TabularTextDatastore Properties

Access and modify TabularTextDatastore properties

TabularTextDatastore properties describe the files associated with a TabularTextDatastore object. Specifically, the properties describe the format of the data in the files and control how the data should be read from the datastore. By changing property values, you can modify certain aspects of the datastore. Use dot notation to view or modify a particular property of a TabularTextDatastore object:

ds = datastore('airlinesmall.csv');
ds.TreatAsMissing = 'NA';
ds.MissingValue = 0;

You also can specify the value of TabularTextDatastore properties using name-value pair arguments when you create a datastore using the datastore function:

ds = datastore('airlinesmall.csv','TreatAsMissing','NA',...
    'MissingValue',0)

The first file specified by the Files property determines the variable names and format information. The datastore function reevaluates this information when you change any of the following properties of a TabularTextDatastore:

`Files`	`Delimiter`	`CommentStyle`
`FileEncoding`	`RowDelimiter`	`Whitespace`
`ReadVariableNames`	`TreatAsMissing`	`MultipleDelimitersAsOne`
`NumHeaderLines`

File Properties

expand all

`Files` — Files included in datastore
cell array of character vectors

Files included in the datastore, resolved as a cell array of character vectors, where each character vector is a full path to a file. The location argument in the tabularTextDatastore and datastore functions define these files.

The first file specified by the Files property determines the variable names and format information for all files in the datastore.

Example: {'C:\dir\data\mydata1.csv';'C:\dir\data\mydata2.csv'}

`FileEncoding` — File encoding
`'UTF-8'` (default) | character vector

File encoding, specified as one of the following:

`'IBM866'`	`'ISO-8859-1'`	`'windows-847'`
`'KOI8-R'`	`'ISO-8859-2'`	`'windows-1250'`
`'KOI8-U'`	`'ISO-8859-3'`	`'windows-1251'`
`'Macintosh'`	`'ISO-8859-4'`	`'windows-1252'`
`'US-ASCII'`	`'ISO-8859-5'`	`'windows-1253'`
`'UTF-8'`	`'ISO-8859-6'`	`'windows-1254'`
	`'ISO-8859-7'`	`'windows-1255'`
	`'ISO-8859-8'`	`'windows-1256'`
	`'ISO-8859-9'`	`'windows-1257'`
	`'ISO-8859-11'`	`'windows-1258'`
	`'ISO-8859-13'`
	`'ISO-8859-15'`

If each file in the datastore fits into memory, then FileEncoding also can be one of the following:

`'Big5'`	`'EUC-KR'`	`'GB18030'`	`'Shift_JIS'`
`'Big5-HKSCS'`	`'EUC-JP'`	`'GB2312'`	`'windows-949'`
`'CP949'`	`'EUC-TW'`	`'GBK'`

`ReadVariableNames` — Indicator for reading first row of first file as variable names
logical `true` or `false` | `1` | `0`

Indicator for reading first row of the first file in the datastore as variable names, specified as either true (1) or false (0).

If unspecified, the tabularTextDatastore function detects the presence of variable names automatically.
If true, then the first nonheader row of the first file determines the variable names for the data.
If false, then the first nonheader row of the first file contains the first row of data. The data is assigned default variable names, Var1, Var2, and so on.

Data Types: logical

`VariableNames` — Names of variables
cell array of character vectors

Names of variables in the datastore, specified as a cell array of character vectors. Specify the variable names in the order in which they appear in the files. If you do not specify the variable names, they are detected from the first nonheader line in the first file of the datastore. When modifying the VariableNames property, the number of new variable names must match the number of original variable names.

If ReadVariableNames is false, then VariableNames defaults to {'Var1','Var2', ...}.

Example: {'Time','Name','Quantity'}

Text Format Properties

expand all

`NumHeaderLines` — Number of lines to skip at beginning of file
non-negative integer

Number of lines to skip at the beginning of the file, specified as a non-negative integer. If unspecified, the tabularTextDatastore function detects the number of lines to skip automatically.

The tabularTextDatastore function ignores the specified number of header lines before reading the variable names or data.

Data Types: double

`Delimiter` — Field delimiter characters
character vector | cell array of character vectors

Field delimiter characters, specified as a character vector or a cell array of character vectors. Specify multiple delimiters in a cell array of character vectors. If unspecified, the tabularTextDatastore function detects the delimiter automatically.

Example: '|'

Example: {';','*'}

Repeated delimiter characters in a file are interpreted as separate delimiters with empty fields between them. If unspecified, the read function detects the delimiter automatically by default.

When you specify one of the following escape sequences as a delimiter, it is converted to the corresponding control character:

`\b`	Backspace
`\n`	Newline
`\r`	Carriage return
`\t`	Tab
`\\`	Backslash (`\`)

`RowDelimiter` — Row delimiter character
`\r\n` (default) | character vector

Row delimiter character, specified as a character vector that must be either a single character or one of '\r', '\n', or '\r\n'.

Example: ':'

`TreatAsMissing` — Numeric values to treat as missing values
`''` (default) | character vector | cell array of character vectors

Numeric values to treat as missing values, specified as a single character vector or cell array of character vectors. Values specified as TreatAsMissing are substituted with the value defined in the MissingValue property. For instance, if MissingValue is defined to be a NaN, and the TreatAsMissing is specified as 'NA'. Then, in the imported data, all occurrences of 'NA' are replaced by NaN.

Note: This option only applies to numeric fields. Also, this property is equivalent to the TreatAsEmpty name-value pair argument for the textscan function.

Example: 'NA'

Example: '-99'

Example: {'-',''}

Data Types: char | cell

`MissingValue` — Value for missing numeric fields
`NaN` (default) | scalar

Value for missing numeric fields in delimited text files, specified as a scalar. This property is equivalent to the EmptyValue name-value pair argument for the textscan function.

Data Types: double

Advanced Text Format Properties

expand all

`TextscanFormats` — Format of the data fields
cell array of character vectors

Format of the data fields, specified as a cell array of character vectors, where each character vector contains one conversion specifier.

When you specify or modify the TextscanFormats property, you can use the same conversion specifiers that the textscan function accepts for the formatSpec argument. This includes specifiers that skip fields using an asterisk (*) character and specifiers that skip literal text. The number of conversion specifiers must match the number of variables in the VariableNames property.

If the value of TextscanFormats includes conversion specifiers that skip fields using asterisk characters (*), then the value of the SelectedVariableNames property automatically updates. MATLAB^® uses the %*q conversion specifier to skip fields omitted by the SelectedVariableNames property and treats the field contents as literal character vectors. For fixed width files, indicate a skipped field using the appropriate conversion specifier along with the field width. For example, %*52c skips a field that contains 52 characters.

If you do not specify a value for TextscanFormats, then datastore determines the format of the data fields by scanning text from the first nonheader line in the first file of the datastore.

Example: {'%s','%s','%f'}

`ExponentCharacters` — Exponent characters
`'eEdD'` (default) | character vector

Exponent characters, specified as a character vector. The default exponent characters are e, E, d, and D.

`CommentStyle` — Style of comments
`''` (default) | character vector | cell array of character vectors

Style of comments in the file, specified as a character vector or cell array of character vectors.

For example, specify '%' to ignore characters following the text on the same line. Specify {'/*','*/'} to ignore characters between the text.

When reading from a TabularTextDatastore, the read function checks for comments only at the start of each field, not within a field.

Example: 'CommentStyle',{'/*', '*/'}

Data Types: char | cell

`Whitespace` — White-space characters
`' \b\t'` (default) | character vector

White-space characters, specified as a character vector of one or more characters.

When you specify one of the following escape sequences as any white-space character, datastore converts that sequence to the corresponding control character:

`\b`	Backspace
`\n`	Newline
`\r`	Carriage return
`\t`	Tab
`\\`	Backslash (`\`)

Example: ' \b\t'

Data Types: char

`MultipleDelimitersAsOne` — Multiple delimiter handling
`0 (false)` (default) | `1 (true)`

Multiple delimiter handling, specified as either true or false. If true, then datastore treats consecutive delimiters as a single delimiter. Repeated delimiters separated by white-space are also treated as a single delimiter.

Properties for `preview`, `read`, `readall` Table

expand all

`SelectedVariableNames` — Variables to read
cell array of character vectors

Variables to read from the file, specified as a cell array of character vectors, where each character vector contains the name of one variable. You can specify the variable names in any order.

Example: {'Var3','Var7','Var4'}

`SelectedFormats` — Formats of selected variables
cell array of character vectors

Formats of the selected variables to read, specified as a cell array of character vectors, where each character vector contains one conversion specifier. The variables to read are indicated by the SelectedVariableNames property. The number of character vectors in SelectedFormats must match the number of variables to read.

You can use the same conversion specifiers that the textscan function accepts, including specifiers that skip literal text. However, you cannot use a conversion specifier that skips a field. That is, the conversion specifier cannot include an asterisk character (*).

Example: {'%d','%d'}

`ReadSize` — Amount of data to read
20000 (default) | positive scalar | `'file'`

Amount of data to read in a call to the read function, specified as a positive scalar or 'file'.

If ReadSize is a positive integer, then each call to read reads at most ReadSize rows.
If ReadSize is 'file', then each call to read reads all of the data in one file.

When you change ReadSize from a numeric scalar to 'file' or vice versa, MATLAB resets the datastore to the state where no data has been read from it.

`TextType` — Output data type of text variables
`'char'` (default) | `'string'`

Output data type of text variables, specified as 'char' or 'string'. TextType specifies the data type of text variables formatted with %s, %q, or [...]. If TextType is 'char', then the output is a cell array of character vectors. If TextType is 'string', then the output has type string.

Documentation

TabularTextDatastore Properties

File Properties

`Files` — Files included in datastore
cell array of character vectors

`FileEncoding` — File encoding
`'UTF-8'` (default) | character vector

`ReadVariableNames` — Indicator for reading first row of first file as variable names
logical `true` or `false` | `1` | `0`

`VariableNames` — Names of variables
cell array of character vectors

Text Format Properties

`NumHeaderLines` — Number of lines to skip at beginning of file
non-negative integer

`Delimiter` — Field delimiter characters
character vector | cell array of character vectors

`RowDelimiter` — Row delimiter character
`\r\n` (default) | character vector

`TreatAsMissing` — Numeric values to treat as missing values
`''` (default) | character vector | cell array of character vectors

`MissingValue` — Value for missing numeric fields
`NaN` (default) | scalar

Advanced Text Format Properties

`TextscanFormats` — Format of the data fields
cell array of character vectors

`ExponentCharacters` — Exponent characters
`'eEdD'` (default) | character vector

`CommentStyle` — Style of comments
`''` (default) | character vector | cell array of character vectors

`Whitespace` — White-space characters
`' \b\t'` (default) | character vector

`MultipleDelimitersAsOne` — Multiple delimiter handling
`0 (false)` (default) | `1 (true)`

Properties for `preview`, `read`, `readall` Table

`SelectedVariableNames` — Variables to read
cell array of character vectors

`SelectedFormats` — Formats of selected variables
cell array of character vectors

`ReadSize` — Amount of data to read
20000 (default) | positive scalar | `'file'`

`TextType` — Output data type of text variables
`'char'` (default) | `'string'`

See Also

More About

MATLAB Documentation

Other Documentation

Support

Documentation

TabularTextDatastore Properties

File Properties

Files — Files included in datastorecell array of character vectors

FileEncoding — File encoding'UTF-8' (default) | character vector

ReadVariableNames — Indicator for reading first row of first file as variable nameslogical true or false | 1 | 0

VariableNames — Names of variablescell array of character vectors

Text Format Properties

NumHeaderLines — Number of lines to skip at beginning of filenon-negative integer

Delimiter — Field delimiter characterscharacter vector | cell array of character vectors

RowDelimiter — Row delimiter character\r\n (default) | character vector

TreatAsMissing — Numeric values to treat as missing values'' (default) | character vector | cell array of character vectors

MissingValue — Value for missing numeric fieldsNaN (default) | scalar

Advanced Text Format Properties

TextscanFormats — Format of the data fieldscell array of character vectors

ExponentCharacters — Exponent characters'eEdD' (default) | character vector

CommentStyle — Style of comments'' (default) | character vector | cell array of character vectors

Whitespace — White-space characters' \b\t' (default) | character vector

MultipleDelimitersAsOne — Multiple delimiter handling0 (false) (default) | 1 (true)

Properties for preview, read, readall Table

SelectedVariableNames — Variables to readcell array of character vectors

SelectedFormats — Formats of selected variablescell array of character vectors

ReadSize — Amount of data to read20000 (default) | positive scalar | 'file'

TextType — Output data type of text variables'char' (default) | 'string'

See Also

More About

MATLAB Documentation

Other Documentation

Support

`Files` — Files included in datastore
cell array of character vectors

`FileEncoding` — File encoding
`'UTF-8'` (default) | character vector

`ReadVariableNames` — Indicator for reading first row of first file as variable names
logical `true` or `false` | `1` | `0`

`VariableNames` — Names of variables
cell array of character vectors

`NumHeaderLines` — Number of lines to skip at beginning of file
non-negative integer

`Delimiter` — Field delimiter characters
character vector | cell array of character vectors

`RowDelimiter` — Row delimiter character
`\r\n` (default) | character vector

`TreatAsMissing` — Numeric values to treat as missing values
`''` (default) | character vector | cell array of character vectors

`MissingValue` — Value for missing numeric fields
`NaN` (default) | scalar

`TextscanFormats` — Format of the data fields
cell array of character vectors

`ExponentCharacters` — Exponent characters
`'eEdD'` (default) | character vector

`CommentStyle` — Style of comments
`''` (default) | character vector | cell array of character vectors

`Whitespace` — White-space characters
`' \b\t'` (default) | character vector

`MultipleDelimitersAsOne` — Multiple delimiter handling
`0 (false)` (default) | `1 (true)`

Properties for `preview`, `read`, `readall` Table

`SelectedVariableNames` — Variables to read
cell array of character vectors

`SelectedFormats` — Formats of selected variables
cell array of character vectors

`ReadSize` — Amount of data to read
20000 (default) | positive scalar | `'file'`

`TextType` — Output data type of text variables
`'char'` (default) | `'string'`