TabularTextDatastore Properties

Access and modify TabularTextDatastore properties

TabularTextDatastore properties describe the files associated with a TabularTextDatastore object. Specifically, the properties describe the format of the data in the files and control how the data should be read from the datastore. By changing property values, you can modify certain aspects of the datastore. Use dot notation to view or modify a particular property of a TabularTextDatastore object:

ds = datastore('airlinesmall.csv');
ds.TreatAsMissing = 'NA';
ds.MissingValue = 0;

You also can specify the value of TabularTextDatastore properties using name-value pair arguments when you create a datastore using the datastore function:

ds = datastore('airlinesmall.csv','TreatAsMissing','NA',...
    'MissingValue',0)

The first file specified by the Files property determines the variable names and format information. The datastore function reevaluates this information when you change any of the following properties of a TabularTextDatastore:

FilesDelimiterCommentStyle
FileEncodingRowDelimiterWhitespace
ReadVariableNamesTreatAsMissingMultipleDelimitersAsOne
NumHeaderLines  

File Properties

expand all

Files included in the datastore, resolved as a cell array of character vectors, where each character vector is a full path to a file. The location argument in the tabularTextDatastore and datastore functions define these files.

The first file specified by the Files property determines the variable names and format information for all files in the datastore.

Example: {'C:\dir\data\mydata1.csv';'C:\dir\data\mydata2.csv'}

File encoding, specified as one of the following:

'IBM866'

'ISO-8859-1'

'windows-847'

'KOI8-R'

'ISO-8859-2'

'windows-1250'

'KOI8-U'

'ISO-8859-3'

'windows-1251'

'Macintosh'

'ISO-8859-4'

'windows-1252'

'US-ASCII'

'ISO-8859-5'

'windows-1253'

'UTF-8'

'ISO-8859-6'

'windows-1254'

 

'ISO-8859-7'

'windows-1255'

 

'ISO-8859-8'

'windows-1256'

 

'ISO-8859-9'

'windows-1257'

 

'ISO-8859-11'

'windows-1258'

 

'ISO-8859-13'

 
 

'ISO-8859-15'

 

If each file in the datastore fits into memory, then FileEncoding also can be one of the following:

'Big5'

'EUC-KR'

'GB18030'

'Shift_JIS'

'Big5-HKSCS'

'EUC-JP'

'GB2312'

'windows-949'

'CP949'

'EUC-TW'

'GBK'

 

Indicator for reading first row of the first file in the datastore as variable names, specified as either true (1) or false (0).

  • If unspecified, the tabularTextDatastore function detects the presence of variable names automatically.

  • If true, then the first nonheader row of the first file determines the variable names for the data.

  • If false, then the first nonheader row of the first file contains the first row of data. The data is assigned default variable names, Var1, Var2, and so on.

Data Types: logical

Names of variables in the datastore, specified as a cell array of character vectors. Specify the variable names in the order in which they appear in the files. If you do not specify the variable names, they are detected from the first nonheader line in the first file of the datastore. When modifying the VariableNames property, the number of new variable names must match the number of original variable names.

If ReadVariableNames is false, then VariableNames defaults to {'Var1','Var2', ...}.

Example: {'Time','Name','Quantity'}

Text Format Properties

expand all

Number of lines to skip at the beginning of the file, specified as a non-negative integer. If unspecified, the tabularTextDatastore function detects the number of lines to skip automatically.

The tabularTextDatastore function ignores the specified number of header lines before reading the variable names or data.

Data Types: double

Field delimiter characters, specified as a character vector or a cell array of character vectors. Specify multiple delimiters in a cell array of character vectors. If unspecified, the tabularTextDatastore function detects the delimiter automatically.

Example: '|'

Example: {';','*'}

Repeated delimiter characters in a file are interpreted as separate delimiters with empty fields between them. If unspecified, the read function detects the delimiter automatically by default.

When you specify one of the following escape sequences as a delimiter, it is converted to the corresponding control character:

\bBackspace
\nNewline
\rCarriage return
\tTab
\\Backslash (\)

Row delimiter character, specified as a character vector that must be either a single character or one of '\r', '\n', or '\r\n'.

Example: ':'

Numeric values to treat as missing values, specified as a single character vector or cell array of character vectors. Values specified as TreatAsMissing are substituted with the value defined in the MissingValue property. For instance, if MissingValue is defined to be a NaN, and the TreatAsMissing is specified as 'NA'. Then, in the imported data, all occurrences of 'NA' are replaced by NaN.

Note: This option only applies to numeric fields. Also, this property is equivalent to the TreatAsEmpty name-value pair argument for the textscan function.

Example: 'NA'

Example: '-99'

Example: {'-',''}

Data Types: char | cell

Value for missing numeric fields in delimited text files, specified as a scalar. This property is equivalent to the EmptyValue name-value pair argument for the textscan function.

Data Types: double

Advanced Text Format Properties

expand all

Format of the data fields, specified as a cell array of character vectors, where each character vector contains one conversion specifier.

When you specify or modify the TextscanFormats property, you can use the same conversion specifiers that the textscan function accepts for the formatSpec argument. This includes specifiers that skip fields using an asterisk (*) character and specifiers that skip literal text. The number of conversion specifiers must match the number of variables in the VariableNames property.

If the value of TextscanFormats includes conversion specifiers that skip fields using asterisk characters (*), then the value of the SelectedVariableNames property automatically updates. MATLAB® uses the %*q conversion specifier to skip fields omitted by the SelectedVariableNames property and treats the field contents as literal character vectors. For fixed width files, indicate a skipped field using the appropriate conversion specifier along with the field width. For example, %*52c skips a field that contains 52 characters.

If you do not specify a value for TextscanFormats, then datastore determines the format of the data fields by scanning text from the first nonheader line in the first file of the datastore.

Example: {'%s','%s','%f'}

Exponent characters, specified as a character vector. The default exponent characters are e, E, d, and D.

Style of comments in the file, specified as a character vector or cell array of character vectors.

For example, specify '%' to ignore characters following the text on the same line. Specify {'/*','*/'} to ignore characters between the text.

When reading from a TabularTextDatastore, the read function checks for comments only at the start of each field, not within a field.

Example: 'CommentStyle',{'/*', '*/'}

Data Types: char | cell

White-space characters, specified as a character vector of one or more characters.

When you specify one of the following escape sequences as any white-space character, datastore converts that sequence to the corresponding control character:

\bBackspace
\nNewline
\rCarriage return
\tTab
\\Backslash (\)

Example: ' \b\t'

Data Types: char

Multiple delimiter handling, specified as either true or false. If true, then datastore treats consecutive delimiters as a single delimiter. Repeated delimiters separated by white-space are also treated as a single delimiter.

Properties for preview, read, readall Table

expand all

Variables to read from the file, specified as a cell array of character vectors, where each character vector contains the name of one variable. You can specify the variable names in any order.

Example: {'Var3','Var7','Var4'}

Formats of the selected variables to read, specified as a cell array of character vectors, where each character vector contains one conversion specifier. The variables to read are indicated by the SelectedVariableNames property. The number of character vectors in SelectedFormats must match the number of variables to read.

You can use the same conversion specifiers that the textscan function accepts, including specifiers that skip literal text. However, you cannot use a conversion specifier that skips a field. That is, the conversion specifier cannot include an asterisk character (*).

Example: {'%d','%d'}

Amount of data to read in a call to the read function, specified as a positive scalar or 'file'.

  • If ReadSize is a positive integer, then each call to read reads at most ReadSize rows.

  • If ReadSize is 'file', then each call to read reads all of the data in one file.

When you change ReadSize from a numeric scalar to 'file' or vice versa, MATLAB resets the datastore to the state where no data has been read from it.

Output data type of text variables, specified as 'char' or 'string'. TextType specifies the data type of text variables formatted with %s, %q, or [...]. If TextType is 'char', then the output is a cell array of character vectors. If TextType is 'string', then the output has type string.

Was this topic helpful?