Access and modify TabularTextDatastore
properties
TabularTextDatastore
properties describe
the files associated with a TabularTextDatastore
object.
Specifically, the properties describe the format of the data in the
files and control how the data should be read from the datastore.
By changing property values, you can modify certain aspects of the
datastore. Use dot notation to view or modify a particular property
of a TabularTextDatastore
object:
ds = datastore('airlinesmall.csv'); ds.TreatAsMissing = 'NA'; ds.MissingValue = 0;
You also can specify the value of TabularTextDatastore
properties
using name-value pair arguments when you create a datastore using
the datastore
function:
ds = datastore('airlinesmall.csv','TreatAsMissing','NA',... 'MissingValue',0)
The first file specified by the Files
property
determines the variable names and format information. The datastore
function
reevaluates this information when you change any of the following
properties of a TabularTextDatastore
:
Files | Delimiter | CommentStyle |
FileEncoding | RowDelimiter | Whitespace |
ReadVariableNames | TreatAsMissing | MultipleDelimitersAsOne |
NumHeaderLines |
Files
— Files included in datastoreFiles included in the datastore, resolved as a cell array of
character vectors, where each character vector is a full path to a
file. The location
argument in the tabularTextDatastore
and datastore
functions
define these files.
The first file specified by the Files
property
determines the variable names and format information for all files
in the datastore.
Example: {'C:\dir\data\mydata1.csv';'C:\dir\data\mydata2.csv'}
FileEncoding
— File encoding'UTF-8'
(default) | character vectorFile encoding, specified as one of the following:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| |
|
| |
|
| |
| ||
|
If each file in the datastore fits into memory, then FileEncoding
also
can be one of the following:
|
|
|
|
|
|
|
|
|
|
|
ReadVariableNames
— Indicator for reading first row of first file as variable namestrue
or false
| 1
| 0
Indicator for reading first row of the first file in the datastore
as variable names, specified as either true
(1
)
or false
(0
).
If unspecified, the tabularTextDatastore
function
detects the presence of variable names automatically.
If true
, then the first nonheader
row of the first file determines the variable names for the data.
If false
, then the first nonheader
row of the first file contains the first row of data. The data is
assigned default variable names, Var1
, Var2
,
and so on.
Data Types: logical
VariableNames
— Names of variablesNames of variables in the datastore, specified as a cell array
of character vectors. Specify the variable names in the order in which
they appear in the files. If you do not specify the variable names,
they are detected from the first nonheader line in the first file
of the datastore. When modifying the VariableNames
property,
the number of new variable names must match the number of original
variable names.
If ReadVariableNames
is false
,
then VariableNames
defaults to {'Var1','Var2',
...}
.
Example: {'Time','Name','Quantity'}
NumHeaderLines
— Number of lines to skip at beginning of fileNumber of lines to skip at the beginning of the file, specified
as a non-negative integer. If unspecified, the tabularTextDatastore
function
detects the number of lines to skip automatically.
The tabularTextDatastore
function ignores
the specified number of header lines before reading the variable names
or data.
Data Types: double
Delimiter
— Field delimiter charactersField delimiter characters, specified as a character vector
or a cell array of character vectors. Specify multiple delimiters
in a cell array of character vectors. If unspecified, the tabularTextDatastore
function
detects the delimiter automatically.
Example: '|'
Example: {';','*'}
Repeated delimiter characters in a file are interpreted as separate delimiters with empty fields between them. If unspecified, the read function detects the delimiter automatically by default.
When you specify one of the following escape sequences as a delimiter, it is converted to the corresponding control character:
\b | Backspace |
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash (\ ) |
RowDelimiter
— Row delimiter character\r\n
(default) | character vectorRow delimiter character, specified as a character vector that
must be either a single character or one of '\r'
, '\n'
,
or '\r\n'
.
Example: ':'
TreatAsMissing
— Numeric values to treat as missing values''
(default) | character vector | cell array of character vectorsNumeric values to treat as missing values, specified as a single
character vector or cell array of character vectors. Values specified
as TreatAsMissing
are substituted with the value
defined in the MissingValue
property. For instance,
if MissingValue
is defined to be a NaN
,
and the TreatAsMissing
is specified as 'NA'
.
Then, in the imported data, all occurrences of 'NA'
are
replaced by NaN
.
Note: This option only applies to numeric fields. Also, this
property is equivalent to the TreatAsEmpty
name-value
pair argument for the textscan
function.
Example: 'NA'
Example: '-99'
Example: {'-',''}
Data Types: char
| cell
MissingValue
— Value for missing numeric fieldsNaN
(default) | scalarValue for missing numeric fields in delimited text files, specified
as a scalar. This property is equivalent to the EmptyValue
name-value
pair argument for the textscan
function.
Data Types: double
TextscanFormats
— Format of the data fieldsFormat of the data fields, specified as a cell array of character vectors, where each character vector contains one conversion specifier.
When you specify or modify the TextscanFormats
property,
you can use the same conversion specifiers that the textscan
function accepts for the formatSpec
argument.
This includes specifiers that skip fields using an asterisk (*) character
and specifiers that skip literal text. The number of conversion specifiers
must match the number of variables in the VariableNames
property.
If the value of TextscanFormats
includes
conversion specifiers that skip fields using asterisk characters (*),
then the value of the SelectedVariableNames
property
automatically updates. MATLAB® uses the %*q
conversion
specifier to skip fields omitted by the SelectedVariableNames
property
and treats the field contents as literal character vectors. For fixed
width files, indicate a skipped field using the appropriate conversion
specifier along with the field width. For example, %*52c
skips
a field that contains 52 characters.
If you do not specify a value for TextscanFormats
,
then datastore
determines the format of the data
fields by scanning text from the first nonheader line in the first
file of the datastore.
Example: {'%s','%s','%f'}
ExponentCharacters
— Exponent characters'eEdD'
(default) | character vectorExponent characters, specified as a character vector. The default
exponent characters are e
, E
, d
,
and D
.
CommentStyle
— Style of comments''
(default) | character vector | cell array of character vectorsStyle of comments in the file, specified as a character vector or cell array of character vectors.
For example, specify '%'
to ignore characters
following the text on the same line. Specify {'/*','*/'}
to
ignore characters between the text.
When reading from a TabularTextDatastore
,
the read
function checks for comments only at the
start of each field, not within a field.
Example: 'CommentStyle',{'/*', '*/'}
Data Types: char
| cell
Whitespace
— White-space characters' \b\t'
(default) | character vectorWhite-space characters, specified as a character vector of one or more characters.
When you specify one of the following escape sequences as any
white-space character, datastore
converts that
sequence to the corresponding control character:
\b | Backspace |
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash (\ ) |
Example: ' \b\t'
Data Types: char
MultipleDelimitersAsOne
— Multiple delimiter handling0 (false)
(default) | 1 (true)
Multiple delimiter handling, specified as either true
or false
.
If true
, then datastore
treats
consecutive delimiters as a single delimiter. Repeated delimiters
separated by white-space are also treated as a single delimiter.
preview
, read
, readall
TableSelectedVariableNames
— Variables to readVariables to read from the file, specified as a cell array of character vectors, where each character vector contains the name of one variable. You can specify the variable names in any order.
Example: {'Var3','Var7','Var4'}
SelectedFormats
— Formats of selected variablesFormats of the selected variables to read, specified as a cell
array of character vectors, where each character vector contains one
conversion specifier. The variables to read are indicated by the SelectedVariableNames
property.
The number of character vectors in SelectedFormats
must
match the number of variables to read.
You can use the same conversion specifiers that the textscan
function
accepts, including specifiers that skip literal text. However, you
cannot use a conversion specifier that skips a field. That is, the
conversion specifier cannot include an asterisk character (*).
Example: {'%d','%d'}
ReadSize
— Amount of data to read'file'
Amount of data to read in a call to the read
function,
specified as a positive scalar or 'file'
.
If ReadSize
is a positive integer,
then each call to read
reads at most ReadSize
rows.
If ReadSize
is 'file'
,
then each call to read
reads all of the data in
one file.
When you change ReadSize
from a numeric scalar
to 'file'
or vice versa, MATLAB resets the
datastore to the state where no data has been read from it.
TextType
— Output data type of text variables'char'
(default) | 'string'
Output data type of text variables, specified as 'char'
or 'string'
. TextType
specifies
the data type of text variables formatted with %s
, %q
,
or [...]
. If TextType
is 'char'
,
then the output is a cell array of character vectors. If TextType
is 'string'
,
then the output has type string
.