Read formatted data from text file or string
reads data from an open text file into a cell array, C
= textscan(fileID
,formatSpec
)C
.
The text file is indicated by the file identifier, fileID
.
Use fopen
to open the file and obtain the fileID
value.
When you finish reading from a file, close the file by calling fclose(fileID)
.
textscan
attempts to match the data in the
file to the conversion specifier in formatSpec
.
The textscan
function reapplies formatSpec
throughout
the entire file and stops when it cannot match formatSpec
to
the data.
reads
file data using the C
= textscan(fileID
,formatSpec
,N
)formatSpec
N
times,
where N
is a positive integer. To read additional
data from the file after N
cycles, call textscan
again
using the original fileID
. If you resume a text
scan of a file by calling textscan
with the same
file identifier (fileID
), then textscan
automatically
resumes reading at the point where it terminated the last read.
reads
the text from character vector C
= textscan(chr
,formatSpec
)chr
into cell array C
.
When reading text from a character vector, repeated calls to textscan
restart
the scan from the beginning each time. To restart a scan from the
last position, request a position
output.
textscan
attempts to match the data in character
vector chr
to the format specified in formatSpec
.
uses
the C
= textscan(chr
,formatSpec
,N
)formatSpec
N
times, where N
is
a positive integer.
specifies
options using one or more C
= textscan(___,Name,Value
)Name,Value
pair arguments,
in addition to any of the input arguments in the previous syntaxes.
[
returns the position in the
file or the character vector at the end of the scan as the second
output argument. For a file, this is the value that C
,position
]
= textscan(___)ftell(fileID)
would
return after calling textscan
. For a character
vector, position
indicates how many characters
textscan
read.
Read a character vector containing floating-point numbers.
chr = '0.41 8.24 3.57 6.24 9.27'; C = textscan(chr,'%f');
The specifier '%f'
in formatSpec
tells textscan
to match each field in chr
to a double-precision floating-point number.
Display the contents of cell array C
.
celldisp(C)
C{1} = 0.4100 8.2400 3.5700 6.2400 9.2700
Read the same character vector, and truncate each value to one decimal digit.
C = textscan(chr,'%3.1f %*1d');
The specifier %3.1f
indicates a field width of 3 digits and a precision of 1. The textscan
function reads a total of 3 digits, including the decimal point and the 1 digit after the decimal point. The specifier, %*1d
, tells textscan
to skip the remaining digit.
Display the contents of cell array C
.
celldisp(C)
C{1} = 0.4000 8.2000 3.5000 6.2000 9.2000
Using a text editor, create a file scan1.dat
that
contains data in the following form:
09/12/2005 Level1 12.34 45 1.23e10 inf Nan Yes 5.1+3i 10/12/2005 Level2 23.54 60 9e19 -inf 0.001 No 2.2-.5i 11/12/2005 Level3 34.90 12 2e5 10 100 No 3.1+.1i
Open the file, and read each column with the appropriate conversion specifier.
fileID = fopen('scan1.dat'); C = textscan(fileID,'%s %s %f32 %d8 %u %f %f %s %f'); fclose(fileID); celldisp(C)
C{1}{1} = 09/12/2005 C{1}{2} = 10/12/2005 C{1}{3} = 11/12/2005 C{2}{1} = Level1 C{2}{2} = Level2 C{2}{3} = Level3 C{3} = 12.3400 23.5400 34.9000 C{4} = 45 60 12 C{5} = 4294967295 4294967295 200000 C{6} = Inf -Inf 10 C{7} = NaN 0.0010 100.0000 C{8}{1} = Yes C{8}{2} = No C{8}{3} = No C{9} = 5.1000 + 3.0000i 2.2000 - 0.5000i 3.1000 + 0.1000i
textscan
returns a 1-by-9 cell array C
.
View the MATLAB® data type of each of the cells in C
.
C
C = Columns 1 through 5 {3x1 cell} {3x1 cell} [3x1 single] [3x1 int8] [3x1 uint32] Columns 6 through 9 [3x1 double] [3x1 double] {3x1 cell} [3x1 double]
For example, C{1}
and C{2}
are
cell arrays. C{5}
is of data type uint32
,
so the first two elements of C{5}
are the maximum
values for a 32-bit unsigned integer, or intmax('uint32')
.
Remove the literal text 'Level'
from
each field in the second column of the data from the previous example.
Match the literal text in the formatSpec
input.
fileID = fopen('scan1.dat'); C = textscan(fileID,'%s Level%d %f32 %d8 %u %f %f %s %f'); fclose(fileID); C{2}
ans = 1 2 3
View the MATLAB data type of the second cell in C
.
class(C{2})
ans = int32
The second cell of the 1-by-9 cell array, C
,
is now of data type int32
.
Read the first column of the file in the previous example into a cell array, skipping the rest of the line.
fileID = fopen('scan1.dat'); dates = textscan(fileID,'%s %*[^\n]'); fclose(fileID); dates{1}
ans = '09/12/2005' '10/12/2005' '11/12/2005'
textscan
returns a 1-by-1 cell array dates
.
Using a text editor, create a comma-delimited file, data.csv
,
that contains
1, 2, 3, 4, , 6 7, 8, 9, , 11, 12
Read the file, converting empty cells to -Inf
.
fileID = fopen('data.csv'); C = textscan(fileID,'%f %f %f %f %u8 %f',... 'Delimiter',',','EmptyValue',-Inf); fclose(fileID); column4 = C{4}, column5 = C{5}
column4 = 4 -Inf column5 = 0 11
textscan
returns a 1-by-6 cell array, C
.
The textscan
function converts the empty value
in C{4}
to -Inf
, where C{4}
is
associated with a floating-point format. Because MATLAB represents
unsigned integer -Inf
as 0
, textscan
converts
the empty value in C{5}
to 0
,
and not -Inf
.
Using a text editor, create a comma-delimited file, data2.csv
,
that contains the lines
abc, 2, NA, 3, 4 // Comment Here def, na, 5, 6, 7
Designate the input that textscan
should
treat as comments or empty values.
fileID = fopen('data2.csv'); C = textscan(fileID,'%s %n %n %n %n','Delimiter',',',... 'TreatAsEmpty',{'NA','na'},'CommentStyle','//'); fclose(fileID); celldisp(C)
C{1}{1} = abc C{1}{2} = def C{2} = 2 NaN C{3} = NaN 5 C{4} = 3 6 C{5} = 4 7
Using a text editor, create a file, data3.csv
,
that contains
1,2,3,,4 5,6,7,,8
To treat the repeated commas as a single delimiter, use
the MultipleDelimsAsOne
parameter, and set the
value to 1 (true)
.
fileID = fopen('data3.csv'); C = textscan(fileID,'%f %f %f %f','Delimiter',',',... 'MultipleDelimsAsOne',1); fclose(fileID); celldisp(C)
C{1} = 1 5 C{2} = 2 6 C{3} = 3 7 C{4} = 4 8
Using a text editor, create a file, grades.txt
,
that contains:
Student_ID | Test1 | Test2 | Test3 1 91.5 89.2 77.3 2 88.0 67.8 91.0 3 76.3 78.1 92.5 4 96.4 81.2 84.6
Read the column headers using the format '%s'
four
times.
fileID = fopen('grades.txt'); formatSpec = '%s'; N = 4; C_text = textscan(fileID,formatSpec,N,'Delimiter','|');
Read the numeric data in the file.
C_data0 = textscan(fileID,'%d %f %f %f')
C_data0 = [4x1 int32] [4x1 double] [4x1 double] [4x1 double]
The default value for CollectOutput
is 0 (false)
,
so textscan
returns each column of the numeric
data in a separate array.
Set the file position indicator to the beginning of the file.
frewind(fileID);
Reread the file and set CollectOutput
to 1
(true)
to
collect the consecutive columns of the same class into a single array.
You can use the repmat
function to indicate that
the %f
conversion specifier should appear three
times. This technique is useful when a format repeats many times.
C_text = textscan(fileID,'%s',N,'Delimiter','|'); C_data1 = textscan(fileID,['%d',repmat('%f',[1,3])],'CollectOutput',1)
C_data1 = [4x1 int32] [4x3 double]
The test scores, which are all double
, are
collected into a single 4-by-3 array.
Close the file, grades.txt
.
fclose(fileID);
Read the first and last columns of data from a text file. Skip a column of text and a column of integer data.
In a text editor, create a comma-delimited text file called names.txt
that
contains:
"Smith, J.","M",38,71.1 "Bates, G.","F",43,69.3 "Curie, M.","F",38,64.1 "Murray, G.","F",40,133.0 "Brown, K.","M",49,64.9
Read the first and last columns of data in the file. Use
the conversion specifier, %q
to read the text enclosed
by double quotation marks ("
). %*q
skips
the quoted text, %*d
skips the integer field, and %f
reads
the floating-point number. Specify the comma delimiter using the 'Delimiter'
name-value
pair argument.
fileID = fopen('names.txt','r'); C = textscan(fileID,'%q %*q %*d %f','Delimiter',','); fclose(fileID); celldisp(C)
C{1}{1} = Smith, J. C{1}{2} = Bates, G. C{1}{3} = Curie, M. C{1}{4} = Murray, G. C{1}{5} = Brown, K. C{2} = 71.1000 69.3000 64.1000 133.0000 64.9000
textscan
returns a 1-by-2 cell array, C
.
Double quotation marks enclosing the text are removed.
Create a sample file named myfile.txt
that
contains comma-separated values. The first column of values contains
dates in German and the second and third columns are numeric values.
fileID = fopen('myfile.txt','w','n','ISO-8859-15'); fprintf(fileID,'1 Januar 2014, 20.2, 100.5 \n'); fprintf(fileID,'1 Februar 2014, 21.6, 102.7 \n'); fprintf(fileID,'1 März 2014, 20.7, 99.8 \n'); fclose(fileID);
The sample file looks like this:
1 Januar 2014, 20.2, 100.5 1 Februar 2014, 21.6, 102.7 1 März 2014, 20.7, 99.8
Open the file. Specify the character encoding scheme associated
with the file as the last input to fopen
.
fileID = fopen('myfile.txt','r','n','ISO-8859-15');
Read the file. Specify the format of the dates in the
file using the %{dd % MMMM yyyy}D
specifier. Specify
the locale of the dates using the DateLocale
name-value
pair argument.
C = textscan(fileID,'%{dd MMMM yyyy}D %f %f',... 'DateLocale','de_DE','Delimiter',','); fclose(fileID);
View the contents of the first cell in C
.
C{1}
ans = 01 January 2014 01 February 2014 01 March 2014
The dates display in the language MATLAB uses depending on your system locale.
Use sprintf
to convert nondefault escape sequences in your data.
Create text that includes a form feed character, \f
. Then, to read the text using textscan
, call sprintf
to explicitly convert the form feed.
lyric = sprintf('Blackbird\fsinging\fin\fthe\fdead\fof\fnight'); C = textscan(lyric,'%s','delimiter',sprintf('\f')); C{1}
ans = 7×1 cell array 'Blackbird' 'singing' 'in' 'the' 'dead' 'of' 'night'
textscan
returns a 1-by-1 cell array, C
.
Resume scanning from a position other than the beginning.
If you resume a scan of the text, textscan
reads from the beginning each time. To resume a scan from any other position, use the two-output argument syntax in your initial call to textscan
.
For example, create a character vector called lyric
. Read the first word of the character vector, and then resume the scan.
lyric = 'Blackbird singing in the dead of night'; [firstword,pos] = textscan(lyric,'%9c',1); lastpart = textscan(lyric(pos+1:end),'%s');
fileID
— File identifierFile identifier of an open text file, specified as a number.
Before reading a file with textscan
, you must use fopen
to
open the file and obtain the fileID
.
Data Types: double
formatSpec
— Format of the data fieldsFormat of the data fields, specified as a character vector of
one or more conversion specifiers. When textscan
reads
a file or a character vector, it attempts to match the data to the
format specified in formatSpec
. If textscan
fails
to match a data field, it stops reading and returns all fields read
before the failure.
The number of conversion specifiers determines the number of
cells in output array, C
.
Numeric Fields
This table lists available conversion specifiers for numeric inputs.
Numeric Input Type | Conversion Specifier | Output Class |
---|---|---|
Integer, signed | %d | int32 |
%d8 | int8 | |
%d16 | int16 | |
%d32 | int32 | |
%d64 | int64 | |
Integer, unsigned | %u | uint32 |
%u8 | uint8 | |
%u16 | uint16 | |
%u32 | uint32 | |
%u64 | uint64 | |
Floating-point number | %f | double |
%f32 | single | |
%f64 | double | |
%n | double |
Nonnumeric Fields
This table lists available conversion specifiers for inputs that include nonnumeric characters.
Nonnumeric Input Type | Conversion Specifier | Details |
---|---|---|
Character | %c | Read any single character, including a delimiter. |
Text Array | %s | Read as a cell array of character vectors. |
%q | Read as a cell array of character vectors.
If the text begins with a double quotation mark ( Example: | |
Dates and time | %D | Read the same way as |
%{ | Read the same way as For
more information about datetime display formats, see the Example: | |
Category | %C | Read the same way as |
Pattern-matching | %[...] | Read as a cell array of character vectors, the characters
inside the brackets up to the first nonmatching character. To include Example: |
%[^...] | Exclude characters inside the brackets, reading until
the first matching character. To exclude Example: |
Optional Operators
Conversion specifiers in formatSpec
can include
optional operators, which appear in the following order (includes
spaces for clarity):
Optional operators include:
Fields and Characters to Ignore
textscan
reads all characters in your file
in sequence, unless you tell it to ignore a particular field or a
portion of a field.
Insert an asterisk character (*) after the percent character (%) to skip a field or a portion of a character field.
Operator | Action Taken |
---|---|
%* | Skip the field. Example: |
'%* | Skip up to Example: |
'%* | Skip |
Field Width
textscan
reads the number of characters or
digits specified by the field width or precision, or up to the first
delimiter, whichever comes first. A decimal point, sign (+
or -
),
exponent character, and digits in the numeric exponent are counted
as characters and digits within the field width. For complex numbers,
the field width refers to the individual widths of the real part and
the imaginary part. For the imaginary part, the field width includes
+ or − but not i
or j
.
Specify the field width by inserting a number after the percent character
(%) in the conversion specifier.
Example: %5f
reads '123.456'
as 123.4
.
Example: %5c
reads 'abcdefg'
as 'abcde'
.
When the field width operator is used with single characters
(%c
), textscan
also reads delimiter,
white-space, and end-of-line characters.
Example: %7c
reads
7 characters, including white-space, so'Day and night'
reads
as 'Day and'
.
Precision
For floating-point numbers (%n
, %f
, %f32
, %f64
),
you can specify the number of decimal digits to read.
Example: %7.2f
reads '123.456'
as 123.45
.
Literal Text to Ignore
textscan
ignores the text appended to the formatSpec
conversion
specifier.
Example: Level%u8
reads 'Level1'
as 1
.
Example: %u8Step
reads '2Step'
as 2
.
N
— Number of times to apply formatSpec
Inf
(default) | positive integerNumber of times to apply formatSpec
, specified
as a positive integer.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
chr
— Input textInput text to read.
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.
C = textscan(fileID,formatSpec,'HeaderLines',3,'Delimiter',',')
skips
the first three lines of the data, and then reads the remaining data,
treating commas as a delimiter.Names are not case sensitive.
'CollectOutput'
— Logical indicator determining data concatenationfalse
(default) | true
Logical indicator determining data concatenation, specified
as the comma-separated pair consisting of 'CollectOutput'
and
either true
or false
. If true
,
then textscan
concatenates consecutive output cells
of the same fundamental MATLAB class into a single array.
'CommentStyle'
— Symbols designating text to ignoreSymbols designating text to ignore, specified as the comma-separated
pair consisting of 'CommentStyle'
and a character
vector or a cell array of character vectors.
For example, specify a character such as '%'
to
ignore text following the symbol on the same line. Specify a cell
array of two character vectors, such as {'/*', '*/'}
,
to ignore any text between those sequences.
textscan
checks for comments only at the
start of each field, not within a field.
Example: 'CommentStyle',{'/*', '*/'}
'DateLocale'
— Locale for reading datesLocale for reading dates, specified as the comma-separated pair
consisting of 'DateLocale'
and a character vector
in the form
,
where xx
_YY
xx
is a lowercase ISO 639-1 two-letter
code that specifies a language, and YY
is
an uppercase ISO 3166-1 alpha-2 code that specifies a country. For
a list of common values for the locale, see the Locale
name-value
pair argument for the datetime
function.
Use DateLocale
to specify the locale in which
textscan should interpret month and day of week names and abbreviations
when reading text as dates using the %D
format
specifier.
Example: 'DateLocale','ja_JP'
'Delimiter'
— Field delimiter charactersField delimiter characters, specified as the comma-separated
pair consisting of 'Delimiter'
and a character
vector or a cell array of character vectors. Specify multiple delimiters
in a cell array of character vectors.
Example: 'Delimiter',{';','*'}
textscan
interprets repeated delimiter characters
as separate delimiters, and returns an empty value to the output cell.
Within each row of data, the default field delimiter is white-space.
White-space can be any combination of space ('
'
), backspace ('\b'
),
or tab ('\t'
) characters. If you do not specify
a delimiter, then:
the delimiter characters are the same as the white-space
characters. The default white-space characters are ' '
, '\b'
,
and '\t'
. Use the 'Whitespace'
name-value
pair argument to specify alternate white-space characters.
textscan
interprets repeated white-space
characters as a single delimiter.
When you specify one of the following escape sequences as a
delimiter, textscan
converts that sequence to the
corresponding control character:
\b | Backspace |
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash (\ ) |
'EmptyValue'
— Returned value for empty numeric fieldsNaN
(default) | scalarReturned value for empty numeric fields in delimited text files,
specified as the comma-separated pair consisting of 'EmptyValue'
and
a scalar.
'EndOfLine'
— End-of-line charactersEnd-of-line characters, specified as the comma-separated pair
consisting of 'EndOfLine'
and a character vector.
The character vector must be '\r\n'
or it must
specify a single character. Common end-of-line characters are a newline
character ('\n'
) or a carriage return ('\r'
).
If you specify '\r\n'
, then textscan
treats
any of \r
, \n
, and the combination
of the two (\r\n
) as end-of-line characters.
The default end-of-line sequence is \n
, \r
,
or \r\n
, depending on the contents of your file.
If there are missing values and an end-of-line sequence at the
end of the last line in a file, then textscan
returns
empty values for those fields. This ensures that individual cells
in output cell array, C
, are the same size.
Example: 'EndOfLine',':'
'ExpChars'
— Exponent characters'eEdD'
(default) | character vectorExponent characters, specified as the comma-separated pair consisting
of 'ExpChars'
and a character vector. The default
exponent characters are e
, E
, d
,
and D
.
'HeaderLines'
— Number of header lines0
(default) | positive integerNumber of header lines, specified as the comma-separated pair
consisting of 'HeaderLines'
and a positive integer. textscan
skips
the header lines, including the remainder of the current line.
'MultipleDelimsAsOne'
— Multiple delimiter handling0 (false)
(default) | 1 (true)
Multiple delimiter handling, specified as the comma-separated
pair consisting of 'MultipleDelimsAsOne'
and either true
or false
.
If true
, textscan
treats consecutive
delimiters as a single delimiter. Repeated delimiters separated by
white-space are also treated as a single delimiter. You must also
specify the Delimiter
option.
Example: 'MultipleDelimsAsOne',1
'ReturnOnError'
— Behavior when textscan
fails to read or convert1 (true)
(default) | 0 (false)
Behavior when textscan
fails to read or convert,
specified as the comma-separated pair consisting of 'ReturnOnError'
and
either true
or false
. If true
, textscan
terminates
without an error and returns all fields read. If false
, textscan
terminates
with an error and does not return an output cell array.
'TreatAsEmpty'
— Placeholder text to treat as empty valuePlaceholder text to treat as empty value, specified as the
comma-separated pair consisting of 'TreatAsEmpty'
and
a single character vector or a cell array of character vectors. This
option only applies to numeric fields.
'Whitespace'
— White-space characters' \b\t'
(default) | character vectorWhite-space characters, specified as the comma-separated pair
consisting of 'Whitespace'
and a character vector
containing one or more characters. textscan
adds
a space character, char(32)
, to any specified Whitespace
,
unless Whitespace
is empty (''
)
and formatSpec
includes any conversion specifier.
When you specify one of the following escape sequences as any
white-space character, textscan
converts that sequence
to the corresponding control character:
\b | Backspace |
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash (\ ) |
'TextType'
— Output data type of text'char'
(default) | 'string'
Output data type of text, specified as the comma-separated pair
consisting of 'TextType'
and either 'char'
or 'string'
.
If you specify the value 'char'
, then textscan
returns
text as a cell array of character vectors. If you specify the value 'string'
,
then textscan
returns text as an array of type string
.
C
— File or text dataFile or text data, returned as a cell array.
For each numeric conversion specifier in formatSpec
,
the textscan
function returns a K
-by-1 MATLAB numeric
vector to the output cell array, C
, where K
is
the number of times that textscan
finds a field
matching the specifier.
For each text conversion specifier (%s
, %q
,
or %[...]
) in formatSpec
, the textscan
function
returns a K
-by-1 cell array of character vectors,
where K
is the number of times that textscan
finds
a field matching the specifier. For each character conversion that
includes a field width operator, textscan
returns
a K
-by-M
character array, where M
is
the field width.
For each datetime or categorical conversion specifier in formatSpec
,
the textscan
function returns a K
-by-1
datetime or categorical vector to the output cell array, C
,
where K
is the number of times that textscan
finds
a field matching the specifier.
position
— Position in the file or character vector Position at the end of the scan, in the file or the character
vector, returned as an integer of class double
.
For a file, ftell
(fileID)
would
return the same value after calling textscan
. For
a character vector, position
indicates how many
characters textscan
read.
textscan
converts numeric fields to the specified
output type according to MATLAB rules regarding overflow, truncation,
and the use of NaN
, Inf
, and -Inf
.
For example, MATLAB represents an integer NaN
as
zero. If textscan
finds an empty field associated
with an integer format specifier (such as %d
or %u
),
it returns the empty value as zero and not NaN
.
When matching data to a text conversion specifier, textscan
reads
until it finds a delimiter or an end-of-line character. When matching
data to a numeric conversion specifier, textscan
reads
until it finds a nonnumeric character. When textscan
can
no longer match the data to a particular conversion specifier, it
attempts to match the data to the next conversion specifier in the formatSpec
.
Sign (+
or -
), exponent characters,
and decimal points are considered numeric characters.
Sign | Digits | Decimal Point | Digits | Exponent Character | Sign | Digits |
---|---|---|---|---|---|---|
Read one sign character if it exists. | Read one or more digits. | Read one decimal point if it exists. | If there is a decimal point, read one or more digits that immediately follow it. | Read one exponent character if it exists. | If there is an exponent character, read one sign character. | If there is an exponent character, read one or more digits that follow it. |
textscan
imports any complex number as a
whole into a complex numeric field, converting the real and imaginary
parts to the specified numeric type (such as %d
or %f
).
Valid forms for a complex number are:
±<real> ±<imag>i|j | Example: |
±<imag>i|j | Example: |
Do not include embedded white space in a complex number. textscan
interprets
embedded white space as a field delimiter.