Write tall array to disk for checkpointing
write(location,tA)
Write a tall array to disk, then subsequently recover the tall array by creating a new datastore for the written files. This process is useful to save your work or share a tall array with a colleague.
Create a datastore for the airlinesmall.csv
data set. Select only the Year
, Month
, and UniqueCarrier
variables, and treat 'NA'
values as missing data. Convert the datastore into a tall table.
ds = datastore('airlinesmall.csv'); ds.TreatAsMissing = 'NA'; ds.SelectedVariableNames = {'Month','Year','UniqueCarrier'}; tt = tall(ds)
tt = M×3 tall table Month Year UniqueCarrier _____ ____ _____________ 10 1987 'PS' 10 1987 'PS' 10 1987 'PS' 10 1987 'PS' 10 1987 'PS' 10 1987 'PS' 10 1987 'PS' 10 1987 'PS' : : : : : :
Sort the data in descending order by year and extract the top 25 rows. The resulting tall table is unevaluated.
tt_new = topkrows(tt,25,'Year')
tt_new = M×3 tall table Month Year UniqueCarrier _____ ____ _____________ ? ? ? ? ? ? ? ? ? : : : : : :
Save the results to a new folder named ExampleData
on the C:\
disk. (You might want to specify a different write location, especially if you are not using a Windows® computer.) The write
function evaluates the tall array prior to writing the files, so there is no need to use the gather
function prior to saving the data.
location = 'C:\ExampleData';
write(location,tt_new)
Writing tall data to folder C:\ExampleData Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0 sec Evaluation completed in 0 sec
Clear tt
and ds
from your working directory. To recover the tall table that was written to disk, first create a new datastore that references the same directory. Then convert the datastore into a tall table. Since the tall table was evaluated before being written to disk, the display now includes a preview of the values.
clear tt ds ds2 = datastore(location); tt2 = tall(ds2)
tt2 = M×3 tall table Month Year UniqueCarrier _____ ____ _____________ 1 2008 'WN' 1 2008 'WN' 1 2008 'WN' 1 2008 'WN' 1 2008 'WN' 1 2008 'WN' 1 2008 'WN' 1 2008 'WN' : : : : : :
location
— Folder location to write dataFolder location to write data, specified as a character vector
or string. location
can specify a full or relative
path. The specified folder can be either of these options:
Existing empty folder that contains no other files
New folder that write
creates
Additional considerations apply for Hadoop® and Apache Spark™:
If the folder is not available locally, then the full
path of the folder must be an internationalized resource identifier
(IRI) of the form:hdfs://
.hostname
:portnumber
/path_to_file
Before writing to HDFS™, set the HADOOP_HOME
, HADOOP_PREFIX
,
or MATLAB_HADOOP_INSTALL
environment variable to
the folder where Hadoop is installed.
Before writing to Apache Spark, set the SPARK_HOME
environment
variable to the folder where Apache Spark is installed.
Example: location = 'hdfs://myHadoopCluster/some/output/folder'
Example: location = '../../dir/data'
Example: location = 'C:\Users\MyName\Desktop'
Data Types: char
| string
tA
— Input arrayInput array, specified as a tall array.
Use the write
function to create checkpoints or snapshots of
your data as you work, especially when working with huge data sets.
This practice allows you to reconstruct tall arrays directly from
files on disk rather than reexecuting all of the commands that produced
the tall array.