This example shows how to find, clean, and delete table rows with missing data.
Load sample data from a comma-separated text file, messy.csv
. The file contains many different missing data indicators:
Empty character vector ('')
period (.)
NA
NaN
-99
To specify the character vectors to treat as empty values, use the 'TreatAsEmpty'
name-value pair argument with the readtable
function.
T = readtable('messy.csv','TreatAsEmpty',{'.','NA'})
T=21×5 table
A B C D E
______ ____ _____ ____ ____
'afe1' 3 'yes' 3 3
'egh3' NaN 'no' 7 7
'wth4' 3 'yes' 3 3
'atn2' 23 'no' 23 23
'arg1' 5 'yes' 5 5
'jre3' 34.6 'yes' 34.6 34.6
'wen9' 234 'yes' 234 234
'ple2' 2 'no' 2 2
'dbo8' 5 'no' 5 5
'oii4' 5 'yes' 5 5
'wnk3' 245 'yes' 245 245
'abk6' 563 '' 563 563
'pnj5' 463 'no' 463 463
'wnn3' 6 'no' 6 6
'oks9' 23 'yes' 23 23
'wba3' NaN 'yes' NaN 14
T
is a table with 21 rows and five variables. 'TreatAsEmpty'
only applies to numeric columns in the file and cannot handle numeric literals, such as '-99'
.
View the data type, description, units, and other descriptive statistics for each variable by creating a table summary using the summary
function.
summary(T)
Variables: A: 21x1 cell array of character vectors B: 21x1 double Values: Min -99 Median 14 Max 563 NumMissing 3 C: 21x1 cell array of character vectors D: 21x1 double Values: Min -99 Median 7 Max 563 NumMissing 2 E: 21x1 double Values: Min -99 Median 14 Max 563
When you import data from a file, the default is for readtable
to read any variables with nonnumeric elements as a cell array of character vectors.
Display the subset of rows from the table, T
, that have at least one missing value.
TF = ismissing(T,{'' '.' 'NA' NaN -99}); T(any(TF,2),:)
ans=5×5 table
A B C D E
______ ___ _____ ___ ___
'egh3' NaN 'no' 7 7
'abk6' 563 '' 563 563
'wba3' NaN 'yes' NaN 14
'poj2' -99 'yes' -99 -99
'gry5' NaN 'yes' NaN 21
readtable
replaced '.'
and 'NA'
with NaN
in the numeric variables, B
, D
, and E
.
Clean the data so that the missing values indicated by code -99
have the standard MATLAB® numeric missing value indicator, NaN
.
T = standardizeMissing(T,-99)
T=21×5 table
A B C D E
______ ____ _____ ____ ____
'afe1' 3 'yes' 3 3
'egh3' NaN 'no' 7 7
'wth4' 3 'yes' 3 3
'atn2' 23 'no' 23 23
'arg1' 5 'yes' 5 5
'jre3' 34.6 'yes' 34.6 34.6
'wen9' 234 'yes' 234 234
'ple2' 2 'no' 2 2
'dbo8' 5 'no' 5 5
'oii4' 5 'yes' 5 5
'wnk3' 245 'yes' 245 245
'abk6' 563 '' 563 563
'pnj5' 463 'no' 463 463
'wnn3' 6 'no' 6 6
'oks9' 23 'yes' 23 23
'wba3' NaN 'yes' NaN 14
standardizeMissing
replaces three instances of -99
with NaN
.
Create a new table, T2
, and replace missing values with values from previous rows of the table. fillmissing
provides a number of ways to fill in missing values.
T2 = fillmissing(T,'previous')
T2=21×5 table
A B C D E
______ ____ _____ ____ ____
'afe1' 3 'yes' 3 3
'egh3' 3 'no' 7 7
'wth4' 3 'yes' 3 3
'atn2' 23 'no' 23 23
'arg1' 5 'yes' 5 5
'jre3' 34.6 'yes' 34.6 34.6
'wen9' 234 'yes' 234 234
'ple2' 2 'no' 2 2
'dbo8' 5 'no' 5 5
'oii4' 5 'yes' 5 5
'wnk3' 245 'yes' 245 245
'abk6' 563 'yes' 563 563
'pnj5' 463 'no' 463 463
'wnn3' 6 'no' 6 6
'oks9' 23 'yes' 23 23
'wba3' 23 'yes' 23 14
Create a new table, T3
, that contains only the rows from T
without missing values.
T3 = rmmissing(T)
T3=16×5 table
A B C D E
______ ____ _____ ____ ____
'afe1' 3 'yes' 3 3
'wth4' 3 'yes' 3 3
'atn2' 23 'no' 23 23
'arg1' 5 'yes' 5 5
'jre3' 34.6 'yes' 34.6 34.6
'wen9' 234 'yes' 234 234
'ple2' 2 'no' 2 2
'dbo8' 5 'no' 5 5
'oii4' 5 'yes' 5 5
'wnk3' 245 'yes' 245 245
'pnj5' 463 'no' 463 463
'wnn3' 6 'no' 6 6
'oks9' 23 'yes' 23 23
'pkn4' 2 'no' 2 2
'adw3' 22 'no' 22 22
'bas8' 23 'no' 23 23
T3
contains 16 rows and five variables.
Sort the rows of T3
in descending order by C
, and then sort in ascending order by A
.
T3 = sortrows(T2,{'C','A'},{'descend','ascend'})
T3=21×5 table
A B C D E
______ ____ _____ ____ ____
'abk6' 563 'yes' 563 563
'afe1' 3 'yes' 3 3
'arg1' 5 'yes' 5 5
'gry5' 23 'yes' 23 21
'jre3' 34.6 'yes' 34.6 34.6
'oii4' 5 'yes' 5 5
'oks9' 23 'yes' 23 23
'poj2' 22 'yes' 22 22
'wba3' 23 'yes' 23 14
'wen9' 234 'yes' 234 234
'wnk3' 245 'yes' 245 245
'wth4' 3 'yes' 3 3
'adw3' 22 'no' 22 22
'atn2' 23 'no' 23 23
'bas8' 23 'no' 23 23
'dbo8' 5 'no' 5 5
In C
, the rows are grouped first by 'yes'
, followed by 'no'
. Then in A
, the rows are listed alphabetically.
Reorder the table so that A
and C
are next to each other.
T3 = T3(:,{'A','C','B','D','E'})
T3=21×5 table
A C B D E
______ _____ ____ ____ ____
'abk6' 'yes' 563 563 563
'afe1' 'yes' 3 3 3
'arg1' 'yes' 5 5 5
'gry5' 'yes' 23 23 21
'jre3' 'yes' 34.6 34.6 34.6
'oii4' 'yes' 5 5 5
'oks9' 'yes' 23 23 23
'poj2' 'yes' 22 22 22
'wba3' 'yes' 23 23 14
'wen9' 'yes' 234 234 234
'wnk3' 'yes' 245 245 245
'wth4' 'yes' 3 3 3
'adw3' 'no' 22 22 22
'atn2' 'no' 23 23 23
'bas8' 'no' 23 23 23
'dbo8' 'no' 5 5 5
fillmissing
| ismissing
| readtable
| rmmissing
| sortrows
| standardizeMissing
| summary