histcounts

Histogram bin counts

Syntax

[N,edges] = histcounts(X)
example
[N,edges] = histcounts(X,nbins)
example
[N,edges] = histcounts(X,edges)
example
[N,edges,bin] = histcounts(___)
example

N = histcounts(C)
example
N = histcounts(C,Categories)
[N,Categories] = histcounts(___)
example

[___] = histcounts(___,Name,Value)
example

Description

[N,edges] = histcounts(X) partitions the X values into bins, and returns the count in each bin, as well as the bin edges. The histcounts function uses an automatic binning algorithm that returns bins with a uniform width, chosen to cover the range of elements in X and reveal the underlying shape of the distribution.

example

[N,edges] = histcounts(X,nbins) uses a number of bins specified by the scalar, nbins.

example

[N,edges] = histcounts(X,edges) sorts X into bins with the bin edges specified by the vector, edges. The value X(i) is in the kth bin if edges(k) ≤ X(i) < edges(k+1). The last bin also includes the right bin edge, so that it contains X(i) if edges(end-1) ≤ X(i) ≤ edges(end).

example

[N,edges,bin] = histcounts(___) also returns an index array, bin, using any of the previous syntaxes. bin is an array of the same size as X whose elements are the bin indices for the corresponding elements in X. The number of elements in the kth bin is nnz(bin==k), which is the same as N(k).

example

N = histcounts(C), where C is a categorical array, returns a vector, N, that indicates the number of elements in C whose value is equal to each of C's categories. N has one element for each category in C.

N = histcounts(C,Categories) counts only the elements in C whose value is equal to the subset of categories specified by Categories.

example

[N,Categories] = histcounts(___) also returns the categories that correspond to each count in N using either of the previous syntaxes for categorical arrays.

example

[___] = histcounts(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments using any of the input or output argument combinations in previous syntaxes. For example, you can specify 'BinWidth' and a scalar to adjust the width of the bins for numeric data. For categorical data, you can specify 'Normalization' and either 'count', 'countdensity', 'probability', 'pdf', 'cumcount', or 'cdf'.

Examples

collapse all

Bin Counts and Bin Edges

Open Script

Distribute 100 random values into bins. histcounts automatically chooses an appropriate bin width to reveal the underlying distribution of the data.

X = randn(100,1);
[N,edges] = histcounts(X)

N =

     2    17    28    32    16     3     2


edges =

    -3    -2    -1     0     1     2     3     4

Specify Number of Bins

Open Script

Distribute 10 numbers into 6 equally spaced bins.

X = [2 3 5 7 11 13 17 19 23 29];
[N,edges] = histcounts(X,6)

N =

     2     2     2     2     1     1


edges =

         0    4.9000    9.8000   14.7000   19.6000   24.5000   29.4000

Specify Bin Edges

Open Script

Distribute 1,000 random numbers into bins. Define the bin edges with a vector, where the first element is the left edge of the first bin, and the last element is the right edge of the last bin.

X = randn(1000,1);
edges = [-5 -4 -2 -1 -0.5 0 0.5 1 2 4 5];
N = histcounts(X,edges)

N =

     0    24   149   142   195   200   154   111    25     0

Normalized Bin Counts

Open Script

Distribute all of the prime numbers less than 100 into bins. Specify 'Normalization' as 'probability' to normalize the bin counts so that sum(N) is 1. That is, each bin count represents the probability that an observation falls within that bin.

X = primes(100);
[N,edges] = histcounts(X, 'Normalization', 'probability')

N =

    0.4000    0.2800    0.2800    0.0400


edges =

     0    30    60    90   120

Determine Bin Placement

Open Script

Distribute 100 random integers between -5 and 5 into bins, and specify 'BinMethod' as 'integers' to use unit-width bins centered on integers. Specify a third output for histcounts to return a vector representing the bin indices of the data.

X = randi([-5,5],100,1);
[N,edges,bin] = histcounts(X,'BinMethod','integers');

Find the bin count for the third bin by counting the occurrences of the number 3 in the bin index vector, bin. The result is the same as N(3).

count = nnz(bin==3)

count =

     8

Categorical Bin Counts

Open Script

Create a categorical vector that represents votes. The categories in the vector are 'yes', 'no', or 'undecided'.

A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1];
C = categorical(A,[1 0 NaN],{'yes','no','undecided'})

C = 

  Columns 1 through 9

     no      no      yes      yes      yes      no      no      no      no 

  Columns 10 through 16

     undecided      undecided      yes      no      no      no      yes 

  Columns 17 through 25

     no      yes      no      yes      no      no      no      yes      yes 

  Columns 26 through 27

     yes      yes

Determine the number of elements that fall into each category.

[N,Categories] = histcounts(C)

N =

    11    14     2


Categories =

  1×3 cell array

    'yes'    'no'    'undecided'

Input Arguments

collapse all

`X` — Data to distribute among bins
vector | matrix | multidimensional array

Data to distribute among bins, specified as a vector, matrix, or multidimensional array. If X is not a vector, then histcounts treats it as a single column vector, X(:).

histcounts ignores all NaN values. Similarly, histcounts ignores Inf and -Inf values unless the bin edges explicitly specify Inf or -Inf as a bin edge.

`C` — Categorical data
categorical array

Categorical data, specified as a categorical array. histcounts ignores undefined categorical values.

Data Types: categorical

`nbins` — Number of bins
positive integer

Number of bins, specified as a positive integer. If you do not specify nbins, then histcounts automatically calculates how many bins to use based on the values in X.

Example: [N,edges] = histcounts(X,15) uses 15 bins.

`edges` — Bin edges
vector

Bin edges, specified as a vector. edges(1) is the left edge of the first bin, and edges(end) is the right edge of the last bin.

`Categories` — Categories included in count
all categories (default) | cell vector of character vectors | categorical vector

Categories included in count, specified as a cell vector of character vectors or a categorical vector. By default, histcounts uses a bin for each category in categorical array C. Use Categories to specify a unique subset of the categories instead.

Example: h = histcounts(C,{'Large','Small'}) counts only the categorical data in the categories 'Large' and 'Small'.

Data Types: cell | categorical

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: [N,edges] = histcounts(X,'Normalization','probability') normalizes the bin counts in N, such that sum(N) is 1.

collapse all

`'BinLimits'` — Bin limits
two-element vector

Bin limits, specified as a two-element vector, [bmin,bmax]. This option bins only the values in X that fall between bmin and bmax inclusive; that is, X(X>=bmin & X<=bmax).

This option does not apply to categorical data.

Example: [N,edges] = histcounts(X,'BinLimits',[1,10]) bins only the values in X that are between 1 and 10 inclusive.

`'BinMethod'` — Binning algorithm
`'auto'` (default) | `'scott'` | `'fd'` | `'integers'` | `'sturges'` | `'sqrt'`

Binning algorithm, specified as one of the values in this table.

Value	Description
`'auto'`	The default `'auto'` algorithm chooses a bin width to cover the data range and reveal the shape of the underlying distribution.
`'scott'`	Scott's rule is optimal if the data is close to being normally distributed, but is also appropriate for most other distributions. It uses a bin width of `3.5std(X(:))numel(X)^(-1/3)`.
`'fd'`	The Freedman-Diaconis rule is less sensitive to outliers in the data, and may be more suitable for data with heavy-tailed distributions. It uses a bin width of `2IQR(X(:))numel(X)^(-1/3)`, where `IQR` is the interquartile range of `X`.
`'integers'`	The integer rule is useful with integer data, as it creates a bin for each integer. It uses a bin width of 1 and places bin edges halfway between integers. To prevent from accidentally creating too many bins, a limit of 65536 bins (2¹⁶) can be created with this rule. If the data range is greater than 65536, then wider bins are used instead.
`'sturges'`	Sturges' rule is a simple rule that is popular due to its simplicity. It chooses the number of bins to be `ceil(1 + log2(numel(X)))`.
`'sqrt'`	The Square Root rule is another simple rule widely used in other software packages. It chooses the number of bins to be `ceil(sqrt(numel(X)))`.

This option does not apply to categorical data.

Example: [N,edges] = histcounts(X,'BinMethod','integers') uses bins centered on integers.

`'BinWidth'` — Width of bins
scalar

Width of bins, specified as a scalar. If you specify BinWidth, then histcounts can use a maximum of 65,536 bins (or 2¹⁶). If the specified bin width requires more bins, then histcounts uses a larger bin width corresponding to the maximum number of bins.

This option does not apply to categorical data.

Example: [N,edges] = histcounts(X,'BinWidth',5) uses bins with a width of 5.

`'Normalization'` — Type of normalization
`'count'` (default) | `'probability'` | `'countdensity'` | `'pdf'` | `'cumcount'` | `'cdf'`

Type of normalization, specified as one of the values in this table.

Value	Description
`'count'`	Default normalization scheme. Each `N` value is equal to the number of observations in the bin, and `sum(N)` is equal to `numel(X)`. For categorical data, each `N` value is the number of elements in each category, and `sum(N)` is equal to `numel(C)` or `sum(ismember(C(:),Categories))`.
`'probability'`	Each `N` value is equal to the relative number of observations in the bin, (number of observations in bin / total number of observations), such that `sum(N)` is `1`. For categorical data, each `N` value is, (number of elements in category / total number of elements in all categories), such that `sum(N)` is `1`.
`'countdensity'`	Each `N` value is equal to the number of observations in the bin divided by the bin width. For categorical data, this is the same as `'count'`.
`'pdf'`	Probability density function estimate. Each `N` value is equal to the relative number of observations in the bin divided by the bin width. For categorical data, this is the same as `'probability'`.
`'cumcount'`	Each `N` value is equal to the cumulative number of observations in the bin and all previous bins. `N(end)` is equal to `numel(X)`. For categorical data, each `N` value is equal to the cumulative number of elements in each category and all previous categories. `N(end)` is equal to `numel(C)` or `sum(ismember(C(:),Categories))`.
`'cdf'`	Cumulative density function estimate. Each `N` value is equal to the cumulative relative number of observations in the bin and all previous bins. `N(end)` is equal to `1`. For categorical data, each `N` value is equal to the cumulative relative number of observations in each category and all previous categories. `N(end)` is equal to `1`.

Example: [N,edges] = histcounts(X,'Normalization','pdf') bins the data using the probability density function estimate.

Output Arguments

collapse all

`N` — Bin counts
row vector

Bin counts, returned as a row vector.

`edges` — Bin edges
vector

Bin edges, returned as a vector. edges(1) is the left edge of the first bin, and edges(end) is the right edge of the last bin.

`bin` — Bin indices
array

Bin indices, returned as an array of the same size as X. Each element in bin describes which numbered bin contains the corresponding element in X.

A value of 0 in bin indicates an element which does not belong to any of the bins (for example, a NaN value).

`Categories` — Categories included in count
cell vector of character vectors

Categories included in count, returned as a cell vector of character vectors. Categories contains the categories in C that correspond to each count in N.

More About

collapse all

Tall Array Support

This function supports tall arrays with the limitations:

Some input options are not supported. The allowed options are:
- 'BinWidth'
- 'BinLimits'
- 'Normalization'
- 'BinMethod' — The 'auto' and 'scott' bin methods are the same. The 'fd' bin method is not supported.

For more information, see Tall Arrays.

Tips

The behavior of histcounts is similar to that of the discretize function. Use histcounts to find the number of elements in each bin. On the other hand, use discretize to find which bin each element belongs to (without counting).

Replace Discouraged Instances of hist and histc

Documentation

histcounts

Syntax

Description

Examples

Bin Counts and Bin Edges

Specify Number of Bins

Specify Bin Edges

Normalized Bin Counts

Determine Bin Placement

Categorical Bin Counts

Input Arguments

`X` — Data to distribute among bins
vector | matrix | multidimensional array

`C` — Categorical data
categorical array

`nbins` — Number of bins
positive integer

`edges` — Bin edges
vector

`Categories` — Categories included in count
all categories (default) | cell vector of character vectors | categorical vector

Name-Value Pair Arguments

`'BinLimits'` — Bin limits
two-element vector

`'BinMethod'` — Binning algorithm
`'auto'` (default) | `'scott'` | `'fd'` | `'integers'` | `'sturges'` | `'sqrt'`

`'BinWidth'` — Width of bins
scalar

`'Normalization'` — Type of normalization
`'count'` (default) | `'probability'` | `'countdensity'` | `'pdf'` | `'cumcount'` | `'cdf'`

Output Arguments

`N` — Bin counts
row vector

`edges` — Bin edges
vector

`bin` — Bin indices
array

`Categories` — Categories included in count
cell vector of character vectors

More About

Tall Array Support

Tips

See Also

Introduced in R2014b

MATLAB Documentation

Other Documentation

Support

Documentation

histcounts

Syntax

Description

Examples

Bin Counts and Bin Edges

Specify Number of Bins

Specify Bin Edges

Normalized Bin Counts

Determine Bin Placement

Categorical Bin Counts

Input Arguments

X — Data to distribute among binsvector | matrix | multidimensional array

C — Categorical datacategorical array

nbins — Number of binspositive integer

edges — Bin edgesvector

Categories — Categories included in countall categories (default) | cell vector of character vectors | categorical vector

Name-Value Pair Arguments

'BinLimits' — Bin limitstwo-element vector

'BinMethod' — Binning algorithm'auto' (default) | 'scott' | 'fd' | 'integers' | 'sturges' | 'sqrt'

'BinWidth' — Width of binsscalar

'Normalization' — Type of normalization'count' (default) | 'probability' | 'countdensity' | 'pdf' | 'cumcount' | 'cdf'

Output Arguments

N — Bin countsrow vector

edges — Bin edgesvector

bin — Bin indicesarray

Categories — Categories included in countcell vector of character vectors

More About

Tall Array Support

Tips

See Also

Introduced in R2014b

MATLAB Documentation

Other Documentation

Support

`X` — Data to distribute among bins
vector | matrix | multidimensional array

`C` — Categorical data
categorical array

`nbins` — Number of bins
positive integer

`edges` — Bin edges
vector

`Categories` — Categories included in count
all categories (default) | cell vector of character vectors | categorical vector

`'BinLimits'` — Bin limits
two-element vector

`'BinMethod'` — Binning algorithm
`'auto'` (default) | `'scott'` | `'fd'` | `'integers'` | `'sturges'` | `'sqrt'`

`'BinWidth'` — Width of bins
scalar

`'Normalization'` — Type of normalization
`'count'` (default) | `'probability'` | `'countdensity'` | `'pdf'` | `'cumcount'` | `'cdf'`

`N` — Bin counts
row vector

`edges` — Bin edges
vector

`bin` — Bin indices
array

`Categories` — Categories included in count
cell vector of character vectors