## Correlation Percentiles (Scripts) Publisher's description

from Francesco Pozzi

### CORRPERC performs a bootstrap (of size equal to n_iters) on correlation matrices of input variable Y and computes the percentiles corrsperc (according to input perc) of each correlation

CORRPERC performs a bootstrap (of size equal to n_iters) on correlation matrices of input variable Y and computes the percentiles corrsperc (according to input perc) of each correlation. The function also provides the standard deviation corrstd for each correlation.

[corrsperc, corrstd] = corrperc(Y, perc, n_iters) returns a matrix of size (N * (N - 1) / 2)-by-length(perc). See below for further details.

[corrsperc, corrstd] = corrperc(Y, perc, n_iters, 1) returns a matrix of size N-by-N-by-length(perc)

*********************************************************************WHY I NEEDED THIS FUNCTION**

When the number of columns in variable Y is big and the number of iterations n_iters for the bootstrap is high, then in general percentiles can't be computed at once. In fact, on most machines, the RAM memory won't be enough.

For example, let's assume we have a 1000-by-500 matrix Y and we desire to perform a bootstrap based on 5000 iterations. Then we have 500 * (500 - 1) / 2 = 124750 correlations

So, if percentiles had to be computed at once we would need a matrix of size 124750-by-5000 and from this matrix we would be able to extract the desired percentiles. But my machine can't do that! The RAM is not enough. The amount of required memory is far too much.

Then my idea was to compute percentiles for 10000 correlations at a time or so (if you desire to change this parameter you can do so from within the code, by changing the parameter named corrs_per_step). Then I need matrices of size 10000-by-5000 and repeat the computation 13 times ( ---> ceil(124750 / 10000)).

It's slow and not elegant at all, but it works.

*********************************************************************INPUTS**

Input Y is a matrix m-by-n where

m is the number of observations and

n is the number of variables

Input perc is a vector of real numbers in the [0, 100] interval:

0 corresponds to the minimum;

100 corresponds to the maximum;

50 corresponds to the median;

25 and 75 are the first and third quartiles;

10 is the tenth percentile and so on;

[1, 99] corresponds to a 98% Centered Confidence Interval.

Input n_iters is the number of correlation matrices which will be

generated in order to compute the percentiles desired. The higher

the number of iterations the higher the precision for the

estimation of the percentiles. A good - and possibly slow - choice

is -->

n_iters = 1000;

Input matrix3D is a logical variable: if it is 1, then output

corrsperc is stored in a 3D matrix and output corrstd is stored in

a 2D matrix; otherwise corrsperc is stored in a 2D matrix and

output corrstd is stored in a vector.**OUTPUTS**

Output corrsperc is an N-by-N-by-length(perc) matrix, if matrix3D

is 1. Otherwise, it is a (n * (n - 1) / 2)-by-length(perc) matrix.

In the latter case, correlations are selected from the rows of the

upper triangle of the n-by-n correlation matrix. For example, if

the correlation matrix is a 9-by-9 matrix:

a12, a13, a14, a15, a16, a17, a18, a19

a23, a24, a25, a26, a27, a28, a29

a34, a35, a36, a37, a38, a39

a45, a46, a47, a48, a49

a56, a57, a58, a59

a67, a68, a69

a78, a79

a89

then elements will be chosen in the following order:

a12, a13, a14, a15, a16, a17, a18, a19, a23, a24, a25, a26,

a27, a28, a29, a34, a35, a36, a37, a38, a39, a45, a46, a47,

a48, a49, a56, a57, a58, a59, a67, a68, a69, a78, a79, a89

and will be disposed over the columns of corrsperc. Each column

represents correlation percentiles according to perc.

Output corrstd is an estimate of the standard deviation regarding

each correlation. The estimate is the more accurate the higher the

value of n_iters and the lower the value of corrs_per_step. If

matrix3D is 1, corrstd is a N-by-N symmetric matrix; otherwise

corrstd is a vector of length (N * (N - 1) / 2).

*******************************************************************

% **Example**

T = 1000;

N = 100;

Y = cumsum(randn(T, N));

perc = [0:100];

n_iters = 250;

[corrsperc, corrstd] = corrperc(Y, perc, n_iters);

% Look at this: 96% Centered Confidence Intervals are approximately

% four times the standard deviations. Cool!

plot((corrsperc(:, 99) - corrsperc(:, 3)) / 2, 2 * corrstd, '.')

#### System Requirements:

MATLAB 7 (R14)**Program Release Status:**New Release

**Program Install Support:**Install and Uninstall