Reductions

MDToolbox.clusterkcentersMethod
clusterkcenters(X::AbstractMatrix, kcluster::Int; nReplicates::Int=10) -> F

Perform clustering with K-center algorithm. Input data X should belong to AbstractMatrix type and its columns corresponds to variables, rows are frames. Also, the number of clusters kcluster should be specified.

Returns a NamedTuple object F which contains cluster index for each sample in F.indexOfCluster, the coordinates of cluster centers in F.center, the indices of cluster centers in F.indexOfCenter, and distances of samples from the nearest centers in F.distanceFromCenter.

Example

julia> using MDToolbox, Plots
julia> X = rand(1000, 2)
julia> F = clusterkcenters(X, 3)
julia> scatter(X[:, 1], X[:, 2], c=F.indexOfCluster)

References

This function uses the method described in
[1] S. Dasgupta and P. M. Long, J. Comput. Syst. Sci. 70, 555 (2005).
[2] J. Sun, Y. Yao, X. Huang, V. Pande, G. Carlsson, and L. J. Guibas, Learning 24, 2 (2009).
source
MDToolbox.clusterkcentersMethod
clusterkcenters(ta::AbstractMatrix, kcluster::Int; nReplicates::Int=10) -> F

Perform clustering with K-center algorithm for a TrjArray variable ta.

source
MDToolbox.compute_covMethod
compute_cov(ta::AbstractMatrix, lagtime::Int=0) -> cov

Compute a variance-covariance or time-lagged covariance matrix from input data X Input data X should belong to AbstractMatrix type and its columns corresponds to variables, rows are frames. Optional input is the lagtime=lagtime for the calculation of the covariance matrix (default is lagtime=0).

Example

julia> X = rand(1000, 100)
julia> cov = compute_cov(X)
source
MDToolbox.rsvdFunction
rsvd(X::AbstractMatrix; k::Number=10) -> F

Perform the randomized SVD for input data X. Input data X should belong to AbstractMatrix type and its columns corresponds to variables, rows are frames. Users can specify the dimension k of subspace onto which the data is randomly projected (by default k=10).

Returns a NamedTuple object F whose members are same as usual SVD, F.V, F.S, F.U.

Example

julia> X = randn(rand(1000, 10))
julia> F = rsvd(X)

References

Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. 
"Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." 
SIAM review 53.2 (2011): 217-288.
source
MDToolbox.pcaMethod
pca(X::AbstractMatrix; k=dimension) -> F

Perform principal component analysis (PCA). PCA captures degrees of freedom which have the largest variances. Input data X should belong to AbstractMatrix type and its columns corresponds to variables, rows are frames.

Returns a NamedTuple object F which contains the principal components in F.projection, the prncipal modes in the columns of the matrix F.mode, and the variances of principal components in F.variance.

If k=dimension is specified, the randomized SVD is used for reducing memory. This algorithm first project the data into a randomly selected k+2-dimensional space, then PCA is performed in the projected data. See the references for details. Note that if the dimension of X is larger than 5000, the randomized SVD is forcibly used with k=1000.

Example

julia> using MDToolbox, Plots, Statistics
julia> X = cumsum(rand(1000, 10))
julia> F = pca(X)
julia> plot(F.projection[:, 1], F.projection[:, 2])

References

Halko, N., Martinsson, P.-G., Shkolnisky, Y. & Tygert, M. 
An Algorithm for the Principal Component Analysis of Large Data Sets. 
SIAM J. Sci. Comput. 33, 2580–2594 (2011).
source
MDToolbox.ticaFunction
tica(X::AbstractMatrix, lagtime::Int=1) -> F

This routine performs time-structure based Independent Component Analysis (tICA). tICA captures degrees of freedom which are most important in the sense that their motions are very slow. X belongs to AbstractMatrix type and its columns corresponds to variables, rows are frames. User should specify the lagtime for the calculation of the time-lagged covariance matrix. Returns a NamedTuple object F which contains independent components in F.projection, the independent modes in the columns of the matrix F.mode, and the eigenvalues in F.variance.

If the dimension of X is larger than 5000, the randomized SVD approximation is forcibly used with k=1000 to reduce memory.

Example

julia> using MDToolbox, Plots, Statistics
julia> X = cumsum(rand(1000, 10))
julia> F = tica(X, 10)
julia> plot(F.projection[:, 1], F.projection[:, 2])

References

Naritomi, Y. & Fuchigami, S. 
Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: 
The case of domain motions. 
The Journal of Chemical Physics 134, 065101 (2011).
source