Reductions
MDToolbox.clusterkcenters
— Methodclusterkcenters(X::AbstractMatrix, kcluster::Int; nReplicates::Int=10) -> F
Perform clustering with K-center algorithm. Input data X
should belong to AbstractMatrix type and its columns corresponds to variables, rows are frames. Also, the number of clusters kcluster
should be specified.
Returns a NamedTuple object F
which contains cluster index for each sample in F.indexOfCluster
, the coordinates of cluster centers in F.center
, the indices of cluster centers in F.indexOfCenter
, and distances of samples from the nearest centers in F.distanceFromCenter
.
Example
julia> using MDToolbox, Plots
julia> X = rand(1000, 2)
julia> F = clusterkcenters(X, 3)
julia> scatter(X[:, 1], X[:, 2], c=F.indexOfCluster)
References
This function uses the method described in
[1] S. Dasgupta and P. M. Long, J. Comput. Syst. Sci. 70, 555 (2005).
[2] J. Sun, Y. Yao, X. Huang, V. Pande, G. Carlsson, and L. J. Guibas, Learning 24, 2 (2009).
MDToolbox.clusterkcenters
— Methodclusterkcenters(ta::AbstractMatrix, kcluster::Int; nReplicates::Int=10) -> F
Perform clustering with K-center algorithm for a TrjArray variable ta
.
MDToolbox.compute_cov
— Methodcompute_cov(ta::AbstractMatrix, lagtime::Int=0) -> cov
Compute a variance-covariance or time-lagged covariance matrix from input data X
Input data X
should belong to AbstractMatrix type and its columns corresponds to variables, rows are frames. Optional input is the lagtime=lagtime
for the calculation of the covariance matrix (default is lagtime=0
).
Example
julia> X = rand(1000, 100)
julia> cov = compute_cov(X)
MDToolbox.rsvd
— Functionrsvd(X::AbstractMatrix; k::Number=10) -> F
Perform the randomized SVD for input data X
. Input data X
should belong to AbstractMatrix type and its columns corresponds to variables, rows are frames. Users can specify the dimension k
of subspace onto which the data is randomly projected (by default k=10
).
Returns a NamedTuple object F
whose members are same as usual SVD, F.V
, F.S
, F.U
.
Example
julia> X = randn(rand(1000, 10))
julia> F = rsvd(X)
References
Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp.
"Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions."
SIAM review 53.2 (2011): 217-288.
MDToolbox.pca
— Methodpca(X::AbstractMatrix; k=dimension) -> F
Perform principal component analysis (PCA). PCA captures degrees of freedom which have the largest variances. Input data X
should belong to AbstractMatrix type and its columns corresponds to variables, rows are frames.
Returns a NamedTuple object F
which contains the principal components in F.projection
, the prncipal modes in the columns of the matrix F.mode
, and the variances of principal components in F.variance
.
If k=dimension
is specified, the randomized SVD is used for reducing memory. This algorithm first project the data into a randomly selected k+2-dimensional space, then PCA is performed in the projected data. See the references for details. Note that if the dimension of X
is larger than 5000, the randomized SVD is forcibly used with k=1000
.
Example
julia> using MDToolbox, Plots, Statistics
julia> X = cumsum(rand(1000, 10))
julia> F = pca(X)
julia> plot(F.projection[:, 1], F.projection[:, 2])
References
Halko, N., Martinsson, P.-G., Shkolnisky, Y. & Tygert, M.
An Algorithm for the Principal Component Analysis of Large Data Sets.
SIAM J. Sci. Comput. 33, 2580–2594 (2011).
MDToolbox.tica
— Functiontica(X::AbstractMatrix, lagtime::Int=1) -> F
This routine performs time-structure based Independent Component Analysis (tICA). tICA captures degrees of freedom which are most important in the sense that their motions are very slow. X
belongs to AbstractMatrix type and its columns corresponds to variables, rows are frames. User should specify the lagtime for the calculation of the time-lagged covariance matrix. Returns a NamedTuple object F
which contains independent components in F.projection
, the independent modes in the columns of the matrix F.mode
, and the eigenvalues in F.variance
.
If the dimension of X
is larger than 5000, the randomized SVD approximation is forcibly used with k=1000
to reduce memory.
Example
julia> using MDToolbox, Plots, Statistics
julia> X = cumsum(rand(1000, 10))
julia> F = tica(X, 10)
julia> plot(F.projection[:, 1], F.projection[:, 2])
References
Naritomi, Y. & Fuchigami, S.
Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis:
The case of domain motions.
The Journal of Chemical Physics 134, 065101 (2011).