Title: | Correlation Classification Method |
---|---|
Description: | Classification method described in Dancik et al (2011) <doi:10.1158/0008-5472.CAN-11-2427> that classifies a sample according to the class with the maximum mean (or any other function of) correlation between the test and training samples with known classes. |
Authors: | Garrett M. Dancik and Yuanbin Ru |
Maintainer: | Garrett M. Dancik <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2 |
Built: | 2025-02-20 04:28:30 UTC |
Source: | https://github.com/gdancik/ccm |
Classification method that classifies an observation based on its correlation with observations having known class labels. There are two main functions. The function create.CCM
creates a correlation matrix of correlations between training and test samples. Both Pearson's and Spearman's rank-based correlations are supported. The function predict.CCM
assigns class labels to test observations according to the class that has the highest mean correlation by default. However, any (user-defined) function in addition to the mean (e.g., median, max) can be specified.
For a complete list of functions, use 'library(help="CCM")'
Package: | CCM |
Type: | Package |
Version: | 1.2 |
Date: | 2018-04-05 |
License: | GPL(>=2) |
LazyLoad: | yes |
Garrett M. Dancik and Yuanbin Ru
Maintainer: Garrett M. Dancik <[email protected]>
create.CCM
;
predict.CCM
;
plot.CCM
## load data ## data(data.expr) data(data.gender) ## check within class correlations ## ## outliers may be caused by poor quality ## ## observations or may indicate CCM is not appropriate ## K = cor.by.class(data.expr, data.gender) ## visualize the results ## boxplot(K, xlab = "gender") ## split dataset into training / testing ## train.expr = data.expr[,1:20] test.expr = data.expr[,21:40] train.gender = data.gender[1:20] test.gender = data.gender[21:40] ## CCM using spearman correlation ## K = create.CCM(test.expr, train.expr, method = "spearman") ## predict based on the class with the highest mean correlation (the default) ## p = predict(K, train.gender) table(pred = p, true = test.gender) # check accuracy ## plot correlations for the 3rd observation ## plot(K, train.gender, index = 3, main = "correlations for obs #3", xlab = "gender", ylab = "correlation")
## load data ## data(data.expr) data(data.gender) ## check within class correlations ## ## outliers may be caused by poor quality ## ## observations or may indicate CCM is not appropriate ## K = cor.by.class(data.expr, data.gender) ## visualize the results ## boxplot(K, xlab = "gender") ## split dataset into training / testing ## train.expr = data.expr[,1:20] test.expr = data.expr[,21:40] train.gender = data.gender[1:20] test.gender = data.gender[21:40] ## CCM using spearman correlation ## K = create.CCM(test.expr, train.expr, method = "spearman") ## predict based on the class with the highest mean correlation (the default) ## p = predict(K, train.gender) table(pred = p, true = test.gender) # check accuracy ## plot correlations for the 3rd observation ## plot(K, train.gender, index = 3, main = "correlations for obs #3", xlab = "gender", ylab = "correlation")
Finds within class correlations between samples of each class type, which is useful for identifying extreme observations and assessing whether CCM is appropriate for classification.
cor.by.class(x, y, method = "pearson", use = "complete")
cor.by.class(x, y, method = "pearson", use = "complete")
x |
data matrix with variables in rows and samples in columns |
y |
classes corresponding to the columns of |
method |
the type of correlation to use, either 'pearson' (the default) or 'spearman' |
use |
instructions for handling missing values. See details and |
Calculates correlations between each pair of observations within each class. The correlation between an observation and itself is ignored.
The default correlation is the Pearson product moment correlation. If method
is 'spearman', then the Spearman's rank correlation is used, which is the Pearson correlation calculated using the ranks of the data.
Correlations are calculated class-wise on the matrix of observations of each class separately. Therefore, missing values may be handled differently for different classes.
A list with each element a vector of correlations between samples of a different class.
Garrett M. Dancik and Yuanbin Ru
data(data.expr) data(data.gender) K = cor.by.class(data.expr, data.gender) ## visualize the results ## boxplot(K, xlab = "gender")
data(data.expr) data(data.gender) K = cor.by.class(data.expr, data.gender) ## visualize the results ## boxplot(K, xlab = "gender")
Creates a CCM correlation matrix that calculates correlations between test and training samples or between each test sample
create.CCM(X.test, X.train = NULL, method = "pearson", use = "everything", verbose = 1)
create.CCM(X.test, X.train = NULL, method = "pearson", use = "everything", verbose = 1)
X.test |
matrix of test samples with rows containing variables and columns containing observations |
X.train |
optional matrix of training samples with rows containing variables and columns containing observations |
method |
the type of correlation to use, either 'pearson' (the default) or 'spearman' |
use |
instructions for handling missing values. See details and |
verbose |
A value of '0' will suppress output |
The default correlation is the Pearson product moment correlation. If method
is 'spearman', then the Spearman's rank correlation is used, which is the Pearson correlation calculated using the ranks of the data.
If X.train
is NULL then correlations are calculated between each column of X.test
, but the correlation between a sample and itself will be assigned the value NA.
When X.train
is specified, correlations are calculated sequentially between each test observation and all training observations using apply
. Note that if missing values are present, they may be handled differently for different test samples.
A CCM correlation matrix with element (i,j) as follows: if X.train
is not NULL, then the correlation between the i(th) test sample and the j(th) training sample; otherwise the correlation between the i(th) and j(th) test samples, with NA along the diagonal.
Garrett M. Dancik and Yuanbin Ru
cor
, the function used to calculate correlations;
predict.CCM
, for classification based on the CCM matrix;
plot.CCM
for plotting correlations of test samples
data(data.expr) data(data.gender) ## split dataset into training / testing ## train.expr = data.expr[,1:20] test.expr = data.expr[,21:40] train.gender = data.gender[1:20] test.gender = data.gender[21:40] ## CCM using spearman correlation ## K = create.CCM(test.expr, train.expr, method = "spearman") ## predict based on the class with the highest mean correlation (the default) ## p = predict(K, train.gender) table(pred = p, true = test.gender) # check accuracy ## CCM using pearson correlation ## K = create.CCM(test.expr, train.expr, method = "pearson") ## predict based on the class with the maximum correlation p = predict(K, train.gender, func = max) table(pred = p, true = test.gender) # check accuracy ### leave-one-out cross validation on entire dataset ### K = create.CCM(data.expr, method = "spearman") p = predict(K, data.gender) table(pred = p, true = data.gender) # check accuracy
data(data.expr) data(data.gender) ## split dataset into training / testing ## train.expr = data.expr[,1:20] test.expr = data.expr[,21:40] train.gender = data.gender[1:20] test.gender = data.gender[21:40] ## CCM using spearman correlation ## K = create.CCM(test.expr, train.expr, method = "spearman") ## predict based on the class with the highest mean correlation (the default) ## p = predict(K, train.gender) table(pred = p, true = test.gender) # check accuracy ## CCM using pearson correlation ## K = create.CCM(test.expr, train.expr, method = "pearson") ## predict based on the class with the maximum correlation p = predict(K, train.gender, func = max) table(pred = p, true = test.gender) # check accuracy ### leave-one-out cross validation on entire dataset ### K = create.CCM(data.expr, method = "spearman") p = predict(K, data.gender) table(pred = p, true = data.gender) # check accuracy
Simulated gene expression data
data(data.expr)
data(data.expr)
A 100x40 (probe x sample) data matrix of gene expression data
data(data.expr) data(data.gender)
data(data.expr) data(data.gender)
Simulated gender data that corresponds with data.expr
data(data.gender)
data(data.gender)
A 40 element vector containing simulated gender information as a factor (M/F), corresponding to columns of data.expr
data(data.expr) data(data.gender)
data(data.expr) data(data.gender)
Constructs a boxplot of correlations by class for a test sample
## S3 method for class 'CCM' plot(x, y, index, no.plot, ...)
## S3 method for class 'CCM' plot(x, y, index, no.plot, ...)
x |
a CCM correlation matrix as obtained from |
y |
classes corresponding to the training samples (columns) of 'K' |
index |
the test sample to include in the plot, corresponding to the row of 'K'. |
no.plot |
if TRUE then plotting is turned off and a list of correlations is returned |
... |
additional arguments for |
This function generates a boxplot of correlations between the training samples and a specific test sample by class
if no.plot
is TRUE, then a list of correlations by class
Garrett M. Dancik and Yuanbin Ru
boxplot
;
create.CCM
for creating the CCM correlation matrix
## load data ## data(data.expr) data(data.gender) ## split dataset into training / testing ## train.expr = data.expr[,1:20] test.expr = data.expr[,21:40] train.gender = data.gender[1:20] test.gender = data.gender[21:40] ## CCM using spearman correlation ## K = create.CCM(test.expr, train.expr, method = "spearman") ## plot correlations for the 3rd observation ## plot(K, train.gender, index = 3, main = "correlations for obs #3", xlab = "gender", ylab = "correlation")
## load data ## data(data.expr) data(data.gender) ## split dataset into training / testing ## train.expr = data.expr[,1:20] test.expr = data.expr[,21:40] train.gender = data.gender[1:20] test.gender = data.gender[21:40] ## CCM using spearman correlation ## K = create.CCM(test.expr, train.expr, method = "spearman") ## plot correlations for the 3rd observation ## plot(K, train.gender, index = 3, main = "correlations for obs #3", xlab = "gender", ylab = "correlation")
Classification as a function of a CCM correlation matrix that contains the correlations between test and training samples
## S3 method for class 'CCM' predict(object, y, func = mean, ret.scores = FALSE, ...)
## S3 method for class 'CCM' predict(object, y, func = mean, ret.scores = FALSE, ...)
object |
a CCM correlation matrix object obtained from |
y |
classes corresponding to the training samples (columns) of ‘object’ |
func |
the function that determines how a test sample is classified, defaulting to mean. See details. |
ret.scores |
If set to TRUE then a matrix of results by class are returned (see details); otherwise a vector of classifications/predictions is returned (the default) |
... |
Additional arguments to |
The function func
can be any R function whose first argument is a vector of correlations (x). The CCM assigns each test sample the class that maximizes func(x). If func
is mean (the default), the classification is the class with the highest mean correlation. Other useful values for func
include median and max.
If ret.scores
is TRUE, then a matrix of results by class is returned, where the i(th) column corresponds to the i(th) test sample and each row corresponds to a possible class. Entry (i,j) contains func(x), where x
is a vector of correlations between the i(th) test sample and all training samples with the class in row j.
The test sample classifications as a vector or a matrix of results by class.
If the max function is used for func
, then the CCM is identical to a 1-nearest neighbor classifier with distance = 1 - r, where 'r' is the correlation (pearson or spearman) specified in the call to create.CCM
Garrett M. Dancik and Yuanbin Ru
data(data.expr) data(data.gender) ## split dataset into training / testing ## train.expr = data.expr[,1:20] test.expr = data.expr[,21:40] train.gender = data.gender[1:20] test.gender = data.gender[21:40] ## CCM using spearman correlation ## K = create.CCM(test.expr, train.expr, method = "spearman") ## predict based on the class with the highest mean correlation (the default) ## p = predict(K, train.gender) table(pred = p, true = test.gender) # check accuracy ## CCM using pearson correlation ## K = create.CCM(test.expr, train.expr, method = "pearson") ## predict based on the class with the maximum correlation p = predict(K, train.gender, func = max) table(pred = p, true = test.gender) # check accuracy
data(data.expr) data(data.gender) ## split dataset into training / testing ## train.expr = data.expr[,1:20] test.expr = data.expr[,21:40] train.gender = data.gender[1:20] test.gender = data.gender[21:40] ## CCM using spearman correlation ## K = create.CCM(test.expr, train.expr, method = "spearman") ## predict based on the class with the highest mean correlation (the default) ## p = predict(K, train.gender) table(pred = p, true = test.gender) # check accuracy ## CCM using pearson correlation ## K = create.CCM(test.expr, train.expr, method = "pearson") ## predict based on the class with the maximum correlation p = predict(K, train.gender, func = max) table(pred = p, true = test.gender) # check accuracy