R Code for DDclust and DDclass:

Algorithms described in

"Clustering and Classification based on the L1 data depth"

General information. I have posted this code, feel free to use it. However, please do not redistribute the code without referring to this page, and the source paper.

About the code:

R code - not optimized for speed
Required libraries: MASS, class, cluster
Assist functions
Source in the functions in the files below: Basicfcns.q, Basicfcns2.q
Main programs
The main programs are DDclass.q for classification and DDclust.q for clustering.
Validation and Visualization
ReDplot.q, DDclust.q


Input for DDclass
Required:(nc,data.resptr,data.mattr,data.matte)
nc - the number of classs
data.resptr - the labels of the training set (label range 0 to C-1, where C is the number of classes)

data.mattr - the training data matrix (rows=samples, columns=genes)

data.matte - the test data matrix

nc - number of classes

Output from DDclass
ReDtrain, ReDtest - the relative data depths
Ntest, NtestCV - the predictions
Ntestm, Ntestc (median and centroid prototyp), Ntestk (kNN), Ntests (SILclass), NtestsCV (SILclassCV)
IVT - training observations removed by DDclass CV. IV - removed by SILclass CV.


Input for DDclust
Required:(X,K,lambda,Th,A=20,T0=0,alpha=.9,lplot=0)
K - the number of clusters
X - the data matrix (rows are clustered)
Th - threshold to identify objects that can be relocated, usually set to 0

Optional: A, T0, alpha, lplot
A - number of iterations, default 20
T0 (1/beta) - default 0
alpha - decay rate, default .9
lplot - tracking convergence, default 0

Output from DDclust
NN - cluster assignment, NN[1,] is the final allocation
Y - the multivariate median cluster representatives
Cost - final value of partition

  • Basicfcns.qSome basic functions, cross validation, data depths etc
  • Basicfcns2.qSome basic functions for DDclust
  • DDplot.qData depth plot
  • ReDplot.q/Relaive data depth plot
  • DDclass.qClassification algorithm
  • DDclust.qClustering algorithm
  • Calling the functions:
    example:

    Dout<-DDclass(3,lander.train,lander.traindat,lander.testdat)

    testerror: length(Dout$Ntest[Dout$Ntest!=lander.test])

    Dout<-DDclust(lander.dat,3,.5,0)

    ReD<-ReDplot(t(Dout$NN),lander.dat,3)

    Back to : Rebecka

    05/03