Clustering Algorithms and Dimensional Reduction
I am giving my self a crash course on some data mining techniques for a project I am working on. Here are some things I found useful.
In Matteo Matteucci’s site at Politechnico di Milano is a nice little introductory tutorial on clustering algorthms, complete with interactive demos. A similar page is Tariq Rashid’s University of Bristol page
Also François Labelle at McGill has a nice overview of reducing the dimensionality of multivariate data using Principal Component Analysis, also with interactive demos which give a nice intuitive feel for the technique. Mathematica supports principal component analysis, so given a data matrix with the each observation in a row, and each column a dimension I found could do the following to get a nice two dimensional view of the multi-dimensional data:
<<Statistics`MultiDescriptiveStatistics`
rotated = PrincipalComponents[Transpose[data]];
rotated2d = Table[ {rotated〚i,1〛, rotated〚i,2〛}, {i,1,n}];
ListPlot[rotated2d]