MIDDLE EAST TECHNICAL UNIVERSITY
DEPT. OF COMPUTER ENGINEERING
CENG 574 STATISTICAL DATA ANALYSIS
Fall 2009
Instructor Volkan Atalay
office A-404 phone 210 4144 vatalay AT metu.edu.tr
Class Frıday 8:40-11:30 (BMB-3)
Office Hour by appointment
Course web page address http://www.ceng.metu.edu.tr/courses/ceng574/
Course Objectives
The objective of this course is to introduce the concepts and techniques of clustering and multivariate and exploratory data analysis. This course also offers an opportunity to perform data analysis by using data visualization and projection. In addition, it allows students to apply these techniques in a specific field, such as bioinformatics.
Prerequisites Knowledge of probability and linear algebra.
Reference Books
W. Härdle and L. Simar (2007) Applied Multivariate Statistical Analysis. Springer.
A. K. Jain and R. C. Dubes (1988) Algorithms for Clustering Data. Prentice Hall.
S. Theodoridis, K. Koutroumbas, (2003) Pattern recognition, 2nd Edition. Academic Press.
B. Everitt, S. Landau, and M. Leese (2001) Cluster analysis. 4th Edition. Edward Arnold Pubs. Ltd.
A. Webb (2002) Statistical Pattern Recognition. Wiley. New York.
E. Alpaydın (2004) Introduction to Machine Learning. The MIT Press.
R. O. Duda, P. E. Hart and D. G. Stork (2001) Pattern Classification (2nd ed.). John Wiley.
Course Outline
1 Input representation, distance metrics and similarity measures
2 Brief review of probability and computational prototyping software tools
3 Linear projections and principal component analysis
4 Multi-dimensional scaling; non-linear projections-Isomap and LLE
5 Clustering, and hierarchical clustering and k-means clustering and their variations
6 Evaluation and validity of clusters, clustering by mixture of Gaussians and EM algorithm
7 Self organizing maps
8 Spectral clustering
9 Manifold learning and clustering
10 Kernel methods and clustering
11 Semi-supervised learning
Grading
Assignments/Quizzes Term Paper Presentations
% 50 20 30
Assignments and term paper and project should be done on individual basis. Remark that Matlab seems to be the most convenient environment to perform computational operations during this course.