Modern methods of discovering patterns in large-scale databases are introduced, including classification, clustering and association rules analysis. These are contrasted with more traditional methods of finding information from data, such as data queries. Data pre-processing methods for dealing with noisy and missing data and with dimensionality reduction are reviewed. Hands-on case studies in building data mining models are performed using a popular software package.
At the completion of this unit students will:
- be able to differentiate between supervised and unsupervised learning;
- know how to apply the main techniques for supervised and unsupervised learning;
- know how to use statistical methods for evaluating data mining models;
- be able to perform data pre-processing for data with outliers, incomplete and noisy data;
- be able to extract and analyse patterns from data using a data mining tool;
- have an understanding of the difference between discovery of hidden patterns and simple query extractions in a dataset;
- have an understanding of the different methods available to facilitate discovery of hidden patterns in a dataset;
- have developed the ability to preprocess data in preparation for data mining experiments;
- have developed the ability to evaluate the quality of data mining models;
- be able to appreciate the need to have representative sample input data to enable learning of patterns embedded in population data;
- be able to appreciate the need to provide quality input data to produce useful data mining models;
- have acquired the skill to use the common features in data mining tools;
- have acquired the skill to use the visualisation features in a data mining tools to facilitate knowledge discovery from a data set;
- have acquired the skill to compare data mining models based on the results on a set of performance criteria;
- be able to work in a team to extract knowledge from a common data set using different data mining methods and techniques.
Examination (3 hours): 60%; In-semester assessment: 40%
Sound fundamental knowledge in maths and statistics. Basic database and computer programming knowledge.