
Data Mining II
Code
200029
Academic unit
NOVA Information Management School
Credits
7.5
Teacher in charge
Leonardo Vanneschi
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Objectives
The main objective of this curricular unit is introducing the main concepts and methods of supervised Machine Learning. More specifically, we will study the creation of predictive models by means of Decision Trees, Artificial Neural Networks, Genetic Programming, and Support Vector Machines.
Prerequisites
No requirement.
Subject matter
1. Introduction to Machine Learning- The concept of learning. Learing a function.
- Concept of generalization. Training set e test set.
- Supervised and unsupervised learning.
- Classification and clustering.
- Performance of a classifier. Data splitting. Crossvalidation and its variants. Precision e Recall. F-measure. K-statistic.
- The concept of feature. Feature selection.
2. Decision Trees
- General Functioning of the method
- Examples of application
3. Neural Networks
- Introduction
- Perceptron:
- One neuron model
- Perceptron Learning Rule.
- Convergence theorem of Perceptron.
- Main activation functions.
- Adaline:
- general structure
- Delta rule. The concept of gradient descent.
- Linearly separable and non-linearly separable problems.
- Layers of hidden neurons.
- Theorem of Universal Approximation.
- Backpropagation
- Ciclic or recursive Neural Networks:
- Jordan Networks
- Elman Networks
- Hopfield Networks (the concept of associative memory, Hebb learning rule).
- Examples of application
4. Support Vector Machines
- General functioning
- Kernel functions
- Examples of application
5. Genetic Programming
- Representation of solutions and principal differences with Genetic Algorithms.
- Genetic Operators
- Fitness Calculation
- Property of Closure and Sufficiency
- Steady State.
- Automatically Defined Functions (ADF).
- GP Benchmarks (even parity, multiplexer, symbolic regression, artificial ant on the Santa Fe trail).
- Parallel and Distributed Genetic Programming (definition and experimental study).
- Diversity and premature convergence
- Open issues and new trends in GP
- integration of semantic awareness in GP
Bibliography
"Machine Learning" Tom Mitchell McGraw-Hill, 1997; "A Brief Introduction to Neural Networks" D. Kriesel 2007.; "Introduction to Data Mining", Chapter 4 Pang-Ning Tan, Michael Steinbach, and Vipin Kumar 2006.; "An Introduction to Support Vector Machines for Data Mining" Robert Burbidge and Bernard Buxton 2001; "A field guide to genetic programming" Riccardo Poli, William B. Langdon and Nicholas Freitag McPhee, 2008.
Teaching method
Theoretical classes: board + slides; Practical casses: slides + projection of exercises and examples using various software environments.
Evaluation method
20% project number 1, 20% project number 2, 60% final exam.