AMS 380, Data Mining
Catalog Description:
This course will teach the basic ingredients of classical and contemporary statistical
data mining methods, including dimension reduction, model selection, pattern recognition,
and predictive modeling using traditional general linear models and generalized linear
models, and modern statistical learning methods, such as decision trees, random forests,
neural networks, etc. The course will teach how to employ and implement these methods.
Prerequisite: AMS 210 or MAT 211; and AMS 311
3 credits
Offered initially spring 2021; thereafter, spring, summer and fall.
Course Materials for Spring 2025:
Required:
"Learning from Data" by Y.S. Abu-Mostafa, M. Magdon-Ismail, and H.T. Lin (LFD)
http://amlbook.com/
https://www.amazon.com/Learning-Data-Yaser-S-Abu-Mostafa/dp/1600490069
e-Chapters at https://amlbook.com/eChapters.html
Recommended:
"Pattern Recognition and Machine Learning" by Christopher Bishop (PRML)
PDF at https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-
Recognitionand-Machine-Learning-2006.pdf
"An Introduction to Statistical Learning with Python" by Gareth James, Daniela Witten,
Trevor Hastie,
Robert Tibshirani (2023 Edition) (ISL-P)
Available here: https://www.statlearning.com
"Dive into Deep Learning" by Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola.
(D2L)
PDF at https://d2l.ai/d2l-en.pdf
*******************************
Course Materials for Fall 2024:
Required:
"An Introduction to Statistical Learning (with Applications in Python)" by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani; Springer Publishing, 1st printing July 5, 2023; ISBN13: 9783031387463. This comprehensive resource is essential for understanding statistical learning techniques and their implementation in Python. It is available at https://www.statlearning.com.
"Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow" (3rd edition) by Aurélien Geron; published by O'Reilly on November 8, 2022; ISBN: 978-1098125974. This book is highly recommended for students interested in practical machine learning projects using popular Python libraries. It serves as an excellent companion for applying concepts learned in class.
Recommended Materials:
"Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares" by Stephen Boyd and Lieven Vandenberghe, Cambridge University Press, 1st edition, Published 2018; ISBN: 978-1316518960. This textbook provides a solid foundation in applied linear algebra, crucial for machine learning applications. The material, along with slides and video lectures, can be found here: https://web.stanford.edu/~boyd/vmls/.
SYLLABUS
- Some basic statistical tests
- Linear regression and classic variable selection
- Regularized linear regression
- General linear model
- Cluster analysis
- Principle Component Analysis
- Statistical Resampling methods
- Random Forests
- Neural Networks
Learning Outcomes for AMS 380, Data Mining:
1) Demonstrate understanding of classical and contemporary data mining methods including:
*Dimension reduction;
*Variable selection;
*Pattern recognition.
2) Demonstrate understanding of predictive modeling using:
*Traditional linear models;
*Generalized linear models.
3) Demonstrate understanding of modern statistical learning models, including:
*Classification and regression trees;
*Random forests;
*Neural networks.
4) Demonstrate mastery of using these statistical procedures with the programming
languages:
* Python.