ST3SML-Statistical Data Science and Machine Learning
Module Provider: Mathematics and Statistics
Number of credits: 10 [5 ECTS credits]
Level:6
Terms in which taught: Spring term module
Pre-requisites: MA1MSP Mathematical and Statistical Programming and ST1PS Probability and Statistics or MA1MPRNU Mathematical Programming and ST1PSNU Probability and Statistics
Non-modular pre-requisites:
Co-requisites:
Modules excluded:
Current from: 2023/4
Module Convenor: Dr Fazil Baksh
Email: m.f.baksh@reading.ac.uk
Type of module:
Summary module description:
The topics of Data Science, Machine Learning and Artificial Intelligence have recently become part of the public consciousness, in part due to their successful application in industry (most notably at large technology companies). Many of the most successful techniques used in these fields are underpinned by statistical techniques. This module begins by covering some of these underpinning techniques, and shows how they may be applied to problems in Data Science and Machine Learning.
Aims:
This module aims to give students a solid understanding of the types of methods that are used in Statistical Machine Learning, and the ability to implement and use some of them. It also aims to connect students with research being conducted in this area.
Assessable learning outcomes:
By the end of the module it is expected that the student will be able to:
- use and explain underpinning statistical methods for Data Science and Machine Learning;
- produce software implementation of the methods taught in the module;
- use statistical learning tools to build and evaluate algorithms for supervised learning.
Additional outcomes:
The student will also gain experience of reading the scientific literature and learning about current research.
Outline content:
The module will begin with an introduction to Data Science, Machine Learning and Artificial Intelligence, then describe the ideas that underpin the statistical approach to these topics. The module focuses on Machine Learning, covering the topics of regression and classification, including: linear and logistic regression; linear and quadratic discriminant analysis; resampling methods; model selection and regularisation; ridge regression; lasso; dimension reduction methods; principal components regression; partial least squares; high dimensional problems; regression splines; generalised additive models; tree-based methods; bagging; stacking; random forests; boosting; neural networks and deep learning; support vector machines.
Brief description of teaching and learning methods:
The core material will be delivered in 16 lectures. These will be supported by material from the book "An Introduction to Statistical Learning with Applications in R" that is freely available online along with research articles, and blog posts. This range of sources will be used to give students exposure to the way a Data Scientist working in industry or academia would learn their subject. This will provide students who are interested in the area a path to explore the subj ect more widely, whilst being supported by being provided with an easy-to-follow path through the material.
There will be 4 practical PC lab sessions spread in between the lectures. Each will give the students the chance to learn to code up concepts covered in the lectures.
There will be one assignment, handed out at the beginning of the module, and due in at the end. The assignment will consist of problems that one will need to use software imp lementations of the algorithms in the module in order to solve. PC labs will cover problems that are very close to those given in the assignment, in order to motivate students to attend the PC labs, and engage with the module as it progresses.
Additional support with programming will be offered where required.
Autumn | Spring | Summer | |
Lectures | 16 | ||
Practicals classes and workshops | 4 | ||
Guided independent study: | 80 | ||
Total hours by term | 0 | 0 | |
Total hours for module | 100 |
Method | Percentage |
Written exam | 70 |
Set exercise | 30 |
Summative assessment- Examinations:
One exam, 2 hours.
The examination for this module will require a narrowly defined time window and is likely to be held in a dedicated exam venue.
Summative assessment- Coursework and in-class tests:
One assignment, with questions that are related to content covered in practicals.
Formative assessment methods:
Feedback given during practicals.
Penalties for late submission:
The Support Centres will apply the following penalties for work submitted late:
- where the piece of work is submitted after the original deadline (or any formally agreed extension to the deadline): 10% of the total marks available for that piece of work will be deducted from the mark for each working day (or part thereof) following the deadline up to a total of five working days;
- where the piece of work is submitted more than five working days after the original deadline (or any formally agreed extension to the deadline): a mark of zero will be recorded.
You are strongly advised to ensure that coursework is submitted by the relevant deadline. You should note that it is advisable to submit work in an unfinished state rather than to fail to submit any work.
Assessment requirements for a pass:
A mark of 40% overall.
Reassessment arrangements:
One examination paper of 2 hours duration in August/September - the resit module mark will be the higher of the exam mark (100% exam) and the exam mark plus previous coursework marks (70% exam, 30% coursework).
Additional Costs (specified where applicable):
1) Required text books: None
2) Specialist equipment or materials: None
3) Specialist clothing, footwear or headgear: None
4) Printing and binding: None
5) Computers and devices with a particular specification: None
6) Travel, accommodation and subsistence: None
Last updated: 30 March 2023
THE INFORMATION CONTAINED IN THIS MODULE DESCRIPTION DOES NOT FORM ANY PART OF A STUDENT'S CONTRACT.