ST3SML-Statistical Data Science and Machine Learning
Module Provider: Mathematics and Statistics
Number of credits: 10 [5 ECTS credits]
Level:6
Terms in which taught: Spring term module
Pre-requisites: ST1PS Probability and Statistics or ST2PS Probability and Statistics and MA1MSP Mathematical and Statistical Programming or MA2MPR Mathematical Programming or MA2NA1 Numerical Analysis I
Non-modular pre-requisites:
Co-requisites:
Modules excluded:
Current from: 2019/0
Email: m.f.baksh@reading.ac.uk
Type of module:
Summary module description:
The topics of Data Science, Machine Learning and Artificial Intelligence have recently become part of the public consciousness, in part due to their successful application in industry (most notably at large technology companies). Many of the most successful techniques used in these fields are underpinned by statistical techniques. This module begins by covering some of these underpinning techniques, and shows how they may be applied to problems in Data Science (for inference in implicit models) and Machine Learning (for classification).
Aims:
This module aims to give students a solid understanding of the types of methods that are used in Machine Learning, and the ability to implement and use some of them. It also aims to connect students with research being conducted in this area.
Assessable learning outcomes:
By the end of the module it is expected that the student will be able to:
- use and explain underpinning statistical methods for Data Science and Machine Learning;
- produce software implementation of the methods taught in the module;
- use approximate Bayesian computation and classification techniques to analyse data.
Additional outcomes:
The student will also gain experience of reading the scientific literature and learning about current research.
Outline content:
The module will begin with an introduction to Data Science, Machine Learning and Artificial Intelligence, then describe the ideas that underpin the statistical approach to these topics (maximum likelihood, Bayesian models and Bayesian inference using Monte Carlo). This leads to a topic in Data Science that has recently become an area of interest to the research community: the use of “approximate Bayesian computation” (ABC) for inference in “implicit” models (statistical models that are defined by a black box simulator). The module then switches attention to Machine Learning, covering the topics of regression and classification, including: linear and logistic regression; simple classifiers and neural networks.
Brief description of teaching and learning methods:
The core material will be delivered in 15 lectures. These will be supported by material from the book “Bayesian Reasoning and Machine Learning” that is freely available online at http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.Online, along with some accessible sections of research articles, and blog posts. This range of sources will be used to give students exposure to the way a Data Scientist working in industry or academia would learn their subject. This will provide students who are interested in the area a path to explore the subject more widely, whilst being supported by being provided with an easy-to-follow path through the material.
There will be 5 practical PC lab sessions spread in between the lectures. Each will give the students the chance to learn to code up concepts covered in the lectures. The concepts will be covered initially in practical sessions, and simply treated as algorithms, in advance of being covered in the lectures, where the underpinning ideas will be explained. The aim of this is to give the students an understanding of the purpose of the methods before they encounter the mathematics.
There will be one assignment, handed out at the beginning of the module, and due in at the end. The assignment will consist of 5 different problems that one will need to use software implementations of the algorithms in the module in order to solve. Each of the 5 PC labs will cover a problem that is very close to one given in the assignment, in order to motivate students to attend the PC labs, and engage with the module as it is progressing.
Additional support with programming will be offered where required.
Autumn | Spring | Summer | |
Lectures | 15 | ||
Practicals classes and workshops | 5 | ||
Guided independent study: | 80 | ||
Total hours by term | 100 | ||
Total hours for module | 100 |
Method | Percentage |
Written exam | 70 |
Set exercise | 30 |
Summative assessment- Examinations:
One exam, 2 hours
Summative assessment- Coursework and in-class tests:
One assignment, with questions that are related to content covered in practicals.
Formative assessment methods:
Feedback given during practicals.
Penalties for late submission:
The Module Convener will apply the following penalties for work submitted late:
The University policy statement on penalties for late submission can be found at: http://www.reading.ac.uk/web/FILES/qualitysupport/penaltiesforlatesubmission.pdf
You are strongly advised to ensure that coursework is submitted by the relevant deadline. You should note that it is advisable to submit work in an unfinished state rather than to fail to submit any work.
Assessment requirements for a pass:
A mark of 40% overall.
Reassessment arrangements:
One examination paper of 2 hours duration in August/September - the resit module mark will be the higher of the exam mark (100% exam) and the exam mark plus previous coursework marks (70% exam, 30% coursework).
Additional Costs (specified where applicable):
Cost | Amount |
Required text books |
|
Specialist equipment or materials |
|
Specialist clothing, footwear or headgear |
|
Printing and binding |
|
Computers and devices with a particular specification |
|
Travel, accommodation and subsistence |
|
Last updated: 24 September 2019
THE INFORMATION CONTAINED IN THIS MODULE DESCRIPTION DOES NOT FORM ANY PART OF A STUDENT'S CONTRACT.