CS3DS19-Data Science Algorithms and Tools
Module Provider: School of Mathematical, Physical and Computational Sciences
Number of credits: 10 [5 ECTS credits]
Level:6
Terms in which taught: Spring term module
Pre-requisites:
Non-modular pre-requisites:
Co-requisites:
Modules excluded:
Current from: 2019/0
Email: g.difatta@reading.ac.uk
Type of module:
Summary module description:
Automated data collection and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories. In this context, automated data analysis and data modelling tools and algorithms (Data Mining) are becoming essential components to any information system. Application areas of these techniques include scientific computing, intelligent business, direct marketing, customer relationship management, market segmentation, store shelf management, data warehouse management, fraud detection in e-commerce and in credit card transactions, etc.
Aims:
The study of fundamental techniques and tools for data manipulation and transformation, and for data mining algorithms classification, regression, clustering, association rule mining. In particular, one of the leading platform for Data Science and Machine Learning, KNIME, will be introduced and adopted for practical activities. We will also collaborate with KNIME for embedding Level-1 and Level-2 industrial certification.
This module also encourages students to develop a set of professional skills, such as problem solving, critical analysis of published literature, creativity, technical report writing for technical and non-technical audience, professional communication (email; letters; minutes etc.), self-reflection and effective use of commercial software.
Assessable learning outcomes:
Students are expected to understand the general Data Mining principles and techniques, and to be able to apply them in different contexts. In a practical project a data workflow is designed and developed using advanced tools for data science to combine data mining algorithms and analyse real-world datasets.
Additional outcomes:
Students will become familiar with the potential applications of data mining techniques in different domains. They will also learn how to carry out experimental tests for algorithm performance evaluations.
During this module the students are also offered the opportunity to gain two levels of the KNIME Certification in Data Science.
Outline content:
- Introduction to Data Mining;
- Introduction to Data Science and Machine Learning platforms
- KNIME
- Data preprocessing;
- Proximity measures;
- Regression, Classification and model evaluation;
- Clustering and cluster validity;
- Decision Tree Induction;
- Association Rule Mining;
Brief description of teaching and learning methods:
The module comprises 2 hours of lectures and 2 hours of practical activities per week. The lectures introduce the basic concepts, the tools, and the algorithms used to build Data Science applications. The assessment is based on multiple choice questionnaires and a data science project that allows the students to apply theoretical concepts to a practical case.
Autumn | Spring | Summer | |
Lectures | 20 | ||
Practicals classes and workshops | 16 | ||
Guided independent study: | 64 | ||
Total hours by term | 100 | ||
Total hours for module | 100 |
Method | Percentage |
Written exam | 50 |
Set exercise | 40 |
Class test administered by School | 10 |
Summative assessment- Examinations:
One examination paper of 90 mins.
Summative assessment- Coursework and in-class tests:
- In-class test: A test based on a multiple choice questionnaire (10% of credits).
: this test has been designed to be valid to achieve the KNIME Certification in Data Science Level 1.
- Set exercise: A coursework assignment (40% of credits): part of the coursework has been designed to be valid to achieve the KNIME Certification in Data Science Level 2.
Formative assessment methods:
In-class test: A test based on a multiple choice questionnaire: this test has been designed to be valid to achieve the KNIME Certification in Data Science Level 2.
Penalties for late submission:
The Module Convener will apply the following penalties for work submitted late:
The University policy statement on penalties for late submission can be found at: http://www.reading.ac.uk/web/FILES/qualitysupport/penaltiesforlatesubmission.pdf
You are strongly advised to ensure that coursework is submitted by the relevant deadline. You should note that it is advisable to submit work in an unfinished state rather than to fail to submit any work.
Assessment requirements for a pass:
A mark of 40% overall.
Reassessment arrangements:
One examination paper of 90 mins duration in August/September - the resit module mark will be
the higher of the exam mark (100% exam) and the exam mark plus previous coursework marks
(50% exam, 50% coursework including in-class test).
Additional Costs (specified where applicable):
Last updated: 8 April 2019
THE INFORMATION CONTAINED IN THIS MODULE DESCRIPTION DOES NOT FORM ANY PART OF A STUDENT'S CONTRACT.