CS3TM20-Text Mining and Natural Language Processing
Module Provider: Computer Science
Number of credits: 10 [5 ECTS credits]
Level:6
Terms in which taught: Spring term module
Pre-requisites:
Non-modular pre-requisites:
Co-requisites:
Modules excluded:
Current from: 2020/1
Email: huizhi.liang@reading.ac.uk
Type of module:
Summary module description:
This module introduces both the theory and practice of Text Mining and Natural Language Processing (NLP).
Aims:
The aim of this module is to introduce the field of text mining and natural language processing. A key focus of the module is placed on the theories and practice of processing text data from the aspects of lexicons, syntactics, and semantics. Aims also include learning about typical application areas such as text classification, topic detection, information extraction, and information retrieval for large scaled text data. The advanced topics such as deep learning for NLP, dialogue systems, machine translation, and current research in the field are also included.
This module also encourages students to develop a set of professional skills, such as problem solving, creativity, technical report writing, organization and time management, self-reflection, software design and development; end-user awareness, action planning and decision making, commercial awareness, critical analysis of published literature and value of diversity.
Assessable learning outcomes:
By the end of this module, students should be able to
- Understand and apply the fundamental principles of text mining and natural language processing;
- Apply methods and algorithms to process different types of textual data;
- Empirically evaluate the performances of methods and algorithms by using accuracy and efficiency metrics;
- Apply analytical and programming skills through using the existing NLP methods and tool s such as NLTK and scikit-learn (python)
Additional outcomes:
This module will provide an overview of the field of Text Mining and NLP and its sub-areas, and will introduce and explain its key techniques, including their applicability and limitations. Topics covered will include:
- Regular expression, Text Normalization, and Edit Distance
- N-gram and language model, part-of-speech tagging
- lexical semantics, Word Senses and WordNet
- Syntactic and Semantic parsing
- Text classification, topic detection, sentiment analysis
- Information extraction including name entity recognition and relation extraction
- Information retrieval and recommender systems
- Advanced topics: deep learning for NLP
- Advanced topics: question answering, dialog systems, machine translation
Outline content:
Brief description of teaching and learning methods:
The course material will be introduced through lectures and practicals. The lecture material will be applied during lab practical sessions. The lab work will provide the student with support to develop high fidelity prototypes by adopting the concepts and storyboards as well as plan for evaluation.
Autumn | Spring | Summer | |
Lectures | 16 | ||
Practicals classes and workshops | 4 | ||
Guided independent study: | |||
Wider reading (independent) | 5 | ||
Wider reading (directed) | 5 | ||
Exam revision/preparation | 20 | ||
Advance preparation for classes | 3 | ||
Preparation of practical report | 5 | ||
Completion of formative assessment tasks | 30 | ||
Revision and preparation | 10 | ||
Reflection | 2 | ||
Total hours by term | 0 | 0 | |
Total hours for module | 100 |
Method | Percentage |
Written exam | 50 |
Set exercise | 50 |
Summative assessment- Examinations:
One 1.5 hour examination paper in May/June
Summative assessment- Coursework and in-class tests:
An individual assignment.
Formative assessment methods:
Students will be provided with formative feedback towards preparation of the coursework in tutorial sessions.
Penalties for late submission:
The Module Convenor will apply the following penalties for work submitted late:
- where the piece of work is submitted after the original deadline (or any formally agreed extension to the deadline): 10% of the total marks available for that piece of work will be deducted from the mark for each working day[1] (or part thereof) following the deadline up to a total of five working days;
- where the piece of work is submitted more than five working days after the original deadline (or any formally agreed extension to the deadline): a mark of zero will be recorded.
You are strongly advised to ensure that coursework is submitted by the relevant deadline. You should note that it is advisable to submit work in an unfinished state rather than to fail to submit any work.
Assessment requirements for a pass:
A mark of 40% overall.
Reassessment arrangements:
One 2-hour examination paper in August/September. Note that the resit module mark will be the higher of (a) the mark from this resit exam and (b) an average of this resit exam mark and previous coursework marks, weighted as per the first attempt (50% exam, 50% coursework).
Additional Costs (specified where applicable):
Last updated: 16 April 2020
THE INFORMATION CONTAINED IN THIS MODULE DESCRIPTION DOES NOT FORM ANY PART OF A STUDENT'S CONTRACT.