CS3TM20-Text Mining and Natural Language Processing
Module Provider: Computer Science
Number of credits: 10 [5 ECTS credits]
Level:6
Terms in which taught: Spring term module
Pre-requisites:
Non-modular pre-requisites:
Co-requisites:
Modules excluded:
Current from: 2022/3
Module Convenor: Prof Xia Hong
Email: x.hong@reading.ac.uk
Type of module:
Summary module description:
This module introduces both the theory and practice of Text Mining and Natural Language Processing (NLP).
Aims:
The aim of this module is to introduce the field of text mining and natural language processing. A key focus of the module is placed on the theories and practice of processing text data from the aspects of lexicons, syntactics, and semantics.
This module also encourages students to develop a set of professional skills, such as problem solving, creativity, technical report writing, organization and time management, self-reflection, software design and development; end-user awareness, action planning and decision making, commercial awareness, critical analysis of published literature and value of diversity.
Assessable learning outcomes:
By the end of this module, students should be able to
- Understand and apply the fundamental principles of text mining and natural language processing;
- Apply methods and algorithms to process different types of textual data;
- Empirically evaluate the performances of methods and algorithms by using accuracy and efficiency metrics;
- Apply analytical and programming skills through using the existing NLP methods and tools such as NLTK and scikit-learn (python)
Additional outcomes:
This module will provide an overview of the field of Text Mining and NLP and its sub-areas, and will introduce and explain its key techniques, including their applicability and limitations. Topics covered will include:
- Regular expression, Text Normalization,
- N-gram and language model, part-of-speech tagging
- lexical semantics, Word Senses and WordNet
- Syntactic and Semantic parsing
- Text classification, sentiment analysis
- Information extraction including name entity recognition and relation extraction
- Advanced topics: Machine learning for NLP, Word embedding, Hidden Markov model and Viterbi algorithm
Outline content:
Brief description of teaching and learning methods:
The course material will be introduced through lectures and practicals. The lecture material will be applied during lab practical sessions. The lab work will provide the student with support to develop high fidelity prototypes by adopting the concepts and storyboards as well as plan for evaluation.
Autumn | Spring | Summer | |
Lectures | 16 | ||
Practicals classes and workshops | 4 | ||
Guided independent study: | |||
Wider reading (independent) | 5 | ||
Wider reading (directed) | 5 | ||
Exam revision/preparation | 20 | ||
Advance preparation for classes | 3 | ||
Preparation of practical report | 5 | ||
Completion of formative assessment tasks | 30 | ||
Revision and preparation | 10 | ||
Reflection | 2 | ||
Total hours by term | 0 | 0 | |
Total hours for module | 100 |
Method | Percentage |
Written exam | 50 |
Set exercise | 50 |
Summative assessment- Examinations:
One 1.5 hour examination paper in May/June
Summative assessment- Coursework and in-class tests:
An individual assignment.
Formative assessment methods:
Students will be provided with formative feedback towards preparation of the coursework in tutorial sessions.
Penalties for late submission:
The Support Centres will apply the following penalties for work submitted late:
- where the piece of work is submitted after the original deadline (or any formally agreed extension to the deadline): 10% of the total marks available for that piece of work will be deducted from the mark for each working day (or part thereof) following the deadline up to a total of five working days;
- where the piece of work is submitted more than five working days after the original deadline (or any formally agreed extension to the deadline): a mark of zero will be recorded.
You are strongly advised to ensure that coursework is submitted by the relevant deadline. You should note that it is advisable to submit work in an unfinished state rather than to fail to submit any work.
Assessment requirements for a pass:
A mark of 40% overall.
Reassessment arrangements:
One 2-hour examination paper in August/September. Note that the resit module mark will be the higher of (a) the mark from this resit exam and (b) an average of this resit exam mark and previous coursework marks, weighted as per the first attempt (50% exam, 50% coursework).
Additional Costs (specified where applicable):
1) Required text books: None
2) Specialist equipment or materials: None
3) Specialist clothing, footwear or headgear: None
4) Printing and binding: None
5) Computers and devices with a particular specification: None
6) Travel, accommodation and subsistence: None
Last updated: 22 February 2023
THE INFORMATION CONTAINED IN THIS MODULE DESCRIPTION DOES NOT FORM ANY PART OF A STUDENT'S CONTRACT.