Internal

CS3TM20 - Text Mining and Natural Language Processing

CS3TM20-Text Mining and Natural Language Processing

Module Provider: Computer Science
Number of credits: 10 [5 ECTS credits]
Level:6
Terms in which taught: Spring term module
Pre-requisites:
Non-modular pre-requisites:
Co-requisites:
Modules excluded:
Current from: 2023/4

Module Convenor: Prof Xia Hong
Email: x.hong@reading.ac.uk

Type of module:

Summary module description:

This module introduces both the theory and practice of Text Mining and Natural Language Processing (NLP).


Aims:

The aim of this module is to introduce the field of text mining and natural language processing. A key focus of the module is placed on the theories and practice of processing text data from the aspects of lexicons, syntactics, and semantics. 



This module also encourages students to develop a set of professional skills, such as problem solving, creativity, technical report writing, organization and time management, self-reflection, software design and development; end-user awareness, action planning and decision making, commercial awareness, critical analysis of published literature and value of diversity.


Assessable learning outcomes:

By the end of this module, students should be able to




  • Understand and apply the fundamental principles of text mining and natural language processing;

  • Apply methods and algorithms to process different types of textual data;

  • Empirically evaluate the performances of methods and algorithms by using accuracy and efficiency metrics; and

  • Apply analytical and programming skills through using the existing NLP methods andtools such as NLTK and scikit-learn (python)


Additional outcomes:

.


Outline content:

This module will provide an overview of the field of Text Mining and NLP and its sub-areas, and will introduce and explain its key techniques, including their applicability and limitations. Topics covered will include:




  • Regular expression, Text Normalization

  • N-gram and language model, part-of-speech tagging

  • lexical semantics, Word Senses and WordNet

  • Syntactic and Semantic parsing

  • Text classification, sentiment analysis

  • Information extraction including name entity recognition and relation extraction

  • Advanced topics: Machine learning for NLP, Word embedding, Hidden Markov model and Viterbi algorithm


Brief description of teaching and learning methods:

The course material will be introduced through lectures and practicals. The lecture material will be applied during lab practical sessions. The lab work will provide the student with support to develop high fidelity prototypes by adopting the concepts and storyboards as well as plan for evaluation.  


Contact hours:
  Autumn Spring Summer
Lectures 16
Practicals classes and workshops 4
Guided independent study:      
    Wider reading (independent) 5
    Wider reading (directed) 5
    Exam revision/preparation 20
    Advance preparation for classes 3
    Preparation of practical report 5
    Completion of formative assessment tasks 30
    Revision and preparation 10
    Reflection 2
       
Total hours by term 0 0
       
Total hours for module 100

Summative Assessment Methods:
Method Percentage
Written exam 50
Set exercise 50

Summative assessment- Examinations:

One 1.5 hour examination paper in May/June


Summative assessment- Coursework and in-class tests:

An individual assignment.


Formative assessment methods:

Students will be provided with formative feedback towards preparation of the coursework in tutorial sessions.


Penalties for late submission:

The Support Centres will apply the following penalties for work submitted late:

  • where the piece of work is submitted after the original deadline (or any formally agreed extension to the deadline): 10% of the total marks available for that piece of work will be deducted from the mark for each working day (or part thereof) following the deadline up to a total of five working days;
  • where the piece of work is submitted more than five working days after the original deadline (or any formally agreed extension to the deadline): a mark of zero will be recorded.
The University policy statement on penalties for late submission can be found at: https://www.reading.ac.uk/cqsd/-/media/project/functions/cqsd/documents/cqsd-old-site-documents/penaltiesforlatesubmission.pdf
You are strongly advised to ensure that coursework is submitted by the relevant deadline. You should note that it is advisable to submit work in an unfinished state rather than to fail to submit any work.

Assessment requirements for a pass:

A mark of 40% overall.


Reassessment arrangements:

One 2-hour examination paper in August/September. Note that the resit module mark will be the higher of (a) the mark from this resit exam and (b) an average of this resit exam mark and previous coursework marks, weighted as per the first attempt (50% exam, 50% coursework). 


Additional Costs (specified where applicable):

1) Required text books:  None

2) Specialist equipment or materials:  None

3) Specialist clothing, footwear or headgear:  None

4) Printing and binding:  None

5) Computers and devices with a particular specification:  None

6) Travel, accommodation and subsistence:  None


Last updated: 30 March 2023

THE INFORMATION CONTAINED IN THIS MODULE DESCRIPTION DOES NOT FORM ANY PART OF A STUDENT'S CONTRACT.

Things to do now