English   Danish

2024/2025  KAN-CDSCO1002U  Natural Language Processing and Text Analytics

English Title
Natural Language Processing and Text Analytics

Course information

Language English
Course ECTS 7.5 ECTS
Type Mandatory (also offered as elective)
Level Full Degree Master
Duration One Semester
Start time of the course Spring
Timetable Course schedule will be posted at calendar.cbs.dk
Study board
Master of Science (MSc) in Business Administration and Data Science
Course coordinator
  • Rajani Singh - Department of Digitalisation (DIGI)
Main academic disciplines
  • Information technology
  • Statistics and quantitative methods
Teaching methods
  • Blended learning
Last updated on 14-05-2024

Relevant links

Learning objectives
To achieve the grade 12, students should meet the following learning objectives with no or only minor mistakes or errors:
  • Characterize the phenomena of text analytics and Natural Language Processing
  • Summarize different fundamental concepts, techniques and methods of Natural Language Processing
  • Analyze and apply different text analytics techniques for big/business datasets in organizational contexts
  • Understand the linkages between business intelligence and text analytics and the potential benefits for organizations
  • Summarize the application areas, trends, and challenges in text analysis
  • Exhibit deeper knowledge and understanding of the topics as part of the project and the report should reflect on critical awareness of the methodological choices with written skills to accepted academic standards
Course prerequisites
This course requires a fundamental understanding of programming in Python language as achieved in, or comparable to "Foundations of Data Science: Programming and Linear Algebra" at 1st semester CM (data science).
Additionally, having a fundamental understanding of probability concepts such as independent assumption, conditional probability, Bayes' theorem, Chain rule, Markov Assumption, etc. would be advantageous.
Prerequisites for registering for the exam (activities during the teaching period)
Number of compulsory activities which must be approved (see section 13 of the Programme Regulations): 2
Compulsory home assignments
Each assignment is 3-5 pages long and done in group of 1-4 students.
The students must have 2 out of 3 assignments approved to qualify for the final exam.

No additional attempts will be offered to students before the ordinary exam. However, if a student is unable to submit an assignment due to a documented illness, or if a student does not have the assignments approved despite making a genuine effort, then the student will be granted one extra attempt before the re-exam. Before the re-exam, there will be one home assignment (max. 10 pages) which will cover 2 mandatory assignments.
Examination
Natural Language Processing and Text Analytics:
Exam ECTS 7,5
Examination form Oral exam based on written product

In order to participate in the oral exam, the written product must be handed in before the oral exam; by the set deadline. The grade is based on an overall assessment of the written product and the individual oral performance, see also the rules about examination forms in the programme regulations.
Individual or group exam Individual oral exam based on written group product
Number of people in the group 2-4
Size of written product Max. 15 pages
Assignment type Project
Release of assignment An assigned subject is released in class
Duration
Written product to be submitted on specified date and time.
20 min. per student, including examiners' discussion of grade, and informing plus explaining the grade
Grading scale 7-point grading scale
Examiner(s) Internal examiner and external examiner
Exam period Summer
Make-up exam/re-exam
Same examination form as the ordinary exam
Students can submit the same project or they can choose to submit a revised project.
Course content, structure and pedagogical approach

The course provides knowledge of various concepts, techniques, and methods related to text analytics. Furthermore, it introduces

  • Basics of Natural Language Processing (NLP) such as POS-tagging, Named-Entity recognition
  • Language Modeling using N-grams
  • Text classification such as Naïve Bayes and Logistic Regression
  • Lexicon Based Sentiment Analysis
  • Unsupervised methods for NLP and latent models.
  • Word-embeddings and Word Vectors
  • Neural Networks for NLP and Neural Language Models
  • Semantic textual similarity
  • Word-sense disambiguation
  • Text summarization
  • Deep Learning Models for NLP

Furthermore, the course provides the students with practical hands-on experience on text analytics using open source machine learning libraries such as scikit-learn, Natural Language Toolkit (NLTK), spacy, Gensim in Python programming language. After completing the course the students will be able to apply and use various NLP techniques such as text classification, sentiment analysis,  topic modelling etc. on textual documents/ text corpora.

Description of the teaching methods
The course consists of lectures, exercises, and assignments. Each lecture is followed by an exercise session, and there will be a teaching assistant providing technical support for assignments and course projects.

The presented theories, concepts and methods should be applied in practice and exercise sessions. The students work in the entire semester on a mini project displaying the understanding of the concepts presented in the lectures and exercises. CBS Canvas is used for sharing documents, slides, exercises etc. as well as for interactive lessons if applicable.
Feedback during the teaching period
In this course, feedback to the students will be provided in the following ways:

1) During the hands-on exercises following each lecture, the students will receive help and feedback in solving the practical hands-on exercises from the teacher and the instructors.

2) At the end of each exercise session, we will go through the solutions to the exercises and discuss various techniques and alternative methods to solve the exercises and also clarify any questions from the students.

3) Feedback on the mandatory assignments will be provided to students as part of the grading for the mandatory assignments. Since the mandatory assignments are at the group level, the students will receive collective feedback on their group submission.

Student workload
Lectures 24 hours
Exercises 24 hours
Prepare to class 48 hours
Project work & report 100 hours
Exam and prepare 10 hours
Total 206 hours
Expected literature

The literature can be changed before the semester starts. Students are advised to find the final literature on Canvas before they buy the books.

 

Textbooks:

 

  • Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (2. ed., Pearson internat. ed.). Prentice Hall, Pearson Education International.

  • Manning, C. D., & Schütze, Hinrich. (2003). Foundations of statistical natural language processing (6. printing with corrections.). MIT Press.


Notes, articles, chapters and webpages will be handed out/made available during the course.

Last updated on 14-05-2024