NLTK book                                              Ming the Social Web


Class Information:

CMPS 143 - Spring 2017: Introduction to Natural Language Processing

Schedule: Tuesday-Thursday, 1:30-3:05pm

Location: PhysSciences 114


Lab Sections:

  • Wed 5:00 - 7:00 PM
  • Fri 3:00 - 5:00 PM
  • Tues 7:00 - 9:00 PM

Location: MingOng Computer Lab 103


Daniel Hardt


Office Hours: Tue, Thur, 10-11am, E2 247B






Geetanjali Rakshit



Course Description:


This class introduces advanced undergraduates to the theory and practice of Natural Language Processing. We will focus on NLP programming for processing and generation of narratively structured text, such as classic stories like Aesop's Fables as well as personal narratives that can be mined on the web. CMPS 143 provides a combination of homeworks and exams targeted at learning the basics of NLP using the NLTK toolkit and other publicly available software.  Previous experience with Python is a prerequisite.




Natural Language Processing with Python. Available electronically and from the bookstore. Henceforth referred to as NLLP:

We will be using NLTK 3.0 and the updated version of the online book that corresponds to it. The version of the book in the bookstore is slightly out of date wrt what is on the web. 


Additional resources:

Speech and Natural Language Processing. Jurafsky and Martin. Coursera online lectures and parts of book available onlin Has some useful stuff for getting data off the web.


Grading Policy:


Attendance: 5%
Homeworks and discussion in class: 45%
Project (assignments that include project, and final presentation of project during Finals slot): 25%
Midterm: 25%
Final: 25%
Homework Delivery: Turn it in on eCommons assignments. Please include any code, files, and written documents in a zip file. Written documents should be plain text or PDF only. Multiple uploads (to overwrite) are enabled. Late HW accepted until noon the next day with a 10% penalty.


Special Accommodations:


If you have special needs, we will accommodate you. The Disability Resource Center offers services that are confidential and free of charge.  After you contact the Disability Resource Center, bring your Accommodation Authorization form to me after class or during office hours and we will discuss your accommodations.


Student Responsibilities:


1.  Students contact the DRC to determine their eligibility for accommodations. When approved by DRC, they will receive their Accommodation Authorization form


2.  Students then notify their instructor during office hours or after class of their accommodations, and provide their instructor with their Accommodation Authorization form.


3. Please note that it is the student's responsibility to contact the instructor about their accommodations. If they do not contact their instructor, accommodations will not be made.


4.Students should submit their requests to faculty no later than 7 days before a regular exam and 14 days before a final exam.





Week 1. NLP Pipeline and Basic Text Processing with Python

NLTK Book Ch 1, Ch 2, Ch 3, Ch 5
Lecture 1: Overview of the course structure, NLP pipeline, Word Counts and frequency distributions


Lecture 2: Working with data in NLTK, Tokenization, Sentence segmentation, stemming, collocations, pOS tagging, Introduction to lexical resources (WordNet)





Week 2:  what's beyond Words, POS Tagging, More WordNet, Statistical NLP, corpus-based NLP, Language models & N-Grams, Review of probability and Regular expressions  

 NLTK Book, Ch 6

Lecture 3: More WordNet and NLTK API, lexical relations, semantic similarity, introducing statistical nLP, corpus-based Approaches, review of probability & conditional probability




Lecture 4: Bayes' Theorem, Language Models, Markov assumption, N-grams, Maximum likelihood estimation, Regular expressions


Week 3: Natural Language Understanding I: Text Classification I, Using Sentiment Lexicons, Lexical Resources, Syntax and Parsing.

NLTK Book Ch 8
Lecture 5: More Regex, Uses of regex in NLP, Classifying text, Supervised classification, Getting labels, feature extraction, overfitting, Train/dev/test sets, Classifying with NLTK, extracting features from LIWC, Evaluation measures


Lecture 6: More classification for NLP, movie reviews classification example, feature selection methods, lexical resources, , error analysis, Syntax and Parsing


Week 4: More Classification: Feature Analysis, Error Analysis. POS Tagging As A Classification Problem. Syntax and Semantics.


Lecture 7: HW2 Review in class, Naive Bayes Classifier, Bayesian Classification, maximum likelihood estimation, Smoothing, numerical stability, labov's theory of narratives, Syntax and Parsing.




Lecture 8 Syntax and Semantics -- approaches to Meaning.


Week 5: The Lexicon, Verbs And Their Subcategorization.Discourse & Narrative Meaning.


Lecture 9: HW3 review in class, Supervised and Unsupervised models, different approaches for automatic POS tagging, n-Gram tagging, Introduction to SIG and Scheherazade, VerbNet.




Lecture 10 - April 28: Continue Parsing, syntactic ambiguity, treebank, Probabilistic CFG, Lexicalized PCFG, Dependency grammar, dependency parse, types of parsers, shift-Reduce parsing, probabilistic dependency parsing, parsing as a classification problem, story intention graph, SIG encodings, Scheherazade annotation tool




Week 6: Midterm


Lecture 11: Midterm Review


Lecture 12: Midterm exam -- May 11




Week 7: Natural Language Understanding II: Chunking, Sentence Structure And Parsing, Natural Language Understanding For Q&A.


Lecture 13 - May 10: HW3 held-out results, Introduction to question-answering, types of questions, iR-based factoid QA, question processing, answer type taxonomy, answer type detection approaches, assignment 6 overview, evaluation metrics for qA, qA Pipeline, Question reformulation, introduction to using syntax for QA 




Lecture 14 - May 12: what is syntax, using syntactic representation for question-answering, chunking, constituency parse, dependency parse, data structure for using parse trees, manipulating constituency trees, reading dependency graphs, increasing precision using syntax, stanford parser dependency structure, HW6 Stub code demo




Week 8:  Question Answering  II: Working With NLU Representations For Q&A.




Week 9: Question Answering III: Lexicons & Lexical Semantics for Q&A.




Week 10: Question Answering Competition & Final Exam In Class Slot.

Final Exam: June 8 (normal class period)

Q/A System Presentations: June 14, 12noon-3pm