Sunday, December 18, 2011

Using nltk with Python 3 (overview)

I am finally done getting at least the code from the official nltk book to work under Python 3. Aside from two things that do not work yet (will be covered later; might be due to changes in the nltk code base), it runs flawlessly

I learned quite a lot about the source code of nltk, so now there will be a more systematic approach. Though I probably still won't change the stream readers for now. As I found the official book somewhat lacking -- I did not want a general Python tutorial, and I prefer a somewhat more consistent introductial approach to language processing than a book that seems to be aimed at students offers -- I finally visited the nearest library. Now I will test my changes to the source on the examples and exercises found in (no particular order)
  • McNeil, J. (2010): Python 2.6 Text Processing. The easiest way to learn how to manipulate text with Python.
  • Perkins, J. (2010): Python Text Processing with NLTK 2.0 Cookbook.

Should I find the time, I will accompany the simple code changes with some snippets I am working on. I am thinking mainly about implementing parallel NLP tasks to finally apply my basic knowledge of Python multiprocessing and/or MapReduce/Hadoop.


Here are the changed files: (see here)
Here the complete nltk source with all changes: (see here)
Here is a Windows installer for nltk under Python 3 (x86): (see here)
Here is a complete and short list, consisting of all changes made to the nltk source: (TBD)

All parts from this post series:
  1. Introduction and overview
  2. nltk chapter 1: Language Processing and Python
  3. nltk chapter 2: Accessing Text Corpora and Lexical Resources
  4. nltk chapter 3: Processing Raw Text
  5. nltk chapter 4: Writing Structured Programs
  6. nltk chapter 5: Categorizing and Tagging Words
  7. nltk chapter 6: Learning to Classify Text
  8. nltk chapter 7: Extracting Information from Text
  9. nltk chapter 8: Analyzing Sentence Structure
  10. nltk chapter 9: Building Feature Based Grammars
  11. nltk chapter 10: Analyzing the Meaning of Sentences
  12. nltk chapter 11: Managing Linguistic Data

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.