I learned quite a lot about the source code of nltk, so now there will be a more systematic approach. Though I probably still won't change the stream readers for now. As I found the official book somewhat lacking -- I did not want a general Python tutorial, and I prefer a somewhat more consistent introductial approach to language processing than a book that seems to be aimed at students offers -- I finally visited the nearest library. Now I will test my changes to the source on the examples and exercises found in (no particular order)
- McNeil, J. (2010): Python 2.6 Text Processing. The easiest way to learn how to manipulate text with Python.
- Perkins, J. (2010): Python Text Processing with NLTK 2.0 Cookbook.
Should I find the time, I will accompany the simple code changes with some snippets I am working on. I am thinking mainly about implementing parallel NLP tasks to finally apply my basic knowledge of Python multiprocessing and/or MapReduce/Hadoop.
Here are the changed files: (see here)
Here the complete nltk source with all changes: (see here)
Here is a Windows installer for nltk under Python 3 (x86): (see here)
Here is a complete and short list, consisting of all changes made to the nltk source: (TBD)
All parts from this post series:
- Introduction and overview
- nltk chapter 1: Language Processing and Python
- nltk chapter 2: Accessing Text Corpora and Lexical Resources
- nltk chapter 3: Processing Raw Text
- nltk chapter 4: Writing Structured Programs
- nltk chapter 5: Categorizing and Tagging Words
- nltk chapter 6: Learning to Classify Text
- nltk chapter 7: Extracting Information from Text
- nltk chapter 8: Analyzing Sentence Structure
- nltk chapter 9: Building Feature Based Grammars
- nltk chapter 10: Analyzing the Meaning of Sentences
- nltk chapter 11: Managing Linguistic Data
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.