Being done working through the nltk book, I finally started working somewhat more systematically. I am currently setting up a number of automatic tests (the official ones seem to be quite outdated, at least according to the source/issue tracker) and working on the identified classes of issues.
I fixed some weird of the weirs fixes I made last time, as well (general except clause, loop inside a try, ...). The next step was looking for all occurrences of file readers using the bytes method and changing them into string (Unicode) readers.
Everything in chapter two should now work, including the stop words examples (fixed now) and the toolbox (fixed earlier in another chapter).
So far, errors are mainly caused by:
- String/bytes/Unicode
- Division returning a float instead of rounding
- Differences in iterable objects/lists
- Comparison no longer works with non-comparable object (especially while sorting lists)
- tkinter name changes
Here the changed files: nltk_rev_1_changes.zip
Here the complete source: nltk_rev_1_complete.zip
Here a Windows installer (32-Bit): nltk_rev_1_win32_installer.msi