While at it, we will fix the included parsers, as I was playing with them and encountered some errors.
First, the parsers. While trying to run them, I run into a grave error, insisting that some parsers do not exist. That was due to a only partial complete wordnet_app.py. I am not sure why, it might be some git bug, it might be Windows specific. However, simply copying the missing lines from the 2.x nltk branch fixes it (only one print function has to be modified), the complete file will be in the download link at the end.
My Python environment here caused a new error to occur, relating to tkinter, Python's standard GUI package. Try entering
import tkinter tkinter._test()If this results in an error, add your tkinter path to your Python configuration. Simply run
import sys sys.path.append('C:\\Program Files (x86)\\Python3\\Lib\tkinter')Change it to your Python path, of course.
Now on to the parsers.
Chartparser nltk.app.chartparser() in app/chartparser_app.py.
Edit the font calls on lines #976 - #982
self._boldfont = tkinter.font.Font(family='helvetica', weight='bold', size=self._fontsize) self._font = tkinter.font.Font(family='helvetica', size=self._fontsize) # See: <http://www.astro.washington.edu/owen/ROTKFolklore.html> self._sysfont = tkinter.font.Font(font=tkinter.Button()["font"]) root.option_add("*Font", self._sysfont)The same for the lines #1696 - #1706
self._sysfont = tkinter.font.Font(font=tkinter.Button()["font"]) root.option_add("*Font", self._sysfont) # TWhat's our font size (default=same as sysfont) self._size = tkinter.IntVar(root) self._size.set(self._sysfont.cget('size')) self._boldfont = tkinter.font.Font(family='helvetica', weight='bold', size=self._size.get()) self._font = tkinter.font.Font(family='helvetica', size=self._size.get())Next is the string joining, which should have been handled by 2to3.
Edit line #1115 to look like this:
rhs = ' '.join(rhselts)Lines #1213:
rhs1 = ' '.join(rhs[:pos]) rhs2 = ' '.join(rhs[pos:])Lines #2075:
sentence = ' '.join(self._tokens)Last change here. Weird, line #939 is commented out. Simply activate it again:
self._init_fonts(root)
Chunkparser nltk.app.chunkparser() in app/chunkparser_app.py.
Same again, change tkfont and int conversion in lines #371 - #376
self._size = IntVar(top) self._size.set(20) self._font = tkinter.font.Font(family='helvetica', size=-self._size.get()) self._smallfont = tkinter.font.Font(family='helvetica', size=-int((self._size.get()*14/20)))To fix a couple of integer errors, now edit chunk/regexp.py, line #132:
for i in range(int(1+len(brackets)/5000)):
Collocations in app/collocations_app.py.
The trend should be obvious now, simply change every occurance of
tkFont.Fontinto
tkinter.font.FontThe next change is somewhat weird. Change line #195:
def next(self):The next error took a while to figure out (Thanks, nltk error handling).
Edit corpus/reader/api.py, change line #309 into:
try: file_id, categories = line.decode().split(self._delimiter, 1) except: file_id, categories = line.split(self._delimiter, 1)
Concordance in app/condordance_app.py.
Change every occurance of
tkFont.Fontinto
tkinter.font.FontChange line #262 to:
def next(self):
RDParser nltk.app.rdparser() in app/rdparser_app.py.
Add the following import after line #69:
import tkinterNow the font calls for tkinter have to be changed. Edit the lines #140 - #147 to look like this:
self._boldfont = tkinter.font.Font(family='helvetica', weight='bold', size=self._size.get()) self._font = tkinter.font.Font(family='helvetica', size=self._size.get()) if self._size.get() < 0: big = self._size.get()-2 else: big = self._size.get()+2 self._bigfont = tkinter.font.Font(family='helvetica', weight='bold', size=big)
SRParser nltk.app.srparser() in app/srparser_app.py.
Change every occurance of
tkFont.Fontinto
tkinter.font.FontNow open up draw/util.py.
Change line #1794:
for x in range(left, right-w, int((right-left-w)/10)):Change line #1798:
for y in range(top, bot-h, int((bot-top-h)/10)):Change line #1799:
for x in range(left, right-w, int((right-left-w)/10)):
Wordnet.
Not sure yet, I had trouble getting the 2.x version to run as well, so we will come back here later.
Finally, on to chapter three.
Quite a lot works instantly now.
>>> sents = sent_tokenizer.tokenize(text) File "...\lib\site-packages\nltk\tokenize\punkt.py", line 1150, in _slices_from_text for match in self._lang_vars.period_context_re().finditer(text): TypeError: can't use a string pattern on a bytes-like objectIn punkt.py, line #1150, convert text to a string object. Add before the offending line:
try:
try: text = text.decode('utf-8','ignore') except: passAnd that should be chapter three.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.