Sunday, December 18, 2011

Using nltk with Python 3 (9)

Chapter 9: Building Feature Based Grammars

>>> nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python322\lib\site-packages\nltk\data.py", line 646, in show_cfg
    if l.startswith(escape): continue
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Nothing surprising, escape has to be a string. Add before the line 646 in data.py:
try:
    l = l.decode()
except AttributeError:
    pass

>>> cp = load_parser('grammars/book_grammars/feat0.fcfg', trace=2)
Traceback (most recent call last):
  File "...\lib\site-packages\nltk\grammar.py", line 1205, in parse_grammar
    line = continue_line + line.strip()
AttributeError: 'int' object has no attribute 'strip'
Add before line #1196 in grammar.py:
try:
    input = input.decode('utf-8','ignore')
except AttributeError:
    pass

File "...\lib\site-packages\nltk\featstruct.py", line 388, in _freeze
    for (fname, fval) in sorted(self._items()):
TypeError: unorderable types: Feature() < str()
The loop causing that error is quite weird and should be easily optimizable (though I suspect efficience is not an issue here). I hesitate to do so, as I could not find any documentation about its behaviour. Well, lets fix this error (in two places at well!). In featstruct.py, change line #339 to:
for (fname, fval) in sorted(self.items(),key=lambda keysort: repr(keysort[0])):
and line #388 to:
for (fname, fval) in sorted(self.items(),key=lambda keysort: repr(keysort[0])):
and line #753 to:
for (fname, fval) in sorted(self.items(),key=lambda keysort: repr(keysort[0])):
Change line #851 to:
nameline = int((len(fval_lines)-1)/2)
and line #872 to:
idline = int((len(lines)-1)/2)

>>> trees = cp.nbest_parse(tokens)
Traceback (most recent call last):
  File "...\lib\site-packages\nltk\parse\chart.py", line 765, in pp_leaves
    header += tok[:width-1].center(width-1)+'.'
TypeError: slice indices must be integers or None or have an __index__ method
Again. In parse/chart.py, change line #765 to:
header += tok[:int(width-1)].center(int(width-1))+'.'
Insert before line #735:
width = int(width)
The parser works now but does not show the same results as its 2.x counterpart. Comparing the source shows that quite a lot was changed. So much, in fact, that I hesitate to make any more changes here, as it might be working as intended.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.