Saturday, December 17, 2011

Using nltk with Python 3 (7)

Chapter seven: Text extraction.
>>> print(result)
Traceback (most recent call last):
  File "...\lib\site-packages\nltk\tree.py", line 659, in _pprint_flat
    string.join(childstrs), parens[1])
AttributeError: 'module' object has no attribute 'join'
Well known error now, simply change the join method into the Python 3 version. tree.py, line #657:
if isinstance(self.node, str):
    return '%s%s%s %s%s' % (parens[0], self.node, nodesep,
            ' '.join(childstrs), parens[1])
else:
    return '%s%r%s %s%s' % (parens[0], self.node, nodesep,
            ' '.join(childstrs), parens[1])

File "...\lib\site-packages\nltk\corpus\reader\ieer.py", line 110, in _read_block
    return ['\n'.join(out)]
TypeError: sequence item 0: expected str instance, bytes found
In corpus/reader/eeer.py, change the functon, starting at line #98:
while True:
    line = stream.readline()
    try:
        line = line.decode('utf-8','ignore')
    except:
        pass
    if not line: break
    if line.strip() == '<DOC>': break
out.append(line)
# Read the document
while True:
    line = stream.readline()
    try:
        line = line.decode('utf-8','ignore')
    except:
        pass
    if not line: break

Traceback (most recent call last):
  File "...\lib\site-packages\nltk\sem\relextract.py", line 78, in _join
    return join(lst, sep=sep)
NameError: global name 'join' is not defined
Again a not converted join call. In sem/relextract.py, change line #78 to
return sep.join(lst)
Do the same on line #81:
return sep.join([tup[0] for tup in lst])
and line #83:
return sep.join([tuple2str(tup) for tup in lst]) 


  File "...\lib\site-packages\nltk\data.py", line 1027, in _char_seek_forward
    chars, bytes_decoded = self._incr_decode(bytes[:est_bytes])
TypeError: slice indices must be integers or None or have an __index__ method

Must have overlooked that one last time.
Change line #1027 in data.py to
chars, bytes_decoded = self._incr_decode(bytes[:int(est_bytes)])
That's it.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.