And on to some work with
chapter five.
>>> nltk.pos_tag(text)
Traceback (most recent call last):
File "...\lib\site-packages\nltk\tag\__init__.py", line 64, in pos_tag
tagger = nltk.data.load(_POS_TAGGER)
File "...\lib\site-packages\nltk\data.py", line 594, in load
resource_val = pickle.load(_open(resource_url))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 0: ordinal not in range(128)
Now we are finally getting to the root of the string/bytes problem. In
data.py, change line #594 to:
try:
resource_val = pickle.load(_open(resource_url))
except:
resource_val = pickle.load(_open(resource_url),fix_imports=True, encoding='latin-1', errors="ignore")
>>>nltk.corpus.sinica_treebank.tagged_words()
Traceback (most recent call last):
File "...\lib\site-packages\nltk\corpus\reader\sinica_treebank.py", line 60, in _read_block
sent = IDENTIFIER.sub('', sent)
TypeError: can't use a string pattern on a bytes-like object
In corpus/reader/
sinica_treebank.py, add after line #59:
try:
sent = sent.decode()
except:
pass
>>> nltk.corpus.conll2002.tagged_words()
Traceback (most recent call last):
File "C:\Program Files (x86)\Python322\lib\site-packages\nltk\corpus\reader\util.py", line 577, in read_blankline_block
line = stream.readline().decode('utf-8','ignore')
AttributeError: 'str' object has no attribute 'decode'
In corpus/reader/
util.py, change line #577 into:
line = stream.readline()
try:
line = line.decode('utf-8','ignore')
try:
pass
>>> nltk.tag.brill.demo()
Traceback (most recent call last):
File "...\lib\site-packages\nltk\tag\brill.py", line 1308, in demo
print_rules = file(rule_output, 'w')
NameError: global name 'file' is not defined
Well, file no longer exist. Change it to open in
brill.py, line #1308:
print_rules = open(rule_output, 'w')
While we are at it, change line #1313 as well:
error_file = open(error_output, 'w')
That's it. That was chapter five. We are getting nearer to the core.
change line #577,
ReplyDeletetry:
line = line.decode('utf-8','ignore')
except:
pass
You have just saved me, thanks very much! However, in order to make pos_tag work I had only to do step number 1. Thanks nevertheless!
ReplyDelete