Skip to content Skip to sidebar Skip to footer

How To Pickle Customized Vectorizer?

I'm having trouble pickling a vectorizer after I customize it. from sklearn.feature_extraction.text import TfidfVectorizer import pickle tfidf_vectorizer = TfidfVectorizer(analyz

Solution 1:

This is not so much a scikit-learn problem as a general Python problem:

>>> pickle.dumps(str.split)
Traceback (most recent call last):
  File "<ipython-input-7-7d3648c78b22>", line 1, in <module>
    pickle.dumps(str.split)
  File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle method_descriptor objects

The solution is to use a pickleable analyzer:

>>> def split(s):
...     return s.split()
... 
>>> pickle.dumps(split)
'c__main__\nsplit\np0\n.'
>>> tfidf_vectorizer = TfidfVectorizer(analyzer=split)
>>> type(pickle.dumps(tfidf_vectorizer))
<type 'str'>

Post a Comment for "How To Pickle Customized Vectorizer?"