Sep and fastai_tokenizer #4

jrlinton · 2020-06-24T20:17:15Z

Brilliant work here, Morgan - really looking forward to using this with my students on a project. Deepest apologies if I'm not doing this right - I'm very new to Github and also not a particularly good programmer.

It looks like perhaps the FastAI v2 team made a change in Tokenizer that is making it choke on the sep argument when instantiating your custom tokenizer in the fasthugs_language_model notebook.

class MLMTokenizer(Tokenizer): 
    def __init__(self, tokenizer, rules=None, counter=None, lengths=None, mode=None, **kwargs):  # removed sep=' '
        super().__init__(tokenizer, rules, counter, lengths, mode)  # removed sep

Taking the sep argument out seemed to fix the issue at first, but then the fastai_tokenizer kept the datasets from being created. I checked the vaious other components and isolated the issue to the tokenizer, but wasn't able to parse the error message that resulted.

tfms=[attrgetter("text"), fastai_tokenizer, AddSpecialTokens(tokenizer), MLMTokensLabels(tokenizer)]
dsets = Datasets(df, splits=splits, tfms=[tfms], dl_type=SortedDL)

Here are head and tail of resulting ten or so pages of error message (again, apologies if I'm not following protocol here):

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-100-070c60545587> in <module>
     11 
     12 #dsets = Datasets(df, splits=splits, tfms=[tfms], dl_type=SortedDL)
---> 13 dsets = Datasets(df, splits=splits, tfms=[tfms], dl_type=SortedDL)
     14 
     15 dsets[0][0][:20], dsets[0][1][:20]

<ipython-input-99-0553a9fb405f> in __init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
      4     "Doesn't create a tuple in __getitem__ as x is already a tuple"
      5     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
----> 6         super().__init__(items=items, tfms=tfms, tls=tls, n_inp=n_inp, dl_type=dl_type, **kwargs)
      7 
      8     def __getitem__(self, it)

.
.  (Pages later)
.

~\.conda\envs\fastai2\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

AttributeError: Can't pickle local object 'parallel_gen.<locals>.f'

Anyway, I hope this is helpful. Please keep up the amazing work!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sep and fastai_tokenizer #4

Sep and fastai_tokenizer #4

jrlinton commented Jun 24, 2020 •

edited

Sep and fastai_tokenizer #4

Sep and fastai_tokenizer #4

Comments

jrlinton commented Jun 24, 2020 • edited

jrlinton commented Jun 24, 2020 •

edited