Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[early WIP] Fix/rationalize loss-tallying #2922

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

gojomo
Copy link
Collaborator

@gojomo gojomo commented Aug 24, 2020

PR to eventually address loss-tallying issues: #2617, #2735, #2743. Early tinkering stage.

@gojomo gojomo force-pushed the loss-fixes branch 6 times, most recently from 8c61787 to 33ef202 Compare August 28, 2020 18:43
@gojomo
Copy link
Collaborator Author

gojomo commented Sep 3, 2020

Changes so far in Word2Vec:

  • using float64 for all loss tallying
  • resetting tally to 0.0 per epoch - but remembering history elsewhere for duration of current train() call
  • micro-tallying into a per-batch value rather than the global tally
  • then, adding to global tally rather than replacing it

Though the real goal is sensible loss-tallying across all classes, I think these small changes already remedy #2735 (float32 swallows large loss-values) & #2743 (worker losses clobber each other).

An oddity from looking at per-epoch loss across a full run: all my hs runs have shown increasing loss every epoch, which makes no sense to me. And yet, the models at the end have moved word-vectors to more useful places (thus passing our minimal sanity-tests). I don't think my small changes could have caused this oddity (but maybe); I suspect something pre-existing in HS-mode loss-tallying is the real reason. When I have a chance I'll compare to the loss patterns for similar-modes/similar-data in something like the Facebook FastText code, that also reports running loss.

@gojomo
Copy link
Collaborator Author

gojomo commented Sep 8, 2020

Training FB fasttext (HS, CBOW, no-ngrams ./fasttext cbow -verbose 5 -maxn 0 -bucket 0 -lr 0.025 -loss hs -thread 3 -input ~/Documents/Dev/gensim/enwik9 -output enwik9-cbow-nongrams-lr025-hs) shows decreasing loss reports through the course of training, as expected and unlike the strangely-increasing per-epoch loss our code (at least in this PR) reports. But, final results on a few quick most_similar ops seem very similar. So something remains odd about our loss reporting, especially in HS mode.

@gojomo
Copy link
Collaborator Author

gojomo commented Sep 8, 2020

As a point of comparison, Facebook's fasttext reports an "average loss", divided over some trial-count, like so:

(base) gojomo@Gobuntu-2020:~/Documents/Dev/fasttext/fastText-0.9.2$ time ./fasttext cbow -verbose 5 -maxn 0 -bucket 0 -lr 0.025 -loss hs -thread 3 -input ~/Documents/Dev/gensim/enwik9 -output enwik9-cbow-nongrams-lr025-hs
Read 142M words
Number of words:  847816
Number of labels: 0
Progress:  39.8% words/sec/thread:  431099 lr:  0.015052 avg.loss:  5.263475 ETA:   0h 5m31s
Progress:  45.4% words/sec/thread:  429306 lr:  0.013645 avg.loss:  4.725245 ETA:   0h 5m 1s
Progress:  58.6% words/sec/thread:  426932 lr:  0.010339 avg.loss:  3.865230 ETA:   0h 3m50s
Progress: 100.0% words/sec/thread:  422384 lr:  0.000000 avg.loss:  2.483185 ETA:   0h 0m 0s

Gensim should probably collect & report 2Vec-class training loss in a comparable way, so that numbers on algorithmically-analogous runs are broadly similar, for familiarity to users & as a cross-check of whatever it is we're doing.

@piskvorky
Copy link
Owner

piskvorky commented Sep 8, 2020

+1 on matching FB's logic. What is "trial-count"? Is the average taken over words or something else?

@gojomo
Copy link
Collaborator Author

gojomo commented Sep 8, 2020

Unsure; their c++ (with a separate class for 'loss') is different enough from our code that I couldn't tell at-a-glance & will need to study it a bit more.

@piskvorky
Copy link
Owner

piskvorky commented Feb 19, 2022

@gojomo cleaning up the loss-tallying logic still very much welcome. Did you figure out the "increasing loss" mystery?

We're planning to make a Gensim release soon – whether this PR gets in now or later, it will be a great addition.

@gojomo
Copy link
Collaborator Author

gojomo commented Feb 21, 2022

These changes would likely apply, & help a bit in Word2Vec, with just a little adaptation to current develop. I could take a look this week & wouldn't expect any complications.

But getting consistent loss-tallying working in Doc2Vec & FastText, & ensuring a similar calculation & roughly similar loss magnitudes with other libraries (mainly Facebook FastText), would require more, & hard-to-estimate, effort. We kind of need someone who both – (1) needs it; & (2) can get deep into understanding the code – to rationalize the whole thing.

Never figured out why our hs mode reports growing loss despite the model improving as expected on other checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants