Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintenance #49

Open
mathias3 opened this issue Nov 18, 2021 · 3 comments
Open

maintenance #49

mathias3 opened this issue Nov 18, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@mathias3
Copy link

Hi Oliver,
You have merged few PR 's but still have not issued new version. Is there chance for that in a near future? Or maybe if you are busy with other projects could you add maintainers to your repo. Me or the guy that did recent PR would be more than happy to contribute and help this package to survive @oborchers

@oborchers
Copy link
Owner

oborchers commented Nov 27, 2021

Hi @mathias3! Help would be very welcome. Although I don't have so much time for this repository, it's still a good thing to have and fills a niche that cannot be filled by more recent advancement in the NLP world.

I've created version 0.1.17 and I have fixed the most glaring issues with the repository, mainly related to the gensim and python incompatibilities.

There is also still the develop branch, which contains many fixes and new features I originally planned to implement or are implemented partially. For example, the code for the following models is fully or partially there:

  • Added Hierarchical (Convolutional) Embeddings for all Models
  • Added MaxPooling
  • Added Features to Sentencevectors
  • Added further unittests
  • Workaround for Numpy memmap issue (ENH: Add madvise for memmap objects numpy/numpy#13172)
  • SVD ram subsampling for SIF / uSIF (customizable, standard is 1 GB of RAM)
  • Minor fixes for nan-handling
  • Minor fixes for sentencevectors class

There are a few things which might make sense to add to the roadmap:

  • Newer models (I don't know, not up to date in this regard)
  • Working the hierarchical op into the main averaging cython routine
  • Support for a user definable embedding class (i.e. fse version of BaseKeyedVectors to get away from the Gensim dependency)
  • Different CI (Travis free mode not longer available)
  • Add pre_inference and post_inference (I think I forgot this one)
  • Refactoring the horribly complicated Input class
  • Reworking the threading (at least from my last experience the input thread is the bottleneck, not the actual computation)
  • Untangling the bad design decision to actually store the BaseKeyedVector from Gensim internally. If users want mmap, they can just load that and pass it.
  • Edit: Approximate nearest neighbor search (i.e. by annoy support)?
  • Return vectors only above a certain threshold Returning vectors with similarity above threshold for most_similar() #34
  • Fix zero division error Encounter "Divided 0 Error" #47

Happy to work on some of the issues as well, should have more time next year

Who might be interested to help?
@mathias3 @grantmwilliams @AlexMRuch

@oborchers
Copy link
Owner

@mathias3: There is also a new version on pypi: 0.1.17

@oborchers
Copy link
Owner

Fixed / added in 0.2.0:

  • Offering pretrained models and making them accessible
  • Fix zero division error
  • Bugfixes for python 3.8 builds
  • Code refactoring to black style

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants