Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributions #32

Open
ziodave opened this issue Mar 1, 2021 · 3 comments
Open

Contributions #32

ziodave opened this issue Mar 1, 2021 · 3 comments

Comments

@ziodave
Copy link
Contributor

ziodave commented Mar 1, 2021

Hello @wetneb

I forked the project and added several contributions to the project. I didn't bother you so far as I wanted to understand what improvements I could bring to the project.

I'll try to enumerate them here:

  1. In order to avoid highlighting stop words or similar (ref Remove noisy words #29), a Solr schema supporting query optimization for more than 40 languages, removing stop words and/or using OpenNLP to filter the most important words (nouns). In order to select the proper analyzer, a new language parameter may be required (ref Allow language selection to prevent aliases adding noise instead of helping #3).
  2. Currently it's not possible to configure the profile with subclassOf higher in the hierarchy because the SPARQL query times out. I provide another implementation which performs several queries in order to retrieve (and cache) the children. As part of this improvement I provide a User Agent as per Wikidata requirements (to avoid rate limitation).
  3. Add relevant sameAs properties to the index. For places also add geo coordinates. For events also add start/end times.

As part of 1. I intend also to provide a Solr docker image with the required libraries and models, given that they're compatible with the project license (partially related to #31).

I quite like the results so far and I would be happy to orderly make a PR for you to review them and merge them in upstream.

Let me know what you think ⭐️

image
image

@wetneb
Copy link
Member

wetneb commented Mar 3, 2021

Hi @ziodave, this is absolutely fantastic! Yes I would be very interested in integrating such changes in this repository. If you could make PRs that would be a bit granular (introducing one feature at a time, basically) it would help to review them, but if that is too much effort I am sure we can find other ways.

@ziodave
Copy link
Contributor Author

ziodave commented Mar 4, 2021

Great! In the next days, I'll cleanup things a bit and I'll start making PRs.

@eracle
Copy link
Collaborator

eracle commented Aug 9, 2022

Any news on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants