Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add occurrence count in the json output for search engine #1076

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

panglesd
Copy link
Collaborator

Since #972 there is a compile-index command which outputs a json index containing all searchable entries. The intended purpose of this json index is to pass it to a search engine.

Since #976 there is also a count-occurrences command which creates a table containing the number of occurrences for each identifier.

This PR makes the compile-index be able to use the information from count-occurrence. It adds a --occurrences argument to get the path to the table.

The first commit isolate the occurrence code in its own library.
The second commit implements the --occurrences argument of ``compile-index`.

@panglesd
Copy link
Collaborator Author

Note that there will be a conflict with #1067 which we will need to resolve (in whichever PR gets merged last).

panglesd added a commit to panglesd/odoc that referenced this pull request Jan 31, 2024
Signed-off-by: Paul-Elliot <peada@free.fr>
Copy link
Contributor

@gpetiot gpetiot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about the displaying of the result, otherwise looks good to me!

("direct", `Float (float_of_int occ.Odoc_occurrences.Table.direct));
("indirect", `Float (float_of_int occ.indirect));
]
| None -> `Null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-picking but it would look more consistent to me if we had 0 for both counters here (see the output in the tests).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "no 'occurrences' field at all"?
I'm not very keen on choosing a wrong value when it was not possible to compute one because the table was not given...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at least if we don't put this field at all it reduces the diff of this PR!

Copy link
Collaborator Author

@panglesd panglesd Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 6205f97, if the occurrence table is not given, the field is not included.
However, if the occurrence table is given, but does not contain the occurrence, a use of 0 is counted.

Indeed, that removed that the diff for search without occurrences.

Thanks for the comment, I think that's better now!

Signed-off-by: Paul-Elliot <peada@free.fr>
This adds the occurrence count to each element of the json index

Signed-off-by: Paul-Elliot <peada@free.fr>
Signed-off-by: Paul-Elliot <peada@free.fr>
Signed-off-by: Paul-Elliot <peada@free.fr>
Signed-off-by: Paul-Elliot <peada@free.fr>
doc/driver.mld Outdated Show resolved Hide resolved
panglesd and others added 2 commits February 19, 2024 10:20
Signed-off-by: Paul-Elliot <peada@free.fr>
Co-authored-by: Guillaume "Liam" Petiot <guillaume@tarides.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants