Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent error when using "not" when searching with index Subject #3895

Open
wesleybl opened this issue Jan 24, 2024 · 6 comments
Open

Comments

@wesleybl
Copy link
Member

wesleybl commented Jan 24, 2024

BUG/PROBLEM REPORT (OR OTHER COMMON ISSUE)

When we do a search in the Subject index, with "not", the result is sometimes wrong. Sometimes it returns an empty list, when it should return content.

What I did:

  1. Create a document with the Tag: "Bulletin"
  2. Create a Python script in ZMI with the content:
return str(context.portal_catalog(Subject={"not": ["Bulletin"]}))
  1. Run the script multiple times.

What I expect to happen:

The search must return all content that does not contain the "Bulletin" Tag

What actually happened:

Sometimes an empty list is returned.

What version of Plone/ Addons I am using:

Plone 6.0.9

@wesleybl
Copy link
Member Author

In fact, the problem occurs even if the Subject is made up of just one word. I updated the description to accommodate this.

@wesleybl
Copy link
Member Author

I debugged this error and came to the following conclusions:

  • The contents that is missing from the result does not have a Subject.
  • The search works as follows. If have "not", first get all the keys from the Subject index:

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/PluginIndexes/unindex.py#L507

  • Then remove the key that corresponds to the "not" parameter:

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/PluginIndexes/unindex.py#L508-L510

  • After that, it searches for all records that have one of the keys:

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/PluginIndexes/unindex.py#L586-L620

  • Note that at this moment, all content that does not have a Subject is left behind
  • Finally, from the result of all that have at least one Subject, those that have the value that is in "not" are removed:

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/PluginIndexes/unindex.py#L681-L683

But why does it sometimes work? Let's go:

  • The search is cumulative for each index used in the search:

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/ZCatalog/Catalog.py#L620-L627

  • For all searches, even if not explicitly requested, the index allowedRolesAndUsers is always used:

kw["allowedRolesAndUsers"] = self._listAllowedRolesAndUsers(user)

  • So, when we search for the Subject index, two indexes are used internally (Subject and allowedRolesAndUsers).
  • Searches in these indexes are done here:

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/ZCatalog/Catalog.py#L620

  • Sometimes the first element in the list is Subject and sometimes it is allowedRolesAndUsers.
  • As the search in indexes is cumulative, when the first index is allowedRolesAndUsers, all records are included in the results, including those that do not have the Subject field. So it works correctly without missing content.
  • When Subject is the first index, there is the problem of leaving content without Subject behind, because we don't have all the records pre-selected.

So this problem can occur in all indexes where not all content has a value, not just with Subject.

How to fix this problem?

I could think about forcing the index allowedRolesAndUsers to always be first. But it is a Plone-specific index, which Products.ZCatalog "does not know".

Could we force indexes with "not" to always be last?

Or do something like: If the search result is empty and the index contains a "not", search all records first. But how and where to do this?

@mauritsvanrees @davisagli @mamico @jensens I'm mentioning you because you recently messed with Products.ZCatalog. Opinions?

@wesleybl
Copy link
Member Author

wesleybl commented Jan 26, 2024

I could think about forcing the index allowedRolesAndUsers to always be first. But it is a Plone-specific index, which Products.ZCatalog "does not know".

In fact, allowedRolesAndUsers exists in Zope too:

https://github.com/zopefoundation/Products.CMFCore/blob/c73cdf4f0fdcca4b9bb95813ade7a374282dd801/src/Products/CMFCore/CatalogTool.py#L208

@dataflake @icemac any thoughts here?

@dataflake
Copy link

The index is not in Zope, it's in Products.CMFCore where the only "consumer" is Plone. I am not a ZCatalog expert, sorry.

@mamico
Copy link
Member

mamico commented Jan 28, 2024

@wesleybl I think changing the order of the indexes should mitigate the problem, but in the end it won't be the real solution.

However, if you want to experiment, you can do so by monkey-patching the method Products.ZCatalog.Catalog.Catalog_sorted_search_indexes, you can find inspiration here https://github.com/RedTurtle/redturtle.volto/blob/master/src/redturtle/volto/catalogplan.py

In the meantime, I would try opening an issue or a PR (starting with a test that breaks) on Products.ZCatalog.

I also see a similar problem, not the same one, here zopefoundation/Products.ZCatalog#35 and some work done, but probably not fully completed, here zopefoundation/Products.ZCatalog#74

\cc @andbag @d-maurer

@wesleybl
Copy link
Member Author

I think changing the order of the indexes should mitigate the problem, but in the end it won't be the real solution.

@mamico I think this would solve the problem in a simpler way. At least it would solve the problem for those using Plone or Products.CMFCore, which I believe are the biggest users of Zope.

Any other solution would be more complex and would have to allow returning all objects in the catalog before applying the filter with not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants