Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SMARTS filter language (filter_smarts) #398

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

merkys
Copy link
Member

@merkys merkys commented Jan 26, 2022

In #368 support for SMARTS filter language was suggested. This PR:

  1. Describes how external filter languages should be supported in OPTIMADE;
  2. Introduces the first one of them, SMARTS.

In short, SMARTS queries are to be passed using filter_smarts URL parameter. If appearing together with filter in the same request, different filters are combined with logical AND.

@ml-evs ml-evs added topic/filtering-language Issue discussing changes and improvements to the query and filtering language type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus. topic/query-string Issues relating to the query string sent to the OPTIMaDe api, excluding the filter language. PR/requires-discussion labels Jun 1, 2022
@rartino
Copy link
Contributor

rartino commented Jun 17, 2022

SMARTS isn't really my field, so even if this PR is simple and I see no formal issues with it, I think it should be reviewed and approved by people who want this feature in the standard and can see themselves implement it down the line.

(I could add that I am a bit concerned over the lack of standardization of the SMARTS language discussed in #368 where it seems one may find different results depending on what SMARTS library the backend is using, so are these SMARTS queries really interoperable? But this isn't an issue I feel strongly about if there are people that want this feature.)

@merkys
Copy link
Member Author

merkys commented Jun 17, 2022

(I could add that I am a bit concerned over the lack of standardization of the SMARTS language discussed in #368 where it seems one may find different results depending on what SMARTS library the backend is using, so are these SMARTS queries really interoperable? But this isn't an issue I feel strongly about if there are people that want this feature.)

I am concerned as well about the same, but SMARTS seems to be the state of the art for at least a decade now. I suggested SMARTS both as a constructive solution for chemical structure search (need expressed in #368), as well as an argument to my thesis that there is no simple way to do it. We may as well scrap the SMARTS proposal by leaving each implementation to sort out how they would like to do chemical substructure search, but my gut feeling is that most of them would still rely on SMARTS under-the-hood.

@rartino
Copy link
Contributor

rartino commented Jun 17, 2022

We may as well scrap the SMARTS proposal by leaving each implementation to sort out how they would like to do chemical substructure search, but my gut feeling is that most of them would still rely on SMARTS under-the-hood.

Right; if these fields are not interoperable - i.e., database A's filter_smarts does not really mean the same thing as database B's - is there really a benefit in using a standardized field rather than _a_filter_smarts and _b_filter_smarts?

@merkys
Copy link
Member Author

merkys commented Jun 17, 2022

Right; if these fields are not interoperable - i.e., database A's filter_smarts does not really mean the same thing as database B's - is there really a benefit in using a standardized field rather than _a_filter_smarts and _b_filter_smarts?

I am fine with not having OPTIMADE standardization for chemical (sub)structure search. I believe it was @JPBergsma who wanted such queries standardized.

@rartino
Copy link
Contributor

rartino commented Jun 29, 2022

@JPBergsma @d-beltran

I think you are the only ones active in the OPTIMADE repo that I suspect may have some experience with SMARTS? This is a short PR, perhaps you can look it over and just confirm if what is written makes sense to you?

@merkys

If we cannot find two people who are invested in getting this merged, it seems highly unlikely that this will be implemented any time soon. If so, I think we should close this PR for now to de-clutter the active PR list, and re-open it if someone shows up and asks for this feature (probably via #368).

@rartino rartino added PR/ready-for-review Add this flag if you are the author of the PR and you want it to be reviewed. Remove it when editing and removed PR/requires-discussion labels Jun 29, 2022
@merkys
Copy link
Member Author

merkys commented Jun 29, 2022

If we cannot find two people who are invested in getting this merged, it seems highly unlikely that this will be implemented any time soon. If so, I think we should close this PR for now to de-clutter the active PR list, and re-open it if someone shows up and asks for this feature (probably via #368).

Tagging @vaitkus who may as well be interested. We come from the same database, thus it would be best to hear from others as well.

I am not particularly happy about closing prospective PRs though. Closed PRs come very close to being invisible - I guess no one visiting a repository looks at closed PRs. And open PRs give an impression of directions where a project is moving towards. Moreover, they invite for discussion. Thus I would vote for keeping all prospective PRs open. Maybe marking them with some tag (a la PR/requires-discussion) is enough.

@d-beltran
Copy link

Sorry, I have no experience with SMARTS

@JPBergsma
Copy link
Contributor

JPBergsma commented Jul 6, 2022

I think I originally started the whole SMILES/SMARTS discussion after talking with someone from the Ocelot database. They have a database with crystal structures of organic molecules. For them being able top query the topology of the molecule, e.g., the SMILES string, is essential to be able to select the entries.
I think we ended up with the idea of supporting SMARTS because we did not want to treat the SMILES field any different from a standard string. Because multiple valid SMILES strings can be generated for one molecule, a direct string comparison is not possible. So as a compromise, we decided to implement SMARTS, so databases would have a method to implement querying the SMILES string/molecular structure.

I have not worked with SMARTS queries myself.
As we use an external standard, I however do not think that that is a problem. In the end, the important part of this PR is how we treat custom filters.

As I may have mentioned before, it is no longer possible to use one query for all databases with this PR,
as a database that has not implemented a filter will return an error. This used to be one of the selling points of the OPTIMADE query language.
I am therefore wondering whether we could use a virtual field, e.g., a field that can be used for querying like any normal OPTIMADE field but is not present in the returned JSON file. For example:

:query-url:http://example.org/optimade/v1/structures?filter=_filter_smarts=CaaO

would return all entries that contain a carbon atom that is attached to an aromatic atom, which in turn is bound to an aromatic atom bound to an oxygen atom.

This would maintain the full querying flexibility we have now, and if we apply a prefix it should be backwards compatible with existing databases, who would ignore it like any other unknown field with a known prefix.

@rartino
Copy link
Contributor

rartino commented Jul 6, 2022

@JPBergsma

Thanks for the clarification! So, are you (or someone else?) willing to seek out the Ocelot database and ask them to look at this PR? I mean, I understand it isn't exactly trivial to review a PR of a project you are not actively contributing to, but I think we should not merge a feature without some form of direct input from someone with experience of and/or stake in said feature.

As I may have mentioned before, it is no longer possible to use one query for all databases with this PR, as a database that has not implemented a filter will return an error. This used to be one of the selling points of the OPTIMADE query language.

One will never be able to do a SMARTS query on a database built on a backend with no support for it, so to allow this as an optional, but standardized, extension is only positive for interoperability. It is better that databases that support SMARTS do it in one unified way rather than inventing their own non-standard extensions. This is the design idea we've followed for many other optional features.

I am therefore wondering whether we could use a virtual field, e.g., a field that can be used for querying like any normal OPTIMADE field but is not present in the returned JSON file. For example:

:query-url:http://example.org/optimade/v1/structures?filter=_filter_smarts=CaaO

Maybe I'm missing something, but I don't see how this increases interoperability? What is a database that have no means of evaluating _filter_smarts going to do with this query?

@rartino rartino mentioned this pull request Aug 8, 2022
@rartino rartino mentioned this pull request Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR/ready-for-review Add this flag if you are the author of the PR and you want it to be reviewed. Remove it when editing topic/filtering-language Issue discussing changes and improvements to the query and filtering language topic/query-string Issues relating to the query string sent to the OPTIMaDe api, excluding the filter language. type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants