Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Exclude characters with special meaning in Lucene Query Parser syntax from searchbylabel search (DEV-1446) #2269

Merged

Conversation

irinaschubert
Copy link

@irinaschubert irinaschubert commented Oct 26, 2022

Issue Number: DEV-1446

Pull Request Checklist

Basic Requirements

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • Bugfix: represents bug fixes
  • Refactor: represents production code refactoring
  • Feature: represents a new feature
  • Documentation: documentation changes (no production code change)
  • Chore: maintenance tasks (no production code change)
  • Style: styles updates (no production code change)
  • Test: all about tests: adding, refactoring tests (no production code change)
  • Other... Please describe:

Does this PR introduce a breaking change?

  • Yes
  • No

Does this PR change client-test-data?

  • Yes (don't forget to update the JS-LIB team about the change)
  • No

Other information

@swarmia
Copy link

swarmia bot commented Oct 26, 2022

@irinaschubert irinaschubert self-assigned this Oct 26, 2022
@codecov
Copy link

codecov bot commented Oct 27, 2022

Codecov Report

Base: 86.85% // Head: 86.99% // Increases project coverage by +0.13% 🎉

Coverage data is based on head (b79723d) compared to base (c5c98ce).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2269      +/-   ##
==========================================
+ Coverage   86.85%   86.99%   +0.13%     
==========================================
  Files         241      242       +1     
  Lines       27967    28066      +99     
==========================================
+ Hits        24292    24415     +123     
+ Misses       3675     3651      -24     
Impacted Files Coverage Δ
...c/main/scala/org/knora/webapi/core/AppServer.scala 89.55% <ø> (ø)
...ebapi/instrumentation/InstrumentationSupport.scala 100.00% <ø> (ø)
...p-shared/src/main/scala/dsp/valueobjects/Iri.scala 94.18% <100.00%> (-0.14%) ⬇️
...sp-shared/src/main/scala/dsp/valueobjects/V2.scala 89.85% <100.00%> (+5.79%) ⬆️
.../org/knora/webapi/messages/OntologyConstants.scala 99.62% <100.00%> (+<0.01%) ⬆️
...la/org/knora/webapi/messages/StringFormatter.scala 90.18% <100.00%> (+0.01%) ⬆️
...sp-shared/src/main/scala/dsp/valueobjects/Id.scala 65.38% <0.00%> (-3.85%) ⬇️
...n/scala/org/knora/webapi/routing/HealthRoute.scala 69.81% <0.00%> (-1.89%) ⬇️
...r/permissionsmessages/PermissionsMessagesADM.scala 86.39% <0.00%> (-0.32%) ⬇️
...ages/util/search/gravsearch/GravsearchParser.scala 68.13% <0.00%> (-0.26%) ⬇️
... and 13 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@irinaschubert irinaschubert marked this pull request as ready for review October 27, 2022 18:20
Copy link
Collaborator

@BalduinLandolt BalduinLandolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (and you beat me to it - I was about to comment on the duplicated test data file when you removed it ^^)

Comment on lines -432 to 436
val searchString =
val sparqlEncodedSearchString =
stringFormatter.toSparqlEncodedString(
searchval,
throw BadRequestException(s"Invalid search string: '$searchval'")
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this step still necessary if we replace some special characters anyways? I haven't checked - but I thought that just replaces some characters too

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toSparqlEncodedString returns an error when there is a new line or an empty string handed over. Also, the characters that are replaced are different than the ones that are replaced because of the Lucene Query Parser syntax. I thought it is better to leave them both separate - although there is some overlap (\" and \\).

?resource <http://jena.apache.org/text#query> "@searchTerm.generateLiteralForLuceneIndexWithoutExactSequence" .
?resource <http://jena.apache.org/text#query> (rdfs:label "@searchTerm.generateLiteralForLuceneIndexWithoutExactSequence") .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what influence does this change have? has our search by label been defective all along?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update doesn't affect anything. It's just good practice to be as explicit as possible, according to the Apache Jena documentation, so I thought I add this.

@irinaschubert irinaschubert changed the title fix: Exclude special characters in searchbylabel search (DEV-1446) fix: Exclude characters with special meaning in Lucene Query Parser syntax from searchbylabel search (DEV-1446) Oct 28, 2022
@irinaschubert irinaschubert merged commit b359916 into main Oct 28, 2022
@irinaschubert irinaschubert deleted the wip/dev-1446-exclude-special-characters-in-searchbylabel branch October 28, 2022 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants