Use elementpath for finding via XPath (2.0) #204

KeyWeeUsr · 2020-10-05T19:56:29Z

The problem is that lxml uses XPath 1.0. That version does not know how to escape those characters as those are special/forbidden. Instead, we can use elementpath which seems to be quite popular (among other XPath 2.0 related packages), but it breaks the tests.

For XPath 1.0 fixing #123 doesn't seem possible, so perhaps rewriting everything to elementpath.select(root, path) would work if the rest of the codebase was adjusted as well.

Closes #123

Originally using re:test() which is: boolean regexp:test(string, string, string?) http://exslt.org/regexp/functions/test/index.html In XPath 2.0 it's unnecessary to import 3rd-party stuff to use regex and the func signature matches. The trick is not to support "None" as "flags" or everything crashes. https://www.w3.org/TR/xpath-functions/#func-matches Also, this e-mail chain: https://exslt.fourthought.narkive.com/qFTvc1I2/exslt-for-xslt-2-0-a-new-life-for-exslt

KeyWeeUsr · 2020-10-08T22:57:40Z

Unfortunately the group lookup returns an empty list from the elementpath.select() and I have no clue why, so only for this single particular XPath I've changed it back to the tree.xpath() call and the rest works nicely with XPath 2.0. All tests passed, so the quotes + the rest works.

.......................................................................................
----------------------------------------------------------------------
Ran 87 tests in 12.947s

OK

Evidlo · 2020-10-18T06:48:26Z

How about this function (taken from here):

def escape_string_for_xpath(s):
    if '"' in s and "'" in s:
        return 'concat(%s)' % ", '\"',".join('"%s"' % x for x in s.split('"'))
    elif '"' in s:
        return "'%s'" % s
    return '"%s"' % s

Then we'd modify the strings in xpath.py and remove the double quotes around the {} substitutions and let the function above figure out quoting.

e.g.

>>> kp = PyKeePass('test4.kdbx', 'password', 'test4.key')
>>> title_xpath = '/String/Key[text()="Title"]/../Value[text()={}]/../..'
>>> title = 'quote test -> " <-'
>>> kp._xpath('//Entry' + title_xpath.format(escape_string_for_xpath(title)), cast=True, first=True)
Entry: "quote test -> " <- (None)"

Evidlo · 2020-11-11T16:43:49Z

Is there a reason we should prefer elementpath to the above solution? For the sake of people packaging pykeepass, I want to avoid adding any more dependencies unless its really necessary.

Evidlo · 2020-12-13T00:26:05Z

Bump.

KeyWeeUsr · 2020-12-14T20:30:04Z

One of the reason is that elementpath allows usage of XPath 1.0 as well as XPath 2.0 and by using the latter we're leveraging out-of-box support for quotes and other fixes, so overall it'd be better to use already newer model which will be even easier for switching to 3.x if/when required.

The other reason is that basically the escaping function is error prone and if there's any other character that XPath 1.0 doesn't like it'll break. Creating a list of unsupported characters and trying to escape them this way is rather ugly.

Re the dependencies reason:

we can't just be without deps nowadays, CPython itself has quite a lot of them already just for working and lxml is one of them
if we don't want to handle XML parsing ourselves, we need something/someone whose primary use-case is actually doing that
elementpath has no deps currently and is pure Python

pschmitt · 2020-12-15T07:07:37Z

Big +1 for merging this

Evidlo · 2021-01-16T04:21:35Z

The other reason is that basically the escaping function is error prone and if there's any other character that XPath 1.0 doesn't like it'll break. Creating a list of unsupported characters and trying to escape them this way is rather ugly.

If there are other characters besides " which are problematic, then I agree we shouldn't try to handle this ourselves. However, I'm not aware of any others characters that have this problem yet. Also it seems like this PR still needs special quote escaping.

As for all the new features of XPath 2.0, I don't see a need right now, as our queries in xpath.py are pretty simple.

In my opinion, we should keep this PR on the backburner until new issues arise, then take a second look.

KeyWeeUsr added 5 commits October 5, 2020 21:02

use elementpath for finding via XPath (2.0)

42da47d

escape only double quotes, apostrophe is ok

a0286a5

curly bracket is a reserved symbol in regex

6434fa5

fallback to original tree.xpath() for Group ancestor lookup

a0a02b5

KeyWeeUsr changed the title ~~[WIP] use elementpath for finding via XPath (2.0)~~ Use elementpath for finding via XPath (2.0) Oct 8, 2020

Evidlo mentioned this pull request Jan 16, 2021

unescaped slashes for entry names/paths #205

Closed

shadow1runner mentioned this pull request Mar 23, 2021

entry._get_string_field fails if key contains a double quote due to missing XPath escaping #254

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use elementpath for finding via XPath (2.0) #204

Use elementpath for finding via XPath (2.0) #204

KeyWeeUsr commented Oct 5, 2020

KeyWeeUsr commented Oct 8, 2020

Evidlo commented Oct 18, 2020 •

edited

Evidlo commented Nov 11, 2020

Evidlo commented Dec 13, 2020

KeyWeeUsr commented Dec 14, 2020

pschmitt commented Dec 15, 2020

Evidlo commented Jan 16, 2021

Use elementpath for finding via XPath (2.0) #204

Are you sure you want to change the base?

Use elementpath for finding via XPath (2.0) #204

Conversation

KeyWeeUsr commented Oct 5, 2020

KeyWeeUsr commented Oct 8, 2020

Evidlo commented Oct 18, 2020 • edited

Evidlo commented Nov 11, 2020

Evidlo commented Dec 13, 2020

KeyWeeUsr commented Dec 14, 2020

pschmitt commented Dec 15, 2020

Evidlo commented Jan 16, 2021

Evidlo commented Oct 18, 2020 •

edited