Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use elementpath for finding via XPath (2.0) #204

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

KeyWeeUsr
Copy link
Contributor

The problem is that lxml uses XPath 1.0. That version does not know how to escape those characters as those are special/forbidden. Instead, we can use elementpath which seems to be quite popular (among other XPath 2.0 related packages), but it breaks the tests.

For XPath 1.0 fixing #123 doesn't seem possible, so perhaps rewriting everything to elementpath.select(root, path) would work if the rest of the codebase was adjusted as well.

Closes #123

@KeyWeeUsr KeyWeeUsr changed the title [WIP] use elementpath for finding via XPath (2.0) Use elementpath for finding via XPath (2.0) Oct 8, 2020
@KeyWeeUsr
Copy link
Contributor Author

Unfortunately the group lookup returns an empty list from the elementpath.select() and I have no clue why, so only for this single particular XPath I've changed it back to the tree.xpath() call and the rest works nicely with XPath 2.0. All tests passed, so the quotes + the rest works.

.......................................................................................
----------------------------------------------------------------------
Ran 87 tests in 12.947s

OK

@Evidlo
Copy link
Member

Evidlo commented Oct 18, 2020

How about this function (taken from here):

def escape_string_for_xpath(s):
    if '"' in s and "'" in s:
        return 'concat(%s)' % ", '\"',".join('"%s"' % x for x in s.split('"'))
    elif '"' in s:
        return "'%s'" % s
    return '"%s"' % s

Then we'd modify the strings in xpath.py and remove the double quotes around the {} substitutions and let the function above figure out quoting.

e.g.

>>> kp = PyKeePass('test4.kdbx', 'password', 'test4.key')
>>> title_xpath = '/String/Key[text()="Title"]/../Value[text()={}]/../..'
>>> title = 'quote test -> " <-'
>>> kp._xpath('//Entry' + title_xpath.format(escape_string_for_xpath(title)), cast=True, first=True)
Entry: "quote test -> " <- (None)"

@Evidlo
Copy link
Member

Evidlo commented Nov 11, 2020

Is there a reason we should prefer elementpath to the above solution? For the sake of people packaging pykeepass, I want to avoid adding any more dependencies unless its really necessary.

@Evidlo
Copy link
Member

Evidlo commented Dec 13, 2020

Bump.

@KeyWeeUsr
Copy link
Contributor Author

One of the reason is that elementpath allows usage of XPath 1.0 as well as XPath 2.0 and by using the latter we're leveraging out-of-box support for quotes and other fixes, so overall it'd be better to use already newer model which will be even easier for switching to 3.x if/when required.

The other reason is that basically the escaping function is error prone and if there's any other character that XPath 1.0 doesn't like it'll break. Creating a list of unsupported characters and trying to escape them this way is rather ugly.

Re the dependencies reason:

  • we can't just be without deps nowadays, CPython itself has quite a lot of them already just for working and lxml is one of them
  • if we don't want to handle XML parsing ourselves, we need something/someone whose primary use-case is actually doing that
  • elementpath has no deps currently and is pure Python

@pschmitt
Copy link
Member

Big +1 for merging this

@Evidlo
Copy link
Member

Evidlo commented Jan 16, 2021

The other reason is that basically the escaping function is error prone and if there's any other character that XPath 1.0 doesn't like it'll break. Creating a list of unsupported characters and trying to escape them this way is rather ugly.

If there are other characters besides " which are problematic, then I agree we shouldn't try to handle this ourselves. However, I'm not aware of any others characters that have this problem yet. Also it seems like this PR still needs special quote escaping.

As for all the new features of XPath 2.0, I don't see a need right now, as our queries in xpath.py are pretty simple.

In my opinion, we should keep this PR on the backburner until new issues arise, then take a second look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_xpath function throws exception when title or username contains quote
3 participants