Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed "regexp error" when using libxml2 to load the xsd file #118

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pushrbx
Copy link

@pushrbx pushrbx commented Mar 24, 2022

I'm currently working on a template system, where I generate the XML files, and I wanted to validate them. I inderictly use libxml2 from python via lxml to validate the generated XML files with the TrustFrameowrkPolicy_0.3.0.0.xsd schema file, but I get errors saying that line 3689 of the xsd file contains an invalid regular expression pattern.

From xmllint:

regexp error : failed to compile: Wrong escape sequence, misuse of character '\'
regexp error : failed to compile: xmlFAParseCharClass: ']' expected
regexp error : failed to compile: xmlFAParseRegExp: extra characters
../policies/TrustFrameworkPolicy_0.3.0.0.xsd:3689: element pattern: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}pattern': The value '^urn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\/\-.:=@;$_!*'%\/?#]+$' of the facet 'pattern' is not a valid regular expression.
WXS schema ../policies/TrustFrameworkPolicy_0.3.0.0.xsd failed to compile

From python (in WSL/Ubuntu):

Validating files...
Traceback (most recent call last):
  File "/mnt/c/Users/pushrbx/PycharmProjects/aad-b2c-extensions/pman.py", line 169, in <module>
    main()
  File "/mnt/c/Users/pushrbx/PycharmProjects/aad-b2c-extensions/pman.py", line 161, in main
    build(config)
  File "/mnt/c/Users/pushrbx/PycharmProjects/aad-b2c-extensions/pman.py", line 99, in build
    validate_built_xml_files()
  File "/mnt/c/Users/pushrbx/PycharmProjects/aad-b2c-extensions/pman.py", line 45, in validate_built_xml_files
    xmlschema = etree.XMLSchema(xmlschema_doc)
  File "src/lxml/xmlschema.pxi", line 89, in lxml.etree.XMLSchema.__init__
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}pattern': The value '^urn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\/\-.:=@;$_!*'%\/?#]+$' of the facet 'pattern' is not a valid regular expression., line 3689

You can also reproduce the issue with the command line tools of libxml2:

  1. On ubuntu: sudo apt install libxml2-utils
  2. xmllint --schema TrustFrameworkPolicy_0.3.0.0.xsd TrustFrameworkBase.xml --noout

With python you can reproduce it the following way:

  1. Python 3.8+ is required.
  2. pip install lxml==4.8.0 cython==0.29.28
  3. Create a python file repro.py
  4. Write the following in the repro.py file:
from lxml import etree

with open("TrustFrameworkPolicy_0.3.0.0.xsd") as f:
    xmlschema_doc = etree.parse(f)
    xmlschema = etree.XMLSchema(xmlschema_doc)

with open("TrustFrameworkBase.xml"):
    doc = etree.parse(xml_file)
    xmlschema.assertValid(doc)

This PR addresses the issue. I need to test this with VSCode too, but I'm not using it on day to day basis, so it would be great if somebody could test this or point me to the right direction so I can set it up myself.

P.S.: Sorry about the whitespace changes.

@ghost
Copy link

ghost commented Mar 24, 2022

CLA assistant check
All CLA requirements met.

@pushrbx pushrbx changed the title Fixed "invalid pattern" error when using libxml2 to load the xsd file Fixed "regexp error" when using libxml2 to load the xsd file Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant