Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about parsing XML attribute names #640

Open
jasonkhanlar opened this issue Mar 22, 2022 · 0 comments
Open

Question about parsing XML attribute names #640

jasonkhanlar opened this issue Mar 22, 2022 · 0 comments

Comments

@jasonkhanlar
Copy link

jasonkhanlar commented Mar 22, 2022

I have been using Node.js exec() with jq/yq/xq to convert XML<->JSON, and comparing the output for XML->JSON with this xml2js to xq, the JSON data returned is different.

const { error, stderr, stdout } = await exec(`cat ${file}|xq`, {maxBuffer: 1024 * 1024 * 1024});

value of stdout appears as:

{
  mediawiki: {
    '@xmlns': 'http://www.mediawiki.org/xml/export-0.10/',
    '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
    '@xsi:schemaLocation': 'http://www.mediawiki.org/xml/export-0.10/ http://www.mediawiki.org/xml/export-0.10.xsd',
    '@version': '0.10',
    '@xml:lang': 'en',
    siteinfo: {
      sitename: 'Wikipedia',
      dbname: 'enwiki',
      base: 'https://en.wikipedia.org/wiki/Main_Page',
      generator: 'MediaWiki 1.38.0-wmf.24',
      case: 'first-letter',
      namespaces: [Object]
    },
    page: [
      [Object], [Object], [Object], [Object], [Object], [Object]
    ]
  }
}

but when using:

xml2js.parseString(await fs.promises.readFile(file, 'utf8', (err, data) => { if (err) throw err; return data; }), function (err, result) { console.dir(result); });

the output appears as:

{
  mediawiki: {
    '$': {
      xmlns: 'http://www.mediawiki.org/xml/export-0.10/',
      'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
      'xsi:schemaLocation': 'http://www.mediawiki.org/xml/export-0.10/ http://www.mediawiki.org/xml/export-0.10.xsd',
      version: '0.10',
      'xml:lang': 'en'
    },
    siteinfo: [ [Object] ],
    page: [
      [Object], [Object], [Object], [Object], [Object], [Object],
    ]
  }
}

Is there a reason for the difference? or a way to configure the output to match?


Edited to add:

let parser = new xml2js.Parser({
    attrNameProcessors: [ function (name) { console.log(name); return name } ]
});

the names of the attributes still do not appear to retain the '@' symbol at the beginning. I couldn't find any options that preserve the data to appear exactly identical as the source XML data without modification. Is there something I missed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant