Skip to content

lowlighter/xml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

XML stringifier/parser (TypeScript, no dependencies)

jsr.io deno.land/x

Basic usage

import { parse } from "./mod.ts"

console.log(parse(
  `
  <root>
    <!-- This is a comment -->
    <text>hello</text>
    <array>world</array>
    <array>monde</array>
    <array>δΈ–η•Œ</array>
    <array>🌏</array>
    <number>42</number>
    <boolean>true</boolean>
    <complex attribute="value">content</complex>
  </root>
`,
  { reviveNumbers: true, reviveBooleans: true },
))
/*
  Same nodes are grouped into arrays, while numbers and booleans are auto-parsed (can be disabled)
  Nodes with attributes will not be flattened and you'll be able to access them with "@" prefix while
  text nodes are available through "#text" key and comment nodes are available through "#comment" key
  {
    root: {
      "#comment": "This is a comment",
      text: "hello",
      array: ["world", "monde", "δΈ–η•Œ", "🌏"],
      number: 42,
      boolean: true,
      complex: {
        "@attribute": "value",
        "#text": "content",
      }
    }
  }
*/
import { stringify } from "./mod.ts"

console.log(stringify({
  root: {
    "#comment": "This is a comment",
    text: "hello",
    array: ["world", "monde", "δΈ–η•Œ", "🌏"],
    number: 42,
    boolean: true,
    complex: {
      "@attribute": "value",
      "#text": "content",
    },
  },
}))

Features

Follow XML.com's Converting between XML and JSON patterns.

  • Support basic XML features (tags, self-closed tags, nested tags, attributes, ...)
  • Support XML.parse and XML.stringify
  • Support <?xml ?> prolog declaration
  • Support <!DOCTYPE> declaration
  • Support <![CDATA[ ]] strings
  • Support <!-- --> comments
  • Support XML entities (&amp;, &#38;, &#x26;, ...)
  • Support auto-conversion of primitives (strings, booleans, numbers, null, ...)
  • Support strings or streams (ReaderSync & SeekerSync) inputs
  • Support custom revivers and replacers
  • Support metadata (parent, name, ...) hidden in a non-enumerable property
  • Auto-group nodes into arrays when same tag is used
  • Auto-unwrap nodes when it only has text content

How reliable is deno.land/x/xml? Check parse tests and stringify tests πŸ§ͺ

Limitations

  • When using mixed content of texts and child nodes, it will be parsed as a text node
  • When using mixed group of nodes, XML.stringify(XML.parse())) may result in a different order
    • Example: <a><b/><c/><b/></a> will result in <a><b/><b/><c/></a>
    • This may or may not be acceptable depending on your use case

Revivers

By default, node contents will be converted to:

  • null when empty, unless emptyToNull = false
  • number when matching finite numbers, if reviveNumbers = true
  • boolean when matching true or false (case insensitive), if reviveBooleans = true

XML entities (e.g. &amp;, &#38;, &#x26;, ...) will be unescaped automatically.

It is also possible to provide a custom reviver for complex transformations:

import { parse } from "./mod.ts"
console.log(parse(
  `
  <prices>
    <product>dakimakura</product>
    <price currency="usd">10.5</price>
    <price currency="eur">10.5</price>
    <price currency="yen">10.5</price>
    <useless/>
  </prices>
`,
  {
    reviver({ value, key, tag, properties }) {
      //Apply special processing for tag, attributes and properties
      if (tag === "price") {
        if (key === "@currency") {
          return { usd: "$", eur: "€", yen: "Β₯" }[value as string] ?? "?"
        }
        if (key === "#text") {
          delete this["@currency"]
          return `${value}${properties?.["@currency"]}`
        }
      }
      //Filter out useless elements
      if (tag === "useless") {
        return undefined
      }
      return value
    },
  },
))
/*
  Like JSON.parse's reviver, computed value can be transformed before being returned.
  - `this` will refer to the node being edited, meaning that any edition will reflect
    on final parsed value.
  - `properties` can be accessed only after all other node's properties have been
    parsed
  - returning `undefined` (or nothing) will filter out current value
  {
    prices: {
      product: "dakimakura",
      price: [ "10.5$", "10.5€", "10.5Β₯" ]
    }
  }
*/

Replacers

By default, node contents will be converted to:

  • "" when null, unless nullToEmpty = false
  • XML entities (e.g. &;, <, >, ", ' ...) will be escaped when needed, unless escapeAllEntities = true in which case they will always be escaped

XML metadata

It is possible to access several metadata properties using $XML symbol, like parent node, name, etc.

import { $XML, parse } from "./mod.ts"
console.log($XML)
console.log(Deno.inspect(
  parse(
    `
  <root>
    <child>hello world</child>
  </root>
`,
    { flatten: false },
  ),
  { showHidden: true, compact: false },
))
/*
  Symbol("x/xml")
  {
    root: {
      child: {
        "#text": "hello world",
        [Symbol("x/xml")]: {
          name: "child",
          parent: [Circular]
        }
      },
      [Symbol("x/xml")]: {
        name: "root",
        parent: null
      }
    }
  }
*/

Stringify as CDATA

The Symbol("x/xml") for the root document may contain an Array<string[]> where each value contains an xml path towards a node that should be wrapped in CDATA.

For more complex transformations, use a reviver instead.

Parsing large files

Parsing large files of several mega bytes can take some time. You can use progress option to pass a callback each time a node has been parsed.

import { parse } from "./mod.ts"
const file = await Deno.open("my.xml")
const { size } = await file.stat()
console.log(parse(file, {
  progress(bytes) {
    Deno.stdout.writeSync(
      new TextEncoder().encode(
        `Parsing document: ${(100 * bytes / size).toFixed(2)}%\r`,
      ),
    )
  },
}))

Why does this use synchronous API ?

While there are no official specs for XML.parse and XML.stringify, it is intended to look like native JSON handler, hence why it is synchronous, and contains replacers and revivers.

About

πŸ“‘ A XML parser/stringifier for Deno written in typescript and with no dependencies

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project