Skip to content

unified-doc/unified-doc

Repository files navigation

unified-doc

unified document APIs.


Contents

Intro

Vast amounts of human knowledge is stored digitally in different document formats. It is cheap to create, store, render, and manage content for the same document format, but much harder to perform the same operations for content across different formats. Some form of unified bridge is required to significantly lower the friction when working across different formats, resulting in improved sharing of human knowledge.

Instead of implementing custom programs per format to parse/render/search/annotate/export content, unified-doc implements a set of unified document APIs for supported content types. This allows extension of existing APIs to newly introduced content types, and for supported content types to benefit from future API methods.

With unified-doc, we can easily

  • compile and render any content to HTML.
  • format and style the document.
  • mark or annotate the document.
  • search on the document's text content.
  • export the document in a variety of file formats.
  • preserve the semantic structure of the source content.
  • retrieve useful representations of the document (e.g. source, html, text, syntax tree).
  • enrich the document through an ecosystem of plugins.
  • evolve with interoperable web technologies.

Document formats

unified-doc supports the following document formats by implementing parsers associated with the mime type of the document format:

  • most source code supported by syntax highlighting libraries (e.g. .txt, .json, .js, .css, .sh, .py, .r, .cpp)
  • .html
  • .md
  • .csv
  • .docx
  • .epub
  • .pdf
  • .tex
  • .mathml
  • .rtf

Spec

Please refer to the Spec documentation for more details on goals, definitions, and implementations in unified-doc.

Packages

The following packages are managed under the unified-doc project.

APIs

Unified document APIs for Node, CLI, DOM.

Parsers

Parsers parse source content into hast trees.

Search Algorithms

Search algorithms use a unified search interface to return search results based on the provided query when searching across a document's textContent.

Hast Utils

hast utilities operate and transform hast trees.

Wrappers

Wrappers implement unified-doc APIs in other interfaces.

Types

Shared Typescript typings used across unified-doc packages.

Development

This project is:

  • implemented with the unified interface.
  • linted with xo + prettier + tsc.
  • developed and built with microbundle.
  • tested with jest.
  • softly-typed with typescript with checkJs (only public APIs are typed).
  • managed with lerna

Monorepo scripts:

# install dependencies and bootstrap with lerna
npm run bootstrap

# build all packages with microbundle
npm run build

# clean all packages (rm dist + node_modules)
npm run clean

# watch/rebuild all packages with microbundle
npm run dev

# lint all packages with xo + prettier + tsc
npm run lint

# test all packages with jest in --watch mode (make sure to run the 'dev' script)
npm run test

# test all packages in a single run
npm run test:run

# publish all packages with lerna
npm run publish