PDF corpus

This project allows to quickly create hand-crafted PDF files. The main Python script pdf-corpus.py is an ad-hoc template engine to easily prototype new PDFs.

Installation

To compile the corpus, just make it (you need a Python interpreter). All .txt files contained in the corpus/ folder are then converted into PDFs.

Description

Each PDF in the corpus is described by a .txt file that indicates the template to use and the content to insert in the template. The following templates are defined, but you can easily create your own by tweaking the Python code.

contentstream: A simple document containing one page in A4 format. You define the graphic commands to put in the page's content stream (see my cheat sheet). For convenience, a font resource is declared as /F1.
objects: A lower level template to directly declare objects. Simple streams can be defined, for which the template computes the /Length field.

Available corpus

The corpus already contains some files. These examples are classified into the following categories.

corpus/contentstream/: Playing with graphics instructions.
corpus/name/: Escape sequences in names.
corpus/number/: How numbers are parsed.

If you want to learn more about how these examples work, you can have a look at my blog posts: introduction to PDF syntax. I also make one-page cheat sheet(s) about PDF. For further details you can also dive into the PDF specification.

Disclaimer

Once compiled, these example files may not be fully compliant with the specification. In particular, they may be interpreted differently by different PDF readers.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
corpus		corpus
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build-pdf.py		build-pdf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus

corpus

src

src

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

build-pdf.py

build-pdf.py

Repository files navigation

PDF corpus

Installation

Description

Available corpus

Disclaimer

License

About

Releases

Packages

Languages

License

gendx/pdf-corpus

Folders and files

Latest commit

History

Repository files navigation

PDF corpus

Installation

Description

Available corpus

Disclaimer

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages