Skip to content

davidcurie/thesis-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A thesis generator for Vanderbilt University

This repository provides an automated mechanism to generate all parts of a thesis for submission to the Vanderbilt University Graduate School. It also provides a means to generate draft documents of individual chapters in a variety of formats for distribution to reviewers.

Why

Despite how powerful and elegant LaTeX is, writing a thesis as a series of .tex documents is cumbersome. Online tools like Overleaf simplify this tremendously and provide useful features like collaborative editing. Vanderbilt University even provides an official Overleaf template for writing graduate theses. These approaches still require you to write in pure LaTeX syntax.

Pandoc allows you to convert many document types to LaTeX. This means you can write in any format you want, using any editor you want, and convert your document to LaTeX syntax for professional typesetting. Additionally, Pandoc seamlessly handles the conversion of many image types, so if you include images files in your source documents in .svg format, Pandoc will automatically convert them to .png.

Applying special thesis formatting to files generated by Pandoc is possible, but often requires many command-line options. Wrapping these pandoc incantations in a single makefile allows for pain-free generation of target files simply by using make <target> instead.

See the available commands provided by this Makefile below.

If you haven't used Make before, learn more about Reproducibility with Make, or view the basics on how you can use markdown and makefiles to automate publishing.

Requirements

  • Make
  • Pandoc version >= 3.1.8
  • LaTeX distribution with pdf-engine: xelatex
  • Shell with access to basic bash commands: rm, touch, mkdir.
    • Windows: Windows Subsystem for Linux (WSL2)
    • MacOS: Terminal or similar

The Pandoc version supplied by Ubuntu repositories frequently lags behind the latest Pandoc release---sometimes by several years. If you are running WSL with Ubunutu 22.04 LTS, consider installing Pandoc directly from source in your Ubuntu environment. If you install Pandoc through Anaconda/Miniconda, opt for the version from the conda-forge channel.

The following additional package is required if you want to make use of native SVG conversion in your figures:

  • rsvg-convert

rsvg-convert is included in Inkscape in MacOS, and available as a standalone package on MacOS (brew install librsvg), Ubuntu (apt install librsvg2-bin), or Windows (choco install rsvg-convert).

Required files to edit

  • Content in the chapters/ directory
  • An abstract in the extras/ directory
  • A YAML metadata file in the extras/ directory with some required fields

While the make recipes in this project should work on any standard input file, writing the source files in Pandoc markdown format allows for more seamless conversion to other output formats, like EPUB or HTML, which are more tablet-friendly formats.

Chapters are written in the chapters/ directory and will be included in name-sorted order, which means you can control the inserted order by using file names 01-intro and 02-experiment.

The title page for the printed thesis is generated automatically from a fixed template and relevant title words are extracted from user-defined metadata. In particular, this project makes use of the following assumed metadata fields:

title: "A title"
author: First Last
institution: Some University
degree: Doctor of Philosopy
major: Major
date: Month Day, Year
location: City, State
chair:
    name: First Last
    title: Ph.D.
committee:
  - name: First Last
    title: Ph.D.
  - name: First Last
    title: Ph.D.
  - name: First Last
    title: Ph.D.
  - name: First Last # Repeat only as necessary
    title: Ph.D.

Extras

Bibliographies, additional user-defined metadata, and target-specific styling parameters belong in the extras/ directory.

The extras/ directory is explicitly aware of a few file types and file names (case-insensitive) during the Make process.

  • Any file with the word abstract in the file name will be recognized as contents to be included in the required abstract. This file is expected to be content only; no special section markers are necessary, but stylized text is allowed.
  • Any file with the words copyright, dedication, or acknowledgment in the file name will be automatically detected as front matter and included in the appropriate order during a full build.
  • Any and all files with the words appendix in the file name will be automatically detected as an appendix entry and included in the name-sorted order after the bibliography. Each appendix should include its own top-level section header.
  • Any and all files with .yaml or .yml extension will be treated as Pandoc metadata options and will be passed into Pandoc with the --metadata-file flag in name-sorted order.
  • Any .bib file in the extras/ directory will be automatically used by Pandoc during each of the build operations. Specifically, Pandoc will use the --citeproc filter to convert generalized Pandoc citations to target-specific syntax (i.e. \cite{} for LaTeX, <div class=csl-entry> for HTML).
  • Any .csl file placed in the extras/ directory will be automatically applied to the bibliography when using Pandoc
  • Any and all .css file placed in the extras/css/ directory will be applied to the HTML build process.
  • Any and all files placed in the extras/js directory will be inserted verbatim into the header of each HTML file using the --header-includes Pandoc option.

In addition to the required fields outlined in the previous section that must be specified, users can extend their preferences with additional Pandoc metadata fields.

The following snippet placed into any YAML file adds a subtitle to the title page, adjusts the line spacing of the main content with linestretch, adjusts the settings to the LaTeX geometry package, and specifies some extra packages and tweaks to be inserted in the preamble of a LaTeX document.

subtitle: "An optional subtitle"
linestretch: 1.5
geometry:
  - layout=letterpaper
  - top=1in
  - bottom=1in
  - inner=1.5in
  - outer=1in
  - heightrounded
header-includes:
  - \usepackage{parskip}
  - \setlength{\parindent}{20pt}
  - \def\thechapter{\arabic{chapter}}

Getting started

After cloning, forking, or downloading this repository to your preferred location, your publishing process can be as simple as:

  1. Create and edit the required files described above.

  2. Open a terminal at the root of this cloned project.

cd path/to/thesis/
  1. Generate your desired target
make thesis

To explore a minimum working example of a thesis project, see this example branch on GitHub.

If you cloned this repository:

git switch example

Updating

The templates in this repository may need to be updated from time to time, either because of clarifications in the official Overleaf template provided by the Vanderbilt University Graduate School, or because future versions of Pandoc introduce newer features that are incompatible with existing templates.

Re-download

If you don't track your changes in Git, you can delete and re-download or re-clone this repository.

Make a copy of your extras/ and chapters/ folders, re-download this entire repository, and place your extras/ and chapters/ folders in the fresh project.

Git pull

If you cloned this repository, navigate to this repository in a terminal that has access to Git and run the following:

git pull

or the more verbose [equivalent] command:

git pull origin main

GitHub sync fork

If you forked this repository to your own GitHub account so that you could track your own changes in Git, update your upstream reference with one of the two methods below:

  • Under the Code drop-down menu at the top of the repository on GitHub, select "Sync fork".

OR

  • In your local terminal, run the following commands:
git fetch upstream
git merge upstream/main

git merge can equivalently be replaced by git rebase if you have a preference on your own local integration strategy.

How it works

The templates/ directory contains various Pandoc templates, Pandoc settings, and LaTeX styles that are used to generate preconfigured LaTeX files necessary for a complete build. These templates strive to follow published guidelines set forth by the Vanderbilt University Graduate School and do not need to be modified.

The _tmp/ directory holds pre-compiled LaTeX pages for front matter that has non-standard styling, such as the title page. If not present, this directory will be created and populated at runtime. The construction of its files is determined by the template files and from contents in the user-controlled extras/ directory. After a build is complete, the files in _tmp/ may be safely discarded; they will be generated again if needed.

During the build process, the intermediate contents of the _tmp/ directory and the source files in the chapters/ and extras/ directory are converted and assembled with Pandoc all at once. The final output depends on the arguments supplied to make, but all final files will end up in some subfolder of a _build/ directory. If not present, this directory will be created after a make <target> command.

In general, make <target> will build a target directly from the intermediate files. If the intermediate file responsible for the build does not yet exist, it will be created from the source documents. The makefile knows about the dependencies of intermediate files it creates, so if a change occurs to the source file responsible for generating an intermediate file, the intermediate file will first be updated before the target is rebuilt.

This project supplies some required options and safe defaults as metadata options stored in the templates/ directory. These are combined with the user options defined in the extras/ directory. In the case where competing metadata entries are present across several additional files, the value from the last-loaded file persists.

The makefile detects and loads metadata in the following order: DEFAULTS, USER, REQUIRED. This order ensures users cannot accidentally override strictly required styles. If multiple user metadata files are detected, they are loaded in name-sorted order.

Due to the way dictionary merges are handled in Pandoc, metadata fields like geometry: or header-includes: that accept an array of sub-options are completely replaced by the presence of a new entry in a later file. This means that partially supplied fields from multiple documents do not stack, even if the subfields within that entry do not conflict.

See Pandoc's notes on yaml metadata blocks for more details.

Commands

Syntax

make <target>

where <target> is replaced by several options below.

Generate a thesis

Generate a fully compiled thesis (title, optional front matter if present, main content, bibliography, optional appendix if present) and a separate abstract page formatted for submission to print.

make thesis

The results are stored under _build/pandoc/thesis.pdf and _build/pandoc/thesis_abstract.pdf.

Generate review documents

If you wish to share copies of your source documents with your reviewers, you can convert each source document to a target format specified below. The file names of each source document will be preserved in the output target.

Make a PDF of each chapter:

make pdf

Make a draft PDF of each chapter:

make draft

A draft is a PDF document with explicit line numbers displayed in the margins for easier referencing.

Make a responsive HTML page of each chapter:

make html

Make a Word document of each chapter:

make doc

File maintenance

Generate a thesis, abstract, and a review document of each chapter in all target formats:

make all

Remove all target build files:

make clean

Removing all build files will trigger a rebuild of any target from the contents in the _tmp/ directory on the next invocation of make.

Remove all target build files and intermediate files:

make purge

Advanced use

If you prefer manual tweaks to your LaTeX source documents and can't figure out how to achieve it through Pandoc or the Makefile, you can trigger the generation of intermediate LaTeX files in the temporary directory.

Regenerate automatically detected dependencies

Note that these intermediate files may already be generated from make commands, but it may be helpful to manually trigger these builds if your expected make <taget> command has runtime errors.

For any front matter or appendix file that is dynamically built, you can run the following to see the LaTeX source generated behind the scenes, assuming you have the appropriate prerequisite source file:

make _tmp/abstract.tex

Available targets: _tmp/title.tex, _tmp/copyright.tex, _tmp/dedication.tex, _tmp/acknowledgments.tex, _tmp/appendix.tex _tmp/abstract.tex

Any edits you make to these raw files will trigger a rebuild of any target that depends on them the next time you run make on that target1. This means you can edit the abstract page if you don't like a setting automatically generated from the template, and then re-run make abstract to regenerate the abstract with your modified _tmp/abstract.tex source.

If you are making the same tweaks often, you can edit the appropriate template file to your liking, but beware that these changes may be destroyed if you re-sync your repository to this one on GitHub.

Edit a LaTeX version of a thesis

Because the Pandoc process handles conversion and assembly of the source documents in one streamlined process during make thesis, there is no intermediate main.tex file that you can edit to control explicit thesis formatting.

You can force the generation of the intermediate LaTeX document produced by Pandoc with the following command:

make latex

This option inserts all source documents in their converted LaTeX form into one document, _tmp/pandoc.tex according to the minimally modified default Pandoc LaTeX template in templates/pandoc.tex. The make process then runs pdflatex and bibtex in a defined sequence more akin to the MikTex or TexShop programs on Windows or Mac. The result is a PDF file stored in the _build/latex directory, but the auxiliary files are explicitly preserved in _tmp/. From here, you can edit _tmp/pandoc.tex and re-run make latex to update the generated PDF.

The net results of make latex and make thesis are not identical because many of the Pandoc run-time options expire upon generation of the intermediate .tex file, at which point the compilation responsibility is handed off to another program that is not Pandoc. The most crucial change is that in order to run bibtex on the intermediate pandoc.tex file, the make latex build process has to substitute the --citeproc filter for the --natbib filter. The undesired consequence of this is that Pandoc no longer handles the styling of the bibliography according to your supplied .csl file or metadata options.

To fix this, you will likely need to edit the bibliography section in your intermediate _tmp/pandoc.tex file and adjust the style to your liking.

Be mindful that if you edit the source documents in chapters/, any metadata file in extras/, or any dependent front matter or appendix files in _tmp/, the _tmp/pandoc.tex file will be rebuilt. If you find yourself making frequent adjustments to _tmp/pandoc.tex, you can edit templates/pandoc.tex. Saving edits to this file will also trigger a rebuild of _tmp/pandoc.tex on the next invocation of make latex.

In a similar fashion, you can generate an intermediate LaTeX thesis file more exactly aligned to the official Vanderbilt University Overleaf template published in 2021. This uses a heavily modified Pandoc template with all but a limited number of Pandoc options explicitly disabled. In particular, it fills in the hyperref PDF attributes from the title: and author: metadata, inserts any header-includes: statements in the preamble, and checks if any optional front matter or appendix files are present in the extras/ directory and automatically inserts them into their appropriate location in the LaTeX file.

make overleaf

As before, this relies on pdflatex and bibtex under the hood, so the same problems outlined in make latex apply here. The intermediate files in _tmp/ will have overleaf in their name, and the final PDF will be placed in _build/overleaf/. After editing _tmp/overleaf.tex, you can run make overleaf again to update the generated PDF.

Also as before, _tmp/overleaf.tex will be rebuilt on any changes to the source files in chapters/ or any metadata file in extras/. If you find yourself making frequent adjustments to _tmp/overleaf.tex, you can edit templates/overleaf.tex. Saving edits to this file will also trigger a rebuild of _tmp/overleaf.tex. Be mindful that your changes to templates/overleaf.tex may be overwritten if you re-sync your repository.

In addition to the bibliography differences that are described for the make latex process, the additional difference in output format between make thesis and make overleaf is that the Overleaf version uses the article document class, whereas the default Pandoc build uses the book document class. If you want to print a one-sided document like in the article class, you can supply the following option in any user-defined metadata files:

classoption: oneside

Notes on vector images

Both the _tmp/overleaf.tex and the _tmp/pandoc.tex intermediate files include the CTAN package svg to handle .svg graphics, but their format is first rendered to a .pdf and .pdf_tex through Inkscape. For the pdflatex process to complete without errors, you need to have inkscape accessible from the PATH of your terminal. Linux and MacOS users who install Inkscape will have this automatically configured, but Windows users may need to do more. See this post for more background. The key difference in result between make thesis and either make latex or make overleaf is that available text fields from any .svg figure will be rendered by LaTeX instead of their fonts as specified in the source file.

Notes on explicit LaTeX builds

Both the _tmp/overleaf.tex and _tmp/pandoc.tex file will be similar to what you would use directly on Overleaf or in a LaTeX software suite except that the paths of the input files and graphics path are specified as relative paths from the root directory of this project. In Overleaf, you might normally keep your main.tex file in the project root and specify figures and other input paths relative to the source document. In this project, the source document is placed in the intermediate directory. This means you cannot run pdflatex from the _tmp/ directory and expect things to work as normal.

If you want to migrate the results of this build process to your preferred LaTeX suite, the following checklist may be helpful:

  • Copy all contents of _tmp/ to a new directory of your choice (e.g. MyThesis/)
  • Copy all figures from this project to a directory of your choice (e.g. MyThesis/Figures/)
  • Copy the bibliography and any style sheets to a directory of your choice (e.g. MyThesis/)
  • Update the graphics path in overleaf.tex to point to your figures directory relative to overleaf.tex (e.g. \graphicspath{{Figures}})
  • Adjust the input path in overleaf.tex for any additional files to be referenced relative to overleaf.tex (e.g. adjust \input(_tmp/title) to \input(title))
  • Ensure the name and location of the bibliography in overleaf.tex is specified relative to overleaf.tex. (e.g. \bibliography{MyBibliography})
  • Compile your LaTeX document from the directory as you normally would (e.g. compile LaTeX -> run BibTeX -> compile LaTeX -> compile LaTeX)

Summary of make targets

Target Rebuilt on changes to Result in _build/ Engine
thesis chapters/*, extras/* pandoc/thesis.pdf, pandoc/thesis_abstract.pdf pandoc
pdf chapters/* pdf/*.pdf, pandoc
html chapters/* html/*.html, pandoc
doc chapters/* doc/*.docx, pandoc
draft chapters/* draft/*.pdf, pandoc
latex _tmp/pandoc.tex, latex/thesis.pdf pdflatex
abstract _tmp/abstract.tex latex/thesis_abstract.pdf pdflatex
overleaf _tmp/overleaf.tex overleaf/thesis.pdf pdflatex

Footnotes

  1. Builds are triggered whenever a dependency is newer than a target. The dependency hierarchy goes from source -> intermediate -> target. If you ask for a target with make target after editing a source file, the intermediate file will be updated before target is built. If a source file remains untouched and an intermediate file is adjusted, the intermediate file will not be regenerated, but the target will recognize the updated intermediate file and trigger a rebuild.

About

Assemble a thesis from various source files into a LaTeX document formatted for submission to the Vanderbilt University Graduate School

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published