Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor lektor.builder: note source metadata early; other cleanups #1148

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dairiki
Copy link
Contributor

@dairiki dairiki commented Jun 5, 2023

Changes

Track source file metadata as early as possible in the Artifact build process

As an Artifact is built, sources are tracked as FileInfo objects (which are obtained from the PathCache). The FileInfo notes metadata about its source file: size, mtime, and (sometimes) a hash of the file's (or directory's) contents. After the Artifact has been built, the metadata for all the Artifact's source files are written to the build state database. That data is used to detect when a source has changed, thus triggering a rebuild of the Artifact.

The current implementation of FileInfo defers computation of metadata until it is referenced. This means the metadata stored in the build state database corresponds to some indeterminate time between the beginning and end of the Artifact build.
There is a race condition here. The source metadata stored in the build database may come from a time after the source was used to compute the artifact. If the source file was modified between when it was used to build the artifact and when the metadata was recorded, that modification will be missed by Lektor's dependency tracking system.
Furthermore, different pieces of metadata, e.g. the file size vs the file checksum may correspond to significantly different points in time within that period — if the file changed between those two points of time, the file size and file checksum can correspond to different versions of the source file. Eek.

In this PR we change things so that source file metadata is computed as early as possible in the Artifact build cycle.
As soon as a source file is declared, all metadata that will eventually be stored in the build-state database at the completion of the current Artifact build is noted. This way, if a source file changes at any time after it has been declared as a source, that change will be detected on the next build cycle and the artifact will be rebuilt. As long as the artifact build algorithm makes sure to declare a source as a dependency before using any data in that source, this eliminates any race conditions for the detection of changes in the sources.

Add type annotations

Type annotations are added for the code in lektor.builder.

Clean up database connection handling

Current code was opening a new database connection for nearly every query. Here we open one persistent connection pre thread. We also make use of various context managers to clean up the commit/roll-back of transactions and closing of cursors.

Record st_mtime_ns instead of int(st_mtime)

Python's os.stat() provides an integer-valued st_mtime_ns that provides mtime to sub-second precision (if the OS supports it). We might as well use it.

Issue(s) Resolved

Related Issues / Links

Description of Changes

  • Wrote at least one-line docstrings (for any new functions)
  • Added unit test(s) covering the changes (if testable)

In order to minimize the possibility of race conditions, the idea here
is to capture source state (mtime, size, and possibly checksum) as
early as possible in the build process.  This allows us to detect
(and rebuild during the next build cycle) the case where a source file
changes during the time that an artifact is being generated.

Previously the computation/caching of those file metadata was deferred
until such time as they were requested. (That deferral did not result
in any saving of computational effort, since, at the end of the build
the metadata is always computed and written into the buildstate
database.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant