Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure we only rebuild indexes when necessary #265

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

SteveMarshall
Copy link
Member

@SteveMarshall SteveMarshall commented Mar 26, 2024

Previously, we were reindexing on every container rebuild, regardless of whether the mission data or backend code had changed.

Given these are the only things that actually impact indexing, though, we can copy just those to the container, reindex, then copy everything else, meaning that in the majority of cases, container builds are much faster because they're not spending 3-4 minutes reindexing.

To make this work, I've made the Makefile more accurately describe the dependency relationship between stats graphs (at least, the 0th graph, which every mission will have) and transcript files. That way, the later call to make collectstatic doesn't try to re-run the previously always-run make statsporn.

This also enables single-mission stats generation, and clarifies some instructions in the README around reindexing (and some other minor bits and bobs).

@jaylett jaylett marked this pull request as ready for review March 28, 2024 19:00
@jaylett
Copy link
Member

jaylett commented Mar 28, 2024

Breaking the automatic creation of stats for collectstatic is absolutely fine, but I wonder if a better way is to rely less on make for docker builds and drive those Django commands directly. I assume (but may be wrong) that the preferred way (if not now then in theory in future) of doing dev is via docker, so make stops being a sensible choice fairly quickly. Prior to this PR the semantics of collectstatic are "create all static files as necessary then process them into the right place for them to serve correctly". After this PR it's "create some static files but not others then process all of them into the right place for them to server correctly".

My instinct is to break the productioncss > collectstatic link (and introduce a new allstatic target in make if that's necessary). In future the trivial make targets could be elided in the dockerfile, but that would leave us in a semantically clean space.

OR don't do that and change the commit message to state that it's fine but slightly confusing but we can't eliminate use of make because of the current complexity of CSS building.

@jaylett jaylett marked this pull request as draft March 28, 2024 19:06
Base automatically changed from upgrade-to-python-3 to main March 30, 2024 13:55
@SteveMarshall SteveMarshall marked this pull request as ready for review March 30, 2024 18:35
@SteveMarshall SteveMarshall marked this pull request as draft March 30, 2024 18:36
@SteveMarshall
Copy link
Member Author

SteveMarshall commented Mar 30, 2024

@jaylett The more I think about it, the more I think the correct solution here is for me to not be quite so lazy, and to make the Makefile more accurately reflect the dependencies inherent in the graph generation, so I've done that.

To make that work well, though, I've had to remove the mission name from the graph files (which appear to date back to the fort, and are the only images with that namespacing, and seem to work fine without) and make it possible to build a single mission's graphs.

It doesn't seem like Docker's COPY operations properly maintain file timestamps, so the images are always regenerated when building the container (no change from before) but, importantly, don't get regenerated when we call collectstatic.

To your wider point: I'm already using Docker as the default way to develop (and updated the readme to suggest that's the default, too), but am of the opinion that the Dockerfile should describe the environment setup, and the Makefile should describe what we do inside that environment. That way, both stay relatively focussed and easy(ish) to follow. Were it not for the significant slowdown caused by reindexing and rebuilding stats graphs, I probably wouldn't do the splitting-up of make all that I'm doing in this PR, but I think this is a case where it makes sense to make use of the Docker cache.

(I've pushed a fixup to update the commit you already reviewed, so will update the original message to include more of this detail if you agree this all makes sense.)

Let me know what you think!

@jaylett
Copy link
Member

jaylett commented Mar 31, 2024

Yes, that looks better to me than what I was suggesting. Does there need to be a note somewhere about what will Just Work and what Required Commands when developing? Since some of it will be baked into the container, I assume that sometimes you need to stop and rebuild, depending on what you're editing?

Previously, we were reindexing on every container rebuild, regardless
of whether the mission data or backend code had changed.

Given these are the only things that actually impact indexing, though,
we can copy just those to the container, reindex, then copy everything
else, meaning that in the majority of cases, container builds are much
faster because they're not spending 3-4 minutes reindexing.

To make this work, I've made the Makefile more accurately describe the
dependency relationship between stats graphs (at least, the 0th graph,
which every mission will have) and transcript files. That way, the
later call to `make collectstatic` doesn't try to re-run the previously
always-run `make statsporn`.

Because of a limitation in GNU Make's patterns (pattern targets can
only have one `%`), we have to remove the mission ID from the graph
files. This mission ID doesn't appear to serve any purpose (and are
only used on graph images), and dates back to the fort.

There's a slightly weird behaviour in Docker's `COPY` that means it
doesn't appear to properly maintain file timestamps when building
containers, so the images are always regenerated when building the
container (no change from before), but CI builds always start clean, so
that's only a slight issue locally (where we might have already built
the images).
This allows us to pass a list of mission names for generation. It could
probably be smarter, de-duping and so on, but we don't really need that.
I missed a few things when updating the README to migrate us to a
Docker-first world. Specifically, the guidance on reindexing mission
content wasn't clear enough on what to do.

This also fixes a couple of typos, and updates the deployment guidance
now we have continuous deployment to Fly set up.
@SteveMarshall
Copy link
Member Author

SteveMarshall commented Apr 1, 2024

Most of that is actually documented already, but I've noticed a few inconsistencies in the README, so have gone through with a fine-toothed comb to update the guidance on reindexing.

I've also rebased to integrate the fixup, and updated the commit message for the initial commit.

@SteveMarshall SteveMarshall marked this pull request as ready for review April 1, 2024 09:19
@SteveMarshall
Copy link
Member Author

SteveMarshall commented Apr 1, 2024

I've just noticed, however, that the current build process (even prior to these changes) results in non-existent graphs in the dev environment unless you've already built them another way (or run the build process in-container with the external filesystem mounted). I'll work out a fix for that separately (as with getting search working in dev).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants