Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size of apparatus files #49

Open
LucasHorseshoeBend opened this issue Feb 25, 2021 · 7 comments
Open

size of apparatus files #49

LucasHorseshoeBend opened this issue Feb 25, 2021 · 7 comments
Assignees
Labels
presentation Relates to how information should be presented in the website

Comments

@LucasHorseshoeBend
Copy link
Collaborator

The "apparatus" files can be quite long.
The Editors citations is 2.3MB (occupying 87 A4 pages)
Mueller's publications, 1.8 MB, 136 pages, but with some shortish, introductory, distinct components. There is a bookmarked pdf version in the dropbox: VMCP/Apparatus files/Mueller's publications/M Pubs 3 ed to 21 Feb 2021-draft.pdf that shows the structure. This is a file likely to be browsed as well as searched.
Honours, awards and memberships, 118KB , is almost certainly to be mainly browsed, as there are things there that no-one is likely to search for unless they are very knowledgeable. The relevant pdf is bookmarked at two levels, country & city within country.
Complete files are easier for us to maintain, so I would prefer not to have to split the files, as I have done as an example in the case of the interim biographical register, where there are 19 segments.

I have not tried using bookmarks in Word, but should I try to create a test file to
a) see that I can master it and
b) to see how it behaves in the pipeline?

@Conal-Tuohy
Copy link
Owner

Conal-Tuohy commented Mar 1, 2021

Are you suggesting a table of contents, for navigating a hierarchical set of bookmarks?

I see the documents you mention already have headings styled Heading 1, Heading 2, Heading 3, which can be converted to bookmarks in the TEI. There's no need to add them in Word.

I'm going to leave this issue open though to record the need to provide such navigation to the end user in the new presentation system.

@Conal-Tuohy Conal-Tuohy self-assigned this Mar 1, 2021
@Conal-Tuohy Conal-Tuohy added the presentation Relates to how information should be presented in the website label Mar 1, 2021
@LucasHorseshoeBend
Copy link
Collaborator Author

Thanks.

@LucasHorseshoeBend
Copy link
Collaborator Author

I will revisit this, but not urgent now: to find specific items is easy using the browser search when the apparatus file is open

Bookmarking would facilitate browsing those files, but I suspect that not many will want to do it, and those that do would be savvy enough to use sensible search terms to go to a place in the lists

Defer?

@LucasHorseshoeBend
Copy link
Collaborator Author

I understand from Niels that this is something you are now thinking about.

I have revisited this since the site has gone live. In making it live the apparatus files have been converted to static files, but with some problems in the resulting displays, which no longer feed off the pipeline, and are to be updated manually by RGBV about once a month when I supply updated .odt files. I don't pretend to understand the rationale for making them static.

They do need attention as they are slow to load. In doing so we ought to try to make them easy to link from footnotes to the letters.

The important point to remember is that they are of two different types:
Honours &c and Mueller's publications are effectively stand-alone products, and are structured that way. There are very few changes expected in these two files.
Biographical Register and editors citations are much less structured, comprising introductory material plus what are effectively a series of discrete entries, each of which could in principle be a stand-alone small file. These two are being added to almost daily.

Thus a one solution fits all may not be the best approach, but whether that is so I can't judge.

Depending on how RBGV decide to make the Honours &c and Mueller's publications files available as complete products (perhaps downloadable pdfs from somewhere outside the VMCP site??) , it may be possible to treat each of the items in Mueller's publications like entries for the other discrete entry files, but for Honours &c I think that the entries cannot be broken down more finely than to country.

@Conal-Tuohy
Copy link
Owner

I think the solution here must involve splitting these over-large pages into smaller chunks.

If you have an apparatus file called X you can break it into X-part-1, X-part-2, etc, or X-a-e, X-f-l, X-m-z, or X-australia, X-europe, or whatever division is appropriate, and the web pages in that set can be effectively associated using the menu system (e.g. the menu item called X that points at X page can be replaced with a menu item X that contains a sub-menu of items, each pointing at a part of the original X, in the appropriate sequence. If this is done, the drop down menus will reflect that organisation, and also, the sub-menu listing all the parts of X will appear as a kind of table of contents in all the X pages.

@LucasHorseshoeBend
Copy link
Collaborator Author

I need to have some idea of acceptable sizes with subdivsions that make sense to users.
For Mueller's publications decades seem to be too big.
This is what the breakdown would be, saved as .doc to fit with the pipeline:
Screenshot 2023-04-26 at 11 33 21
Looking at those file sizes it would seem as if the 1860s to 1890s would need to be broken down to at least half decades? But they are still big files,
Screenshot 2023-04-26 at 11 53 50
For comparison, the longest letter 56-06-16, is reported as 116 kb.
So perhaps we go the whole hog and do them by year, giving in the order of 50 files? It would not be hard to do, and a lot depends on what you think looks sensible in terms of dropdown menus, and web response times.

@Conal-Tuohy
Copy link
Owner

It's a bit arbitrary, but I'd say we definitely ought to aim for no more than about 200kb of HTML per page, and even smaller is probably better; maybe 100kb. At the moment the "publications" web page is about 2300kb. So yes, perhaps one page per year? Though you could always start with 5-year chunks and then reduce the size further if it feels necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
presentation Relates to how information should be presented in the website
Projects
None yet
Development

No branches or pull requests

2 participants