-
-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move image copyrights to image #8863
Comments
One problem with this is that some of the tools for keeping image size down will strip EXIF data; I'm not sure if |
Just like @cooljeanius said, I remember that one of the things that |
I could also remove it from the CI check and then update everything en-masse as part of the release work for each point release. |
I'm not convinced by either of those excuses.
|
That sounds like it would shift the meta-work to you, and doesn't seem right if there is a better way. Maybe there isn't a better way, but I'd like to try. A side benefit of using the EXIF data is that the artists, Wesnoth, and CC licenses could get marginally more exposure, as the image is grabbed and reused. |
Probably using EXIF data in the webp files would be fine, since optiwebp doesn't do anything besides check if converting from png/jpg to webp results in a meaningful size reduction. It doesn't touch existing webp images or do anything else to reduce file size, though I have no idea if EXIF data would survive the conversion to webp for existing images. |
I think this is definitely worth exploring. Having author and copyright info in the files themselves would mean they are still there even if someone uses them in their own project without copying the copyright.csv entries in some form. Currently utils/woptipng.py calls convert -strip which removes nearly all meta data. Didn't test utils/optiwebp.py since you can't even run it on a path you want. cwebp itself appears to only care for meta data when given the -metadata option. So our tooling would certainly need adjusting. Btw, not stripping meta data at all is not a good option as can be seen in e3ee472 for example where image sizes increase by thousands of percent in size just because of the added meta data. |
I've got a bit of a learning curve to get through to actually make this happen, but it looks like it should be possible to do this in CI:
Then as a manually run script (not necessarily all one script):
|
Even if the copyright information is kept in the images themselves, we could still keep a file around with the hashes of all the images files and check against that. For the actual question: IIRC it might have something to do with the shallow checkout ( |
Right, I was trying to say that was part of the deal: there is a bookkeeping file that is human readable, and there are scripts to deal with it. It's just not part of CI; the images and whatever .cfg or .lua references them are all that need be touched in a PR. Storing the image hashes there makes sense, and there is no reason the bookkeeping file can't be automatically updated. Eventually... maybe...
I hadn't looked at that, it's something to try, thanks. It's very frustrating that the info is right there, it just can't be touched without a whole lot of BS overhead. |
I finally got out of that github cathole, and have a mostly working metadata check. (My struggle, that completely trashed the commit history of one of my repositories, seems to have more to do with github webhooks than anything purely git related.)
Figuring out how to make a PR for this sounds like another cathole I don't want to squish around in right now, probably best to get something mostly finished first. I'll update the first post when there's progress. But for now, the specific check I'm looking at is:
|
I am a bit worried that this will result in a noticable size increase of the wesnoth github repo since it could mean that we will change the image files more often in the future. Unlike text files, git usually cannot handle binay files very well and afaik any change to them is stored as a full replacement of the image file. |
How worried are you, really? I'm not seeing it, you've got to explain. |
@gfgtdf I assume that'd be pretty easy to test out? |
So i'm not really an expert on the git details, so what i'm saying might be wrong, but i think i remember from an earlier discussion that it's a bad idea to change binary files too often in git repos because the (git history) compression cannot really store diffs of them, so that the git history can become very large if binary files get changed often. (a quick search on the internet gave me different contradictory results on this topic so i actually don't know. Some sources say this only applies extra compressed binary files and that compression part of git doesn't really make a diffrence between binary and text files, and other pages say git simply cannot do this for any kind of binary files at all) I also don't know how often this change will actually change the files, (except once of course to add the first metadata, which would be fine in i any case i guess). My worries were more about cases where we later decide to add extra data, change the format, add authors etc.
Its a bit tricky actually since one needs to make sure to compare compressed git histoy to compressed git history, but sureley possible. |
I mean that all you'd need to do would be to make a metadata change and then check how much data changed in the diff, wouldn't you? If the diff only says that 50 bytes where added, then surely git doesn't do a full replacement anyway. |
I doubt you can make such an assumption. It's certainly possible to do a binary diff which works out very similar in functionality to a text diff, but even if only 50 bytes were added, there's a pretty high chance that a lot more bytes were changed. And there's a point where the diff would end up being bigger than the original file. I'm not sure what it would take to hit that point with image or sound files though. |
Based on https://stackoverflow.com/a/59346690 and https://stackoverflow.com/a/53648035, the answer seems to be:
So I don't think ballooning the git history is a reason this can't be done. |
This was the source of my confusion; I don't understand why we'd suddenly start changing this info frequently, if at all. Moot point, I guess, if what Pentarctagon found is true. BTW, where does one find a repository's resource usage? If we continue to use the |
What do you mean by a repository's resource usage, specifically? |
Probably something similar to the total size of the repository history on disk? |
Total resources that Wesnoth is being granted by the Microsoft Corporation, or any subset of that (disk space, data transfer, anything else). I can't find any online meter, but I'm not curious enough to clone and check my disk usage. I was mostly wondering about the context of gfgtdf's objection. Back on topic, I've got a maintenance script (derived from If no one has any other suggestions, I'll consider it settled that these are the tags:
|
Describe the desired feature
We currently (since 97c8feb) track image copyrights in a CSV, with some support script and CI enforcement. This makes sure we don't end up with images of unknown or forgotten authors/copyrights. But then this involves meta-work and potential for merge conflicts whenever an image is added or modified.
We could use the EXIF tags of the WEBP, PNG, JPEG images to hold all the necessary info, and check that added/updated images have the info. A human-readable summary file (essentially replacing the current CSV) can be printed whenever, no reason for it to create merge conflicts.
EDIT: I'm working on a draft of this in one of my own repositories, because it's not clear to me how to test this locally before incorporating into the Wesnoth repository. This matters most for the CI workflow, the maintenance scripts can probably be a proper PR.
Once the scripts are done, I don't think it's too much to ask that anyone deciding to batch run some image optimizing script also run the Exif metadata update.
EDIT2: This is a functional draft of the maintenance script to read the Exif data from the images and write to CSV
export_copyrights.txt
The text was updated successfully, but these errors were encountered: