Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing repo info #41

Open
spyoungtech opened this issue Mar 6, 2021 · 4 comments
Open

Parsing repo info #41

spyoungtech opened this issue Mar 6, 2021 · 4 comments

Comments

@spyoungtech
Copy link

spyoungtech commented Mar 6, 2021

Thank you so much for your work on this.

I'm wondering if there's a known tool or method to parse the repo info MD files or if the data is available in another format, like JSON.

I'm wanting to parse this data to build a tool to assist docker users in being able to re-constitute old docker builds. For example, by inspecting an image, being able to determine the tag(s)/SHA256 image digest of the upstream base image(s). Any thoughts or suggestions would be greatly appreciated.

@spyoungtech
Copy link
Author

I made a short Python script to extract some of the info I was looking for from the MD files.

Can be seen here: https://gist.github.com/spyoungtech/3506d5ae9a9888ec709c8fcad33cfc34

Could be extended to parse additional info, I suppose.

@tianon
Copy link
Member

tianon commented Aug 24, 2021

Yeah, this is something I've thought a lot about (and I know a few people have done privately in the past).

Ideally, this whole repository would be replaced by a real proper service with an API (and boatloads of historical data), but that's a bit more complicated. 😅

Some of this is something Docker Hub really ought to provide out of the box (IMO), but it's not exactly straightforward for them to do so, I think.

@spyoungtech
Copy link
Author

spyoungtech commented Aug 27, 2021

@tianon yeah, this is more or less the exact thing I set out to do and accomplished :- )

Thanks to this project, I was able to get historical information at least for official docker images.

The way registries work doesn't make them, alone, well-suited to the tasks we had in front of us and obviously wouldn't reasonably relate information across registries. Amazon's ECR even times out on certain operations for sufficiently large repositories.

Anyhow, I ended up putting the information from this project into a Postgres database. The main use case for my project was to be able to examine docker images and determine the base image(s) used and their historical tags.

For example, like this:

In [23]: elixer
Out[23]: <Image: elixir@sha256:c3ee088c737bf55150dc5da229ca69e92c5a31eb6ba9da976ade722942c885d3>

In [24]: for base in elixer.bases():
    ...:     print(base)
    ...:     print(base.tags)
debian@sha256:0ba0446bc007a3196501ecbe91aabd4193db81085b23f4a99685448445762396 
['10.9', '10', 'buster-20210511', 'buster', 'latest']
erlang@sha256:d5d8e6be8de1b9e7946c6ed1a6278db1a60ae8ee89c0a65e7165238145ff9b54 
['24-slim', '24.0-slim', '24.0.1-slim', '24.0.1.0-slim', 'slim']

(Django ORM)

So, for example, this particular elixer image was evidently derived from erlang:24.0.1-slim (the tag at the time, anyhow) and the erlang image was derived from debian:buster-20210511

And all the other usual information about an image can be had, exposed through the REST API.

I may be able to open source this in the future or rebuild something similar in the open source space.

@captn3m0
Copy link

captn3m0 commented Oct 7, 2022

@spyoungtech Did you end up publishing this somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants