Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate snapshot updates #58

Open
lidel opened this issue Sep 9, 2019 · 7 comments
Open

Automate snapshot updates #58

lidel opened this issue Sep 9, 2019 · 7 comments

Comments

@lidel
Copy link
Member

lidel commented Sep 9, 2019

This is a placeholder issue.
Will be updated with more details when we gain better understanding of what is needed here.

In the long run, we want to introduce CI/CD automation that does something along these lines:

Then, maintainer would review PR and merge it.
Updating manifest in master would trigger an update of DNSLink under <lang>.wikipedia-on-ipfs.org, propagating change to collaborative cluster etc.

@kelson42
Copy link

kelson42 commented Sep 9, 2019

@lidel For the updates, we start to advert and use our OPDS feed (which works like an atom feed). I would recommend to use that in the future. See https://wiki.kiwix.org/wiki/OPDS (still in beta).

@lidel
Copy link
Member Author

lidel commented Sep 9, 2019

@kelson42 thats sounds very useful!
what would be a valid query to return the latest snapshot of english or turkish wiki?

Tried https://library.kiwix.org/catalog/search?lang=en&tag=wikipedia but it points at old snapshot: wikipedia_en_wp1-0.8_orig_2010-12.zim

@kelson42
Copy link

kelson42 commented Sep 9, 2019

@lidel This feed delivers the most recent ZIM files... but a few or them are simply not newly generated. Let me know if you find a recent file which is not in it.

@lidel
Copy link
Member Author

lidel commented Sep 10, 2019

@kelson42 I think things like kiwix/kiwix-tools#231 and kiwix/kiwix-tools#316 need to land before we can use OPDS feed.

Right now, I was unable to come up with filters to get the latest English wikipedia with pictures and without video (wikipedia_en_all_novid)

Looking at https://download.kiwix.org/zim/wikipedia/ directly sounds like more robust solution atm.

@mkg20001
Copy link
Contributor

mkg20001 commented Sep 10, 2019

Right now, I was unable to come up with filters to get the latest English wikipedia with pictures and without video (wikipedia_en_all_novid)

In my solution I'm using a dynamic parser, which should solve that

https://github.com/ipfs/distributed-wikipedia-mirror/pull/40/files#diff-31235a619c2d46324cca9e5429d49b3cR106-R132

@kelson42
Copy link

@lidel Looks like you have pretty well identified what needs to be done. An alternative would be to rely on https://download.kiwix.org/library/library_zim.xml (is is not dynamic like the OPDS feed, but easier to parse than HTML)... and more robust.

@alzinging
Copy link

@kelson42 thats sounds very useful!
what would be a valid query to return the latest snapshot of english or turkish wiki?

Tried https://library.kiwix.org/catalog/search?lang=en&tag=wikipedia but it points at old snapshot: wikipedia_en_wp1-0.8_orig_2010-12.zim

We need to be working of MWDumper.pl and the XML bz2 dataset from Wikipedia ... I will do an export to static HTML and collect the required code again, it's "known working".

I'd like to see more functionality here, we need "search and editing". Afaik there is not yet a good marriage of git or wiki and IPFS and it should be core to ... us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants