Skip to content

captdeaf/bzwikipedia

Repository files navigation

bzwikipedia:

  Serve wikimedia (Wikipedia, Wiktionary, Wikinews, etc) format websites from
  xml.bz2 compressed files.

  This is intended for people to run on their own laptops, taking few
  resources (once the initial title caching is done), so they can have
  access to wikipedia.

Features:

  * Serves wipedia pages/articles using limited resources: 7.2GB on disk
    and 10-20MB RAM (up to 100MB burst, with search).

  * Fast wiki page access. "search" is fast for the resources given.

  * Advanced title search: Ignoring punctuation, spaces and case.

  * Quick and easy setup.

  * Optionally ignores redirect articles. (Default: ignores redirects)

  * Optionally ignores certain pages. (Default: Ignores metadata pages)

Initial setup:

  Things should work out of the box on anything that has a Go compiler and
  bzip2recover.

1) Download the pages-articles .xml.bz2 file from:

  http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia

2) Drop the .xml.bz2 you just downloaded into the drop/ directory.

  If there is only one .xml.bz2 file, then bzwikipedia will use that. If
  there is more than one, then bzwikipedia will use the one with the most
  recent timestamp in the filename
  (e.g: enwiki-20110803-pages-articles.xml.bz2)

3) Optionally: Edit bzwikipedia.conf to fiddle with your own settings.

4) When using a different wiki: Edit namespace.conf to reflect that.

  The default setup is for the English version of Wikipedia. For different
  language versions of Wikipedia and/or different sites entirely (like
  Wiktionary, for example) you'll need to make some changes here.

5) Start the server:

  Linux: Run "StartWikiServer.sh"

  It will perform initial setup on its own. This can take up to a few hours
  the first time and any time you drop a new .xml.bz2 file into the drop/
  directory.

  NOTE: Unfortunately, when it parses the .xml.bz2 file, it can chew up
  close to a GB of RAM. This is one time only, and I'm considering a process
  to let people download pre-generated titlecache.dat and bzwikipedia.dat
  files.

To access:

Go to http://localhost:2012

How to UPDATE:

  Simply kill the server, drop an updated pages-articles .xml.bz2 file with a
  newer timestamp in its filename (e.g: enwiki-20110803-pages-articles will
  replace enwiki-20110403-pages-articles) into the drop/ directory and start
  the server again.

  Alternately, if you aren't using timestamps in the filenames, run
  ForceUpdate.sh

About

A Go project to serve a local mirror of a wikimedia site from an .xml.bz2 file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published