Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm gui.py, mv launcher and uploader to subfolders #213

Open
wants to merge 19 commits into
base: python3
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Git can be counterintuitive, and [GitHub Desktop](https://desktop.github.com/) o

VSCode has [a guide to source control](https://code.visualstudio.com/docs/sourcecontrol/overview), and it has [an extension for working with GitHub](https://marketplace.visualstudio.com/items?itemName=GitHub.vscode-pull-request-github) which you may also find convenient.

In addition to the tools listed in the basic installation instructions in the main [README](./README.md), you can install [`pre-commit`](https://pre-commit.com/) in order to check and verify your work before submitting it.
In addition to the tools listed in the basic [installation instructions](./INSTALLATION.md), you can install [`pre-commit`](https://pre-commit.com/) in order to check and verify your work before submitting it.

## Contributing Code

Expand Down
34 changes: 15 additions & 19 deletions PUBLISHING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,23 @@ Instructions on using the scripts `launcher` and `uploader` are in the file [Usa

Just use `uploader` (especially if you have multiple wikis): the script takes the filename of a list of wikis as argument and uploads their dumps to archive.org. You only need to:

- Check the 7z compressed dumps are in the same directory as `listfile`. The file `listfile` contains a list of the api.php URLs of the wikis to upload, one per line.
- [Retrieve your S3 keys](http://www.archive.org/account/s3.php), save them one per line (in the order provided) in a keys.txt file in same directory as `uploader`.
- Run the script `uploader listfile`.
* Check the 7z compressed dumps are in the same directory as `listfile`. The file `listfile` contains a list of the api.php URLs of the wikis to upload, one per line.
* [Retrieve your S3 keys](http://www.archive.org/account/s3.php), save them one per line (in the order provided) in a keys.txt file in same directory as `uploader`.
* Run the script `uploader listfile`.

## Manual publishing

- After running dumpgenerator, in each dump folder, select all files, right-click on the selection, click 7-Zip, click `Add to archive...` and click OK.
- At Archive.org, for each wiki [create a new item](http://archive.org/create/).
- Click `Upload files`. Then either drag and drop the 7-Zip archive onto the box or click `Choose files` and select the 7-Zip archive.
- `Page Title` and `Page URL` will be filled in by the uploader.
- Add a short `Description`, such as a descriptive name fopr the wiki.
- Add `Subject Tags`, separated by commas, these are the keywords that will help the archive to show up in a Internet Archive search, e.g. wikiteam,wiki,subjects of the wiki, and so on.
- `Creator`, can be left blank.
- `Date`, can be left blank.
- `Collection`, select `Community texts`.
- `Language`, select the language of the wiki.
- `License`, click to expand and select Creative Commons, Allow Remixing, Require Share-Alike for a CC-BY-SA licence.
- Click `Upload and Create Your Item`.
* After running dumpgenerator, in each dump folder, select all files, right-click on the selection, click 7-Zip, click `Add to archive...` and click OK.
* At Archive.org, for each wiki [create a new item](http://archive.org/create/).
* Click `Upload files`. Then either drag and drop the 7-Zip archive onto the box or click `Choose files` and select the 7-Zip archive.
* `Page Title` and `Page URL` will be filled in by the uploader.
* Add a short `Description`, such as a descriptive name fopr the wiki.
* Add `Subject Tags`, separated by commas, these are the keywords that will help the archive to show up in a Internet Archive search, e.g. wikiteam,wiki,subjects of the wiki, and so on.
* `Creator`, can be left blank.
* `Date`, can be left blank.
* `Collection`, select `Community texts`.
* `Language`, select the language of the wiki.
* `License`, click to expand and select Creative Commons, Allow Remixing, Require Share-Alike for a CC-BY-SA licence.
* Click `Upload and Create Your Item`.

With the subject tag of wikiteam and collection of community texts, your uploads should appear in a search for [subject:"wikiteam" AND collection:opensource](https://archive.org/search?query=subject%3A%22wikiteam%22+AND+collection%3Aopensource).

## Info for developers

- [Internet Archive’s S3 like server API](https://archive.org/developers/ias3.html).
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,14 @@ For prerequisites and installation see [Installation](./INSTALLATION.md)

For usage see [Usage](./USAGE.md)

## Types of dump

There are two types of backups that can be made XML dumps (current and history) and image dumps. Both can be done in one dump.

An XML dump contains the meta-data of the edits (author, date, comment) and the wikitext. An XML dump may be "current" or "history". A "history" dump contains the complete history of every page, which is better for CC-BY-SA licencing and is the default. A "current" dump contains only the last edit for every page.

An image dump contains all the images available in a wiki, plus their descriptions.

## Publishing the dump

Please consider publishing your wiki dump(s). You can do it yourself as explained in [Publishing](./PUBLISHING.md).
Expand All @@ -31,7 +39,12 @@ Please consider publishing your wiki dump(s). You can do it yourself as explaine

## Contributing

For information on reporting bugs and proposing changes, please see the [Contributing](./Contributing.md) guide.
For information on reporting bugs and proposing changes, please see the [Contributing](./CONTRIBUTING.md) guide.

### Further info for developers

* [MediaWiki Action API](https://www.mediawiki.org/wiki/API:Main_page)
* [The Internet Archive Python Library](https://archive.org/developers/internetarchive/)

## Code of Conduct

Expand Down
30 changes: 23 additions & 7 deletions USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,13 @@ In the above example, `--path` is only necessary if the download path is not the

### Checking dump integrity

If you want to check the XML dump integrity, type this into your command line to count title, page and revision XML tags:
#### 1. Count title, page and revision XML tags

Enter this into the command line:

```bash
grep -Ec "<title(.*?)>" *.xml;grep -Ec "<page(.*?)>" *.xml;grep -Ec "</page>" *.xml; \
grep -Ec "<revision(.*?)>" *.xml;grep -Ec "</revision>" *.xml
grep -c "<title(.*?)>" *.xml;grep -c "<page(.*?)>" *.xml;grep -c "</page>" *.xml; \
grep -c "<revision(.*?)>" *.xml;grep -c "</revision>" *.xml
```

You should see something similar to this (not the actual numbers) - the first three numbers should be the same and the last two should be the same as each other:
Expand All @@ -75,6 +77,16 @@ You should see something similar to this (not the actual numbers) - the first th

If your first three numbers or your last two numbers are different, then, your XML dump is corrupt (it contains one or more unfinished ```</page>``` or ```</revision>```). This is not common in small wikis, but large or very large wikis may fail at this due to truncated XML pages while exporting and merging. The solution is to remove the XML dump and re-download, a bit boring, and it can fail again.

#### 2. Confirm the XML dump closes with ```</mediawiki>```

Enter this into the command line:

```bash
tail *.xml | grep '</mediawiki>'
```

You should see ```</mediawiki>``` printed to stdout.

## Viewing MediaWiki XML Dumps

* [XML namespaces](https://www.mediawiki.org/xml/)
Expand Down Expand Up @@ -105,13 +117,17 @@ Each wiki will be stored into files contiaining a stripped version of the url an
By default, a `7z` executable is found on `PATH`. The `--7z-path` argument can be used to use a specific executable instead.

The `--generator-arg` or `-g` argument can be used on the command line to pass through arguments to the `generator` instances that are spawned. For example:
- `--generator-arg=--xmlrevisions` to use the modern MediaWiki API for retrieving revisions
- `--generator-arg=--delay=2` to use a delay of 2 seconds between requests
- `-g=--user -g=USER -g=--pass -g=PASSWORD` to dump a wiki that only logged in users can read
* `--generator-arg=--xmlrevisions` to use the modern MediaWiki API for retrieving revisions
* `--generator-arg=--delay=2` to use a delay of 2 seconds between requests
* `-g=--user -g=USER -g=--pass -g=PASSWORD` to dump a wiki that only logged in users can read

## `Uploader`

The script `uploader` is a way to upload a set of already-generated wiki dumps to the Internet Archive with a single invocation.
The script `uploader` is a way to upload a set of already-generated wiki dumps to the Internet Archive with a single invocation. The script takes the filename of a list of wikis as argument and uploads their dumps to archive.org. You only need to:

* Check the 7z compressed dumps are in the same directory as `listfile`. The file `listfile` contains a list of the api.php URLs of the wikis to upload, one per line.
* [Retrieve your S3 keys](http://www.archive.org/account/s3.php), save them one per line (in the order provided) in a keys.txt file in same directory as `uploader`.
* Run the script `uploader listfile`.

Usage:

Expand Down
19 changes: 19 additions & 0 deletions wikiteam3/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# WikiTeam3 is a project to port the WikiTeam toolset to Python 3 and PyPI.
#
# Copyright (C) 2011-2023 WikiTeam developers and MediaWiki Client Tools
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
# To learn more, read the documentation at
# https://github.com/mediawiki-client-tools/mediawiki-dump-generator
10 changes: 5 additions & 5 deletions wikiteam3/dumpgenerator/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python3

# DumpGenerator A generator of dumps for wikis
# Copyright (C) 2011-2018 WikiTeam developers
# DumpGenerator a generator of dumps for wikis
#
# Copyright (C) 2011-2023 WikiTeam developers and MediaWiki Client Tools
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
Expand All @@ -11,12 +11,12 @@
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#

# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

# To learn more, read the documentation:
# https://github.com/WikiTeam/wikiteam/wiki
# https://github.com/mediawiki-client-tools/mediawiki-dump-generator


from wikiteam3.dumpgenerator.dump import DumpGenerator
Expand Down
4 changes: 3 additions & 1 deletion wikiteam3/dumpgenerator/cli/greeter.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,9 @@ def bye():
print("")
print("If this is a public wiki, please, consider publishing this dump.")
print("Do it yourself as explained in:")
print(" https://github.com/WikiTeam/wikiteam/wiki/Tutorial#Publishing_the_dump")
print(
" https://github.com/mediawiki-client-tools/mediawiki-dump-generator/blob/python3/PUBLISHING.md"
)
print("")
print("Good luck! Bye!")
print("")