Skip to content

Commit

Permalink
0.3.10 (#18)
Browse files Browse the repository at this point in the history
* Add categorytree
* New exception type: MediaWikiCategoryTreeError
* Update documentation
* Additional test coverage
* Simplify references and categories properties
  • Loading branch information
barrust committed Jan 14, 2017
1 parent 76fa271 commit c2c6b4a
Show file tree
Hide file tree
Showing 14 changed files with 277,514 additions and 57,592 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Expand Up @@ -2,6 +2,11 @@

## Current

### Version 0.3.10

* Add categorytree support
* Remove adding 'http:' to references if missing

### Version 0.3.9

* Fix infinite loop on continued queries: [issue #15](https://github.com/barrust/mediawiki/issues/15)
Expand Down
1 change: 1 addition & 0 deletions docs/source/code.rst
Expand Up @@ -37,6 +37,7 @@ Indices and tables
==================

* :ref:`home`
* :ref:`quickstart`
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
3 changes: 2 additions & 1 deletion docs/source/conf.py
Expand Up @@ -120,8 +120,9 @@
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
# html_theme = 'alabaster'
# html_theme = 'bizstyle'
html_theme = 'flask_small'

# html_theme = 'kr'

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
Expand Down
20 changes: 18 additions & 2 deletions docs/source/index.rst
Expand Up @@ -2,14 +2,29 @@

MediaWiki
=========
.. image:: https://badge.fury.io/py/pymediawiki.svg
:target: https://badge.fury.io/py/pymediawiki
.. image:: https://travis-ci.org/barrust/mediawiki.svg?branch=master
:target: https://travis-ci.org/barrust/mediawiki
:alt: Build Status
.. image:: https://coveralls.io/repos/github/barrust/mediawiki/badge.svg?branch=master
:target: https://coveralls.io/github/barrust/mediawiki?branch=master
:alt: Test Coverage
.. image:: https://api.codacy.com/project/badge/Grade/afa87d5f5b6e4e66b78e15dedbc097ec
:target: https://www.codacy.com/app/barrust/mediawiki?utm_source=github.com&utm_medium=referral&utm_content=barrust/mediawiki&utm_campaign=Badge_Grade
:alt: Codacy Review
.. image:: https://img.shields.io/badge/license-MIT-blue.svg
:target: https://opensource.org/licenses/MIT/
:alt: License

MediaWiki is a python library to help pull information from MediaWiki sites
using the MediaWiki API. It provides a simple and, hopefully, intuitive
manner of accessing the data and returning it in standard python data types.

MediaWiki wraps the `MediaWiki API <https://www.mediawiki.org/wiki/API>`_
so you can focus on leveraging your favorite MediaWiki site's data,
not getting it.
not getting it. Please check out the code on
`github <https://www.github.com/barrust/mediawiki>`_!

.. code: python
Expand All @@ -24,7 +39,8 @@ Go to the :ref:`quickstart` to start using ``mediawiki`` now, or see the :ref:`a
Indices and tables
******************

* :ref:`api`
* :ref:`quickstart`
* :ref:`genindex`
* :ref:`modindex`
* :ref:`api`
* :ref:`search`
18 changes: 13 additions & 5 deletions docs/source/quickstart.rst
Expand Up @@ -16,16 +16,24 @@ please see :ref:`api`.
Install
^^^^^^^

Using pip
"""""""""

::

$ pip install pymediawiki

From source
"""""""""""

Begin by installing Wikipedia: simply clone the
`repository on GitHub <https://github.com/barrust/Wikipedia>`__,
`repository on GitHub <https://github.com/barrust/mediawiki>`__,
then run the following command from the extracted folder:

::

$ python setup.py install

Hopefully in the near future `mediawiki` will be available using pip

Setup
^^^^^

Expand Down Expand Up @@ -141,7 +149,7 @@ The revision id of the page
pageid
"""""""""""

The revision id of the page
The page id of the page

.. code: python
Expand All @@ -162,7 +170,7 @@ The revision id of the page
parent_id
"""""""""""

The parent_id id of the page
The parent id of the page

.. code: python
Expand Down
6 changes: 4 additions & 2 deletions mediawiki/__init__.py
Expand Up @@ -4,7 +4,8 @@
from .mediawiki import (MediaWiki, MediaWikiPage, URL, VERSION)
from .exceptions import (MediaWikiException, PageError, MediaWikiGeoCoordError,
RedirectError, DisambiguationError,
MediaWikiAPIURLError, HTTPTimeoutError)
MediaWikiAPIURLError, HTTPTimeoutError,
MediaWikiCategoryTreeError)

__author__ = 'Tyler Barrus'
__maintainer__ = 'Tyler Barrus'
Expand All @@ -16,4 +17,5 @@

__all__ = ['MediaWiki', 'PageError', 'RedirectError', 'MediaWikiException',
'DisambiguationError', 'MediaWikiAPIURLError',
'HTTPTimeoutError', 'MediaWikiGeoCoordError']
'HTTPTimeoutError', 'MediaWikiGeoCoordError',
'MediaWikiCategoryTreeError']
14 changes: 14 additions & 0 deletions mediawiki/exceptions.py
Expand Up @@ -120,3 +120,17 @@ def __init__(self, error):
'error: {0} - Please use valid coordinates or a proper '
'page title.').format(self.error)
super(MediaWikiGeoCoordError, self).__init__(msg)


class MediaWikiCategoryTreeError(MediaWikiBaseException):
'''
Exception when the category tree is unable to complete for an unknown
reason
'''

def __init__(self, category):
self.category = category
msg = ("Categorytree threw an exception for trying to get the "
"same category '{}' too many times. Please try again later "
"and perhaps use the rate limiting option.").format(category)
super(MediaWikiCategoryTreeError, self).__init__(msg)
146 changes: 126 additions & 20 deletions mediawiki/mediawiki.py
Expand Up @@ -14,11 +14,12 @@
from .exceptions import (MediaWikiException, PageError,
RedirectError, DisambiguationError,
MediaWikiAPIURLError, HTTPTimeoutError,
MediaWikiGeoCoordError, ODD_ERROR_MESSAGE)
MediaWikiGeoCoordError, MediaWikiCategoryTreeError,
ODD_ERROR_MESSAGE)
from .utilities import memoize

URL = 'https://github.com/barrust/mediawiki'
VERSION = '0.3.9'
VERSION = '0.3.10'


class MediaWiki(object):
Expand Down Expand Up @@ -504,6 +505,8 @@ def categorymembers(self, category, results=10, subcategories=True):
:param subcategories: Include subcategories (**True**) or not \
(**False**)
:type subcategories: Boolean
:returns: Either a tuple ([pages], [subcategories]) or just the \
list of pages
.. note:: Set results to **None** to get all results
'''
Expand All @@ -520,16 +523,16 @@ def categorymembers(self, category, results=10, subcategories=True):
subcats = list()
returned_results = 0
finished = False
last_continue = dict()
last_cont = dict()
while not finished:
params = search_params.copy()
params.update(last_continue)
raw_results = self.wiki_request(params)
params.update(last_cont)
raw_res = self.wiki_request(params)

self._check_error_response(raw_results, category)
self._check_error_response(raw_res, category)

current_pull = len(raw_results['query']['categorymembers'])
for rec in raw_results['query']['categorymembers']:
current_pull = len(raw_res['query']['categorymembers'])
for rec in raw_res['query']['categorymembers']:
if rec['type'] == 'page':
pages.append(rec['title'])
elif rec['type'] == 'subcat':
Expand All @@ -538,22 +541,126 @@ def categorymembers(self, category, results=10, subcategories=True):
tmp = tmp[9:]
subcats.append(tmp)

if 'continue' not in raw_results:
if 'continue' not in raw_res or last_cont == raw_res['continue']:
break

returned_results = returned_results + current_pull
if results is None or (results - returned_results > 0):
last_continue = raw_results['continue']
last_cont = raw_res['continue']
else:
finished = True

# end while loop

if subcategories:
return pages, subcats
else:
return pages
# end categorymembers

# @memoize
def categorytree(self, category, depth=5):
''' Generate the Category Tree for the given categories
:param category: Category name
:type category: string or list of strings
:param depth: Depth to traverse the tree
:type depth: integer or None
:returns: Dictionary of the category tree structure
:rtype: Dictionary
:Return Data Structure: Subcategory contains the same recursive \
structure
>>> {
'category': {
'depth': Number,
'links': list,
'parent-categories': list,
'sub-categories': dict
}
}
.. versionadded:: 0.3.10
.. note:: Set depth to **None** to get the whole tree
'''
def __cat_tree_rec(cat, depth, tree, level, categories, links):
''' recursive function to build out the tree '''
tree[cat] = dict()
tree[cat]['depth'] = level
tree[cat]['sub-categories'] = dict()
tree[cat]['links'] = list()
tree[cat]['parent-categories'] = list()
parent_cats = list()

if cat not in categories:
tries = 0
while True:
if tries > 10:
raise MediaWikiCategoryTreeError(cat)
try:
categories[cat] = self.page('Category:{0}'.format(cat))
parent_cats = categories[cat].categories
links[cat] = self.categorymembers(cat, results=None,
subcategories=True)
break
except PageError:
raise PageError('Category:{0}'.format(cat))
except Exception:
tries = tries + 1
time.sleep(1)
else:
parent_cats = categories[cat].categories

for pcat in parent_cats:
tree[cat]['parent-categories'].append(pcat)

for link in links[cat][0]:
tree[cat]['links'].append(link)

if depth and level >= depth:
for ctg in links[cat][1]:
tree[cat]['sub-categories'][ctg] = None
else:
for ctg in links[cat][1]:
__cat_tree_rec(ctg, depth,
tree[cat]['sub-categories'], level + 1,
categories, links)
return
# end __cat_tree_rec

# ###################################
# ### Actual Function Code ###
# ###################################

# make it simple to use both a list or a single category term
if not isinstance(category, list):
cats = [category]
else:
cats = category

# parameter verification
if len(cats) == 1 and (cats[0] is None or cats[0] == ''):
msg = ("CategoryTree: Parameter 'category' must either "
"be a list of one or more categories or a string; "
"provided: '{}'".format(category))
raise ValueError(msg)

if depth is not None and depth < 1:
msg = ("CategoryTree: Parameter 'depth' must None (for the full "
"tree) be greater than 0")
raise ValueError(msg)

results = dict()
categories = dict()
links = dict()

for cat in cats:
if cat is None or cat == '':
continue
__cat_tree_rec(cat, depth, results, 0, categories, links)
return results
# end categorytree

def page(self, title=None, pageid=None, auto_suggest=True, redirect=True,
preload=False):
''' Get MediaWiki page based on the provided title or pageid
Expand All @@ -569,6 +676,9 @@ def page(self, title=None, pageid=None, auto_suggest=True, redirect=True,
:param preload: **True:** Load most page properties
:type preload: Boolean
:raises ValueError: when title is blank or None and no pageid is \
provided
:raises `mediawiki.exceptions.PageError`: if page does not exist
.. note:: Title takes precedence over pageid if both are provided
'''
if (title is None or title.strip() == '') and pageid is None:
Expand Down Expand Up @@ -865,11 +975,7 @@ def references(self):
params = {'prop': 'extlinks', 'ellimit': 'max'}
self._references = list()
for link in self._continued_query(params):
if link['*'].startswith('http'):
url = link['*']
else:
url = 'http:{0}'.format(link['*'])
self._references.append(url)
self._references.append(link['*'])
self._references = sorted(self._references)
return self._references

Expand All @@ -889,10 +995,10 @@ def categories(self):
'clshow': '!hidden'
}
for link in self._continued_query(params):
if link['title'].startswith('Category:'):
self._categories.append(link['title'][9:])
else:
self._categories.append(link['title'])
cat = link['title']
if cat.startswith('Category:'):
cat = cat[9:]
self._categories.append(cat)
self._categories = sorted(self._categories)
return self._categories

Expand Down
3 changes: 1 addition & 2 deletions setup.py
Expand Up @@ -31,6 +31,5 @@
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6'
],
test_suite = 'tests',
test_requires = ['unittest']
test_suite = 'tests'
)

0 comments on commit c2c6b4a

Please sign in to comment.