Skip to content
Waldir Pimenta edited this page May 31, 2014 · 2 revisions

Site._init_ connects to the given MediaWiki site and creates a new Site object.

Parameters##

  • host (str): Hostname of the MediaWiki server to connect to.
  • (optional) path (str): The URL path to api.php on the remote MediaWiki server. (default: '/w/')
  • (optional) ext (str): Script extension used on the MediaWiki server for "api" and "index". (default: '.php')
  • (optional) pool (HTTPPool): Pool object to use for connections. None to create a new pool. (default: None)
  • (optional) retry_timeout (int): Seconds to wait between retries. (default: 30)
    • Note: retry_timeout is used for the first retry, and the wait increases by retry_timeout for each subsequent retry.
  • (optional) max_retries (int): Maximum number of times to retry a single API operation. (default: 25)
  • (optional) wait_callback (function): A hook called whenever the API waits before a retry. (default: a no-op function)
  • (optional) max_lag (int): The maximum number of seconds of database replication lag. (default: 3)
    • If the server is lagging more than this, mwclient will automatically wait and retry. See Manual:Maxlag.
  • (optional) compress (bool): True if API results should be compressed during transmission with gzip (default: True)
  • (optional) force_login (bool): True to forbid editing pages while logged out (anonymous editing) (default: True)
  • (optional) do_init (bool): True to call Site.site_init immediately (default: True)
    • If set to False, site_init will be called at Site.login time. This can slightly improve performance, but no operations can be performed until after login.

Result

A new Site object for the given parameters. If do_init = True, the object will be fully initialized and can be used immediately.

Errors

Note: These errors can occur only if do_init = True (default is True).

  • MediaWikiVersionError: Either the MediaWiki version on the site was not new enough for use with mwclient, or the MediaWiki version could not be retrieved/parsed successfully.
    • Note: mwclient currently requires MediaWiki 1.11 due to outstanding compatibility issues with earlier versions.
  • APIDisabledError: The MediaWiki API is disabled for this site (see Manual:$wgEnableAPI). mwclient is a MediaWiki API client and cannot be used with HTML-only MediaWiki sites.
  • APIError: An API error occurred getting the site info and user info.
  • HTTPRedirectError: The site sent an HTTP redirect to another URL. mwclient does not support client-side redirects.
  • HTTPStatusError: An HTTP error with unhandled status code outside 500-599 occurred.
  • MaximumRetriesExceeded: API call to get site info and user info failed and was retried until all retries were exhausted.
    • This may indicate a variety of issues, such as repeated HTTP errors, repeated internal database connection errors, or a server with long-term replication lag. Investigate by querying the API directly using your web browser. If long-term replication lag is the problem, you can workaround this by increasing max_lag.

Examples

The hostname must be supplied. It is the part of the URL after "http://" and before the next "/". For example, for the English Wikipedia:

site = mwclient.Site('en.wikipedia.org')

Do not include the leading http://. By default Site assumes api.php is located at path /w/ (e.g. http://en.wikipedia.org/w/api.php). If the site you are connecting to places it in a different location, you must specify this (path must end in a /):

site = mwclient.Site('conservapedia.com', path='/')
site = mwclient.Site('sourceforge.net', path='/apps/mediawiki/mwclient/')

Typically api.php is in the same location as index.php. If you edit or view source of a page, you should see the path of index.php in the URL (e.g. http://en.wikipedia.org/w/index.php?title=Apple&action=edit).

For a bot, it is recommended to only edit while logged in to make the bot easier to block in case of a malfunction. If you want to edit anonymously (while not logged in), you must set force_login to False:

site = mwclient.Site('en.wikipedia.org', force_login=False)

Under heavy load, replication servers lag, resulting in out-of-date data being read. If you have no problem with reading stale results, you can increase max_lag to obtain results more quickly:

site = mwclient.Site('en.wikipedia.org', max_lag=1000)

Note that this will increase load on heavily-loaded servers, and so is not recommended.

If you frequently receive the error MaximumRetriesExceeded, you can increase the wait between retries and/or the maximum number of retries:

site = mwclient.Site('en.wikipedia.org', retry_timeout=60, max_retries=50)

Conversely, if you want your calls to raise an error if they cannot produce a result immediately, you can set max_retries to zero:

site = mwclient.Site('en.wikipedia.org', max_retries=0)

Questions

Q. Can I use a custom User-Agent header?

mwclient currently uses a fixed User-Agent header, MwClient/<version> (https://github.com/mwclient/mwclient). You can modify your copy of http.py to alter this functionality if needed.


This page was originally imported from the old mwclient wiki at SourceForge. The imported version was dated from 02:26, 13 November 2011, and its only editor was Derrickcoetzee (@dcoetzee).