This document describes the internal architecture of Wikipedia-API,
explains how the classes interact with each other, and provides a
step-by-step guide for adding support for a new MediaWiki API call.
Table of Contents
- Overview
- File Layout
- Class Hierarchy
- Transport Layer
- API Layer
- Dispatch Helpers
- Request Lifecycle
- Adding a New API Call
- Step 1 β Choose the Right Dispatcher
- Step 2 β Add a Return-Type Attribute to BaseWikipediaPage
- Step 3 β Add the Parameter Builder
- Step 4 β Add the Response Parser
- Step 5 β Add the Sync Method to WikipediaResource
- Step 6 β Add the Async Method to AsyncWikipediaResource
- Step 7 β Add a Lazy Property to WikipediaPage
- Step 8 β Add a Lazy Coroutine Property to AsyncWikipediaPage
- Step 9 β Add Tests
- Invariants and Conventions
- Command Line Interface
Wikipedia-API is structured around two independent concerns:
- HTTP transport β how to make HTTP requests (sync vs. async, retries, rate-limit handling).
- API logic β how to build MediaWiki query parameters and parse the JSON responses into Python objects.
Each concern is implemented as an abstract mixin. Concrete client
classes are assembled by combining one transport mixin with one API
mixin through Python's multiple inheritance. This keeps the two layers
entirely decoupled: the API logic never imports httpx, and the
transport layer knows nothing about MediaWiki.
wikipediaapi/ βββ __init__.py # Public exports βββ cli.py # Command line interface (main entry point) βββ commands/ # CLI command modules β βββ __init__.py β βββ base.py # Shared utilities and common options β βββ page_commands.py # Page content commands β βββ link_commands.py # Link-related commands β βββ category_commands.py # Category commands β βββ geo_commands.py # Geographic commands β βββ image_commands.py # Image file commands β βββ search_commands.py # Search and discovery commands βββ _http_client/ # Transport layer package β βββ __init__.py β βββ base_http_client.py # Shared retry & config logic β βββ sync_http_client.py # Blocking httpx.Client β βββ async_http_client.py # Non-blocking httpx.AsyncClient β βββ retry_utils.py # Retry utilities β βββ retry_after_wait.py # Retry-After header handling βββ _resources/ # API layer package β βββ __init__.py β βββ base_wikipedia_resource.py # Param builders, parsers, dispatchers β βββ wikipedia_resource.py # Sync public API methods β βββ async_wikipedia_resource.py # Async public API methods βββ _types/ # Typed dataclasses package β βββ __init__.py β βββ coordinate.py # Coordinate dataclass β βββ geo_point.py # GeoPoint dataclass β βββ geo_box.py # GeoBox dataclass β βββ geo_search_meta.py # GeoSearchMeta dataclass β βββ image_info.py # ImageInfo dataclass β βββ search_meta.py # SearchMeta dataclass β βββ search_results.py # SearchResults dataclass βββ _params/ # Query parameter dataclasses package β βββ __init__.py β βββ base_params.py # Base parameter class β βββ coordinates_params.py # CoordinatesParams β βββ geo_search_params.py # GeoSearchParams β βββ images_params.py # ImagesParams β βββ random_params.py # RandomParams β βββ search_params.py # SearchParams β βββ protocols.py # Protocol constants βββ _pages_dict/ # PagesDict and ImagesDict package β βββ __init__.py β βββ base_pages_dict.py # Base PagesDict functionality β βββ pages_dict.py # PagesDict (sync) β βββ async_pages_dict.py # AsyncPagesDict β βββ images_dict.py # ImagesDict (sync) β βββ async_images_dict.py # AsyncImagesDict βββ _enums/ # Enums package β βββ __init__.py β βββ coordinate_type.py # CoordinateType enum β βββ coordinates_prop.py # CoordinatesProp enum β βββ direction.py # Direction enum β βββ geosearch_sort.py # GeoSearchSort enum β βββ globe.py # Globe enum β βββ namespace.py # Namespace enum β βββ redirect_filter.py # RedirectFilter enum β βββ search_info.py # SearchInfo enum β βββ search_prop.py # SearchProp enum β βββ search_qi_profile.py # SearchQiProfile enum β βββ search_sort.py # SearchSort enum β βββ search_what.py # SearchWhat enum βββ exceptions/ # Exception classes package β βββ __init__.py β βββ wikipedia_exception.py # Base exception β βββ wiki_connection_error.py # Connection errors β βββ wiki_http_error.py # HTTP errors β βββ wiki_http_timeout_error.py # Timeout errors β βββ wiki_invalid_json_error.py # JSON parsing errors β βββ wiki_rate_limit_error.py # Rate limiting errors βββ _wikipedia/ # Concrete client package β βββ __init__.py β βββ wikipedia.py # Wikipedia (sync concrete client) β βββ async_wikipedia.py # AsyncWikipedia (async concrete client) βββ _page/ # Page object package β βββ __init__.py β βββ _base_wikipedia_page.py # BaseWikipediaPage (shared page state & methods) β βββ wikipedia_page.py # WikipediaPage (lazy sync page object) β βββ async_wikipedia_page.py # AsyncWikipediaPage (lazy async page object) β βββ wikipedia_page_section.py # WikipediaPageSection βββ _image/ # Image/file page object package β βββ __init__.py β βββ _base_wikipedia_image.py # BaseWikipediaImage (shared image state & methods) β βββ wikipedia_image.py # WikipediaImage (lazy sync file page object) β βββ async_wikipedia_image.py # AsyncWikipediaImage (lazy async file page object) βββ extract_format.py # ExtractFormat enum (WIKI / HTML) βββ namespace.py # Legacy namespace module (redirects to _enums.namespace)
The inheritance chains are:
BaseHTTPClient
βββ SyncHTTPClient
βββ AsyncHTTPClient
BaseWikipediaResource
βββ WikipediaResource
βββ AsyncWikipediaResource
BaseWikipediaPage
βββ WikipediaPage
βββ AsyncWikipediaPage
βββ BaseWikipediaImage
βββ WikipediaImage
βββ AsyncWikipediaImage
Concrete clients compose one transport and one API mixin:
Wikipedia(WikipediaResource, SyncHTTPClient) AsyncWikipedia(AsyncWikipediaResource, AsyncHTTPClient)
Page objects hold a back-reference to the client and call it lazily:
WikipediaPage(BaseWikipediaPage) ββback-refβββΊ Wikipedia AsyncWikipediaPage(BaseWikipediaPage) βββββββββΊ AsyncWikipedia
BaseWikipediaPage holds all state (_attributes, _called,
_section_mapping, β¦) and all code whose behaviour is identical
regardless of sync vs. async: ATTRIBUTES_MAPPING, __init__,
the language/variant/title/ns properties,
sections_by_title, and section_by_title.
The subclasses are responsible for the fundamentally different parts:
_fetchβdefin sync,async defin async._info_attr(name)β sync helper returns cached info attr (fetching if needed); async version isasync def.sectionsproperty β sync auto-fetches; async requires an explicitawait page.summaryfirst.exists()β sync auto-fetches viaself.pageid; async is a coroutine method that lazily fetchespageidviainfo. Invariant: Whenexists()returnsTrue,pageidreturns a positive integer; whenexists()returnsFalse,pageidreturns a negative integer. Both values are deterministic based onabs(hash(title)).- All data-fetching surface (
summary,langlinks,pageid, β¦) β explicit@propertyin both; async properties return coroutines (await page.summary,await page.pageid, etc.). WikipediaPagealso overridessections_by_titleto trigger an automaticextractsfetch (the base version is read-only from cache).
βββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
β BaseHTTPClient β β BaseWikipediaResource β
β _get(lang, params) β β _construct_params() β
β __init__(...) β β _make_page() β
β _check_and_correct_ β β _common_attributes() β
β params() β β _create_section() β
ββββββββββββββ¬βββββββββββββ β _build_extracts() β
β β _build_info() β
βββββββββ΄βββββββββ β _build_langlinks() β
β β β _build_links() β
ββββββ΄βββββββ βββββββββ΄βββββββ β _build_backlinks() β
β Sync β β Async β β _build_categories() β
β HTTP β β HTTP β β _build_categorymembers() β
β Client β β Client β β _process_prop_response() β
β β β β β _dispatch_prop() β
β _get() β β _get() β β _async_dispatch_prop() β
β (sync) β β (async) β β _dispatch_prop_paginated() β
ββββββ¬βββββββ βββββββββ¬βββββββ β _async_dispatch_prop_pag..()β
β β β _dispatch_list() β
β β β _async_dispatch_list() β
β β β _dispatch_standalone_list() β
β β β _async_dispatch_standalone_ β
β β β list() β
β β β _build_normalization_map() β
β β β _extracts_params() β
β β β _info_params() β
β β β _langlinks_params() β
β β β _links_params() β
β β β _backlinks_params() β
β β β _categories_params() β
β β β _categorymembers_params() β
β β β _coordinates_params() β
β β β _images_params() β
β β β _geosearch_params() β
β β β _random_params() β
β β β _search_params() β
β β ββββββββββββββββ¬ββββββββββββββββ
β β β
β β ββββββββββ΄βββββββββββ
β β β β
β βββββββ΄ββββββββ βββββββ΄βββββββ ββββββββββ΄ββββββββ
β β Wikipedia β β Wikipedia β β AsyncWikipediaβ
β β Resource β β (concrete)β β Resource β
β β β β β β β
β β page() β β __init__()β β _make_page() β
β β article() β βββββββ¬βββββββ β page() β
β β extracts() β β β article() β
β β info() β β(MRO) β extracts() β
β β langlinks()β β β info() β
β β links() β β β langlinks() β
β β backlinks()β β β links() β
β β categories() β β backlinks() β
β β category β β β categories() β
β β members()β β β category β
β β coordinates() β β members() β
β β images() β β β coordinates() β
β β geosearch()β β β images() β
β β random() β β β geosearch() β
β β search() β β β random() β
β β batch_ β β β search() β
β β coordinates() β β batch_ β
β β batch_ β β β coordinates()β
β β images() β β β batch_images()β
β βββββββ¬ββββββββ β β β
β β β ββββββββββ¬ββββββββ
βββββββββββββββββββ β β
βββββββββββββββββ βββββββββββββββ΄βββββββ
β AsyncWikipedia β
β (concrete) β
β __init__() β
ββββββββββββββββββββββ
Page objects (share a common base; hold back-reference to their wiki instance):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BaseWikipediaPage β
β β
β ATTRIBUTES_MAPPING (class var) β
β __init__(wiki, title, ns, language, variant, url) β
β language, variant, title, ns (properties, no fetch) β
β sections_by_title(title) β list (reads cache) β
β section_by_title(title) β opt (delegates to above) β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββ¬ββββββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ βββββββββββββ΄βββββββββββββββ
β WikipediaPage β β AsyncWikipediaPage β
β β β β
β _fetch (def) β β _fetch (async def) β
β _info_attr(name) β β _info_attr(name) (async) β
β sections_by_title β β sections (property, β
β (override: auto- β β no auto-fetch) β
β fetches extracts) β β exists() (coroutine) β
β sections (auto-fetch) β β summary (await. prop) β
β exists() (auto-fetch) β β text (await. prop) β
β summary (property) β β langlinks (await. prop) β
β text (property) β β links (await. prop) β
β langlinks (property) β β backlinks (await. prop) β
β links (property) β β categories (await. prop) β
β backlinks (property) β β categorymembers β
β categories (property) β β (awaitable prop) β
β categorymembers (prop) β β coordinates (await. prop)β
β coordinates (property) β β images (await. prop) β
β images (property) β β geosearch_meta (property)β
β geosearch_meta (prop) β β search_meta (property) β
β search_meta (property) β β pageid (await. prop) β
β pageid (property) β β fullurl (await. prop) β
β fullurl (property) β β displaytitle (await.) β
β displaytitle (property)β β + 18 more info props β
β + 18 more info props β β β
β β β _wiki βββββββββββββββββββΊβ
β β β AsyncWikipedia instance β
β _wiki ββββββββββββββββββΌβΊ β β
β Wikipedia instance β βββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββ
_http_client/ package implements the HTTP transport layer with three classes.
Abstract base in base_http_client.py that holds shared configuration (language, variant,
user-agent, extract format, retry parameters, extra API params) and
the _check_and_correct_params() validator. It does not make
HTTP requests directly.
Provides a blocking _get(language, params) -> dict method in sync_http_client.py backed by
httpx.Client. Retry logic uses tenacity with exponential
backoff; Retry-After headers are honoured for HTTP 429 responses.
Provides an async def _get(language, params) -> dict coroutine in async_http_client.py
backed by httpx.AsyncClient. Retry logic mirrors
SyncHTTPClient but uses tenacity's AsyncRetrying.
Both clients construct the endpoint URL as:
https://{language}.wikipedia.org/w/api.php
Additional utilities:
retry_utils.py- Common retry utilities and helpersretry_after_wait.py- Retry-After header handling logic
_resources/ package implements the API layer with three classes.
Pure mixin in base_wikipedia_resource.py with no HTTP transport. Contains:
- Parameter builders (
_*_params) β each returns adictready to pass to the dispatcher. - Response parsers (
_build_*) β each accepts a raw API response fragment and aWikipediaPage, populates the page in-place, and returns the parsed value. - Dispatch helpers β generic methods that call
self._get(provided by the transport mixin), handle pagination, and delegate to a_build_*method. See Dispatch Helpers below.
Thin synchronous mixin in wikipedia_resource.py. Each public API method (extracts,
info, langlinks, links, backlinks, categories,
categorymembers) is a one-liner that delegates to the appropriate
sync dispatch helper:
def extracts(self, page, **kwargs):
return self._dispatch_prop(
page, self._extracts_params(page, **kwargs),
"", self._build_extracts,
)
Mirror of WikipediaResource using async dispatch helpers in async_wikipedia_resource.py:
async def extracts(self, page, **kwargs):
return await self._async_dispatch_prop(
page, self._extracts_params(page, **kwargs),
"", self._build_extracts,
)
_make_page is overridden to return AsyncWikipediaPage instead
of WikipediaPage so that stub pages created during response parsing
are automatically async-capable.
Four dispatch patterns cover all current MediaWiki API query shapes. Each has a sync and an async variant.
| Helper | When to use | Pagination key |
|---|---|---|
_dispatch_prop |
Prop query, result fits in one page.
Response: raw["query"]["pages"] |
(none) |
_dispatch_prop_paginated |
Prop query, result may span pages.
Accumulates raw["query"]["pages"]
[page_id][list_key] across pages. |
raw["continue"]
[continue_key] |
_dispatch_list |
List query, result may span pages.
Accumulates raw["query"][list_key]
across pages. Requires a page object. |
raw["continue"]
[continue_key] |
_dispatch_standalone_list |
List query that does not require a page
object. Accumulates raw["query"]
[list_key] and returns the raw response. |
raw["continue"]
[continue_key] |
Current mapping:
extracts,info,langlinks,categoriesβ_dispatch_proplinksβ_dispatch_prop_paginated(cursor:plcontinue, list key:links)backlinksβ_dispatch_list(cursor:blcontinue, list key:backlinks)categorymembersβ_dispatch_list(cursor:cmcontinue, list key:categorymembers)coordinatesβ custom per-page dispatch with per-parameter caching (cursor:cocontinue, uses_dispatch_prop_paginatedinternally)imagesβ custom per-page dispatch with per-parameter caching (cursor:imcontinue)geosearchβ single_getcall (no pagination)randomβ single_getcall (no pagination)searchβ single_getcall (no pagination)
Warning
geosearch, random, and search deliberately bypass
_dispatch_standalone_list and make a single API request.
The caller's limit parameter already tells the MediaWiki API
how many results to return. Using the paginating dispatcher would
cause an infinite loop for random (the API always offers more
random pages) and near-infinite loops for search and
geosearch (broad queries can match thousands of pages).
Only use _dispatch_standalone_list for list queries where
exhaustive fetching is the desired behaviour.
user code: page.summary
β
βΌ
WikipediaPage.summary (property, checks _summary cache)
β
βΌ
WikipediaPage._fetch_page()
β
βΌ
Wikipedia.extracts(page) ββ WikipediaResource
β
βΌ
BaseWikipediaResource._dispatch_prop(
page, params, empty="", builder=_build_extracts)
β
βββΊ _construct_params(page, params) β merged dict
β
βββΊ SyncHTTPClient._get(language, merged_params)
β β
β βββΊ httpx.Client.get(url, params=β¦)
β βββΊ tenacity retry loop (429 / 5xx / timeout)
β β raw JSON dict
β
βββΊ _process_prop_response(raw, page, empty, builder)
β
βββΊ _build_extracts(extract, page)
β
βββΊ populate page._summary
βββΊ populate page._section_mapping
βββΊ return page._summary
user code: await page.summary
β
βΌ
AsyncWikipediaPage.summary (explicit @property, returns coroutine)
β
βΌ
AsyncWikipediaPage._fetch (async, called inside the coroutine)
β
βΌ
AsyncWikipedia.extracts(page) ββ AsyncWikipediaResource
β
βΌ
BaseWikipediaResource._async_dispatch_prop(
page, params, empty="", builder=_build_extracts)
β
βββΊ _construct_params(page, params) β merged dict
β
βββΊ await AsyncHTTPClient._get(language, merged_params)
β β
β βββΊ await httpx.AsyncClient.get(url, params=β¦)
β βββΊ tenacity AsyncRetrying loop
β β raw JSON dict
β
βββΊ _process_prop_response(raw, page, empty, builder)
β
βββΊ _build_extracts(extract, page)
βββΊ return page._summary
This section walks through a complete example: adding support for the
templates prop, which returns a list of templates used on a page.
MediaWiki reference: https://www.mediawiki.org/w/api.php?action=help&modules=query%2Btemplates
Inspect the API response structure:
- Single-fetch prop (result in
raw["query"]["pages"], nocontinuekey expected in practice) β_dispatch_prop. - Paginated prop (
continuekey uses a*continuecursor, data nested underraw["query"]["pages"][id][list_key]) β_dispatch_prop_paginated. - List query (
action=query&list=β¦, data underraw["query"][list_key]) β_dispatch_list.
templates uses prop=templates, may paginate with tlcontinue,
and stores results under raw["query"]["pages"][id]["templates"].
β Use _dispatch_prop_paginated.
In _page/_base_wikipedia_page.py, add a cache slot in
BaseWikipediaPage.__init__:
self._templates: dict[str, Any] = {}
In BaseWikipediaResource (_resources/base_wikipedia_resource.py), add:
def _templates_params(self, page: WikipediaPage) -> dict[str, Any]:
"""
Build params for the ``templates`` prop query.
Requests up to 500 templates per API response page. Pagination
is handled automatically by :meth:`_dispatch_prop_paginated`
using the ``tlcontinue`` cursor.
:param page: source page (provides ``title``)
:return: base params dict; merge kwargs at the call site
"""
return {
"action": "query",
"prop": "templates",
"titles": page.title,
"tllimit": 500,
}
In BaseWikipediaResource (_resources/base_wikipedia_resource.py), add:
def _build_templates(
self, extract: Any, page: WikipediaPage
) -> PagesDict:
"""
Build the templates map from a ``templates`` API response.
:param extract: single page entry from ``raw["query"]["pages"]``
:param page: page object whose ``_templates`` dict is replaced
:return: ``{title: WikipediaPage}`` mapping
"""
page._templates = {}
self._common_attributes(extract, page)
for tpl in extract.get("templates", []):
page._templates[tpl["title"]] = self._make_page(
title=tpl["title"],
ns=int(tpl["ns"]),
language=page.language,
variant=page.variant,
)
return page._templates
def templates(
self, page: WikipediaPage, **kwargs: Any
) -> PagesDict:
"""
Fetch all templates used on a page, keyed by title.
Follows API pagination automatically (``tlcontinue`` cursor).
:param page: source page
:param kwargs: extra API parameters forwarded verbatim
:return: ``{title: WikipediaPage}``; ``{}`` if page missing
:raises WikiHttpTimeoutError: if the request times out
:raises WikiConnectionError: if a connection cannot be established
:raises WikiRateLimitError: if the API returns HTTP 429
:raises WikiHttpError: if the API returns a non-success HTTP status
:raises WikiInvalidJsonError: if the response is not valid JSON
"""
return self._dispatch_prop_paginated(
page,
{**self._templates_params(page), **kwargs},
"tlcontinue",
"templates",
self._build_templates,
)
async def templates(
self, page: WikipediaPage, **kwargs: Any
) -> PagesDict:
"""
Async version of :meth:`WikipediaResource.templates`.
"""
return await self._async_dispatch_prop_paginated(
page,
{**self._templates_params(page), **kwargs},
"tlcontinue",
"templates",
self._build_templates,
)
In _page/wikipedia_page.py:
@property
def templates(self) -> PagesDict:
"""Returns templates used on this page."""
if not self._called["templates"]:
self._fetch("templates")
return self._templates
In _page/async_wikipedia_page.py, the @property returns a coroutine
created by a nested async def; callers do await page.templates:
@property
def templates(self) -> Any:
"""Awaitable: returns templates used on this page."""
async def _get() -> PagesDict:
if not self._called["templates"]:
await self._fetch("templates")
return self._templates
return _get()
Add mock data to tests/mock_data.py:
"Template:A": {
"query": {
"pages": {
"1": {
"pageid": 1,
"ns": 0,
"title": "Template:A",
"templates": [
{"ns": 10, "title": "Template:A"},
],
}
}
}
},
Add a test file tests/templates_test.py:
import unittest
from unittest.mock import patch
import wikipediaapi
from tests.mock_data import mock_data
class TestTemplates(unittest.TestCase):
def setUp(self):
self.wiki = wikipediaapi.Wikipedia(
user_agent="test", language="en"
)
def _mock_get(self, language, params):
return mock_data[params["titles"]]
def test_templates(self):
with patch.object(self.wiki, "_get", side_effect=self._mock_get):
page = self.wiki.page("Template:A")
templates = self.wiki.templates(page)
self.assertIn("Template:A", templates)
The following invariants hold throughout the codebase and must be preserved when adding new functionality.
Parameter builders (``_*_params``)
- Always return a plain
dict[str, Any]. - Never call
_construct_paramsβ dispatchers do that. - Never mutate the page object.
- For props: include
"action": "query"and"prop": "<name>". - For lists: include
"action": "query"and"list": "<name>".
Response parsers (``_build_*``)
- Accept
(extract: Any, page: WikipediaPage)as the first two positional arguments. - Reset the relevant cache attribute (
page._links = {}, etc.) before populating it. - Call
_common_attributes(extract, page)to copy standard fields. - Always return the populated cache attribute.
- Use
_make_page()to create stub child pages so that the correct page type (WikipediaPagevs.AsyncWikipediaPage) is produced automatically.
Dispatch helpers
_dispatch_prop/_async_dispatch_propβ for props where the full result fits in one API response._dispatch_prop_paginated/_async_dispatch_prop_paginatedβ for props that may paginate. The params dict is mutated in-place to add the continuation key on each subsequent request._dispatch_list/_async_dispatch_listβ forlist=queries that may paginate. Requires a page object for language context._dispatch_standalone_list/_async_dispatch_standalone_listβ forlist=queries that are not tied to a specific page (e.g.geosearch,random,search). These accept alanguagestring instead of a page object and return the raw merged response.
Public API methods
- Sync methods in
WikipediaResourcemust never useawait. - Async methods in
AsyncWikipediaResourcemust always be defined withasync defand useawait. - Both sync and async methods must share the same
_*_paramsand_build_*implementations without duplication. - All raises must be documented in the docstring.
Typed data (``_types/`` package)
coordinate.pyβCoordinatefrozen@dataclassvalue objectsgeo_point.pyβGeoPointfrozen@dataclassvalue objectsgeo_box.pyβGeoBoxfrozen@dataclassvalue objectsgeo_search_meta.pyβGeoSearchMetafrozen@dataclassvalue objectssearch_meta.pyβSearchMetafrozen@dataclassvalue objectssearch_results.pyβSearchResultswrapper aroundPagesDict
Parameter dataclasses (``_params/`` package)
Each query submodule has a frozen @dataclass (e.g.
CoordinatesParams, ImagesParams) that maps clean Python
names to MediaWiki API parameter names with a configurable prefix.
* Pipe-separated MediaWiki parameters (for example prop, info,
andimages) are exposed as iterable-only inputs in the Python API. They are normalized to"|"-joined strings in__post_init__before API serialization.
- The
to_api()method returns thedict[str, str]ready for the API call;cache_key()returns a hashable tuple for per-parameter caching.
Enums (``_enums/`` package)
Strongly-typed enums for API parameters:
* coordinate_type.py β CoordinateType enum for coordinate filtering
* coordinates_prop.py β CoordinatesProp enum for coordinate properties
* direction.py β Direction enum for sort direction
* geosearch_sort.py β GeoSearchSort enum for geographic search sorting
* globe.py β Globe enum for celestial bodies
* namespace.py β Namespace enum for MediaWiki namespaces
* redirect_filter.py β RedirectFilter enum for redirect filtering
* search_info.py β SearchInfo enum for search metadata
* search_prop.py β SearchProp enum for search properties
* search_qi_profile.py β SearchQiProfile enum for query-independent ranking
* search_sort.py β SearchSort enum for search sorting
* search_what.py β SearchWhat enum for search type
Exceptions (``exceptions/`` package)
wikipedia_exception.pyβWikipediaExceptionbase exceptionwiki_connection_error.pyβWikiConnectionErrorfor connection failureswiki_http_error.pyβWikiHttpErrorfor HTTP errorswiki_http_timeout_error.pyβWikiHttpTimeoutErrorfor timeoutswiki_invalid_json_error.pyβWikiInvalidJsonErrorfor JSON parsing errorswiki_rate_limit_error.pyβWikiRateLimitErrorfor rate limiting
Per-parameter caching
coordinatesandimagessupport different parameter sets per page. Results are cached inpage._param_cache[name][cache_key]via_get_cached/_set_cachedonBaseWikipediaPage.- The
NOT_CACHEDsentinel (a singleton_Sentinelinstance) distinguishes "never fetched" from "fetched, result isNone". - Page-level properties (
page.coordinates,page.images) use default parameters; callingwiki.coordinates(page, primary="all")caches under a separate key.
Batch methods
batch_coordinates(pages)andbatch_images(pages)send multi-title API requests (up to 50 titles per request) and distribute results to each page's per-parameter cache.PagesDict.coordinates()andPagesDict.images()are convenience methods that delegate to the batch methods on the wiki client.- Batch methods use
_build_normalization_map(raw)to handle MediaWiki title normalization (e.g.Test_1βTest 1).
Page objects
- A page is created lazily via
wiki.page(title)β no network call at construction time. - Properties cache their result in a
_<name>attribute; the first access triggers the API call, subsequent accesses return the cached value. WikipediaPage._fetch(call)callsgetattr(self.wiki, call)(self)and marks_called[call] = True; the matching async versionAsyncWikipediaPage._fetch(call)does the same withawait.geosearch_metaandsearch_metaare plain@propertyin both sync and async β they are set bygeosearch()/search()on the wiki client and require no network call on the page itself.
The CLI provides a command-line tool for querying Wikipedia using Wikipedia-API. It is organized into a modular structure for better maintainability.
Architecture
The CLI is split into a main entry point and functional command modules:
wikipediaapi/
βββ cli.py # Main CLI entry point (54 lines)
βββ commands/ # CLI command modules
βββ __init__.py
βββ base.py # Shared utilities and common options
βββ page_commands.py # Page content commands
βββ link_commands.py # Link-related commands
βββ category_commands.py # Category commands
βββ geo_commands.py # Geographic commands
βββ search_commands.py # Search and discovery commands
Main Entry Point (``cli.py``)
- Sets up the Click command group with version and help options
- Imports and registers all command modules
- Provides the
main()function for the console script entry point - Reduced from 1481 lines to 54 lines for better maintainability
Base Module (``commands/base.py``)
- Contains shared utilities: TypedDict classes, enum validators, formatters
- Defines common Click options used across all commands
- Provides helper functions for Wikipedia instance creation and page fetching
- Centralizes formatting functions for consistent output
Command Modules
Each command module groups related functionality:
page_commands.pyβsummary,text,sections,section,pagelink_commands.pyβlinks,backlinks,langlinkscategory_commands.pyβcategories,categorymembersgeo_commands.pyβcoordinates,images,geosearchsearch_commands.pyβsearch,random
Command Pattern
Each command module follows this pattern:
- Business logic functions β Pure functions that handle Wikipedia API calls
- Formatting functions β Convert results to text/JSON output
- Click command decorators β Define CLI interface with options and arguments
- Register function β Registers commands with the main CLI group
Benefits of Modular Structure
- Maintainable file sizes β Each module 150-430 lines vs one 1481-line file
- Logical organization β Related commands grouped together
- Easier development β Changes to specific functionality isolated to relevant module
- Better testing β Command modules can be tested independently
- Perfect backward compatibility β All CLI commands work identically to before
Usage Examples
The CLI supports all original commands with identical interfaces:
wikipedia-api summary "Python (programming language)" wikipedia-api links "Python (programming language)" --language cs wikipedia-api categories "Python (programming language)" --json wikipedia-api coordinates "Mount Everest" wikipedia-api geosearch --coord "51.5074|-0.1278" wikipedia-api search "Python programming"
Adding New Commands
To add a new CLI command:
- Choose the appropriate command module based on functionality
- Add business logic function (following existing patterns)
- Add formatting function for output
- Add Click command with proper options and documentation
- Register the command in the module's
register_commands()function
The modular structure makes it easy to extend the CLI while maintaining clean organization.