Guide for AI agents (and developers) on how to install, build, and test the Wikipedia-API project.
π CRITICAL: Always read the design and API documentation first
Before making any changes, read these two documents to understand the architecture, class hierarchy, and public API:
DESIGN.rstβ internal architecture, class hierarchy, request lifecycle, dispatch helpers, and a step-by-step guide for adding new API calls.API.rstβ public API reference: every method, property, and attribute available onWikipedia,AsyncWikipedia,WikipediaPage,AsyncWikipediaPage,WikipediaPageSection, and the CLI.
Skipping this step risks duplicating existing logic, violating established conventions, or breaking the sync/async symmetry.
π¨ CRITICAL: The sync and async APIs MUST stay in perfect symmetry.
WikipediaPage and AsyncWikipediaPage are parallel classes.
Every public attribute or method on one must have the same kind of
interface on the other:
If WikipediaPage has β¦ |
then AsyncWikipediaPage MUST have β¦ |
|---|---|
@property foo |
awaitable property await page.foo (explicit @property returning a coroutine) |
plain method foo() |
coroutine method await page.foo() |
plain @property (no fetch) |
plain @property (no fetch) |
Never convert a property to a method (or vice versa) in one class without making the matching change in the other. Violations break the documented API contract and confuse callers who switch between the two clients.
Examples of correct symmetry currently in place:
page.summary,page.text,page.langlinks,page.links,page.backlinks,page.categories,page.categorymembersβ@propertyin sync; explicit@propertyreturning a coroutine in async.page.pageid,page.fullurl,page.displaytitle, and all other info attributes β same pattern:@propertyin sync, awaitable@propertyin async.page.exists()β plain method in sync; coroutine methodawait page.exists()in async (both use call syntax()).page.sections,page.title,page.ns,page.language,page.variantβ plain@propertyin both (no awaiting needed).
π§ Prefer explicit type annotations and minimize Any.
When writing or updating Python code in this repository:
- Use inline type annotations directly on variables, attributes, parameters,
and return values (e.g.
value: dict[str, int] = {}), instead of legacy# type:comments. - Avoid
Anywhenever a more specific type can be expressed. - Use
Anyonly when it is absolutely necessary (for example, dynamic external payloads or framework boundaries where precise typing is not practical). - If
Anyis required, keep its scope as small as possible and prefer typed wrappers/conversions at the boundary. - Validate typing-related changes by running
make run-pre-commitbefore submitting.
π Write descriptive docstrings with consistent structure.
All functions, methods, classes, and modules must have descriptive docstrings that follow this structure:
def example_function(param1: str, param2: int) -> bool:
"""One-line summary of the function's purpose.
Detailed description of the function's behavior, including important
implementation details, usage patterns, or context.
Args:
param1: Description of the first parameter, including expected format
and constraints.
param2: Description of the second parameter, including valid ranges
or special values.
Returns:
Description of the return value, including its type and meaning.
Include possible return values and their significance.
Raises:
ExceptionType: Description of when this exception is raised,
including the conditions that trigger it.
AnotherException: Description of when this exception occurs.
Invariants:
- Any conditions that remain true before and after execution
- State guarantees that the function maintains
- Thread safety considerations if applicable
"""-
One-line summary: Must be a complete sentence ending with a period that concisely describes what the function does.
-
Detailed description: Expand on the summary with implementation details, usage examples, or important context.
-
Parameters section (
Args): Document all parameters with:- Parameter name (matching the function signature)
- Description including expected format, constraints, and valid values
- Use proper indentation and formatting
-
Return values section (
Returns): Document the return value with:- Type information (if not obvious from type hints)
- Meaning and significance of different return values
- Special cases or conditions
-
Exceptions section (
Raises): Document all exceptions that can be raised:- Exception type name
- Conditions that trigger the exception
- Any recovery strategies or expected handling
-
Invariants section (
Invariants): Document any guarantees:- Pre/post-conditions that remain true
- State guarantees and thread safety
- Side effects and their implications
- Properties: Use the same format but replace
Argswith appropriate sections - Methods: Include
selfparameter documentation if relevant - Classes: Include overall purpose, usage examples, and important attributes
- Modules: Describe the module's purpose and main exports
def page_exists(self) -> bool:
"""Check if the Wikipedia page exists on the server.
Performs a lightweight check to determine if the page exists without
fetching the full page content. This is useful for validation before
attempting expensive operations.
Returns:
True if the page exists and is accessible, False if the page
does not exist or cannot be accessed.
Raises:
requests.exceptions.ConnectionError: If the network connection fails.
requests.exceptions.Timeout: If the request times out.
Invariants:
- Does not modify the page's internal state
- Safe to call multiple times without side effects
- Thread-safe for concurrent access
"""All docstrings must be checked during code review and should pass the pre-commit hooks without warnings.
- Python 3.10+ (supported: 3.10, 3.11, 3.12, 3.13, 3.14)
- uv (Python package manager and installer)
- Make
Install all dependencies (runtime, dev, docs, build):
make requirements-allOr install individual dependency groups:
- Runtime dependencies:
make requirements(installs core dependencies) - Dev dependencies:
make requirements-dev(installs ruff, coverage, ty, pre-commit, tox, etc.) - Doc dependencies:
make requirements-doc(installs sphinx) - Build dependencies:
make requirements-build(installs rst2html, setuptools, wheel)
Note: uv automatically creates and manages a virtual environment in .venv/.
Build the source distribution package:
make build-packageGenerate PyPI HTML documentation preview:
make pypi-htmlGenerate Sphinx HTML documentation:
make htmlmake run-testsThis command runs both the unit tests and CLI verification tests.
- The unit tests are executed via
uv run pytest tests/. All test files are in thetests/directory and follow the*_test.pynaming pattern. - The CLI verification tests are run using
./tests/cli/test_cli.sh verify.
You can run the CLI tests independently.
-
Verify CLI:
make run-test-cli-verify- This runs the CLI tests to verify that the output matches the recorded snapshots. It uses
./tests/cli/test_cli.sh verify.
- This runs the CLI tests to verify that the output matches the recorded snapshots. It uses
-
Record CLI Snapshots:
make run-test-cli-record- This command updates the CLI test snapshots with the current output. Use this when you have intentionally changed the CLI's output. It runs
./tests/cli/test_cli.sh record.
- This command updates the CLI test snapshots with the current output. Use this when you have intentionally changed the CLI's output. It runs
make run-coverageProduces a coverage report and coverage.xml for the wikipediaapi package using pytest and pytest-cov.
π― CRITICAL: Always maintain code coverage above 90%
Before submitting any changes, ensure that:
- Run coverage check:
make run-coverage - Verify coverage: All modules must have β₯90% coverage
- CLI module special attention: The
wikipediaapi/cli.pymodule must maintain β₯90% coverage - If coverage drops: Add appropriate tests to bring coverage back above 90%
- Coverage report: Check the output for any modules below 90% and address them
Current coverage targets:
- Overall project: β₯90%
- CLI module: β₯90% (currently 96%)
- Core modules: β₯95%
The coverage report will show:
Name Stmts Miss Cover Missing
wikipediaapi/cli.py 289 11 96% 28, 38, 60-61, 367-376, 730
If any module shows coverage below 90%, you must:
- Identify the missing lines in the
Missingcolumn - Write tests to cover the uncovered code paths
- Re-run coverage until all modules meet the 90% threshold
π CRITICAL: All tests making HTTP requests must use tests/mock_data.py
When writing or updating tests that make HTTP requests to Wikipedia's API:
- Always use
tests/mock_data.py: All HTTP request-based tests must import and use mock fixtures fromtests/mock_data.py - Never make real HTTP requests in unit tests: Tests should be deterministic and fast, not dependent on network connectivity
- Use existing mock fixtures: Import functions like
create_mock_wikipedia(),create_mock_page(), and predefined API responses - Add new mock data when needed: If testing new API endpoints, add appropriate mock responses to
tests/mock_data.py - Keep mock data consistent: Ensure mock responses match the actual Wikipedia API structure and format
Examples of tests that must use mock_data.py:
- All
*test.pyfiles that test API methods (page.summary(),page.text(),wiki.search(), etc.) - Integration tests for
Wikipedia,AsyncWikipedia,WikipediaPage,AsyncWikipediaPage - CLI tests that verify API command output
- Error handling tests for network-related failures
Tests that don't need mock_data.py:
- Pure unit tests for enums, converters, parameter validation
- Type checking and validation logic tests
- Internal algorithm tests that don't touch external APIs
πΌ CRITICAL: All new public APIs must include VCR integration tests
When adding new public API methods or modifying existing ones that make HTTP requests:
- Always add VCR tests: Create integration tests in the
tests/vcr_*_test.pyfiles - Record real API responses: Use
--record-mode=onceto record actual Wikipedia API responses - Test both sync and async: Ensure both synchronous and asynchronous APIs have VCR coverage
- Cover all parameter combinations: Test different parameter values, enum options, and edge cases
- Use appropriate test files:
tests/vcr_wiki_client_sync_test.py/tests/vcr_wiki_client_async_test.pyforWikipedia/AsyncWikipediamethodstests/vcr_page_sync_test.py/tests/vcr_page_async_test.pyforWikipediaPage/AsyncWikipediaPagemethodstests/vcr_pages_dict_sync_test.py/tests/vcr_pages_dict_async_test.pyforPagesDict/AsyncPagesDictmethods
VCR Test Workflow:
# Record new API responses (run once when adding tests)
uv run pytest tests/vcr_wiki_client_sync_test.py::TestClassName::test_method --record-mode=once
# Run integration tests using recorded cassettes
make run-tests-integration
# Run specific VCR test files
uv run pytest tests/vcr_page_sync_test.py tests/vcr_page_async_test.py --record-mode=none -vVCR Test Requirements:
- Real API validation: Tests must use actual Wikipedia API responses (not mocked data)
- Complete parameter coverage: Test all enum values, optional parameters, and edge cases
- Both sync/async variants: Maintain perfect symmetry between sync and async tests
- Deterministic results: Tests should be fast and reliable using recorded cassettes
- Proper cassette naming: Use descriptive test method names for clear cassette identification
Examples of APIs requiring VCR tests:
- New
wiki.search()parameters or options - New
wiki.geosearch()functionality - New
page.coordinates()properties - Any new method that makes HTTP requests to Wikipedia's API
- Modified parameter handling in existing HTTP-based methods
VCR integration tests complement unit tests by ensuring the library works correctly against real Wikipedia API responses while maintaining test speed and reliability.
π§ CRITICAL: All pre-commit hooks must pass
Before submitting any changes, ensure that:
- Run pre-commit checks:
make run-pre-commit - All hooks must pass: No failures allowed
- Fix any issues: Address linting, formatting, type checking, and other violations
- Re-run until clean: Continue fixing and re-running until all checks pass
- Check project configuration: Verify line length limits and linter settings in
pyproject.tomlmatch project requirements
The pre-commit hooks include:
- ruff: Linting and import sorting (replaces flake8 + isort)
- ruff-format: Code formatting (replaces black, max 100 characters per line)
- ty: Type checking
- pyupgrade: Python syntax upgrades
- trailing whitespace: Whitespace cleanup
- YAML validation: YAML file checks
Common issues to fix:
- Remove unused imports (F401)
- Fix line length violations (E501: max 100 characters)
- Resolve type checking errors
- Fix undefined variables (F821)
- Avoid lambda assignments (E731)
- Fix redefinition errors (F811)
Project Configuration: Check pyproject.toml for:
tool.ruff.line-length = 100- Linter configurations in
[tool.ruff.*]sections
make run-toxRuns the test suite against Python 3.10β3.14 via tox.
make run-pre-commitThis runs ruff (lint + format), ty, pyupgrade, and other checks (trailing whitespace, YAML validation, etc.).
- Type checking:
uv run ty check wikipediaapi/ - Linting & formatting:
make run-ruff(runsruff checkandruff format --check)
β CRITICAL: Keep all documentation, examples, and tests in sync
After completing any change, go through this checklist before committing:
Ensure each of the following files accurately reflects the change:
API.rstβ add or update entries for any new or modified methods, properties, or attributes.DESIGN.rstβ update the class hierarchy, file layout, diagrams, or step-by-step guide if the architecture changed.index.rstβ update usage examples or feature descriptions if user-facing behaviour changed (note:README.rstis identical in content and must be kept in sync withindex.rst).README.rstβ mirror any changes made toindex.rst.
example_sync.pyβ add or update usage of any new or changed synchronous API (Wikipedia,WikipediaPage).example_async.pyβ add or update usage of any new or changed asynchronous API (AsyncWikipedia,AsyncWikipediaPage).
Both files serve as living documentation and must exercise every publicly available method and attribute.
Whenever the public API of Wikipedia or AsyncWikipedia changes
(new methods, renamed parameters, changed return types), the
command-line interface and its tests must be updated in lockstep:
wikipediaapi/cli.pyβ add or update CLI commands and their helper functions to expose the new or changed API functionality.tests/cli/test_cli.shβ add new test entries to theTESTSarray for every new command, then runmake run-test-cli-recordto generate the expected output fixtures.tests/cli_test.pyβ add or update unit tests for the CLI helper functions (e.g.get_*,format_*).CLI.rstβ document new commands, options, and examples.
- Add or update unit tests in
tests/for every new or modified code path. - For synchronous code:
tests/wikipedia_page_test.pyand related files. - For asynchronous code:
tests/async_wikipedia_page_test.pyand related files. - Add VCR integration tests: For any new public API methods that make HTTP requests, add corresponding tests in the appropriate
tests/vcr_*_test.pyfiles. Record real API responses using--record-mode=onceand ensure both sync and async variants are covered. - Keep
tests/test_sync_async_symmetry.pyup to date: When adding new properties or methods to eitherWikipediaPageorAsyncWikipediaPage, update theTestSyncAsyncPropertySymmetryclass to include the new attributes in the appropriate property lists (construction_props,awaitable_props,collection_props, etc.). The test automatically discovers all public attributes usingdir()and will alert you to any missing properties that need to be added to the test lists. This ensures sync/async symmetry is maintained for all API features. - Run the full suite and coverage check (see Test below) before committing.
- Run
make run-validate-attributes-mapppingwhenever you add or modify a property, method, or attribute onWikipediaPage,AsyncWikipediaPage,Wikipedia, orAsyncWikipedia. This validates thatATTRIBUTES_MAPPINGin_base_wikipedia_page.pyis in sync with the actual page properties. If the script fails, add the missing entries toATTRIBUTES_MAPPING(use an empty list[]for properties that use_param_cacheinstead of_called).
Run the full validation suite (pre-commit, type check, ruff, coverage, pypi-html, tox, example):
make pre-release-checkπ CRITICAL: Always store script output to log files for analysis
When running scripts, especially for debugging or validation purposes:
- Always use
uv run: Useuv run python script.pyinstead of.venv/bin/python script.py - Always redirect output to a timestamped log file: Use
2>&1 | tee script_$(date +%Y%m%d_%H%M%S).logto capture both stdout and stderr - Read and analyze the log file: Use
read_filetool to examine the complete output instead of multiple command executions - Avoid repeated command executions: Don't use multiple
grep,head,tailcommands on the same script run - read the log once and analyze it - Include timestamps for debugging: Add timestamps to script output when investigating timing issues
- Use structured logging: Format output in a consistent way for easy parsing
Example:
# Good practice - single execution, complete capture with timestamp
uv run python script.py 2>&1 | tee script_$(date +%Y%m%d_%H%M%S).log
# Then analyze the complete output
# (use read_file tool to examine the timestamped log file)
# Bad practice - multiple executions
.venv/bin/python script.py | grep "error"
.venv/bin/python script.py | tail -10
.venv/bin/python script.py | head -20This approach ensures consistent analysis and avoids wasting time with repeated command executions.
wikipediaapi/β Main package (single__init__.pymodule)tests/β Unit tests (*test.pyfiles,mock_data.pyfor test fixtures)pyproject.tomlβ Package metadata, dependencies, and build configurationMakefileβ All build, test, and release commandstox.iniβ Multi-Python test configuration.pre-commit-config.yamlβ Pre-commit hook definitions