Skip to content

Commit 4e34a7c

Browse files
committed
docs(fix): "Set up incremental updates" shouldn't be nested under "Create a data support main server"
1 parent 7f3a681 commit 4e34a7c

2 files changed

Lines changed: 40 additions & 40 deletions

File tree

docs/deploy/servers/data-registry.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Tinyproxy
1313
#. Update the allowed IP addresses in the ``pillar/tinyproxy.sls`` file.
1414
#. Deploy the ``docs`` service, when ready.
1515

16-
Update Salt and halt jobs
17-
~~~~~~~~~~~~~~~~~~~~~~~~~
16+
Update Salt configuration and halt jobs
17+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1818

1919
#. Check that ``docker.uid`` in the server's Pillar file matches the entry in the ``/etc/passwd`` file for the ``docker.user`` (``deployer``).
2020
#. Change ``cron.present`` to ``cron.absent`` in the ``salt/registry/init.sls`` file.

docs/deploy/servers/data-support.rst

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,42 @@
11
Data support
22
============
33

4+
Set up incremental updates
5+
--------------------------
6+
7+
This creates a cron job to run a ``scrapy crawl`` command. The `DatabaseStore <https://kingfisher-collect.readthedocs.io/en/latest/contributing/extensions/database_store.html>`__ extension implements the incremental updates.
8+
9+
#. `Choose a spider <https://kingfisher-collect.readthedocs.io/en/latest/spiders.html>`__ that collects the desired data. Prefer the spider that:
10+
11+
- Accepts a ``from_date`` spider argument, preferably at the same granularity as the cron schedule
12+
- Is fastest: for example, ``_bulk``, instead of ``_api``
13+
- Reduces processing: for example, a spider that yields compiled releases
14+
15+
If needed, improve the spider in `Kingfisher Collect <https://github.com/open-contracting/kingfisher-collect>`__.
16+
#. Add an entry to the ``python_apps.kingfisher_collect.crawls`` section of the ``pillar/kingfisher_main.sls`` file:
17+
18+
``identifier``
19+
An uppercase, underscore-separated name, like ``DOMINICAN_REPUBLIC``.
20+
``spider``
21+
The spider's name, like ``dominican_republic_api``.
22+
``crawl_time``
23+
The current date, like ``'2025-05-06'`` (though, any date works).
24+
``spider_arguments`` (optional)
25+
Any `spider arguments <https://kingfisher-collect.readthedocs.io/en/latest/spiders.html#spider-arguments>`__.
26+
27+
If the spider doesn't yield compiled releases, add ``-a compile_releases=true``.
28+
``cardinal`` (optional)
29+
``True``, to enable a pipeline involving `Cardinal <https://cardinal.readthedocs.io/en/latest/>`__.
30+
``users`` (optional)
31+
A list of additional :ref:`PostgreSQL users<pg-users>` that need read access to the database.
32+
``day`` (optional)
33+
The day of the month on which to run the cron job.
34+
35+
Required if an incremental update takes longer than a day.
36+
37+
#. If an *initial crawl* would take longer than a day, run the `scrapy crawl <https://github.com/open-contracting/deploy/blob/main/salt/kingfisher/collect/files/cron.sh>`__ command manually.
38+
#. :doc:`Deploy the server<../deploy>`.
39+
440
Create a data support main server
541
---------------------------------
642

@@ -19,8 +55,8 @@ Dependents
1955

2056
#. Notify RBC Group of the new domain name for the new PostgreSQL server.
2157

22-
Update Salt and halt jobs
23-
~~~~~~~~~~~~~~~~~~~~~~~~~
58+
Update Salt configuration and halt jobs
59+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2460

2561
#. Check that ``docker.uid`` in the server's Pillar file matches the entry in the ``/etc/passwd`` file for the ``docker.user`` (``deployer``).
2662
#. Change ``cron.present`` to ``cron.absent`` in the ``salt/pelican/backend/init.sls`` file.
@@ -63,42 +99,6 @@ Kingfisher Collect
6399

64100
Once DNS has propagated, :ref:`update-spiders`.
65101

66-
Set up incremental updates
67-
^^^^^^^^^^^^^^^^^^^^^^^^^^
68-
69-
This creates a cron job to run a ``scrapy crawl`` command. The `DatabaseStore <https://kingfisher-collect.readthedocs.io/en/latest/contributing/extensions/database_store.html>`__ extension implements the incremental updates.
70-
71-
#. `Choose a spider <https://kingfisher-collect.readthedocs.io/en/latest/spiders.html>`__ that collects the desired data. Prefer the spider that:
72-
73-
- Accepts a ``from_date`` spider argument, preferably at the same granularity as the cron schedule
74-
- Is fastest: for example, ``_bulk``, instead of ``_api``
75-
- Reduces processing: for example, a spider that yields compiled releases
76-
77-
If needed, improve the spider in `Kingfisher Collect <https://github.com/open-contracting/kingfisher-collect>`__.
78-
#. Add an entry to the ``python_apps.kingfisher_collect.crawls`` section of the ``pillar/kingfisher_main.sls`` file:
79-
80-
``identifier``
81-
An uppercase, underscore-separated name, like ``DOMINICAN_REPUBLIC``.
82-
``spider``
83-
The spider's name, like ``dominican_republic_api``.
84-
``crawl_time``
85-
The current date, like ``'2025-05-06'`` (though, any date works).
86-
``spider_arguments`` (optional)
87-
Any `spider arguments <https://kingfisher-collect.readthedocs.io/en/latest/spiders.html#spider-arguments>`__.
88-
89-
If the spider doesn't yield compiled releases, add ``-a compile_releases=true``.
90-
``cardinal`` (optional)
91-
``True``, to enable a pipeline involving `Cardinal <https://cardinal.readthedocs.io/en/latest/>`__.
92-
``users`` (optional)
93-
A list of additional :ref:`PostgreSQL users<pg-users>` that need read access to the database.
94-
``day`` (optional)
95-
The day of the month on which to run the cron job.
96-
97-
Required if an incremental update takes longer than a day.
98-
99-
#. If an *initial crawl* would take longer than a day, run the `scrapy crawl <https://github.com/open-contracting/deploy/blob/main/salt/kingfisher/collect/files/cron.sh>`__ command manually.
100-
#. :doc:`Deploy the server<../deploy>`.
101-
102102
Copy incremental data
103103
^^^^^^^^^^^^^^^^^^^^^
104104

0 commit comments

Comments
 (0)