Skip to content

Express indicators requirements using DSL rules and move towards #62#106

Draft
jpmckinney wants to merge 12 commits into
mainfrom
98-indicators
Draft

Express indicators requirements using DSL rules and move towards #62#106
jpmckinney wants to merge 12 commits into
mainfrom
98-indicators

Conversation

@jpmckinney
Copy link
Copy Markdown
Member

@jpmckinney jpmckinney commented Apr 22, 2026

Working on #98 and toward #62

This is where I left off when I got pulled onto new tasks.

The notes I had left myself about next steps:

  • - Add the new extracted functions to kingfisher-colab
    • Leave the column names ("number of indicators", etc.) and Boolean labels ("yes", "no") up to the calling code
    • Add tests
  • - Review indicators.schema.json
    • Make strict using ocdskit schema-strict (uniqueItems, no additionalProperties, min/max keywords, etc.)
    • Move indicators.schema.json to data-support, and validate it on pre-commit *
  • Compare existing notebooks to test whether the new code generates the same results
  • Update README.md as needed

* I think maybe I wanted to move the schema to data-support so that it could be retrievable by the registry? Not sure.


Some Claude notes I had kept around that I think I still need to work through. This related to some of the remaining differences across the red flags and other indicator files.

┌──────────────────┬───────────────────────────────────────────┬────────────────────────────┐
│      Aspect      │        check_usability_indicators         │ check_red_flags_indicators │
├──────────────────┼───────────────────────────────────────────┼────────────────────────────┤
│ Language support │ Yes (English/Spanish)                     │ No                         │
├──────────────────┼───────────────────────────────────────────┼────────────────────────────┤
│ Worksheet index  │ 0                                         │ 1                          │
├──────────────────┼───────────────────────────────────────────┼────────────────────────────┤
│ Columns selected │ [0, 3, 4, 9] or [0, 3, 4, 5, 9]           │ [0, 5, 6, 7]               │
├──────────────────┼───────────────────────────────────────────┼────────────────────────────┤
│ Merge key        │ "U_id"                                    │ "R_id"                     │
├──────────────────┼───────────────────────────────────────────┼────────────────────────────┤
│ Post-processing  │ Spanish: rename columns, translate values │ None                       │
└──────────────────┴───────────────────────────────────────────┴────────────────────────────┘

Common pattern:

rows = authenticate_gspread().open_by_key(spreadsheet_key).get_worksheet(N).get_all_values()
indicators = pd.DataFrame(rows).pipe(lambda df: df.rename(columns=df.iloc[0]).drop(df.index[0]))
return result.merge(indicators.iloc[:, columns], on=merge_key)

Refactoring options:

  1. Extract fetch helper - Create _fetch_indicator_metadata(spreadsheet_key, worksheet_index) that returns the DataFrame. Each function calls this then does its specific merge/transforms. Minimal change, removes 2 duplicate lines.
  2. Unified function with config - Single merge_indicator_metadata(result, spreadsheet_key, worksheet_index, columns, merge_key) plus separate Spanish translation logic. More DRY but the functions differ enough (language handling) that it may not be worth it.
  3. Keep separate - The functions have different enough signatures and behavior (language support vs none) that the 2-line duplication may be acceptable.

Given that usability has language-specific logic that red flags doesn't need, I'd suggest Option 1 (extract fetch helper) as the pragmatic choice.


Separately, I had noted "Retrieve _evaluate_rule to use in registry", which refers to this code, which might be useful for when we implement indicators in the registry.

def _evaluate_rule(rule, fields_list):
    """
    Return a table of the usability checks.
    Evaluate a DSL rule against a list of available fields.

    Rules can be:
    - A string: the field must be present in fields_list
    - {"all": [...]}: all sub-rules must be satisfied (logical AND)
    - {"any": [...]}: at least one sub-rule must be satisfied (logical OR)

    Returns True if the rule is satisfied, False otherwise.
    """
    if isinstance(rule, str):
        return rule in fields_list
    if isinstance(rule, dict):
        if "all" in rule:
            return all(_evaluate_rule(sub_rule, fields_list) for sub_rule in rule["all"])
        if "any" in rule:
            return any(_evaluate_rule(sub_rule, fields_list) for sub_rule in rule["any"])
    return False

@yolile
Copy link
Copy Markdown
Member

yolile commented Apr 23, 2026

Move indicators.schema.json to data-support, and validate it on pre-commit *

I'm moving it to kingfisher-colab, where pre-commit is running and failing now, but I will fix the issues there

ref open-contracting/kingfisher-colab#96

@yolile
Copy link
Copy Markdown
Member

yolile commented Apr 27, 2026

@jpmckinney do we still want to import everything all the notebooks need in component_enviroment or can we now import each function where needed only?

@jpmckinney
Copy link
Copy Markdown
Member Author

Up to you and the rest of the team - whatever is more ergonomic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants