Skip to content

FPM-946: Categorical flagging system#72

Merged
richjam merged 6 commits intomainfrom
feature/FPM-945-simple-flagging-system
Apr 14, 2026
Merged

FPM-946: Categorical flagging system#72
richjam merged 6 commits intomainfrom
feature/FPM-945-simple-flagging-system

Conversation

@richjam
Copy link
Copy Markdown
Collaborator

@richjam richjam commented Apr 9, 2026

Overview

The previous flagging system requires all flag values to be powers of two, enabling bitwise combination of multiple flags on a single column. This design suits our FDRI workflows where multiple flag conditions can co-exist on a single observation.

However, many users come with existing flagging schemes that do not follow this convention - for example, legacy datasets using codes like {good: 0, bad: 1, suspect: 2} or string codes like {good: "G", bad: "B", suspect: "S"}. These don't follow the bitwise integer pattern. There will be some categorical schemes that need to be considered mutually exclusive - row holds exactly one flag value, not a combination. There will be others that can have combinations, but obviously won't be able to use the bitwise combination, so instead will need appending into a list of flag values.

This PR implements a categorical flag system as an alternative to the existing bitwise system.

Main changes

  • Categorical flag system

    • CategoricalFlagsupports arbitrary int or str flag values with no power-of-two constraint. Columns operate in either scalar mode (one value per row) or list mode (multiple values per row. Validation is done to ensure that duplicate values are rejected.
    • A complimentary CategoricalFlagColumn is introduced that defines how flags are added/removed via the add_flag and remove_flag methods on the base TimeFrame class.
      • Scalar mode: add_flag sets the value where the expression is true (with an overwrite option to control whether existing values are replaced); remove_flag sets the value to null.
      • List mode: add_flag appends the value to the list (if it doesn't already exist); remove_flag removes it from the list.
  • flags/ directory

    • bitwise.py and flag_manager.py have been moved into src/time_stream/flags/, split across four modules:
      • flag_system.py - FlagSystemBase (mixin defining the shared interface: system_name, to_dict, get_flag, value_type, validate_column) and FlagMeta (shared metaclass base providing repr, eq, and hash on the
        class itself rather than its instances).
      • bitwise_flag_system.py - BitwiseFlag, now inheriting from FlagSystemBase and BitwiseMeta.
      • categorical_flag_system.py - CategoricalFlag, inheriting from FlagSystemBase and CategoricalMeta.
      • flag_manager.py - updated FlagManager registry supporting both system types, with FlagColumn promoted to an abstract base class. BitwiseFlagColumn and CategoricalFlagColumn are concrete subclasses implementing the
        appropriate add_flag, remove_flag, encode, and decode semantics.

Other changes

  • TimeWindow.from_tuple() class method added to construct a TimeWindow from a 2- or 3-element tuple rather than doing this within methods in the TimeFrame class.
  • TimeManager property configuration refactored: the single _configure_period_properties() method split into individual @staticmethod methods (_configure_resolution_property, _configure_offset_property,
    _configure_alignment_property, _configure_periodicity_property)
  • Operation.register and Operation.get type signatures now use a TypeVar bound to Operation instead of Self, fixing an issue that caused IDE warnings.
  • ComparisonCheck (qc.py): is_in operator now wraps a scalar compare_to value in a list, fixing a case where a non-list value was passed directly to pl.Expr.is_in.
  • get_date_columns (calculations.py): added explicit return None at the end of the function to satisfy type checkers.
  • New exception classes: CategoricalFlagError, CategoricalFlagTypeError, CategoricalFlagValueError, CategoricalFlagUnknownError.
  • Other minor type hint fixes and IDE warning cleanup

Note

Files under src/time_stream/flags/ and tests/time_stream/flags/ will appear as new files in the diff rather than renames, because the content changed significantly during the move. The old bitwise.py and flag_manager.py at the top-level time_stream/ package have been deleted. Sorry this makes the git diff difficult to follow...

Usage examples

Categorical flag system - integer values (scalar mode)

import polars as pl
from time_stream import TimeFrame
from datetime import datetime

tf = TimeFrame(
  pl.DataFrame({
      "time": [datetime(2025, 1, 1), datetime(2025, 1, 2), datetime(2025, 1, 3)],
      "value": [10.0, None, 30.0],
  }), "time")

tf.register_flag_system("qc", {"good": 0, "missing": 1, "suspect": 2}, flag_type="categorical")
tf.init_flag_column("qc", "qc_flag")

tf.add_flag("qc_flag", "missing", pl.col("value").is_null())
tf.add_flag("qc_flag", "suspect", pl.col("value").gt(25), overwrite=False)

print(tf.df)
 
# qc_flag: [None, 1, 2]

Categorical flag system - string values (scalar mode)

tf.register_flag_system("qc2", {"good": "G", "missing": "M", "suspect": "S"}, flag_type="categorical")
tf.init_flag_column("qc2", "qc2_flag")

tf.add_flag("qc2_flag", "M", pl.col("value").is_null())
tf.add_flag("qc2_flag", "S", pl.col("value").gt(25), overwrite=False)

print(tf.df)

# qc_flag: [None, M, S]

List mode - multiple flags per row

tf.register_flag_system("qc3", {"A": 1, "B": 2, "C": 3}, flag_type="categorical")
tf.init_flag_column("qc3", "qc3_flag", list_mode=True)

tf.add_flag("qc3_flag", "A", pl.col("value").gt(0))
tf.add_flag("qc3_flag", "C", pl.col("value").gt(25))

print(tf.df)
 
# qc_flags: [[1], [], [1, 3]]   

Decode and encode

decoded = tf.decode_flag_column("qc_flag")
# qc_flag values: 0 -> "good", 1 -> "missing", 2 -> "suspect"
 
encoded = decoded.encode_flag_column("qc_flag")
# back to raw integer values

@richjam richjam requested a review from simsta87 April 9, 2026 15:55
Comment thread src/time_stream/flags/flag_manager.py Outdated
Comment thread src/time_stream/flags/categorical_flag_system.py
@richjam richjam requested a review from simsta87 April 13, 2026 14:01
@richjam richjam merged commit db439c7 into main Apr 14, 2026
3 checks passed
@richjam richjam deleted the feature/FPM-945-simple-flagging-system branch April 14, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants