Skip to content

Latest commit

 

History

History
409 lines (302 loc) · 11.3 KB

File metadata and controls

409 lines (302 loc) · 11.3 KB

orgdatacore - Python

Python port of the Go orgdatacore library for organizational data management.

This library provides thread-safe access to organizational data including employees, teams, organizations, pillars, and team groups.

Installation

Using UV (Recommended)

UV is a fast Python package installer.

# Install the package
uv pip install -e .

# With GCS support (recommended for production)
uv pip install -e ".[gcs]"

# Or use uv sync for development (installs dev dependencies)
uv sync

# With GCS support
uv sync --extra gcs

Using pip

# Install from source
pip install -e .

# With GCS support (recommended for production)
pip install -e ".[gcs]"

# With development dependencies
pip install -e ".[dev]"

Quick Start

Using GCS (Recommended for Production)

from orgdatacore import Service, GCSConfig
from orgdatacore.datasources import GCSDataSourceWithSDK
from datetime import timedelta

# Configure GCS data source
config = GCSConfig(
    bucket="your-bucket",
    object_path="path/to/org_data.json",
    project_id="your-project",
    check_interval=timedelta(minutes=5),
)
source = GCSDataSourceWithSDK(config)

# Option 1: Constructor injection (recommended for simple cases)
service = Service(data_source=source)

# Option 2: Lazy loading (matches Go API, good for deferred loading)
# service = Service()
# service.load_from_data_source(source)

# Query employees
employee = service.get_employee_by_uid("jsmith")
if employee:
    print(f"Found: {employee.full_name}")

Using a Custom DataSource (S3, Azure, etc.)

The library supports pluggable data sources. Implement the DataSource interface for your storage backend:

from orgdatacore import Service
from orgdatacore.interface import DataSource
from typing import BinaryIO, Callable, Optional
from io import BytesIO

class S3DataSource(DataSource):
    """Example custom DataSource for AWS S3."""
    
    def __init__(self, bucket: str, key: str):
        self.bucket = bucket
        self.key = key
    
    def load(self) -> BinaryIO:
        import boto3
        s3 = boto3.client('s3')
        response = s3.get_object(Bucket=self.bucket, Key=self.key)
        return BytesIO(response['Body'].read())
    
    def watch(self, callback: Callable[[], Optional[Exception]]) -> Optional[Exception]:
        # Implement watching logic (polling, S3 events, etc.)
        return None
    
    def __str__(self) -> str:
        return f"s3://{self.bucket}/{self.key}"

# Use your custom data source
service = Service()
source = S3DataSource("my-org-bucket", "data/org_data.json")
service.load_from_data_source(source)

API Reference

Service

The main class providing access to organizational data.

Core Data Access

  • get_employee_by_uid(uid: str) -> Employee | None
  • get_employee_by_slack_id(slack_id: str) -> Employee | None
  • get_employee_by_github_id(github_id: str) -> Employee | None
  • get_manager_for_employee(uid: str) -> Employee | None
  • get_team_by_name(team_name: str) -> Team | None
  • get_org_by_name(org_name: str) -> Org | None
  • get_pillar_by_name(pillar_name: str) -> Pillar | None
  • get_team_group_by_name(team_group_name: str) -> TeamGroup | None

Membership Queries

  • get_teams_for_uid(uid: str) -> list[str]
  • get_teams_for_slack_id(slack_id: str) -> list[str]
  • get_team_members(team_name: str) -> list[Employee]
  • is_employee_in_team(uid: str, team_name: str) -> bool
  • is_slack_user_in_team(slack_id: str, team_name: str) -> bool

Organization Queries

  • is_employee_in_org(uid: str, org_name: str) -> bool
  • is_slack_user_in_org(slack_id: str, org_name: str) -> bool
  • get_user_organizations(slack_user_id: str) -> list[OrgInfo]

Data Management

  • get_version() -> DataVersion
  • load_from_data_source(source: DataSource) -> None
  • start_data_source_watcher(source: DataSource) -> None

Hierarchy Queries

  • get_hierarchy_path(entity_name: str, entity_type: str) -> list[HierarchyPathEntry]
  • get_descendants_tree(entity_name: str) -> HierarchyNode | None

Jira Queries

  • get_jira_projects() -> list[str]
  • get_jira_components(project: str) -> list[str]
  • get_teams_by_jira_project(project: str) -> list[JiraOwnerInfo]
  • get_teams_by_jira_component(project: str, component: str) -> list[JiraOwnerInfo]
  • get_jira_ownership_for_team(team_name: str) -> list[dict]

Component Queries

  • get_component_by_name(name: str) -> Component | None
  • get_all_components() -> list[Component]

Enumeration

  • get_all_employee_uids() -> list[str]
  • get_all_team_names() -> list[str]
  • get_all_org_names() -> list[str]
  • get_all_pillar_names() -> list[str]
  • get_all_team_group_names() -> list[str]

DataSource Protocol

Implement this protocol for custom storage backends (no inheritance needed):

from typing import BinaryIO, Callable, Optional

class MyDataSource:  # No inheritance needed!
    def load(self) -> BinaryIO:
        """Return a file-like object containing JSON data."""
        ...
    
    def watch(self, callback: Callable[[], Optional[Exception]]) -> Optional[Exception]:
        """Start watching for changes, call callback when data updates."""
        ...
    
    def __str__(self) -> str:
        """Return a description of this data source."""
        ...

Data Sources

GCSDataSource / GCSDataSourceWithSDK

For production use with Google Cloud Storage (requires google-cloud-storage):

from orgdatacore import GCSConfig
from orgdatacore.datasources import GCSDataSourceWithSDK
from datetime import timedelta

config = GCSConfig(
    bucket="your-bucket",
    object_path="path/to/data.json",
    project_id="your-project",
    check_interval=timedelta(minutes=5),
)
source = GCSDataSourceWithSDK(config)

Async API

For asyncio-based applications (FastAPI, aiohttp, etc.), use AsyncService:

Basic Async Usage

import asyncio
from orgdatacore import AsyncService, GCSConfig
from orgdatacore._async import AsyncGCSDataSource

async def main():
    # Configure async GCS data source
    config = GCSConfig(
        bucket="your-bucket",
        object_path="path/to/org_data.json",
        project_id="your-project",
    )
    source = AsyncGCSDataSource(config)

    # Create and initialize async service
    service = AsyncService()
    await service.load_from_data_source(source)

    # Query employees (all methods are async)
    employee = await service.get_employee_by_uid("jsmith")
    if employee:
        print(f"Found: {employee.full_name}")

    # Check team membership
    is_member = await service.is_employee_in_team("jsmith", "platform-team")

    # Get hierarchy path
    path = await service.get_hierarchy_path("platform-team", "team")
    for entry in path:
        print(f"  {entry.type}: {entry.name}")

asyncio.run(main())

Using with Data Watcher

import asyncio
from orgdatacore import AsyncService, GCSConfig
from orgdatacore._async import AsyncGCSDataSource

async def run_with_auto_reload():
    config = GCSConfig(
        bucket="your-bucket",
        object_path="org_data.json",
        project_id="your-project",
    )
    source = AsyncGCSDataSource(config)
    service = AsyncService()

    # Start watcher - loads data and monitors for changes
    await service.start_data_source_watcher(source)

    try:
        # Service auto-reloads when data changes
        while True:
            teams = await service.get_all_team_names()
            print(f"Currently tracking {len(teams)} teams")
            await asyncio.sleep(60)
    finally:
        await service.stop_watcher()

asyncio.run(run_with_auto_reload())

AsyncService API Reference

The AsyncService has the same methods as Service, but all are async:

Core Data Access

  • await get_employee_by_uid(uid)Employee | None
  • await get_employee_by_email(email)Employee | None
  • await get_employee_by_slack_id(slack_id)Employee | None
  • await get_employee_by_github_id(github_id)Employee | None
  • await get_manager_for_employee(uid)Employee | None
  • await get_team_by_name(name)Team | None
  • await get_org_by_name(name)Org | None
  • await get_pillar_by_name(name)Pillar | None
  • await get_team_group_by_name(name)TeamGroup | None
  • await get_component_by_name(name)Component | None

Membership Queries

  • await get_teams_for_uid(uid)list[str]
  • await get_teams_for_slack_id(slack_id)list[str]
  • await get_team_members(team_name)tuple[Employee, ...]
  • await get_org_members(org_name)tuple[Employee, ...]
  • await is_employee_in_team(uid, team_name)bool
  • await is_slack_user_in_team(slack_id, team_name)bool
  • await is_employee_in_org(uid, org_name)bool
  • await is_slack_user_in_org(slack_id, org_name)bool

Hierarchy Queries

  • await get_hierarchy_path(entity_name, entity_type)list[HierarchyPathEntry]
  • await get_descendants_tree(entity_name)HierarchyNode | None
  • await get_user_organizations(uid)tuple[OrgInfo, ...]

Jira Queries

  • await get_jira_projects()list[str]
  • await get_jira_components(project)list[str]
  • await get_teams_by_jira_project(project)list[JiraOwnerInfo]
  • await get_teams_by_jira_component(project, component)list[JiraOwnerInfo]
  • await get_jira_ownership_for_team(team_name)list[dict]

Enumeration

  • await get_all_employee_uids()list[str]
  • await get_all_team_names()list[str]
  • await get_all_org_names()list[str]
  • await get_all_pillar_names()list[str]
  • await get_all_team_group_names()list[str]
  • await get_all_employees()tuple[Employee, ...]
  • await get_all_teams()tuple[Team, ...]
  • await get_all_orgs()tuple[Org, ...]
  • await get_all_pillars()tuple[Pillar, ...]
  • await get_all_team_groups()tuple[TeamGroup, ...]
  • await get_all_components()tuple[Component, ...]

Data Management

  • await load_from_data_source(source)None
  • await start_data_source_watcher(source)None
  • await stop_watcher()None
  • is_healthy()bool (sync)
  • is_ready()bool (sync)
  • get_version()DataVersion (sync)

Thread Safety

The Service class is thread-safe. All read operations can be performed concurrently, and data reloading is atomic.

The AsyncService class is asyncio-safe, using asyncio.Lock to protect data access during concurrent async operations.

Development

Using UV (Recommended)

# Set up development environment
uv sync

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=orgdatacore --cov-report=html

# Type checking
uv run mypy orgdatacore

# Code formatting
uv run black orgdatacore tests
uv run isort orgdatacore tests

# Linting
uv run ruff check orgdatacore tests

Using pip

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=orgdatacore --cov-report=html

# Type checking
mypy orgdatacore

# Code formatting
black orgdatacore tests
isort orgdatacore tests

Examples

Run the GCS demo with real data:

# Make sure you're logged in
gcloud auth application-default login

# Install GCS support
pip install -e ".[gcs]"

# Run the demo
python examples/gcs_demo.py

License

Apache-2.0