diff --git a/docs/architecture/hld.md b/docs/architecture/hld.md new file mode 100644 index 00000000..830a99fc --- /dev/null +++ b/docs/architecture/hld.md @@ -0,0 +1,251 @@ +# High Level Design (HLD) – SAP Sector + +## 1. Purpose + +This document provides a **High Level Design (HLD)** for the SAP Sector service. + +It exists to: +- support **assurance and assessment** +- clearly document the service for **developers, newcomers, and external reviewers** +- explain **all major service components, user types, data types, and how they interact** + +For deeper technical detail (class/module/interface level), see the Low Level Design (LLD). + +--- + +## 2. Scope + +### In scope +- User types and access patterns +- Service components and responsibilities +- Data stores and search subsystem +- External integrations (DfE Sign-in) +- High-level interaction flows (search, view details, compare) +- High-level operational and security considerations + +### Out of scope +- Method/class-level design, internal module structure (LLD) +- Full database schema (ERD) + +--- + +## 3. Service overview + +SAP Sector is a web application that enables schools and local authorities to: +- search for schools +- view school details +- compare similar schools + +The service is implemented as an ASP.NET Core MVC application with: +- Single Sign-On using **DfE Sign-in** +- **PostgreSQL** as the authoritative data store +- **Lucene** for fast full-text search and filtering +- GOV.UK / DfE frontend assets (npm-managed and served from `wwwroot`) + +--- + +## 4. Users and user types + +### 4.1 End user types + +1. **Unauthenticated user (if supported)** + - Accesses public pages and journeys that do not require sign-in. + +2. **Authenticated user (DfE Sign-in)** + - Typical users: school staff and local authority staff. + - Access to features is determined by claims/roles/policies. + +### 4.2 Operational user types + +3. **Service operator / Support** + - Views health endpoints and operational metrics/logs. + - Investigates incidents and monitors uptime and performance. + - May perform operational actions (for example, initiating data refresh/reindex if implemented). + +--- + +## 5. Data and information types + +### 5.1 Authoritative service data (PostgreSQL) + +PostgreSQL is the system of record for persisted data used by the service. Typical categories include: +- School identity and attributes (e.g., URN, name, address, LA) +- Attributes required for filtering and comparison (phase/type/status) +- Metrics or performance-related values (if stored) +- Relationships used for comparisons (if stored rather than derived) + +> The full schema is documented separately in the ERD. + +### 5.2 Search/index data (Lucene) + +Lucene stores **derived** search data optimised for: +- full-text search +- fast filtering +- paging/ordering results + +Lucene is not authoritative. It is built from authoritative sources (commonly PostgreSQL and/or imported datasets). + +### 5.3 Identity and access data (DfE Sign-in) + +DfE Sign-in provides: +- authentication identity +- claims/roles/organisation context (as configured) + +This information is used for route protection and feature access decisions. + +--- + +## 6. Service components + +### 6.1 Web Application (ASP.NET Core MVC) + +Responsibilities: +- Handles HTTP requests and responses +- Input validation and routing +- Authentication/authorisation enforcement +- Rendering Razor views +- Mapping application results into ViewModels + +Key characteristics: +- Thin controllers (orchestration only) +- No direct database or Lucene access from controllers + +--- + +### 6.2 Application/Service Layer (Use-cases) + +Responsibilities: +- Implements user journeys as **use-cases**, for example: + - Search schools + - View school details + - Compare similar schools +- Coordinates repository access and search access +- Enforces business rules and validation +- Returns application-level DTOs/results to the Web layer + +--- + +### 6.3 Infrastructure Layer + +Responsibilities: +- PostgreSQL access via repository implementations +- Lucene indexing and searching implementation +- External adapters and technical services (where applicable) +- Configuration binding and environment integration + +Key characteristic: +- Owns all persistence and Lucene-specific concerns + +--- + +### 6.4 Data Stores + +- **PostgreSQL**: authoritative persistent store +- **Lucene index**: derived, optimised store for search + +--- + +## 7. Interactions and key flows (high level) + +### 7.1 Search schools (primary journey) + +1. User submits a search term and optional filters +2. Web layer validates input and calls the Search use-case +3. Application service builds a safe search request +4. Infrastructure queries Lucene and returns paged results +5. Web layer maps results to a ViewModel and renders the page + +Systems involved: +- Web (MVC) +- Application services +- Lucene + +--- + +### 7.2 View school details + +1. User selects a school from search results +2. Web calls the “Get school details” use-case +3. Application service queries PostgreSQL repository (authoritative) +4. Details are mapped and rendered + +Systems involved: +- Web (MVC) +- Application services +- PostgreSQL + +--- + +### 7.3 Compare similar schools + +1. User selects a school and chooses “compare” +2. Web calls the Compare use-case +3. Application service determines a set of similar schools: + - may be derived via rules and/or Lucene and/or stored relationships (depending on implementation) +4. Application service loads required details from PostgreSQL +5. Web renders comparison view + +Systems involved: +- Web (MVC) +- Application services +- PostgreSQL and possibly Lucene + +--- + +### 7.4 Sign-in and protected routes + +1. User requests a protected feature/page +2. Web layer redirects to DfE Sign-in for authentication +3. User returns with identity/claims +4. Authorisation rules/policies determine access + +Systems involved: +- Web (MVC) +- DfE Sign-in + +--- + +## 8. Diagrams + +### 8.1 System context + +WIP + + +### 8.2 Component view + +WIP + +## 9. Non-functional considerations (assurance) + +### 9.1 Security +- DfE Sign-in for authentication +- Explicit authorisation controls for protected routes +- Strict Content Security Policy (CSP) approach +- No secrets in source control +- Avoid logging sensitive data, tokens, or unnecessary PII + +### 9.2 Performance +- Search requests are paged and limited +- PostgreSQL is used as source of truth for detail pages +- Indexing strategy ensures Lucene remains performant + +### 9.3 Availability and monitoring + +- Health endpoints provide basic and detailed status: + - /healthcheck + - /health +- Health responses must not expose sensitive infrastructure detail + +## 10. Assumptions and constraints + +- The service is maintained as a public repository and must follow secure development practices. +- Lucene is a derived data store and may require reindexing after schema or data model changes. +- PostgreSQL is authoritative for persisted entities. + +## 11. References + +- Developer handbook: /docs/developers/ +- ADRs (decisions and rationale): /docs/adrs/ +- ERD (data model): /docs/data/erd.md +- Low Level Design (LLD): /docs/architecture/lld.md \ No newline at end of file diff --git a/docs/architecture/lld.md b/docs/architecture/lld.md new file mode 100644 index 00000000..9fef19fb --- /dev/null +++ b/docs/architecture/lld.md @@ -0,0 +1,302 @@ +# Low Level Design (LLD) – SAP Sector + +## 1. Purpose + +This document provides the **Low Level Design (LLD)** for the SAP Sector service. + +It describes: +- internal service components and boundaries +- data types and models used at each layer +- user types and how requests flow through the system +- how services, repositories, and search interact + +This document exists to support: +- assurance and assessment +- developer onboarding +- clear understanding of internal interactions + +For a high-level overview, see `/docs/architecture/hld.md`. +For architectural rationale, see `/docs/adrs/`. + +--- + +## 2. Architectural layering + +The service uses a **layered architecture** with strict dependency rules. + +### Layers + +1. **Web (ASP.NET Core MVC)** +2. **Core (Domain + Application)** +3. **Infrastructure** +4. **Data stores** + +### Dependency rules + +- Web depends on Core +- Core defines interfaces and business logic +- Infrastructure implements Core interfaces +- Infrastructure depends on external systems (Postgres, Lucene) +- Dependencies always point **inwards** +- Controllers must never access repositories or Lucene directly + +--- + +## 3. User types (implementation view) + +### 3.1 End users + +- **Unauthenticated users** + - Access public routes (if enabled) + - Cannot access protected features + +- **Authenticated users (DfE Sign-in)** + - Identified by DfE Sign-in + - Access controlled via ASP.NET Core authorisation + - Claims treated as untrusted input + +### 3.2 Operational users + +- **Service operators** + - Monitor health endpoints + - Investigate logs and telemetry + - Perform operational tasks (e.g. reindex) if implemented + +--- + +## 4. Web layer (SAPSec.Web) + +### Responsibilities + +- Handle HTTP requests and routing +- Enforce authentication and authorisation +- Validate input (model binding / ModelState) +- Map request data into application DTOs +- Call application services (use-cases) +- Map service results into ViewModels +- Render Razor views or return appropriate responses + +### Key rules + +- Controllers must be thin +- No database access +- No Lucene access +- No business rules +- No infrastructure types exposed to views + +### Typical components + +- Controllers (feature-based) +- ViewModels +- Razor Views +- Filters and middleware (cross-cutting) + +--- + +## 5. Application layer (SAPSec.Core) + +### Responsibilities + +The application layer implements **use-cases** that represent user actions. + +Examples: +- Search schools +- View school details +- Compare similar schools + +Each use-case: +- accepts input DTOs +- validates business rules +- coordinates repositories and search services +- returns result DTOs or result objects + +### Key rules + +- No MVC dependencies +- No direct database or Lucene access +- Depends only on interfaces +- Testable without ASP.NET hosting + +--- + +## 6. Infrastructure layer (SAPSec.Infrastructure) + +### Responsibilities + +- Implement repository interfaces (PostgreSQL) +- Implement search interfaces (Lucene) +- Handle indexing and querying logic +- Integrate with external systems if required +- Map persistence/search results into DTOs + +### Key rules + +- Owns all Postgres and Lucene details +- No MVC dependencies +- Does not expose ORM or Lucene types outside the layer + +--- + +## 7. Data access (PostgreSQL) + +### Repository design + +Repositories: +- are async-only +- expose domain-oriented methods +- return DTOs or domain models (not ORM entities) +- enforce paging for list operations + +Examples: +- Get school by URN +- Get multiple schools by URN list +- Load attributes required for comparison + +### Data rules + +- PostgreSQL is the authoritative data store +- All queries must be parameterised +- Avoid N+1 queries +- Use appropriate indexes (documented in ERD) + +--- + +## 8. Search subsystem (Lucene) + +### Responsibilities + +- Provide fast full-text and filtered search +- Support paging and ordering +- Return candidate results for further processing + +### Search rules + +- Lucene is **not** authoritative +- Input must be sanitised and escaped +- Query length and result limits enforced +- Lucene-specific types never escape Infrastructure + +### Indexing rules + +- Index schema defined in one place +- Index derived from authoritative data +- Reindex required when schema or key data changes + +--- + +## 9. Data models and boundaries + +### ViewModels (Web) +- Used only for rendering views +- May contain formatted/display-friendly values +- Must not include persistence or search types + +### Application DTOs (Core) +- Represent inputs and outputs of use-cases +- Independent of UI and infrastructure +- Used in services and tests + +### Domain models (Core) +- Represent business concepts and rules +- No UI or persistence concerns + +### Persistence/search models (Infrastructure) +- Map to SQL rows or Lucene documents +- Must not be exposed outside Infrastructure + +--- + +## 10. Key interactions (low-level flows) + +### 10.1 Search schools + +1. Controller receives search input +2. Input validated at Web boundary +3. Search use-case invoked with DTO +4. Search service calls Lucene implementation +5. Lucene returns paged result DTO +6. Controller maps to ViewModel +7. View rendered + +--- + +### 10.2 View school details + +1. Controller receives school identifier +2. Use-case invoked +3. Repository queried (PostgreSQL) +4. Data mapped to DTO +5. Controller maps DTO to ViewModel +6. View rendered + +--- + +### 10.3 Compare similar schools + +1. Controller invokes compare use-case +2. Use-case determines similar school set: + - via rules and/or Lucene and/or stored relationships +3. Repository loads authoritative details +4. DTO assembled for comparison +5. Controller maps to ViewModel +6. Comparison view rendered + +--- + +## 11. Authentication and authorisation (DfE Sign-in) + +### Implementation rules + +- Protected routes explicitly use `[Authorize]` +- Policy-based authorisation used for complex rules +- Claims treated as untrusted input +- Tokens and headers are never logged +- Services do not read `HttpContext` directly + +--- + +## 12. Error handling and logging + +### Error handling + +- Expected outcomes returned as result objects +- Unexpected exceptions handled globally +- User-facing errors are safe and non-technical + +### Logging + +- Structured logging only +- Avoid PII, tokens, and secrets +- Logs support operational monitoring and assurance + +--- + +## 13. Testing considerations (design-level) + +- Unit tests: + - application services + - business rules + - search query construction +- Integration tests: + - repositories against Postgres (where feasible) + - Lucene indexing and querying (where feasible) +- End-to-end tests: + - critical user journeys via Playwright + +--- + +## 14. Trust boundaries (assurance) + +- User input is untrusted at Web boundary +- Claims from DfE Sign-in are untrusted and validated +- PostgreSQL is the source of truth +- Lucene is derived data for performance only + +--- + +## 15. References + +- High Level Design: `/docs/architecture/hld.md` +- ERD (data model): `/docs/data/erd.md` +- ADRs (decisions): `/docs/adrs/` +- Developer handbook: `/docs/developers/` diff --git a/docs/data/erd.md b/docs/data/erd.md new file mode 100644 index 00000000..99c28517 --- /dev/null +++ b/docs/data/erd.md @@ -0,0 +1,146 @@ +# Entity Relationship Diagram (ERD) – SAP Sector + +## 1. Purpose + +This document describes the **logical PostgreSQL data model** for the SAP Sector service, including: +- the main data entities +- relationships (PK/FK) +- how data supports user journeys (search, view, compare) + +This is provided for: +- assurance and assessment +- onboarding (new developers and reviewers) +- clear understanding of how data is stored and accessed + +> **Source of truth:** the database schema/migrations. +> Update this ERD whenever schema changes. + +--- + +## 2. Data stores and scope + +### PostgreSQL (authoritative) +PostgreSQL stores the authoritative data used by the service (school records, attributes, metrics, etc.). + +### Lucene (derived, not authoritative) +Lucene holds a derived search index built from authoritative sources (typically PostgreSQL and/or imported datasets). +Lucene is documented in developer docs and the HLD/LLD; it is not modelled as an ERD entity because it is not relational. + +--- + +## 3. Core entities (logical model) + +The service domain suggests these core logical entities: + +- **LocalAuthority** +- **School** +- **SchoolCharacteristic** (key/value attributes used for display/filter/compare) +- **SchoolMetric** (numeric/time-series values used for compare) +- **SimilarSchool** (optional: stored similarity relationships) + +If your implementation does not store some of these (e.g., similarity is computed dynamically), remove those entities from this ERD. + +--- + +## 4. Logical ERD (Mermaid) + +> WIP. + +--- + +## 5. Relationship definitions + +### LOCAL_AUTHORITY → SCHOOL (1-to-many) +- One local authority oversees many schools. +- A school belongs to one local authority (if known). + +### SCHOOL → SCHOOL_CHARACTERISTIC (1-to-many) +- A school has multiple characteristics (e.g., governance, admissions, etc.). +- Characteristics should be stable, queryable, and safe to display. + +### SCHOOL → SCHOOL_METRIC (1-to-many) +- A school has multiple metrics. +- Metrics may be time-bound (academic year) or current. + +### Similar schools (logical relationship) +The concept of “similar schools” is represented logically rather than as a direct relational join in the Mermaid ERD. + +If similarity is persisted: +- a join table (e.g. `SIMILAR_SCHOOL`) stores relationships between two schools +- both references point to `SCHOOL.urn` +- this relationship is many-to-many and self-referencing + +If similarity is derived: +- similarity is calculated dynamically using rules and/or search +- no persistent relationship table exists + +This approach avoids circular dependencies in the ERD and reflects the implementation accurately. + +--- + +## 6. Data used by key user journeys + +### 6.1 Search schools +- Primary search is served by Lucene (derived index). +- Filters typically rely on DB-backed attributes such as: + - phase + - school_type + - local_authority_id + - status / is_open + +When a user selects a result, the service should load authoritative details from PostgreSQL. + +### 6.2 View school details +- Fetch from `SCHOOL` and any related tables needed for display: + - characteristics + - metrics + - local authority + +### 6.3 Compare similar schools +- Identify a comparison set: + - from a similarity join table (if stored), or + - derived dynamically via rules/search +- Load authoritative details for all schools being compared from PostgreSQL. + +--- + +## 7. Indexing and constraints (assurance) + +Document the indexes that support performance and integrity. + +### Suggested constraints +- WIP + + + +### Suggested indexes (examples) +- WIP + +If similarity is stored: +- WIP + +--- + +## 8. Consistency with Lucene (derived data) + +Document how the Lucene index stays consistent with PostgreSQL: + +- **Indexed fields:** list the fields that are copied/derived into Lucene +- **Index trigger:** when indexing occurs (on import, scheduled job, manual run) +- **Reindex strategy:** how schema changes or data updates are handled + +Example (update to reality): +- School name and postcode are indexed for text search +- Phase/type/status are indexed for filtering +- A rebuild is required when index schema changes + +--- + +## 9. Change control + +Any schema change must update: +- database migrations/schema +- this ERD (`docs/data/erd.md`) +- dependent repository queries and mapping +- Lucene indexing schema if indexed fields change +- tests (unit/integration) covering impacted flows