Add 14 generator types to meet the 60+ claim documented on the website#47
Conversation
… claim gap New generators: TITLE, JOB_TITLE, NATIONALITY, COMPANY_NAME, DEPARTMENT, CURRENCY_CODE, DOMAIN_NAME, USER_AGENT, LATITUDE, LONGITUDE, TIME_ZONE, BOOLEAN, LOREM, TIMESTAMP Updated: backend enum, GeneratorService logic, frontend enum, tests, README, website guide, and user guide documentation. Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/93036016-950b-41e8-8c56-906e6a23008c Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR expands OpenDataMask’s generator catalog so the codebase matches the “60+ generator types” claim by adding 14 new Datafaker-backed generator types across backend, frontend, tests, and docs.
Changes:
- Added 14 new
GeneratorTypevalues (47 → 61) to backend + frontend enums. - Implemented generation logic for the new types in
GeneratorServiceand added corresponding unit tests. - Updated docs/website and enum-count assertions to reflect 61 generator types.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates the documented generator count and examples. |
| frontend/src/views/tests/ConnectionsView.test.ts | Updates enum-count assertion from 47 to 61. |
| frontend/src/types/index.ts | Adds the 14 new GeneratorType enum members. |
| docs/website/guide.html | Updates generator table to include new generator types/categories. |
| docs/user-guide.md | Adds example rows demonstrating new generator outputs. |
| backend/src/test/kotlin/com/opendatamask/domain/model/EnumAlignmentTest.kt | Updates canonical generator set to include the new types. |
| backend/src/test/kotlin/com/opendatamask/application/service/GeneratorServiceTest.kt | Adds tests covering the 14 new generators’ outputs/types. |
| backend/src/main/kotlin/com/opendatamask/domain/model/ColumnGenerator.kt | Adds the 14 new backend enum values. |
| backend/src/main/kotlin/com/opendatamask/application/service/GeneratorService.kt | Implements generation logic for the new enum values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| GeneratorType.CURRENCY_CODE -> faker.currency().code() | ||
| GeneratorType.DOMAIN_NAME -> faker.internet().domainName() | ||
| GeneratorType.USER_AGENT -> faker.internet().userAgent().replace(Regex("\\s+"), " ") | ||
| GeneratorType.LATITUDE -> faker.address().latitude() |
There was a problem hiding this comment.
USER_AGENT generation compiles a new Regex("\\s+") on every call via replace(Regex(...)), which can be a hot path when generating large datasets. Consider using a precompiled regex constant (e.g., in a companion object) or an alternative that avoids per-call regex construction.
| GeneratorType.TIME_ZONE -> faker.address().timeZone() | ||
| GeneratorType.BOOLEAN -> faker.bool().bool() | ||
| GeneratorType.LOREM -> faker.lorem().paragraph() | ||
| GeneratorType.TIMESTAMP -> java.sql.Timestamp(faker.date().past(365 * 10, java.util.concurrent.TimeUnit.DAYS).time) |
There was a problem hiding this comment.
TIMESTAMP uses a fully-qualified java.util.concurrent.TimeUnit.DAYS while earlier generators in the same method use the imported TimeUnit.DAYS. For consistency/readability, use the same TimeUnit reference throughout this when block.
| GeneratorType.TIMESTAMP -> java.sql.Timestamp(faker.date().past(365 * 10, java.util.concurrent.TimeUnit.DAYS).time) | |
| GeneratorType.TIMESTAMP -> java.sql.Timestamp(faker.date().past(365 * 10, TimeUnit.DAYS).time) |
| | `JOB_TITLE` | `Senior Software Engineer` | | ||
| | `NATIONALITY` | `Canadian` | | ||
| | `DOMAIN_NAME` | `example.com` | | ||
| | `TIMESTAMP` | `2024-06-15 14:30:00` | |
There was a problem hiding this comment.
The TIMESTAMP example (2024-06-15 14:30:00) likely won’t match actual output: java.sql.Timestamp.toString() typically includes fractional seconds (e.g., .0 or milliseconds). Consider updating the example to include fractional seconds or clarifying the displayed format.
| | `TIMESTAMP` | `2024-06-15 14:30:00` | | |
| | `TIMESTAMP` | `2024-06-15 14:30:00.0` | |
| <tr><td>Personal</td><td>NAME, FIRST_NAME, LAST_NAME, EMAIL, PHONE, BIRTH_DATE, GENDER, TITLE, JOB_TITLE, NATIONALITY</td></tr> | ||
| <tr><td>Address</td><td>ADDRESS, STREET_ADDRESS, CITY, STATE, ZIP_CODE, COUNTRY, GPS_COORDINATES, LATITUDE, LONGITUDE, TIME_ZONE</td></tr> |
There was a problem hiding this comment.
This row is labeled Address but now includes LATITUDE, LONGITUDE, and TIME_ZONE, which are described as "Location" generators in the PR description. Consider renaming the category (e.g., Location) or splitting into a separate row to keep the table taxonomy consistent.
Website and docs claim "60+ Generator Types" but only 47 exist. The other 6 advertised features (Auto PII Detection, Compliance Reports, Job Scheduling, Workspace Inheritance, Webhooks, REST API & CLI) are fully implemented with no gaps.
New generators (47 → 61)
TITLE,JOB_TITLE,NATIONALITYCOMPANY_NAME,DEPARTMENTCURRENCY_CODEDOMAIN_NAME,USER_AGENTLATITUDE,LONGITUDE,TIME_ZONEBOOLEAN,LOREM,TIMESTAMPAll backed by Datafaker, consistent with existing generator patterns:
Files touched
ColumnGenerator.kt,GeneratorService.kttypes/index.tsEnumAlignmentTest.ktcanonical set,GeneratorServiceTest.kt(14 new test cases),ConnectionsView.test.tscount assertionREADME.md,guide.html,user-guide.md