Releases: litedatum/validatelite
**Release v0.5.0: Advanced Schema Soft Validation and Streamlined Definitions**
We are excited to announce the release of ValidateLite v0.5.0, a significant update focused on providing more intelligent, flexible, and user-friendly schema validation capabilities. This release introduces the concept of "Soft Validation" and marks a major step forward in data quality assurance.
✨ Key Features & Enhancements
🚀 New: Schema "Soft Validation" with desired_type
You can now validate data based on its convertibility to a desired type, not just its stored physical type. The new desired_type attribute in your JSON schema definitions enables powerful new validation scenarios.
For example, you can now verify:
- If a
stringfield contains data that can be safely converted to afloat(12,2). - If a
stringrepresenting a date, like"20250911", conforms to a specificdatetime('yyyymmdd')format.
This feature is perfect for validating data integrity during ETL processes and ensuring data is ready for downstream applications.
🧠 Smart Compatibility Engine
To optimize performance, ValidateLite now includes a smart compatibility engine. Before scanning data, it analyzes the source (native) data type and the target desired_type:
- Compatible Conversions are Skipped: Safe conversions, like from
integer(10)tostring(20), are automatically passed without performing a costly data-level scan. - Incompatible Conversions are Validated: Potentially problematic conversions, such as
stringtointegerorfloat(12,3)tofloat(10,2), trigger a precise data-level validation. - Conflicting Conversions Fail Fast: Impossible conversions, like
floattodatetime, will raise an immediate error, saving you time and resources.
simplifying Simplified DDL-style Type Definitions
We've made defining types more intuitive and concise. You can now use a familiar DDL-like syntax, reducing verbosity in your JSON files.
Old Format:
{
"column": "name",
"type": "string",
"max_length": 50
}New, Simplified Format:
{
"column": "name",
"type": "string(50)"
}This new format is supported for string, integer, float, datetime, and binary types. This change is fully backward-compatible; your existing schema files will continue to work without any modifications.
🛠️ Other Improvements
- New
length_rule: A dedicated rule for length and precision checks, forming a core part of the new soft validation engine. - Enhanced Rules: The
regex_rulehas been improved to support numeric pattern matching, and thedate_time_rulenow features better cross-database compatibility, including for SQLite and PostgreSQL.
🏛️ Architectural Note
This release introduces a sophisticated, CLI-driven two-phase validation strategy. The CLI first fetches metadata from the database, then intelligently generates a precise set of validation rules based on our new compatibility logic. This approach minimizes changes to the core execution engine, maximizing stability and reusing existing validation capabilities for peak performance.
We believe v0.5.0 will make your data validation workflows more powerful and efficient. Thank you for your continued support
Enhance Schema Rule to Validate Length, Precision, and Scale
The implementation will extend the existing SCHEMA rule to be "dialect-aware" and fully backward-compatible.
-
Extend Rule Definition: Users will be able to specify the following optional attributes for a column in their rules file:
- length: For string and binary types.
- precision: For integer and float types.
- scale: For float types.
- datetime_precision: For datetime types.
-
Dialect-Aware Implementation: The core validation logic will leverage the existing DatabaseDialect system to fetch detailed column metadata. This ensures the
solution is robust, extensible, and avoids fragile string parsing of data types. -
Backward Compatibility: This is a critical requirement. If a user's rule file does not contain these new attributes, the SCHEMA rule's behavior will be
identical to the current implementation.
What's Changed
- chore: update docs by @litedatum in #33
- Feature/enhance schema rule by @litedatum in #35
- Enhanced Schema Validation (v0.4.3) by @litedatum in #36
Full Changelog: 0.4.2...0.4.3
Release v0.4.2
Overview
The core objective is to improve the CLI parameter design and enhance the functionality of the schema command.
We will transition from the current pattern of using positional parameters <data_source> to a clearer, more flexible option parameter (--conn, --table) pattern. This will be a key step toward supporting multi-table validation.
Requirement Details
1. check Command Interface Standardization
- Objective: Make the
checkcommand interface more explicit by separating data source connection information from table names. - Current State (As-Is):
vlite check <connection_and_table_string> --rules <file.json> - Future State (To-Be):
vlite check --conn <connection_string> --table <table_name> --rules <file.json> - Acceptance Criteria:
- Remove dependency on positional parameter
<data_source>. - Introduce two new, required option parameters:
--conn <string>: Used to specify database connection string or file path.--table <string>: Used to specify the table name or file name to validate.
- The old
vlite check <data_source>format should be marked as "deprecated" with clear prompt messages guiding users to use the new format. - Update the
vlite check --helphelp documentation to reflect the new parameter design.
- Remove dependency on positional parameter
2. schema Command Functionality Enhancement: Multi-Table Validation Support
- Objective: Enable the
schemacommand to use a single rules file to validate multiple table structures in a specified data source at once. - Current State (As-Is):
vlite schema <connection_and_table_string> --rules <single_table_schema.json> - Future State (To-Be):
vlite schema --conn <connection_string> --rules <multi_table_schema.json> - Acceptance Criteria:
- CLI Interface Changes:
- Similar to the
checkcommand, remove dependency on positional parameter<data_source>. - Introduce required
--conn <string>option parameter. - Note: The
schemacommand in multi-table mode does not require the--tableparameter, as all tables to be validated will be defined in the rules file.
- Similar to the
- Redefine
--rulesFile Structure:- To support multiple tables, introduce a new JSON structure: a top-level object with table names as keys and schema definitions for those tables as values.
- Example (multi_table_schema.json):
{ "users": { "rules": [ { "field": "id", "type": "integer", "required": true }, { "field": "age", "type": "integer", "min": 0, "max": 120 }, { "field": "gender", "type": "string", "enum": ["M", "F"] }, { "field": "email", "type": "string", "required": true }, { "field": "created_at", "type": "datetime" } ] }, "products": { "rules": [ { "field": "product_id", "type": "integer" }, { "field": "price", "type": "float" } ], "strict_mode": false } }
- Update
schemaCommand Execution Logic:- After the program loads the
--rulesfile, iterate through all top-level keys of the JSON object (users,products, etc.). - For each key (table name), retrieve the actual schema information for that table from the data source specified by
--conn. - Compare the actual schema with the expected schema defined in the rules file.
- After the program loads the
- Optimize Output Information: The validation report must be clearly grouped, indicating the validation results for each table.
- Example Output:
Schema validation results for connection: mysql://... 📋 Table: users(2,000 records) ✓ id: OK ✓ email: OK 📋 Table: products(100 records) ✓ product_id: OK ✓ price: FAILED - Expected type 'float', found 'decimal'.
- Example Output:
- Excel multi-sheet file support as data source.
- CLI Interface Changes:
Release v0.4.1
fix ci and tag issues
Feature: Schema Validation Command
Summary
Add a new CLI command to validate dataset schema definitions against data sources. The command reads a JSON rules file, decomposes it into atomic validation rules, dispatches them to the core rule engine, and aggregates results for CLI output. No inline rules for schema are supported initially.
Motivation
- Ensure data sources conform to predefined schema (field presence and type).
- Reuse existing rule execution infrastructure while keeping CLI changes isolated.
- Provide a scalable path to higher-level schema authoring, while core focuses on atomic checks.
Scope
- New CLI command:
schema. - CLI-only rule decomposition from schema JSON to atomic rules.
- Core: add a new
Schemarule type for field existence and data type matching. - Output and error handling aligned with existing
checkbehavior. - Tests, docs, and CI integration to maintain coverage and quality.
CLI Specification
- Command
vlite schema "data-source" --rules schema.json
- Arguments
data-source: same format and resolution logic ascheck(e.g., connection string, path, table selector).--rules/-r: path to a JSON rules file (no inline supported).- Table resolution: in v1 the table is derived exclusively from
data-source. If atablefield is present in the rules file, it is ignored and a warning is emitted. - Optional flags (matching existing conventions):
--output json|table,--fail-on-error,--max-errors N,--verbose.
- Exit codes
- 0: all validations passed.
- 1: validation failures.
- 2: CLI/configuration error (e.g., unreadable file, invalid JSON).
- Output
- Human-readable table by default; JSON when
--output jsonis used. - Aggregated result summarizing total checks, failures, and per-field details.
- Human-readable table by default; JSON when
Rules File Format
- Single-table file (v1); do not include a top-level
table. The target table is resolved fromdata-source. - Example:
{ "rules": [ { "field": "id", "type": "integer", "required": true }, { "field": "age", "type": "integer", "required": true, "min": 0, "max": 120 }, { "field": "has_children", "enum": [0, 1] }, { "field": "income", "type": "float", "required": true, "min": 0 }, { "field": "job_category", "type": "string", "enum": ["engineer", "teacher", "doctor", "other"] } ] } - Supported properties
field(string, required)type(enum viashared/enums: STRING, INTEGER, FLOAT, BOOLEAN, DATE, DATETIME). Length/precision are not considered in v1.required(boolean)enum(array)min/max(numeric; applies to numeric types)
- Limitations
- No inline schema rules.
- Initial version supports one table per file; multi-table files considered later.
- No
jsonschemadependency in v1; the CLI performs minimal manual validation of the rules file.
Behavior and Rule Decomposition
- CLI maps each entry into:
- Schema rule: verifies field exists and type matches.
- not_null rule: for
required: true. - range rule: for numeric
min/max. - enum rule: for enumerations.
- CLI sends decomposed rules to core, receives results, and aggregates them back into field-level outcomes.
Aggregation and Prioritization
- Evaluation order per field: existence → type → not_null → range/enum.
- If the field is missing, report a single failure for the field with reason "FIELD_MISSING" and mark dependent checks as "SKIPPED".
- If the type mismatches, report a single failure with reason "TYPE_MISMATCH" and mark not_null/range/enum as "SKIPPED".
- Only when existence and type pass will not_null/range/enum be executed and reported.
- CLI output aggregates per field, prioritizing the most fundamental cause; skipped dependents are visible in JSON output (when requested) with their skip reason, but are not duplicated as failures in human-readable output.
Acceptance Criteria
- New command works with valid JSON rule files and fails gracefully on invalid input.
- Core
Schemarule verifies presence and type usingshared/enumsandshared/utils. - CLI output mirrors
checkstyle; exit codes match spec. - Unit and integration tests; ≥80% coverage maintained.
- Docs updated:
README.md,DEVELOPMENT.md,CHANGELOG.md. - Table name, if present in the rules file, is ignored with a warning; the table is derived from
data-source. - Aggregation behavior follows the prioritization rules above; dependent checks are marked as skipped when blocked.
Release 0.3.1
Full Changelog: 0.3.0...0.3.1
Release v0.3.0 - Enhanced Project Maturity
What's New in v0.3.0
This release marks a significant milestone in ValidateLite's development, reflecting enhanced project maturity and stability.
✨ Key Enhancements
** Enhanced Project Maturity**
- Comprehensive test coverage improvements
- Robust CI/CD pipeline with automated testing and security scanning
- Advanced rule engine supporting complex validation scenarios
🛡️ Improved Reliability
- Enhanced error handling and classification system
- Better error reporting and user experience
- Improved configuration management and validation
🔧 Developer Experience
- Comprehensive documentation and development guides
- Pre-commit hooks and code quality enforcement
- Performance optimizations and monitoring capabilities
🗄️ Database Support
- Enhanced support for multiple database dialects
- Improved connection type handling
- Better integration with MySQL, PostgreSQL, and SQLite
📦 Installation
pip install validatelite==0.3.0Migration from v0.1.0
This release is backward compatible with v0.1.0. No breaking changes have been introduced.
Documentation
🧪 Testing
The project now includes:
- Comprehensive unit, integration, and E2E tests
- Automated security scanning
- Code quality enforcement
- Performance monitoring
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines and Code of Conduct.
📋 Full Changelog
For detailed changes, see CHANGELOG.md.
Thank you for using ValidateLite! 🎯
If you encounter any issues or have questions, please open an issue on GitHub.