Skip to content

Releases: litedatum/validatelite

**Release v0.5.0: Advanced Schema Soft Validation and Streamlined Definitions**

19 Sep 03:32
70bab5e

Choose a tag to compare

We are excited to announce the release of ValidateLite v0.5.0, a significant update focused on providing more intelligent, flexible, and user-friendly schema validation capabilities. This release introduces the concept of "Soft Validation" and marks a major step forward in data quality assurance.


✨ Key Features & Enhancements

🚀 New: Schema "Soft Validation" with desired_type

You can now validate data based on its convertibility to a desired type, not just its stored physical type. The new desired_type attribute in your JSON schema definitions enables powerful new validation scenarios.

For example, you can now verify:

  • If a string field contains data that can be safely converted to a float(12,2).
  • If a string representing a date, like "20250911", conforms to a specific datetime('yyyymmdd') format.

This feature is perfect for validating data integrity during ETL processes and ensuring data is ready for downstream applications.

🧠 Smart Compatibility Engine

To optimize performance, ValidateLite now includes a smart compatibility engine. Before scanning data, it analyzes the source (native) data type and the target desired_type:

  • Compatible Conversions are Skipped: Safe conversions, like from integer(10) to string(20), are automatically passed without performing a costly data-level scan.
  • Incompatible Conversions are Validated: Potentially problematic conversions, such as string to integer or float(12,3) to float(10,2), trigger a precise data-level validation.
  • Conflicting Conversions Fail Fast: Impossible conversions, like float to datetime, will raise an immediate error, saving you time and resources.

simplifying Simplified DDL-style Type Definitions

We've made defining types more intuitive and concise. You can now use a familiar DDL-like syntax, reducing verbosity in your JSON files.

Old Format:

{
  "column": "name",
  "type": "string",
  "max_length": 50
}

New, Simplified Format:

{
  "column": "name",
  "type": "string(50)"
}

This new format is supported for string, integer, float, datetime, and binary types. This change is fully backward-compatible; your existing schema files will continue to work without any modifications.

🛠️ Other Improvements

  • New length_rule: A dedicated rule for length and precision checks, forming a core part of the new soft validation engine.
  • Enhanced Rules: The regex_rule has been improved to support numeric pattern matching, and the date_time_rule now features better cross-database compatibility, including for SQLite and PostgreSQL.

🏛️ Architectural Note

This release introduces a sophisticated, CLI-driven two-phase validation strategy. The CLI first fetches metadata from the database, then intelligently generates a precise set of validation rules based on our new compatibility logic. This approach minimizes changes to the core execution engine, maximizing stability and reusing existing validation capabilities for peak performance.

We believe v0.5.0 will make your data validation workflows more powerful and efficient. Thank you for your continued support

Enhance Schema Rule to Validate Length, Precision, and Scale

08 Sep 01:17
c577111

Choose a tag to compare

The implementation will extend the existing SCHEMA rule to be "dialect-aware" and fully backward-compatible.

  1. Extend Rule Definition: Users will be able to specify the following optional attributes for a column in their rules file:

    • length: For string and binary types.
    • precision: For integer and float types.
    • scale: For float types.
    • datetime_precision: For datetime types.
  2. Dialect-Aware Implementation: The core validation logic will leverage the existing DatabaseDialect system to fetch detailed column metadata. This ensures the
    solution is robust, extensible, and avoids fragile string parsing of data types.

  3. Backward Compatibility: This is a critical requirement. If a user's rule file does not contain these new attributes, the SCHEMA rule's behavior will be
    identical to the current implementation.

What's Changed

Full Changelog: 0.4.2...0.4.3

Release v0.4.2

28 Aug 02:56
0a60502

Choose a tag to compare

Overview

The core objective is to improve the CLI parameter design and enhance the functionality of the schema command.

We will transition from the current pattern of using positional parameters <data_source> to a clearer, more flexible option parameter (--conn, --table) pattern. This will be a key step toward supporting multi-table validation.


Requirement Details

1. check Command Interface Standardization

  • Objective: Make the check command interface more explicit by separating data source connection information from table names.
  • Current State (As-Is): vlite check <connection_and_table_string> --rules <file.json>
  • Future State (To-Be): vlite check --conn <connection_string> --table <table_name> --rules <file.json>
  • Acceptance Criteria:
    1. Remove dependency on positional parameter <data_source>.
    2. Introduce two new, required option parameters:
      • --conn <string>: Used to specify database connection string or file path.
      • --table <string>: Used to specify the table name or file name to validate.
    3. The old vlite check <data_source> format should be marked as "deprecated" with clear prompt messages guiding users to use the new format.
    4. Update the vlite check --help help documentation to reflect the new parameter design.

2. schema Command Functionality Enhancement: Multi-Table Validation Support

  • Objective: Enable the schema command to use a single rules file to validate multiple table structures in a specified data source at once.
  • Current State (As-Is): vlite schema <connection_and_table_string> --rules <single_table_schema.json>
  • Future State (To-Be): vlite schema --conn <connection_string> --rules <multi_table_schema.json>
  • Acceptance Criteria:
    1. CLI Interface Changes:
      • Similar to the check command, remove dependency on positional parameter <data_source>.
      • Introduce required --conn <string> option parameter.
      • Note: The schema command in multi-table mode does not require the --table parameter, as all tables to be validated will be defined in the rules file.
    2. Redefine --rules File Structure:
      • To support multiple tables, introduce a new JSON structure: a top-level object with table names as keys and schema definitions for those tables as values.
      • Example (multi_table_schema.json):
        {
          "users": {
            "rules": [
              { "field": "id", "type": "integer", "required": true },
              { "field": "age", "type": "integer", "min": 0, "max": 120 },
              { "field": "gender", "type": "string", "enum": ["M", "F"] },
              { "field": "email", "type": "string", "required": true },
              { "field": "created_at", "type": "datetime" }
            ]
          },
          "products": {
            "rules": [
              { "field": "product_id", "type": "integer" },
              { "field": "price", "type": "float" }
            ],
            "strict_mode": false
          }
        }
    3. Update schema Command Execution Logic:
      • After the program loads the --rules file, iterate through all top-level keys of the JSON object (users, products, etc.).
      • For each key (table name), retrieve the actual schema information for that table from the data source specified by --conn.
      • Compare the actual schema with the expected schema defined in the rules file.
    4. Optimize Output Information: The validation report must be clearly grouped, indicating the validation results for each table.
      • Example Output:
        Schema validation results for connection: mysql://...
        
        📋 Table: users(2,000 records)
        ✓ id: OK
        ✓ email: OK
        
        📋 Table: products(100 records)
        ✓ product_id: OK
        ✓ price: FAILED - Expected type 'float', found 'decimal'.
        
    5. Excel multi-sheet file support as data source.

Release v0.4.1

15 Aug 00:17
4808e74

Choose a tag to compare

fix ci and tag issues

Feature: Schema Validation Command

14 Aug 22:06
71e6502

Choose a tag to compare

Summary

Add a new CLI command to validate dataset schema definitions against data sources. The command reads a JSON rules file, decomposes it into atomic validation rules, dispatches them to the core rule engine, and aggregates results for CLI output. No inline rules for schema are supported initially.

Motivation

  • Ensure data sources conform to predefined schema (field presence and type).
  • Reuse existing rule execution infrastructure while keeping CLI changes isolated.
  • Provide a scalable path to higher-level schema authoring, while core focuses on atomic checks.

Scope

  • New CLI command: schema.
  • CLI-only rule decomposition from schema JSON to atomic rules.
  • Core: add a new Schema rule type for field existence and data type matching.
  • Output and error handling aligned with existing check behavior.
  • Tests, docs, and CI integration to maintain coverage and quality.

CLI Specification

  • Command
    • vlite schema "data-source" --rules schema.json
  • Arguments
    • data-source: same format and resolution logic as check (e.g., connection string, path, table selector).
    • --rules/-r: path to a JSON rules file (no inline supported).
    • Table resolution: in v1 the table is derived exclusively from data-source. If a table field is present in the rules file, it is ignored and a warning is emitted.
    • Optional flags (matching existing conventions): --output json|table, --fail-on-error, --max-errors N, --verbose.
  • Exit codes
    • 0: all validations passed.
    • 1: validation failures.
    • 2: CLI/configuration error (e.g., unreadable file, invalid JSON).
  • Output
    • Human-readable table by default; JSON when --output json is used.
    • Aggregated result summarizing total checks, failures, and per-field details.

Rules File Format

  • Single-table file (v1); do not include a top-level table. The target table is resolved from data-source.
  • Example:
    {
      "rules": [
        { "field": "id", "type": "integer", "required": true },
        { "field": "age", "type": "integer", "required": true, "min": 0, "max": 120 },
        { "field": "has_children", "enum": [0, 1] },
        { "field": "income", "type": "float", "required": true, "min": 0 },
        { "field": "job_category", "type": "string", "enum": ["engineer", "teacher", "doctor", "other"] }
      ]
    }
  • Supported properties
    • field (string, required)
    • type (enum via shared/enums: STRING, INTEGER, FLOAT, BOOLEAN, DATE, DATETIME). Length/precision are not considered in v1.
    • required (boolean)
    • enum (array)
    • min/max (numeric; applies to numeric types)
  • Limitations
    • No inline schema rules.
    • Initial version supports one table per file; multi-table files considered later.
    • No jsonschema dependency in v1; the CLI performs minimal manual validation of the rules file.

Behavior and Rule Decomposition

  • CLI maps each entry into:
    • Schema rule: verifies field exists and type matches.
    • not_null rule: for required: true.
    • range rule: for numeric min/max.
    • enum rule: for enumerations.
  • CLI sends decomposed rules to core, receives results, and aggregates them back into field-level outcomes.
Aggregation and Prioritization
  • Evaluation order per field: existence → type → not_null → range/enum.
  • If the field is missing, report a single failure for the field with reason "FIELD_MISSING" and mark dependent checks as "SKIPPED".
  • If the type mismatches, report a single failure with reason "TYPE_MISMATCH" and mark not_null/range/enum as "SKIPPED".
  • Only when existence and type pass will not_null/range/enum be executed and reported.
  • CLI output aggregates per field, prioritizing the most fundamental cause; skipped dependents are visible in JSON output (when requested) with their skip reason, but are not duplicated as failures in human-readable output.

Acceptance Criteria

  • New command works with valid JSON rule files and fails gracefully on invalid input.
  • Core Schema rule verifies presence and type using shared/enums and shared/utils.
  • CLI output mirrors check style; exit codes match spec.
  • Unit and integration tests; ≥80% coverage maintained.
  • Docs updated: README.md, DEVELOPMENT.md, CHANGELOG.md.
  • Table name, if present in the rules file, is ignored with a warning; the table is derived from data-source.
  • Aggregation behavior follows the prioritization rules above; dependent checks are marked as skipped when blocked.

Release 0.3.1

06 Aug 02:26

Choose a tag to compare

Full Changelog: 0.3.0...0.3.1

Release v0.3.0 - Enhanced Project Maturity

05 Aug 21:53

Choose a tag to compare

What's New in v0.3.0

This release marks a significant milestone in ValidateLite's development, reflecting enhanced project maturity and stability.

✨ Key Enhancements

** Enhanced Project Maturity**

  • Comprehensive test coverage improvements
  • Robust CI/CD pipeline with automated testing and security scanning
  • Advanced rule engine supporting complex validation scenarios

🛡️ Improved Reliability

  • Enhanced error handling and classification system
  • Better error reporting and user experience
  • Improved configuration management and validation

🔧 Developer Experience

  • Comprehensive documentation and development guides
  • Pre-commit hooks and code quality enforcement
  • Performance optimizations and monitoring capabilities

🗄️ Database Support

  • Enhanced support for multiple database dialects
  • Improved connection type handling
  • Better integration with MySQL, PostgreSQL, and SQLite

📦 Installation

pip install validatelite==0.3.0

Migration from v0.1.0

This release is backward compatible with v0.1.0. No breaking changes have been introduced.

Documentation

🧪 Testing

The project now includes:

  • Comprehensive unit, integration, and E2E tests
  • Automated security scanning
  • Code quality enforcement
  • Performance monitoring

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines and Code of Conduct.

📋 Full Changelog

For detailed changes, see CHANGELOG.md.


Thank you for using ValidateLite! 🎯

If you encounter any issues or have questions, please open an issue on GitHub.