Metadata-Version: 2.4
Name: ZogPy
Version: 0.1.1
Summary: A library for quick data verification
Author: Tristan Hertzog
License: MIT
Project-URL: Homepage, https://github.com/Tmanhertzog/ZogPy
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.24
Dynamic: license-file

# ZogPy
A test PyPI package


# To Implement

#### Schema & Structure Validation
* validate_schema(df, required_cols, dtypes=None)
  * Checks required columns exist
  * Optional dtype enforcement
  * Detects unexpected columns
* validate_dtypes(df, dtype_map)
  * Enforces numeric / categorical / datetime
  * Handles nullable pandas dtypes
* validate_shape(df, min_rows=None, max_rows=None)
  * Dataset sanity checks
  * Empty dataset detection

#### Missing Data Validation
* check_missing(df, threshold=0.0)
  * Percent missing per column
  * Fail if above threshold
* require_non_null(df, cols)
  * Strong constraint for critical fields
* report_missing_summary(df)
  * Returns structured summary
  * Useful for logging / dashboards

#### Range & Value Constraints
* validate_numeric_range(df, col, min=None, max=None)
  * Non-negative values
  * Physical constraints (age, price, distance)
* validate_allowed_values(df, col, allowed)
  * Enums / categorical safety
  * Prevents category explosion
* validate_boolean(df, col)
  * Ensures only {0,1} or {True,False}

#### Uniqueness & Key Constraints
* validate_unique(df, cols)
  * Primary key enforcement
  * Composite keys supported
* check_duplicates(df, cols=None)
  * Full row or column subset

#### Cross-Column Logic
* validate_column_relationship(df, col_a, col_b, op)
  * EX: start <= end
* validate_conditional_null(df, if_col, if_val, then_required)
  * EX: if status == "closed" → closed_at must not be null

#### Statistical & Distribution Checks
* detect_outliers(df, col, method="iqr" | "zscore")
  * Flags, doesn’t auto-remove
  * Returns indices or mask
* check_distribution_shift(df_train, df_new, col)
  * Mean / variance change
  * KS test
  * Drift detection
* validate_cardinality(df, col, max_unique)
  * Prevents feature blow-up

#### Formatting & Parsing Validation
* validate_regex(df, col, pattern)
  * Emails
  * ID
  * Codes
* validate_datetime(df, col, allow_future=False)
  * Timestamp sanity
  * Log/event data validation

#### Referential Integrity
* validate_foreign_key(df, col, reference_set)
  * No orphan rows
  * Common in joins

#### Dataset-Level Quality Checks
* validate_row_count_change(df_old, df_new, max_delta_pct)
  * Detect broken ingestion jobs
* validate_freshness(df, timestamp_col, max_age)
  * Streaming / batch safety

#### Reporting & DX
* validate_all(df, rules)
  * EX: validate_all(df, [require_non_null("user_id"), validate_unique(["user_id"]), validate_numeric_range("age", 0, 120),])
