Metadata-Version: 2.4
Name: goquality
Version: 0.1.0
Summary: AI-Native Data Governance: TypeScript for Databases
Author: GoQuality
License: MIT
Keywords: ai,data-quality,governance,postgres,validation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.18.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: ibis-framework[duckdb,postgres]>=9.0.0
Requires-Dist: openai>=1.0.0
Requires-Dist: psycopg2-binary>=2.9.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.9.0
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pandas>=2.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# GoQuality CLI

**AI-Native Data Governance: TypeScript for Databases**

GoQuality brings type safety to your data. Define types once, validate everywhere. Let AI generate the types, you govern the rules.

```
┌─────────────────────────────────────────────────────────────────┐
│  Database  →  AI Inference  →  YAML Types  →  Validation  →  ✓ │
│                                                                 │
│  "email"      Email           pattern: ^...   99.8% valid      │
│  "amount"     USD             min: 0          100% valid       │
│  "status"     OrderStatus     enum: [...]     98.2% valid      │
└─────────────────────────────────────────────────────────────────┘
```

## Installation

```bash
# Basic installation
pip install goquality

# With PostgreSQL support
pip install goquality[postgres]

# With Snowflake support
pip install goquality[snowflake]

# With BigQuery support
pip install goquality[bigquery]

# With all database drivers
pip install goquality[all]

# Development installation
pip install goquality[dev]
```

## Quick Start

```bash
# 1. Initialize a new project
goquality init

# 2. Generate types from your database using AI
goquality generate --source postgres://user:pass@localhost/mydb

# 3. Review and edit the generated goquality.yaml

# 4. Run validation checks
goquality check --source postgres://user:pass@localhost/mydb

# 5. Diagnose any issues
goquality doctor --source postgres://user:pass@localhost/mydb
```

## Commands

### `goquality init`

Initialize a new GoQuality configuration file.

```bash
goquality init [OPTIONS]
```

**Options:**
| Option | Short | Description |
|--------|-------|-------------|
| `--source` | `-s` | Database connection string to test |
| `--path` | `-p` | Path for configuration file (default: `goquality.yaml`) |

**Examples:**
```bash
# Create default config
goquality init

# Create config and test database connection
goquality init --source postgres://localhost/mydb

# Create config at custom path
goquality init --path config/goquality.yaml
```

---

### `goquality generate`

Generate type mappings using AI inference. Profiles your database schema and uses an LLM to suggest appropriate types for each column.

```bash
goquality generate [OPTIONS]
```

**Options:**
| Option | Short | Description |
|--------|-------|-------------|
| `--source` | `-s` | Database connection string (required) |
| `--output` | `-o` | Output file path (default: `goquality.yaml`) |
| `--schema` | | Database schema to profile |
| `--provider` | | LLM provider: `openai`, `anthropic`, `ollama` (default: `openai`) |

**Environment Variables:**
- `OPENAI_API_KEY` - Required for OpenAI provider
- `ANTHROPIC_API_KEY` - Required for Anthropic provider
- `OLLAMA_HOST` - Ollama server URL (default: `http://localhost:11434`)

**Examples:**
```bash
# Generate using OpenAI (default)
goquality generate --source postgres://localhost/mydb

# Generate using Anthropic Claude
goquality generate --source postgres://localhost/mydb --provider anthropic

# Generate for specific schema
goquality generate --source postgres://localhost/mydb --schema public

# Generate using local Ollama
OLLAMA_HOST=http://localhost:11434 goquality generate \
  --source postgres://localhost/mydb \
  --provider ollama
```

---

### `goquality check`

Run validation checks against your database. This is the core command that validates your data against the defined types.

```bash
goquality check [OPTIONS]
```

**Options:**
| Option | Short | Description |
|--------|-------|-------------|
| `--config` | `-c` | Configuration file path (default: `goquality.yaml`) |
| `--source` | `-s` | Database connection string |
| `--table` | `-t` | Only check this specific table |
| `--output` | `-o` | Output format: `table`, `json`, `yaml`, `csv`, `markdown` |
| `--fail-threshold` | | Percentage of failures allowed (0-100) |
| `--fail-on-error/--no-fail-on-error` | | Exit with error code on failures (default: true) |
| `--quiet` | `-q` | Only show errors and summary |

**Exit Codes:**
- `0` - All checks passed (or within threshold)
- `1` - Validation failures detected (above threshold)

**Examples:**
```bash
# Basic check
goquality check --source postgres://localhost/mydb

# Check specific table
goquality check --source postgres://localhost/mydb --table users

# Output as JSON (for CI/CD pipelines)
goquality check --source postgres://localhost/mydb --output json

# Allow up to 5% failures
goquality check --source postgres://localhost/mydb --fail-threshold 5

# Generate markdown report
goquality check --source postgres://localhost/mydb --output markdown > report.md

# Quiet mode for scripts
goquality check --source postgres://localhost/mydb --quiet

# Don't fail on errors (always exit 0)
goquality check --source postgres://localhost/mydb --no-fail-on-error
```

**Output Formats:**

*Table (default):*
```
┏━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Column   ┃ Type   ┃ Rows   ┃ Valid % ┃ Status   ┃ Details       ┃
┡━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ email    │ Email  │ 10,000 │ 99.8%   │ ✓ PASS   │               │
│ status   │ Status │ 10,000 │ 98.2%   │ ✗ FAIL   │ 180 invalid   │
└──────────┴────────┴────────┴─────────┴──────────┴───────────────┘
```

*JSON:*
```json
{
  "summary": {
    "total_checks": 5,
    "passed": 4,
    "failed": 1,
    "failure_rate": 20.0,
    "threshold": 0.0,
    "threshold_passed": false
  },
  "tables": [...]
}
```

---

### `goquality validate`

Validate configuration file syntax without connecting to a database.

```bash
goquality validate [OPTIONS]
```

**Options:**
| Option | Short | Description |
|--------|-------|-------------|
| `--config` | `-c` | Configuration file to validate (default: `goquality.yaml`) |

**Examples:**
```bash
# Validate default config
goquality validate

# Validate specific config
goquality validate --config staging.yaml
```

---

### `goquality types`

List and search available types in the standard library.

```bash
goquality types [OPTIONS]
```

**Options:**
| Option | Short | Description |
|--------|-------|-------------|
| `--search` | `-s` | Search types by name or description |
| `--tag` | `-t` | Filter by tag |
| `--show` | | Show details for a specific type |

**Examples:**
```bash
# List all types
goquality types

# Search for email types
goquality types --search email

# Filter by tag
goquality types --tag finance
goquality types --tag healthcare
goquality types --tag regional

# Show type details
goquality types --show Email
goquality types --show CreditCardNumber
```

**Available Tags:**
- `core` - Basic string/number types
- `finance` - Currency, banking, payments
- `healthcare` - Medical codes, identifiers
- `ecommerce` - Products, orders, shipping
- `saas` - API keys, tokens, SaaS identifiers
- `regional` - Country-specific formats
- `analytics` - Metrics, percentages, scores
- `iot` - Sensors, devices, protocols
- `pii` - Personally identifiable information

---

### `goquality doctor`

Diagnose your GoQuality environment and configuration.

```bash
goquality doctor [OPTIONS]
```

**Options:**
| Option | Short | Description |
|--------|-------|-------------|
| `--config` | `-c` | Configuration file to check |
| `--source` | `-s` | Database connection to test |
| `--verbose` | `-v` | Show detailed information |

**Checks Performed:**
- Python version compatibility
- Core dependencies installed
- Database drivers available
- LLM providers configured
- Type library loading
- Configuration file validity
- Database connectivity
- Environment variables

**Examples:**
```bash
# Basic diagnostics
goquality doctor

# Check with database connection
goquality doctor --source postgres://localhost/mydb

# Verbose output
goquality doctor --verbose
```

---

### `goquality stats`

Show statistics about the type library and configuration.

```bash
goquality stats [OPTIONS]
```

**Options:**
| Option | Short | Description |
|--------|-------|-------------|
| `--config` | `-c` | Configuration file path |

**Examples:**
```bash
goquality stats
```

---

### `goquality version`

Show version information.

```bash
goquality version
```

---

## Configuration File

GoQuality uses YAML configuration files. The default file is `goquality.yaml`.

### Full Example

```yaml
# GoQuality Configuration
# https://goquality.dev/docs

# Custom type definitions (extend or override stdlib)
types:
  # Simple type with pattern
  - name: EmployeeId
    description: "Internal employee identifier"
    base: String
    pattern: "^EMP-[0-9]{6}$"
    min_length: 10
    max_length: 10

  # Type extending stdlib
  - name: CorporateEmail
    description: "Company email address"
    base: String
    extends: Email
    pattern: "^[a-z.]+@acme\\.com$"

  # Numeric type with range
  - name: DiscountPercent
    description: "Discount percentage"
    base: Decimal
    min: 0
    max: 100
    precision: 2

  # Enum type
  - name: Department
    description: "Company department"
    base: String
    enum: ["engineering", "sales", "marketing", "hr", "finance"]

  # Type with uniqueness constraint
  - name: ProductSKU
    description: "Unique product SKU"
    base: String
    pattern: "^[A-Z]{2}-[0-9]{6}$"
    unique: true

# Model mappings (table → column types)
models:
  - table: public.users
    columns:
      - name: id
        type: UUID
      - name: email
        type: CorporateEmail
      - name: employee_id
        type: EmployeeId
      - name: department
        type: Department
      - name: created_at
        type: Timestamp

  - table: public.orders
    columns:
      - name: id
        type: UUID
      - name: user_id
        type: UUID
      - name: total_amount
        type: USD
      - name: discount
        type: DiscountPercent
        allow_null: true  # Override type's nullability
      - name: status
        type: OrderStatus

# Ad-hoc checks (quick SQL rules)
checks:
  - "on": orders
    name: "Order integrity"
    rules:
      - "total_amount >= 0"
      - "created_at <= NOW()"
      - "status IS NOT NULL"

  - "on": users
    name: "User constraints"
    rules:
      - "email IS NOT NULL"
      - "created_at <= NOW()"
```

### Type Definition Fields

| Field | Type | Description |
|-------|------|-------------|
| `name` | string | PascalCase type name (required) |
| `description` | string | Human-readable description (required) |
| `base` | string | Base type: `String`, `Integer`, `Decimal`, `Boolean`, `Date`, `Timestamp` |
| `extends` | string | Parent type to inherit from |
| `pattern` | string | Regex pattern (String types) |
| `min_length` | int | Minimum string length |
| `max_length` | int | Maximum string length |
| `not_empty` | bool | Reject empty/whitespace strings |
| `min` | number | Minimum value (numeric types) |
| `max` | number | Maximum value (numeric types) |
| `precision` | int | Decimal places (Decimal type) |
| `enum` | array | Allowed values |
| `allow_null` | bool | Whether NULL is permitted (default: false) |
| `unique` | bool | Values must be unique |
| `foreign_key` | string | Reference `table.column` for FK validation |
| `tags` | array | Searchable tags |
| `examples` | array | Example valid values |
| `deprecated` | bool | Mark as deprecated |

---

## Connection Strings

GoQuality supports multiple database backends via connection strings.

### PostgreSQL

```bash
# Full format
postgres://user:password@host:port/database

# Examples
postgres://postgres:secret@localhost:5432/mydb
postgresql://user:pass@db.example.com/production
postgres://localhost/mydb  # Local with defaults
```

### DuckDB

```bash
# In-memory database
duckdb://:memory:

# File database
duckdb:///path/to/database.db

# CSV/Parquet files (auto-detected)
/path/to/data.csv
/path/to/data.parquet
./relative/path/data.csv
```

### Snowflake

```bash
# Full format
snowflake://user@account/database/schema?warehouse=WAREHOUSE

# Examples
snowflake://john@xy12345/analytics/public?warehouse=COMPUTE_WH
snowflake://user@account/db/schema?warehouse=WH&role=ANALYST
```

**Environment Variables:**
```bash
export SNOWFLAKE_ACCOUNT=xy12345
export SNOWFLAKE_USER=john
export SNOWFLAKE_PASSWORD=secret
export SNOWFLAKE_DATABASE=analytics
export SNOWFLAKE_SCHEMA=public
export SNOWFLAKE_WAREHOUSE=COMPUTE_WH
```

### BigQuery

```bash
# Format
bigquery://project-id/dataset

# Examples
bigquery://my-project/analytics
bigquery://prod-data-warehouse/sales
```

**Environment Variables:**
```bash
export GOOGLE_CLOUD_PROJECT=my-project
export BIGQUERY_DATASET=analytics
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
```

---

## Connection Configuration File

Store multiple database connections in a YAML file for easy switching.

### File Location

GoQuality looks for connection configs in:
1. `.goquality/connections.yaml`
2. `goquality-connections.yaml`
3. `~/.config/goquality/connections.yaml`

### Example

```yaml
# .goquality/connections.yaml

# Default connection to use
default: dev

connections:
  local:
    connection_string: duckdb://:memory:
    description: Local testing with DuckDB

  dev:
    dialect: postgres
    host: localhost
    port: 5432
    database: myapp_dev
    user: developer
    password: devpass
    description: Development database

  staging:
    dialect: postgres
    host: ${STAGING_DB_HOST}
    database: myapp_staging
    user: ${STAGING_DB_USER}
    password: ${STAGING_DB_PASSWORD}
    description: Staging environment

  prod:
    connection_string: postgres://${PROD_USER}:${PROD_PASS}@prod.example.com/myapp
    description: Production database (read-only)

  warehouse:
    dialect: snowflake
    host: xy12345.snowflakecomputing.com
    database: analytics
    schema: public
    user: ${SNOWFLAKE_USER}
    password: ${SNOWFLAKE_PASSWORD}
    options:
      warehouse: COMPUTE_WH
      role: ANALYST
```

### Using Named Connections

```bash
# Use default connection
goquality check

# Use named connection
goquality check --source dev
goquality check --source staging
goquality check --source warehouse
```

---

## CI/CD Integration

### GitHub Actions

```yaml
name: Data Quality

on:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install GoQuality
        run: pip install goquality[postgres]

      - name: Validate Configuration
        run: goquality validate

      - name: Run Data Quality Checks
        run: |
          goquality check \
            --source ${{ secrets.DATABASE_URL }} \
            --output json \
            --fail-threshold 1 \
            > results.json

      - name: Upload Results
        uses: actions/upload-artifact@v4
        with:
          name: quality-report
          path: results.json
```

### GitLab CI

```yaml
data-quality:
  image: python:3.11
  stage: test
  script:
    - pip install goquality[postgres]
    - goquality validate
    - goquality check --source $DATABASE_URL --output markdown > report.md
  artifacts:
    paths:
      - report.md
    expire_in: 1 week
```

### Pre-commit Hook

```yaml
# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: goquality-validate
        name: Validate GoQuality Config
        entry: goquality validate
        language: system
        files: goquality\.yaml$
        pass_filenames: false
```

---

## Standard Library Types

GoQuality includes 300+ pre-defined types organized by category.

### Core Types

| Type | Base | Description |
|------|------|-------------|
| `Email` | String | Email address |
| `EmailNullable` | String | Optional email |
| `UUID` | String | UUID v4 |
| `URL` | String | HTTP/HTTPS URL |
| `PhoneNumber` | String | International phone |
| `Hostname` | String | DNS hostname |

### Finance Types

| Type | Base | Description |
|------|------|-------------|
| `USD` | Decimal | US Dollar amount |
| `EUR` | Decimal | Euro amount |
| `CreditCardNumber` | String | Credit card (Luhn) |
| `IBAN` | String | International bank account |
| `BIC` | String | Bank identifier code |
| `ABARoutingNumber` | String | US routing number |

### Healthcare Types

| Type | Base | Description |
|------|------|-------------|
| `ICD10` | String | ICD-10 diagnosis code |
| `CPT` | String | CPT procedure code |
| `NPI` | String | National Provider ID |
| `NDC` | String | National Drug Code |
| `LOINC` | String | Lab test code |

### E-commerce Types

| Type | Base | Description |
|------|------|-------------|
| `SKU` | String | Stock keeping unit |
| `UPC` | String | UPC-A barcode |
| `EAN13` | String | EAN-13 barcode |
| `ASIN` | String | Amazon product ID |
| `ISBN13` | String | Book ISBN-13 |

### Regional Types

| Type | Base | Description |
|------|------|-------------|
| `SSN` | String | US Social Security |
| `USZipCode` | String | US ZIP code |
| `USState` | String | US state code |
| `GermanVATNumber` | String | German VAT |
| `UKPostcode` | String | UK postcode |
| `IndianPAN` | String | Indian tax ID |

### Analytics Types

| Type | Base | Description |
|------|------|-------------|
| `Percentage` | Decimal | 0-100 percentage |
| `Rate` | Decimal | 0-1 rate |
| `Score` | Decimal | 0-100 score |
| `MRR` | Decimal | Monthly recurring revenue |
| `NPSScore` | Integer | Net promoter score |

Browse all types:
```bash
goquality types
goquality types --tag finance
goquality types --search email
```

---

## Custom Validators (Plugins)

GoQuality supports custom validation logic via Python plugins.

### Creating a Validator

```python
# .goquality/plugins/my_validators.py

from goquality.plugins import register_validator

@register_validator("is_palindrome", description="Check if string is palindrome")
def is_palindrome(value: str) -> bool:
    clean = value.lower().replace(" ", "")
    return clean == clean[::-1]

@register_validator("divisible_by", description="Check divisibility")
def divisible_by_three(value: int) -> bool:
    return value % 3 == 0
```

### Built-in Advanced Validators

| Validator | Description |
|-----------|-------------|
| `luhn` | Luhn checksum (credit cards) |
| `iban` | IBAN checksum |
| `isbn10` | ISBN-10 checksum |
| `isbn13` | ISBN-13 checksum |
| `ean13` | EAN-13 barcode checksum |
| `upc` | UPC-A barcode checksum |
| `email_format` | Email format validation |
| `ipv4` | IPv4 address format |
| `ipv6` | IPv6 address format |
| `mac_address` | MAC address format |
| `json` | Valid JSON string |
| `base64` | Valid Base64 encoding |
| `future_date` | Date in the future |
| `past_date` | Date in the past |

---

## Troubleshooting

### Common Issues

**"Config file not found"**
```bash
# Create a config file
goquality init

# Or specify path
goquality check --config path/to/config.yaml
```

**"Unknown type: X"**
```bash
# List available types
goquality types --search X

# Check if custom type is defined in config
goquality validate
```

**"Connection failed"**
```bash
# Run diagnostics
goquality doctor --source YOUR_CONNECTION_STRING

# Check if driver is installed
pip install goquality[postgres]  # or [snowflake], [bigquery]
```

**"LLM API error"**
```bash
# Check API key is set
echo $OPENAI_API_KEY

# Try different provider
goquality generate --source ... --provider anthropic
goquality generate --source ... --provider ollama
```

### Debug Mode

```bash
# Enable verbose logging
GOQUALITY_DEBUG=1 goquality check --source ...
```

### Getting Help

```bash
# General help
goquality --help

# Command-specific help
goquality check --help
goquality generate --help
```

---

## License

MIT License - see [LICENSE](LICENSE) for details.

## Contributing

Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## Links

- Documentation: https://goquality.dev/docs
- GitHub: https://github.com/goquality/goquality
- PyPI: https://pypi.org/project/goquality/
