Metadata-Version: 2.4
Name: pagespeed
Version: 2.0.0
Summary: CLI tool for batch Google PageSpeed Insights analysis with CSV/JSON/HTML reports
Project-URL: Homepage, https://github.com/volkanunsal/pagespeed
Project-URL: Repository, https://github.com/volkanunsal/pagespeed
Project-URL: Issues, https://github.com/volkanunsal/pagespeed/issues
License-Expression: MIT
License-File: LICENSE
Keywords: core-web-vitals,lighthouse,pagespeed,performance,web-vitals
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Site Management
Requires-Python: >=3.13
Requires-Dist: httpx
Requires-Dist: pandas
Requires-Dist: rich
Description-Content-Type: text/markdown

# PageSpeed Insights Batch Analysis Tool

A command-line tool that automates Google PageSpeed Insights analysis across multiple URLs, extracting performance metrics (lab + field data) into structured CSV, JSON, and HTML reports.

## Installation

### Run instantly with `uvx` (recommended, no install needed)

```bash
uvx pagespeed quick-check https://example.com
```

### Install with `pip` or `pipx`

```bash
pip install pagespeed
pagespeed quick-check https://example.com
```

### Run from URL (just needs `uv`)

```bash
uv run https://raw.githubusercontent.com/volkanunsal/pagespeed/main/pagespeed_insights_tool.py quick-check https://example.com
```

### Development

```bash
git clone https://github.com/volkanunsal/pagespeed.git
cd pagespeed
uv run pagespeed_insights_tool.py quick-check https://example.com
```

## Prerequisites

- **Python 3.13+**
- **Google API key** (optional) — without one, you're limited to ~25 queries/day; with one, ~25,000/day

## Getting an API Key

1. Go to the [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project (or select an existing one)
3. Navigate to **APIs & Services > Library**
4. Search for **PageSpeed Insights API** and enable it
5. Go to **APIs & Services > Credentials**
6. Click **Create Credentials > API Key**
7. Copy the key and set it:
   ```bash
   export PAGESPEED_API_KEY=your_key_here
   ```
   Or add it to your `pagespeed.toml` config file (see [Configuration](#configuration)).

## Usage

### `quick-check` — Fast single-URL spot check

Prints a formatted report to the terminal. No files written.

```bash
# Mobile only (default)
pagespeed quick-check https://www.google.com

# Both mobile and desktop
pagespeed quick-check https://www.google.com --device both

# With specific categories
pagespeed quick-check https://www.google.com --categories performance accessibility
```

Sample output:

```
============================================================
  URL:      https://www.google.com
  Strategy: mobile
============================================================
  Performance Score: 92/100 (GOOD)

  --- Lab Data ---
  First Contentful Paint............. 1200ms
  Largest Contentful Paint........... 1800ms
  Cumulative Layout Shift............ 0.0100
  Speed Index........................ 1500ms
  Total Blocking Time................ 150ms
  Time to Interactive................ 2100ms
```

### `audit` — Full batch analysis

Analyzes multiple URLs and writes CSV/JSON reports.

```bash
# From a file of URLs
pagespeed audit -f urls.txt

# Multiple strategies and output formats
pagespeed audit -f urls.txt --device both --output-format both

# Inline URLs with custom output path
pagespeed audit https://a.com https://b.com -o report

# With a named profile
pagespeed audit -f urls.txt --profile full

# Piped input
cat urls.txt | pagespeed audit

# Include full Lighthouse audit data in JSON output
pagespeed audit -f urls.txt --full --output-format json

# Stream results as NDJSON to stdout as they complete
pagespeed audit -f urls.txt --stream

# Pipe streamed results into jq for real-time filtering
pagespeed audit -f urls.txt --stream | jq '.performance_score'

# Stream and filter to only failing URLs
pagespeed audit -f urls.txt --stream | jq 'select(.performance_score < 50)'
```

#### `--full` flag

Pass `--full` to embed the complete raw `lighthouseResult` object from the PageSpeed API into each result in the JSON output. This includes all Lighthouse audits, opportunities, diagnostics, and metadata — useful for deep analysis or feeding into other tools.

- **JSON**: each result gains a top-level `lighthouseResult` key containing the full API object.
- **CSV**: `--full` is silently ignored; the raw object is never written to CSV.
- **File naming**: auto-named files get a `-full` suffix (e.g., `20260219T143022Z-mobile-full.json`).
#### `--stream` flag

Pass `--stream` to print results to stdout as **NDJSON** (one JSON object per line) as each URL/strategy completes, instead of buffering everything and writing files at the end. This lets you pipe results into `jq`, `grep`, or other tools without waiting for the full batch to finish.

- **Output**: one `json.dumps` line per result written to stdout immediately on completion.
- **File output**: skipped — no CSV/JSON files are written in stream mode.
- **Summary**: the post-run audit summary table is suppressed (not useful when piping).
- **Progress bar**: still shown on stderr so you can track progress while piping stdout.
- **Budget**: still evaluated if `--budget` is set, using the complete result set.

```bash
# Stream all results to stdout
pagespeed audit -f urls.txt --stream

# Extract a single field from each result
pagespeed audit -f urls.txt --stream | jq '.performance_score'

# Filter to only URLs below a score threshold
pagespeed audit -f urls.txt --stream | jq 'select(.performance_score < 50)'

# Save streamed results to a file while also viewing them
pagespeed audit -f urls.txt --stream | tee results.ndjson | jq '.url'
```

Each NDJSON line is a flat JSON object with the same fields as a CSV row (`url`, `strategy`, `performance_score`, `lab_fcp_ms`, etc.). `null` is used where a value is not available.

The URL file is one URL per line. Lines starting with `#` are comments:

```
# Main pages
https://example.com
https://example.com/about
https://example.com/contact
```

### `compare` — Compare two reports

Loads two previous report files and shows per-URL score changes.

```bash
# Compare before and after
pagespeed compare before.csv after.csv

# Custom threshold (flag changes >= 10%)
pagespeed compare --threshold 10 old.json new.json
```

Output flags regressions with `!!` and improvements with `++`.

### `report` — Generate HTML dashboard

Creates a self-contained HTML report from a results file.

```bash
# Generate HTML from CSV results
pagespeed report results.csv

# Custom output path
pagespeed report results.json -o dashboard.html

# Auto-open in browser
pagespeed report results.csv --open
```

The HTML report includes:
- Summary cards (total URLs, average/best/worst scores)
- Color-coded score table (green/orange/red)
- Core Web Vitals pass/fail indicators
- Bar charts comparing scores across URLs
- Field data table (when available)
- Sortable columns (click headers)

### `run` — Low-level direct access

Full control with every CLI flag. Same internals as `audit`.

```bash
pagespeed run https://example.com --device desktop --categories performance accessibility --delay 2.0
```

### `pipeline` — End-to-end analysis

Resolves URLs from a sitemap (or file/inline), runs the analysis, writes CSV/JSON data files, and generates an HTML report — all in one command. Optionally evaluates a performance budget.

```bash
# From a sitemap (auto-detected from URL shape)
pagespeed pipeline https://example.com/sitemap.xml

# Limit URLs and auto-open report in browser
pagespeed pipeline https://example.com/sitemap.xml --sitemap-limit 20 --open

# Filter to a section of the sitemap, both devices
pagespeed pipeline https://example.com/sitemap.xml --sitemap-filter "/blog/" --device both

# Inline URLs
pagespeed pipeline https://a.com https://b.com --device both

# From a URL file
pagespeed pipeline -f urls.txt --open

# Data files only — skip HTML report generation
pagespeed pipeline -f urls.txt --no-report --output-format json

# Evaluate Core Web Vitals budget (exits 2 on failure)
pagespeed pipeline https://example.com/sitemap.xml --budget cwv

# Custom budget with GitHub Actions output format
pagespeed pipeline https://example.com/sitemap.xml --budget budget.toml --budget-format github
```

**Sitemap auto-detection**: when a single positional argument looks like a sitemap (ends in `.xml`, contains `sitemap` in the path, or the file content starts with `<?xml`), it is treated as a sitemap source automatically. Pass `--sitemap` explicitly to use a sitemap alongside inline URLs.

#### `pipeline` flags

| Flag | Short | Default | Description |
|------|-------|---------|-------------|
| `source` | — | `[]` | Sitemap URL/path (auto-detected) or plain URLs |
| `--file` | `-f` | None | File with one URL per line |
| `--sitemap` | — | None | Explicit sitemap URL or local path |
| `--sitemap-limit` | — | None | Max URLs to extract from sitemap |
| `--sitemap-filter` | — | None | Regex to filter sitemap URLs |
| `--open` | — | `False` | Auto-open HTML report in browser after completion |
| `--no-report` | — | `False` | Skip HTML report; write data files only |
| `--budget` | — | None | Budget file (TOML) or `cwv` preset — exits 2 on failure |
| `--budget-format` | — | `text` | Budget output format: `text`, `json`, or `github` |
| `--webhook` | — | None | Webhook URL for budget result notifications |
| `--webhook-on` | — | `always` | When to send webhook: `always` or `fail` |

All `audit` flags (`--device`, `--output-format`, `--output`, `--output-dir`, `--delay`, `--workers`, `--categories`) also apply.

## Configuration

### Config file: `pagespeed.toml`

An optional TOML file for persistent settings and named profiles. The tool searches for it in:
1. Current working directory (`./pagespeed.toml`)
2. User config directory (`~/.config/pagespeed/config.toml`)

You can also pass an explicit path with `--config path/to/config.toml`.

```toml
[settings]
api_key = "YOUR_API_KEY"       # or use PAGESPEED_API_KEY env var
urls_file = "urls.txt"         # default URL file for -f
delay = 1.5                    # seconds between API requests
device = "mobile"              # mobile, desktop, or both
output_format = "csv"          # csv, json, or both
output_dir = "./reports"       # directory for output files
workers = 4                    # concurrent workers (1 = sequential)
categories = ["performance"]   # Lighthouse categories
verbose = false

[profiles.quick]
device = "mobile"
output_format = "csv"
categories = ["performance"]

[profiles.full]
device = "both"
output_format = "both"
categories = ["performance", "accessibility", "best-practices", "seo"]

[profiles.core-vitals]
device = "both"
output_format = "csv"
categories = ["performance"]

[profiles.client-report]
urls_file = "client_urls.txt"
device = "both"
output_format = "both"
output_dir = "./client-reports"
categories = ["performance", "accessibility", "seo"]
```

### Config resolution order

Settings are merged with the following priority (highest wins):

1. **CLI flags** — explicit command-line arguments
2. **Profile values** — via `--profile name`
3. **`[settings]`** — defaults from config file
4. **Built-in defaults** — hardcoded in the script

### Global flags

| Flag | Short | Default | Description |
|------|-------|---------|-------------|
| `--api-key` | — | config/env | Google API key |
| `--config` | `-c` | auto-discovered | Path to config TOML |
| `--profile` | `-p` | None | Named profile from config |
| `--verbose` | `-v` | False | Verbose output to stderr |
| `--version` | — | — | Print version and exit |

### `audit` / `run` flags

| Flag | Short | Default | Description |
|------|-------|---------|-------------|
| `urls` | — | `[]` | Positional URLs |
| `--file` | `-f` | None | File with one URL per line |
| `--device` | — | `mobile` | `mobile`, `desktop`, or `both` |
| `--output-format` | — | `csv` | `csv`, `json`, or `both` |
| `--output` | `-o` | auto-timestamped | Explicit output file path |
| `--output-dir` | — | `./reports/` | Directory for auto-named files |
| `--delay` | `-d` | `1.5` | Seconds between requests |
| `--workers` | `-w` | `4` | Concurrent workers |
| `--categories` | — | `performance` | Lighthouse categories |
| `--full` | — | `False` | Embed raw `lighthouseResult` in JSON output (ignored for CSV) |
| `--stream` | — | `False` | Print results as NDJSON to stdout as they complete (skips file output) |

## Output Formats

### File naming

By default, output files use UTC timestamps:

```
{output_dir}/{YYYYMMDD}T{HHMMSS}Z-{strategy}.{ext}
```

Examples:
```
./reports/20260216T143022Z-mobile.csv
./reports/20260216T150000Z-both.json
./reports/20260216T143022Z-report.html
```

Use `-o` to override with an explicit path.

### CSV

Flat table with one row per (URL, strategy) pair. Columns:

| Column | Description |
|--------|-------------|
| `url` | The analyzed URL |
| `strategy` | `mobile` or `desktop` |
| `performance_score` | 0-100 Lighthouse score |
| `lab_fcp_ms` | First Contentful Paint (ms) |
| `lab_lcp_ms` | Largest Contentful Paint (ms) |
| `lab_cls` | Cumulative Layout Shift |
| `lab_speed_index_ms` | Speed Index (ms) |
| `lab_tbt_ms` | Total Blocking Time (ms) |
| `lab_tti_ms` | Time to Interactive (ms) |
| `field_*` | Field (CrUX) metrics (when available) |
| `error` | Error message if the request failed |

### JSON

Structured with metadata header:

```json
{
  "metadata": {
    "generated_at": "2026-02-16T14:30:22+00:00",
    "total_urls": 5,
    "strategies": ["mobile", "desktop"],
    "tool_version": "1.0.0"
  },
  "results": [
    {
      "url": "https://example.com",
      "strategy": "mobile",
      "performance_score": 92,
      "lab_metrics": { "lab_fcp_ms": 1200, "lab_lcp_ms": 1800, ... },
      "field_metrics": { "field_lcp_ms": 2100, "field_lcp_category": "FAST", ... },
      "error": null
    }
  ]
}
```

With `--full`, each result also includes the complete raw `lighthouseResult` from the API:

```json
{
  "results": [
    {
      "url": "https://example.com",
      "strategy": "mobile",
      "performance_score": 92,
      "lab_metrics": { ... },
      "field_metrics": { ... },
      "lighthouseResult": {
        "audits": { ... },
        "categories": { ... },
        "categoryGroups": { ... },
        "configSettings": { ... },
        "environment": { ... },
        "fetchTime": "...",
        "finalUrl": "https://example.com",
        "lighthouseVersion": "...",
        "requestedUrl": "https://example.com",
        "runWarnings": [],
        "stackPacks": [],
        "timing": { ... },
        "i18n": { ... }
      },
      "error": null
    }
  ]
}
```

## Metrics Reference

### Lab data (synthetic, from Lighthouse)

| Metric | Good | Needs Work | Poor |
|--------|------|-----------|------|
| First Contentful Paint | < 1.8s | 1.8s–3.0s | > 3.0s |
| Largest Contentful Paint | < 2.5s | 2.5s–4.0s | > 4.0s |
| Cumulative Layout Shift | < 0.1 | 0.1–0.25 | > 0.25 |
| Total Blocking Time | < 200ms | 200ms–600ms | > 600ms |
| Speed Index | < 3.4s | 3.4s–5.8s | > 5.8s |
| Time to Interactive | < 3.8s | 3.8s–7.3s | > 7.3s |

### Field data (real users, from CrUX)

Field data comes from the Chrome User Experience Report. It may not be available for low-traffic sites.

| Metric | Description |
|--------|-------------|
| FCP | First Contentful Paint — when first content appears |
| LCP | Largest Contentful Paint — when main content loads |
| CLS | Cumulative Layout Shift — visual stability |
| INP | Interaction to Next Paint — input responsiveness |
| FID | First Input Delay — (deprecated, replaced by INP) |
| TTFB | Time to First Byte — server response time |

## Rate Limits

| Scenario | Limit |
|----------|-------|
| Without API key | ~25 queries/100 seconds |
| With API key | ~25,000 queries/day (400/100 seconds) |

Tips:
- Use `--delay` to increase time between requests if hitting rate limits
- The tool retries on 429 (rate limit) responses with exponential backoff
- See [Concurrency Model](#concurrency-model) for how `--workers` and `--delay` interact

## Concurrency Model

The tool uses `asyncio` + `httpx` for non-blocking HTTP I/O.

**How it works:**

- With `--workers 1` (or effectively 1), requests run strictly sequentially — one finishes before the next starts.
- With `--workers N > 1` (default: 4), all tasks are launched together via `asyncio.gather()`. A shared `asyncio.Semaphore(1)` ensures requests _start_ no more than once per `--delay` seconds:
  1. Each coroutine acquires the semaphore
  2. Sleeps the remainder of `delay` since the last request started
  3. Records the timestamp and releases the semaphore
  4. Makes the actual HTTP request — outside the semaphore

Because the HTTP call happens after releasing the semaphore, multiple requests can be **in-flight simultaneously** even though they start `delay` seconds apart. Wall time is therefore much shorter than `n_urls × (delay + latency)`; it converges toward `n_urls × delay + avg_latency` as the number of URLs grows.

**Practical rule of thumb:**

| Goal | Setting |
|------|---------|
| Safest for rate limits | `--workers 1` (sequential) |
| Default (balanced) | `--workers 4 --delay 1.5` |
| Maximum throughput | `--workers 4 --delay 1.0` (watch for 429s) |

## Cron usage

Output files auto-increment with timestamps, so cron jobs won't overwrite previous results:

```bash
# Every Monday at 6am UTC
0 6 * * 1 cd /path/to/project && pagespeed audit -f urls.txt --profile full
```

## Examples

The [`examples/`](examples/) folder contains ready-to-use configuration files for common workflows:

| Example | Description |
|---------|-------------|
| [`basic/`](examples/basic/) | Minimal config with API key, strategy, and a sample URL list |
| [`multi-profile/`](examples/multi-profile/) | Named profiles for quick, full, and client-report workflows |
| [`ci-budget/`](examples/ci-budget/) | Strict and lenient performance budgets for CI pipelines |
| [`sitemap-pipeline/`](examples/sitemap-pipeline/) | Sitemap auto-discovery with regex filters and section-specific profiles |

Copy any example folder into your project and edit to taste. See [`examples/README.md`](examples/README.md) for full details.

## Testing

The project includes a comprehensive test suite (169 tests across 30 test classes). All tests run offline — API calls, sitemap fetches, and file I/O are mocked.

```bash
# Run all tests
uv run test_pagespeed_insights_tool.py -v

# Run a single test class
uv run test_pagespeed_insights_tool.py -v TestValidateUrl

# Run a specific test method
uv run test_pagespeed_insights_tool.py -v TestExtractMetrics.test_full_extraction
```

## License

This project is licensed under the [MIT License](LICENSE).
