Metadata-Version: 2.4
Name: html-intersection
Version: 0.2.0
Summary: Fix canonical links, FLAGS, and RO<->EN cross-references across mirrored HTML directories.
Author: Andrei/Andreea Team
License: MIT
Project-URL: Homepage, https://pypi.org/project/html-intersection/
Project-URL: Repository, https://example.com/
Keywords: html,seo,canonical,flags,intersection,sync,ro,en
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

html-intersection
==================

Fix canonical links, FLAGS, and RO↔EN cross-references across mirrored HTML directories.

What it does
------------
- Ensures each file's `<link rel="canonical" ...>` matches its exact filename (case-sensitive)
- Ensures the FLAGS links for RO (`+40`) and EN (`+1`) match the canonical in the same file
- Synchronizes cross-references between `ro/` and `en/` files so the pair points to each other (RO<->EN)
- Detects and reports unresolved cases:
  - invalid links (pointing to non-existent files)
  - pairs with no common links in FLAGS (all four links different)
  - unmatched RO/EN files that remain without a valid pair

Inspired by the step-by-step process in your Intersection scripts and packaged like a PyPI library (style similar to `html` on PyPI).

Install
-------
```bash
pip install html-intersection
```

Quick start
-----------
```python
from html_intersection.core import repair_all

repair_all(
    ro_directory=r"E:\\path\\to\\site\\ro",
    en_directory=r"E:\\path\\to\\site\\en",
    base_url="https://neculaifantanaru.com",
)
```

CLI usage
---------
```bash
html-intersection repair --ro-dir "E:\\path\\to\\site\\ro" --en-dir "E:\\path\\to\\site\\en" --base-url https://neculaifantanaru.com
```

Commands:
- `repair` (runs all 3 steps)
- `fix-canonicals`
- `fix-flags`
- `sync`
- `scan` (prints detected RO↔EN pairs; add `--report` to include invalid links, mismatches, unmatched files)

Python API
----------
- `fix_canonicals(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)`
- `fix_flags_match_canonical(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)`
- `sync_cross_references(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)`
- `repair_all(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)`

Examples
--------
1) Basic repair
```python
from html_intersection.core import repair_all

repair_all(
    ro_directory=r"E:\\site\\ro",
    en_directory=r"E:\\site\\en",
    base_url="https://neculaifantanaru.com",
)
```

2) Dry run (no writes)
```python
from html_intersection.core import repair_all

repair_all(
    ro_directory=r"E:\\site\\ro",
    en_directory=r"E:\\site\\en",
    base_url="https://neculaifantanaru.com",
    dry_run=True,
)
```

3) CLI one step at a time
```bash
html-intersection fix-canonicals --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com
html-intersection fix-flags      --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com
html-intersection sync           --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com

# Scan with detailed report
html-intersection scan           --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com --report
```

How the logic works (3 steps)
-----------------------------
1. Canonicals: set canonical to exact file name (case-sensitive); RO → `https://.../<name>.html`, EN → `https://.../en/<Name>.html`.
2. FLAGS = canonical in the same file: RO uses `cunt_code="+40"`; EN uses `cunt_code="+1"`.
3. Cross-references RO↔EN: in `ro/<name>.html` the `+1` link points to the paired `en/<Name>.html`; in `en/<Name>.html` the `+40` link points to the paired `ro/<name>.html`.

Notes on robustness
-------------------
- The matching for `+40` and `+1` accepts both `"+40"` and `"\+40"` (and similarly for `+1`).
- Accidental `...html.html` is normalized to `...html` when comparing and fixing.
- `scan --report` surfaces invalid links, mismatched pairs with no common links, and files left unmatched.

Windows install and build
-------------------------
```powershell
# Create and activate venv
py -m venv .venv
.\.venv\Scripts\Activate.ps1

# Install build tooling
py -m pip install --upgrade pip build twine

# Build the wheel and sdist
py -m build

# Upload to TestPyPI (recommended first)
$env:TWINE_USERNAME = "__token__"
$env:TWINE_PASSWORD = "pypi-<YOUR_TESTPYPI_TOKEN>"
py -m twine upload --repository testpypi dist/*

# Upload to PyPI (when ready)
$env:TWINE_USERNAME = "__token__"
$env:TWINE_PASSWORD = "pypi-<YOUR_PYPI_TOKEN>"
py -m twine upload dist/*
```

Notes
-----
- Files are written UTF-8; the reader tries `utf-8`, `latin1`, `cp1252`, `iso-8859-1`.
- You can pass `backup_ext=".bak"` to keep a backup of modified files.
- The library aims to follow the precise, case-sensitive flow in your instructions.

References
----------
- Diacritice project structure reference: [`https://github.com/me-suzy/Diacritice-Proiect---pypi-org`](https://github.com/me-suzy/Diacritice-Proiect---pypi-org)
- PyPI `html` package page style reference: [`https://pypi.org/project/html/`](https://pypi.org/project/html/)


