Metadata-Version: 2.4
Name: pyannotators-entityfishing
Version: 1.6.44
Summary: Annotator based on entity-fishing
Project-URL: Homepage, https://github.com/oterrier/pyannotators_entityfishing/
Author-email: Olivier Terrier <olivier.terrier@kairntech.com>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: collections-extended
Requires-Dist: mongoquery
Requires-Dist: pydantic<3.0,>=2.0
Requires-Dist: pymultirole-plugins<0.7.0,>=0.6.0
Requires-Dist: python-singleton-metaclasses
Requires-Dist: requests
Requires-Dist: requests-cache
Requires-Dist: requests-futures
Provides-Extra: dev
Requires-Dist: bump2version; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Provides-Extra: docs
Requires-Dist: lxml-html-clean; extra == 'docs'
Requires-Dist: m2r2; extra == 'docs'
Requires-Dist: sphinx; extra == 'docs'
Requires-Dist: sphinx-rtd-theme; extra == 'docs'
Requires-Dist: sphinxcontrib-apidoc; extra == 'docs'
Provides-Extra: sbom
Requires-Dist: cyclonedx-bom; extra == 'sbom'
Requires-Dist: pip-audit; extra == 'sbom'
Provides-Extra: spacy
Requires-Dist: spacy>=3.0; extra == 'spacy'
Provides-Extra: test
Requires-Dist: dirty-equals; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: ruff; extra == 'test'
Description-Content-Type: text/markdown

# pyannotators-entityfishing

Annotator based on [entity-fishing](https://github.com/kermitt2/entity-fishing) for named entity recognition and disambiguation against Wikidata.

## Installation

```bash
pip install pyannotators-entityfishing
```

For noun-form filtering (optional):

```bash
pip install pyannotators-entityfishing[spacy]
python -m spacy download en_core_web_sm  # or other language models
```

## Usage

```python
from pymultirole_plugins.v1.schema import Document
from pyannotators_entityfishing.entityfishing import EntityFishingAnnotator, EntityFishingParameters

annotator = EntityFishingAnnotator()
parameters = EntityFishingParameters(
    default_label="ENTITY",
    minSelectorScore=0.3,
)

docs = annotator.annotate(
    [Document(text="Albert Einstein was born in Ulm.", metadata={"language": "en"})],
    parameters,
)

for ann in docs[0].annotations:
    print(f"{ann.start}:{ann.end} {ann.labelName} {ann.terms[0].identifier}")
```

## Development

Install test dependencies:

```bash
uv pip install -e ".[test]"
```

### Linting

```bash
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
```

### Testing

```bash
uv run pytest
```

### Coverage

```bash
uv run pytest --cov=src --cov-report=term-missing
```

## SBOM & vulnerability check

Install the SBOM dependencies:

```
uv sync --extra sbom
```

Generate a CycloneDX SBOM from the current environment:

```
uv run cyclonedx-py environment -o sbom.cdx.json --output-format json
```

Audit dependencies for known vulnerabilities:

```
uv run pip-audit --format json --output audit-report.json
```

To fail on any known vulnerability (useful in CI):

```
uv run pip-audit --strict
```
