Metadata-Version: 2.1
Name: ftmq
Version: 0.6.2
Summary: followthemoney query dsl and io helpers
Home-page: https://github.com/investigativedata/ftmq
License: MIT
Author: Simon Wörpel
Author-email: simon.woerpel@pm.me
Requires-Python: >=3.11,<3.12
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: PyICU (>=2.12,<3.0)
Requires-Dist: alephclient (>=2.3.6,<3.0.0)
Requires-Dist: anystore (>=0.1.3,<0.2.0)
Requires-Dist: banal (>=1.0.6,<2.0.0)
Requires-Dist: certifi (>=2024.2.2)
Requires-Dist: click (>=8.1.7,<9.0.0)
Requires-Dist: click-default-group (>=1.2.4,<2.0.0)
Requires-Dist: cryptography (>=42.0.4,<43.0.0)
Requires-Dist: followthemoney (>=3.5.9,<4.0.0)
Requires-Dist: nomenklatura (>=3.10.4,<4.0.0)
Requires-Dist: orjson (>=3.9.15,<4.0.0)
Requires-Dist: pycountry (>=23.12.11,<24.0.0)
Requires-Dist: pydantic (>=2.6.2,<3.0.0)
Requires-Dist: scipy (>=1.12.0,<2.0.0)
Requires-Dist: sqlalchemy (>=2.0.27,<3.0.0)
Requires-Dist: urllib3 (<3)
Project-URL: Bug Tracker, https://github.com/investigativedata/ftmq/issues
Project-URL: Documentation, https://github.com/investigativedata/ftmq
Project-URL: Repository, https://github.com/investigativedata/ftmq
Description-Content-Type: text/markdown

[![ftmq on pypi](https://img.shields.io/pypi/v/ftmq)](https://pypi.org/project/ftmq/) [![Python test and package](https://github.com/investigativedata/ftmq/actions/workflows/python.yml/badge.svg)](https://github.com/investigativedata/ftmq/actions/workflows/python.yml) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit) [![Coverage Status](https://coveralls.io/repos/github/investigativedata/ftmq/badge.svg?branch=main)](https://coveralls.io/github/investigativedata/ftmq?branch=main) [![MIT License](https://img.shields.io/pypi/l/ftmq)](./LICENSE)

# ftmq

An attempt towards a followthemoney query dsl.

This library provides methods to query and filter entities formatted as
[followthemoney](https://github.com/alephdata/followthemoney) data, either from
a json file/stream or using a SQL backend via
[followthemoney-store](https://github.com/alephdata/followthemoney-store)

It also provides a `Query` class that can be used in other libs to work with
SQL queries or api queries.

**Minimum Python version: 3.11**

## Installation

    pip install ftmq

## Usage

`ftmq` accepts either a line-based input stream or an argument with a file uri.
(For integration with `followthemoney-store`, see below)

Input stream:

    cat entities.ftm.json | ftmq <filter expression> > output.ftm.json

URI argument:

Under the hood, `ftmq` uses
[smart_open](https://github.com/RaRe-Technologies/smart_open) to be able to
interpret arbitrary file uris as argument `-i`:

    ftmq <filter expression> -i ~/Data/entities.ftm.json
    ftmq <filter expression> -i https://example.org/data.json.gz
    ftmq <filter expression> -i s3://data-bucket/entities.ftm.json
    ftmq <filter expression> -i webhdfs://host:port/path/file

[...and so on](https://github.com/RaRe-Technologies/smart_open#how)

Of course, the same is possible for output `-o`:

    cat data.json | ftmq <filter expression> -o s3://data-bucket/output.json

### Filter for a dataset:

    cat entities.ftm.json | ftmq -d ec_meetings

### Filter for a schema:

    cat entities.ftm.json | ftmq -s Person

Filter for a schema and all it's descendants or ancestors:

    cat entities.ftm.json | ftmq -s LegalEntity --schema-include-descendants
    cat entities.ftm.json | ftmq -s LegalEntity --schema-include-ancestors

### Filter for properties:

[Properties](https://followthemoney.tech/explorer/) are options via `--<prop>=<value>`

    cat entities.ftm.json | ftmq -s Company --country=de

#### Comparison lookups for properties:

    cat entities.ftm.json | ftmq -s Company --incorporationDate__gte=2020 --address__ilike=berlin

Possible lookups:
- `gt` - greater than
- `lt` - lower than
- `gte` - greater or equal
- `lte` - lower or equal
- `like` - SQLish `LIKE` (use `%` placeholders)
- `ilike` - SQLish `ILIKE`, case-insensitive (use `%` placeholders)
- `[]` - usage: `prop[]=foo` evaluates if `foo` is member of array `prop`


### ftmq apply

"Uplevel" an entity input stream to `nomenklatura.entity.CompositeEntity` and
optionally apply a dataset.

    ftmq apply -i ./entities.ftm.json -d <aditional_dataset>

Overwrite datasets:

    ftmq apply -i ./entities.ftm.json -d <aditional_dataset> --replace-dataset

### Coverage / Statistics

Often in ftm scripting, we are iterating through all the proxies (e.g. during aggregation). Why not use this to collect statistics on the way? There is a context manager for this, which turns into the `Coverage` model:

Print coverage to stdout (and filtered entities to nowhere):

    cat entities.ftm.json | ftmq -s Event -o /dev/null --coverage-uri -

Within code:

```python
from ftmq.coverage import Collector

fragments = [...]
buffer = {}

c = Collector()
for proxy in fragments:
    if proxy.id in buffer:
        buffer[proxy.id].merge(proxy)
    else:
        buffer[proxy.id] = proxy
        # here collect stats:
        c.collect(proxy)

coverage = c.export()
```

### ftmstore (database read)

**NOT IMPLEMENTED YET**

The same cli logic applies:

    ftmq store iterate -d ec_meetings -s Event --date__gte=2019 --date__lte=2020

## Python Library

```python
from ftmq import Query

q = Query() \
    .where(dataset="ec_meetings", date__lte=2020) \
    .where(schema="Event") \
    .order_by("date", ascending=False)

assert q.apply(proxy)
```

## support

*This project is part of [investigraph](https://github.com/investigativedata/investigraph)*

[Media Tech Lab Bayern batch #3](https://github.com/media-tech-lab)

<a href="https://www.media-lab.de/en/programs/media-tech-lab">
    <img src="https://raw.githubusercontent.com/media-tech-lab/.github/main/assets/mtl-powered-by.png" width="240" title="Media Tech Lab powered by logo">
</a>

