Metadata-Version: 2.1
Name: crowd-kit
Version: 1.3.0
Summary: Computational Quality Control for Crowdsourcing
Author: Toloka
License: Apache 2.0
Project-URL: Homepage, https://github.com/Toloka/crowd-kit
Project-URL: Bug Tracker, https://github.com/Toloka/crowd-kit/issues
Project-URL: Documentation, https://crowd-kit.readthedocs.io/
Project-URL: API Reference, https://crowd-kit.readthedocs.io/
Project-URL: Source Code, https://github.com/Toloka/crowd-kit
Project-URL: Release Notes, https://github.com/Toloka/crowd-kit/blob/main/CHANGELOG.md
Keywords: crowdsourcing,data labeling,answer aggregation,truth inference,learning from crowds,machine learning,quality control,data quality
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS
Requires-Dist: attrs
Requires-Dist: nltk
Requires-Dist: numpy
Requires-Dist: pandas>=1.1.0
Requires-Dist: pyarrow
Requires-Dist: scikit-learn
Requires-Dist: tqdm
Requires-Dist: transformers
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: codecov; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: ipywidgets; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: nbqa; extra == "dev"
Requires-Dist: notebook; extra == "dev"
Requires-Dist: pandas-stubs; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pyupgrade; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: types-tqdm; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs-material; extra == "docs"
Requires-Dist: mkdocstrings-python; extra == "docs"
Provides-Extra: learning
Requires-Dist: torch>=1.6.0; extra == "learning"
Requires-Dist: transformers>=4.0.0; extra == "learning"

# Crowd-Kit: Computational Quality Control for Crowdsourcing

[![Crowd-Kit](https://tlk.s3.yandex.net/crowd-kit/Crowd-Kit-GitHub.png)](https://github.com/Toloka/crowd-kit)

[![PyPI Version][pypi_badge]][pypi_link]
[![GitHub Tests][github_tests_badge]][github_tests_link]
[![Codecov][codecov_badge]][codecov_link]
[![Documentation][docs_badge]][docs_link]

[pypi_badge]: https://badge.fury.io/py/crowd-kit.svg
[pypi_link]: https://pypi.python.org/pypi/crowd-kit
[github_tests_badge]: https://github.com/Toloka/crowd-kit/actions/workflows/tests.yml/badge.svg?branch=main
[github_tests_link]: https://github.com/Toloka/crowd-kit/actions/workflows/tests.yml
[codecov_badge]: https://codecov.io/gh/Toloka/crowd-kit/branch/main/graph/badge.svg
[codecov_link]: https://codecov.io/gh/Toloka/crowd-kit
[docs_badge]: https://readthedocs.org/projects/crowd-kit/badge/
[docs_link]: https://crowd-kit.readthedocs.io/

**Crowd-Kit** is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

* implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
* metrics of uncertainty, consistency, and agreement with aggregate;
* loaders for popular crowdsourced datasets.

Also, the `learning` subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

## Installing

To install Crowd-Kit, run the following command: `pip install crowd-kit`. If you also want to use the `learning` subpackage, type `pip install crowd-kit[learning]`.

If you are interested in contributing to Crowd-Kit, use [Pipenv](https://pipenv.pypa.io/en/latest/) to install the library with its dependencies: `pipenv install --dev`. We use [pytest](https://pytest.org/) for testing.

## Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

````python
from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd
````

Then, you need to read your annotations into Pandas DataFrame with columns `task`, `worker`, `label`. Alternatively, you can download an example dataset:

````python
df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset
````

Then, you can aggregate the workers' responses using the `fit_predict` method from the **scikit-learn** library:

````python
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
````

[More usage examples](https://github.com/Toloka/crowd-kit/tree/main/examples)

## Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

### Categorical Responses

| Method | Status |
| ------------- | :-------------: |
| [Majority Vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.majority_vote.MajorityVote) | ✅ |
| [One-coin Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.OneCoinDawidSkene) | ✅ |
| [Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.DawidSkene) | ✅ |
| [Gold Majority Vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.gold_majority_vote.GoldMajorityVote) | ✅ |
| [M-MSR](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.m_msr.MMSR) | ✅ |
| [Wawa](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.wawa.Wawa) | ✅ |
| [Zero-Based Skill](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.zero_based_skill.ZeroBasedSkill) | ✅ |
| [GLAD](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.glad.GLAD) | ✅ |
| [KOS](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.kos.KOS) | ✅ |
| [MACE](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.mace.MACE) | ✅ |

### Multi-Label Responses

|Method|Status|
|-|:-:|
|[Binary Relevance](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.multilabel.binary_relevance.BinaryRelevance)|✅|

### Textual Responses

| Method | Status |
| ------------- | :-------------: |
| [RASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.embeddings.rasa.RASA) | ✅ |
| [HRRASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.embeddings.hrrasa.HRRASA) | ✅ |
| [ROVER](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.texts.rover.ROVER) | ✅ |

### Image Segmentation

| Method | Status |
| ------------------ | :------------------: |
| [Segmentation MV](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_majority_vote.SegmentationMajorityVote) | ✅ |
| [Segmentation RASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_rasa.SegmentationRASA) | ✅ |
| [Segmentation EM](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_em.SegmentationEM) | ✅ |

### Pairwise Comparisons

| Method | Status |
| -------------- | :---------------------: |
| [Bradley-Terry](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.pairwise.bradley_terry.BradleyTerry) | ✅ |
| [Noisy Bradley-Terry](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.pairwise.noisy_bt.NoisyBradleyTerry) | ✅ |

### Learning from Crowds

|Method|Status|
|-|:-:|
|[CrowdLayer](https://toloka.ai/docs/crowd-kit/reference/crowdkit.learning.crowd_layer.CrowdLayer)|✅|
|[CoNAL](https://toloka.ai/docs/crowd-kit/reference/crowdkit.learning.conal.CoNAL)|✅|

## Citation

* Ustalov D., Pavlichenko N., Tseitlin B. [Learning from Crowds with Crowd-Kit](https://arxiv.org/abs/2109.08584). 2023. arXiv: [2109.08584 [cs.HC]](https://arxiv.org/abs/2109.08584).

```bibtex
@misc{CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
  title     = {{Learning from Crowds with Crowd-Kit}},
  year      = {2023},
  publisher = {arXiv},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://arxiv.org/abs/2109.08584},
  language  = {english},
}
```

## Support and Contributions

Please use [GitHub Issues](https://github.com/Toloka/crowd-kit/issues) to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in [CONTRIBUTING.md](CONTRIBUTING.md).

## License

&copy; Crowd-Kit team authors, 2020&ndash;2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
