Metadata-Version: 2.1
Name: CTApy
Version: 0.1.3
Summary: Python package for the Conditional Topic Allocation (CTA)
Home-page: https://github.com/twekhof/CTA
Author: Tobias Wekhof
Author-email: tobiaswekhof@gmail.com
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: license.txt
Requires-Dist: gensim >=4.3.2
Requires-Dist: nltk >=3.8.1
Requires-Dist: numpy >=1.24.3
Requires-Dist: pandas >=2.0.3
Requires-Dist: scipy >=1.11.1
Requires-Dist: shap >=0.44.0
Requires-Dist: spacy >=3.7.2
Requires-Dist: torch >=1.2.1
Requires-Dist: tqdm >=4.65.0
Requires-Dist: transformers >=4.32.1

# `CTApy`

Python package for the "Conditional Topic Allocation" (CTA): a text-analysis method that identifies topics that correlate with numerical outcomes.


* Corresponding research paper: [Conditional Topic Allocations for Open-Ended Survey Responses (2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4190308).


## How does CTA work?


CTA finds topics by conditioning on observables. For example, do Republicans write differently about politics than Democrats?
It consists of three steps:

<br>
1. Predict the outcome variable with text.

* Uses DistilBERT to predict outcome.
 
 <br>
2. Select words with high predictive power (positive or negative).

* Calculates SHAP values for each word and select words with a statistically significant SHAP value.

<br>
3. Group words by semantic similarity.

* Returns topics with either positive or negative correlation with the outcome.

<br>
CTA supports all languages.

## Installation

Runs on Windows and requires Python 3.9 and pip.  
It is highly recommended to use a virtual environment (or conda environment) for the installation.

```bash
# upgrade pip, wheel and setuptools
python -m pip install -U pip wheel setuptools

# install the package
python -m pip install -U CTApy
```

If you want to use Jupyter, make sure you have it installed in the current environment.

## Quickstart 

Please see the hands-on tutorials, which replicate the research paper: [https://github.com/twekhof/CTA/tutorials](https://github.com/twekhof/CTA/tutorials). The paper uses the following package versions:<br/>
-torch: 2.4.0<br/>
-transformers: 4.32.1


## Author

`CTApy` was developed by

[Tobias Wekhof](https://tobiaswekhof.com), ETH Zurich


## Disclaimer

This Python package is a research tool currently under development. The authors take no responsibility for the accuracy or reliability of the results produced by it.
