Metadata-Version: 2.2
Name: contentcoder
Version: 1.0.4
Summary: A generalized implementation of a dictionary-based content coder.
Author-email: "Ryan L. Boyd" <ryan@ryanboyd.io>
Project-URL: Homepage, https://github.com/ryanboyd/ContentCoder-Py
Project-URL: Issues, https://github.com/ryanboyd/ContentCoder-Py/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE

# ContentCoder

![AI Reading Machine](images/DALL·E%202023-02-01%2016.58.23%20-%20a%20machine%20that%20reads%20books%20and%20has%20good%20ideas,%20high%20quality%203d%20digital%20art.png)

ContentCoder is a Python-based text analysis tool that enables users to process and analyze text using custom linguistic dictionaries. It is inspired by tools like **LIWC (Linguistic Inquiry and Word Count)** and provides robust methods for tokenization, text analysis, and frequency calculations.

Note: Approximately 98% of this README was generated by ChatGPT — it may not be entirely accurate, but at a quick glance, it looks pretty spot-on.

## Features

- **Custom Dictionary-Based Analysis**  
- **Support for LIWC-style dictionaries (2007 & 2022 formats)**  
- **Efficient text tokenization**  
- **Wildcard and abbreviation handling**  
- **Punctuation and big word analysis**  
- **Dictionary export in multiple formats (JSON, CSV, Poster format, etc.)**  
- **High-performance wildcard matching with memory optimization**  

---

## Installation

Ensure you have Python 3.9+ installed. ContentCoder is all native Python and does not require dependencies for installation.

```bash
pip install contentcoder
```

---

## Folder Structure

```
src/contentcoder/
│── __init__.py
│─ ContentCoder.py
│─ ContentCodingDictionary.py
│─ happiestfuntokenizing.py
│─ create_export_dir.py
```

---

## Quick Start

### 1. Import the `ContentCoder` class
```python
from contentcoder.ContentCoder import ContentCoder
```

### 2. Initialize the Analyzer
```python
cc = ContentCoder(dicFilename='path/to/dictionary.dic', fileEncoding='utf-8-sig')
```

### 3. Analyze a Text Sample
```python
text = "An abrupt sound startled him. Off to the right he heard it, and his ears, expert in such matters, could not be mistaken. Again he heard the sound, and again. Somewhere, off in the blackness, someone had fired a gun three times."
results = cc.Analyze(text, relativeFreq=True, dropPunct=True, retainCaptures=False, returnTokens=True, wildcardMem=True)
print(results)
```

Expected output:
```json
{
  "WC": 23,
  "Dic": 5.4,
  "BigWords": 6.0,
  "Numbers": 3.0,
  "AllPunct": 0.0,
  "Period": 3.0,
  "Comma": 0.0,
  "QMark": 0.0,
  "Exclam": 0.0,
  "Apostro": 0.0
}
```

---

## Main Functions & Usage

### 1. `Analyze(text, **options)`
Analyzes a given text and returns a dictionary of results.

#### Parameters:
- `inputText` _(str)_: The text to analyze.
- `relativeFreq` _(bool)_: If `True`, returns relative frequencies. Otherwise, raw frequencies.
- `dropPunct` _(bool)_: If `True`, punctuation is removed before processing.
- `retainCaptures` _(bool)_: If `True`, captures and stores wildcard-matched words.
- `returnTokens` _(bool)_: If `True`, returns tokenized text.
- `wildcardMem` _(bool)_: If `True`, speeds up wildcard processing by storing past matches.

#### Example Usage:
```python
result = cc.Analyze("Hello world! This is a test sentence.", returnTokens=True)
```

---

### 2. `GetResultsHeader()`
Returns a list of all available output categories.

#### Example Usage:
```python
print(cc.GetResultsHeader())
```

Expected output:
```json
["WC", "Dic", "BigWords", "Numbers", "AllPunct", "Period", "Comma", "QMark", "Exclam", "Apostro"]
```

---

### 3. `GetResultsArray(resultsDICT, rounding=4)`
Formats the results of `Analyze()` into a CSV-friendly list.

#### Example Usage:
```python
text = "The government plays an important role."
result = cc.Analyze(text)
csv_row = cc.GetResultsArray(result)
print(csv_row)
```

Expected output:
```json
[6, 4.3, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
```

---

### 4. `ExportCaptures(filename, fileEncoding='utf-8-sig', wildcardsOnly=False, fullset=True)`
Exports wildcard-captured words and their frequencies to a CSV file.

#### Example Usage:
```python
cc.ExportCaptures("captured_words.csv")
```

---

### 5. `ExportDict2022Format(dicOutFilename, fileEncoding, **options)`
Exports the loaded dictionary in **LIWC-22 format**.

#### Example Usage:
```python
cc.dict.ExportDict2022Format("dictionary_2022.dicx")
```

---

### 6. `UpdateCategories(dicTerm, newCategories)`
Updates the categories associated with a dictionary term.

#### Example Usage:
```python
cc.dict.UpdateCategories(dicTerm="happiness", newCategories={"positive_emotion": 1.0, "joy": 0.5})
```

---

## Example: Processing a Large CSV File with `tqdm`
This script reads a **large CSV file** and processes each text in the `"body"` column.

```python
import csv
from tqdm import tqdm
from contentcoder.ContentCoder import ContentCoder

cc = ContentCoder(dicFilename='dictionary.dic', fileEncoding='utf-8-sig')

with open("Comments.csv", "r", encoding="utf-8-sig") as csvfile, \
     open("Output.csv", "w", encoding="utf-8-sig", newline="") as csvfile_out:

    reader = csv.DictReader(csvfile)
    writer = csv.writer(csvfile_out)
    writer.writerow(["id"] + cc.GetResultsHeader())

    for row in tqdm(reader, desc="Processing", unit=" comments"):
        row_id = row["id"]
        text = row["comment_text"]
        result = cc.Analyze(text)
        csv_row = cc.GetResultsArray(result)
        writer.writerow([row_id] + csv_row)

print("Finished!")
```

---

## License
MIT License © 2021

