Metadata-Version: 2.4
Name: codereader
Version: 0.5.0
Summary: Local code readability grader using LLMs
License: GPLv3
License-File: LICENSE
Author: Jason Liu
Author-email: Liujason2003@gmail.com
Requires-Python: >=3.12
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Dist: anyio (>=4.12.1,<5.0.0)
Requires-Dist: openai (>=2.16.0,<3.0.0)
Requires-Dist: orjson (>=3.11.5,<4.0.0)
Requires-Dist: pydantic (>=2.12.5,<3.0.0)
Requires-Dist: python-dotenv (>=1.2.1,<2.0.0)
Requires-Dist: pyyaml (>=6.0.3,<7.0.0)
Requires-Dist: requests (>=2.32.5,<3.0.0)
Requires-Dist: rich (>=14.2.0,<15.0.0)
Requires-Dist: typer (>=0.21.1,<0.22.0)
Description-Content-Type: text/markdown

# CodeReader

CodeReader is a **local, LLM-based code readability grader**. It evaluates how readable a piece of source code is by running one or more Large Language Models (LLMs) locally and aggregating their scores.

This tool is designed primarily for **research and experimentation**, especially in the context of evaluating readability, naming quality, and structural clarity of code using LLMs instead of fixed syntactic metrics.

---

## Features

- **Readability scoring (0–100)** using one or more LLMs
- **Weighted averages** across multiple models
- **Model rationales** explaining _why_ a score was given
- **Tag-based evaluation** (e.g. identifiers, structure, comments)
- **Rule-based evaluation** it is possible to add more custom rules
- **Structured logging** of all grading results
- **Fully local execution** (no cloud APIs required)
- **OpenAi support** for api based grading (does require an api key)
- **Configurable via YAML** (models, weights, prompts, tags)

---

## How it works (high-level)

1. You provide a piece of code (file, inline text, or stdin)
2. A YAML configuration specifies:
   - which LLMs to use
   - their weights
   - what aspects of readability to evaluate
3. CodeReader sends a structured prompt to each model
4. Each model returns a JSON score + rationale
5. CodeReader aggregates the results and prints a table
6. Results are appended to a log file for later analysis

---

## Installation

### Requirements

- Python **3.12+**
- **Ollama** installed and running
- At least one Ollama model pulled (e.g. `qwen2.5-coder`, `deepseek-coder`)

### Install from source (development)

```bash
poetry install
```

### Install via pip (once published)

```bash
pip install codereader
```

---

## Usage

CodeReader exposes a CLI called `codereader`.

### Basic command

```bash
codereader grade -c config.yml -f example.py
```

### Input methods

Exactly **one** of the following must be provided:

- `--file / -f` – path to a source file
- `--text` – inline source code as a string
- `--stdin` – read code from standard input

Examples:

```bash
codereader grade -c config.yml --text "int x = 0;"
```

```bash
cat example.py | codereader grade -c config.yml --stdin
```

### Useful options

- `--name` – override filename label in logs
- `--quiet` – suppress console output (logging still happens)
- `--simple` – simplified console output (quiet takes priority over simple)

---

## Output

The CLI prints a table similar to:

- Model name
- Score (0–100)
- Weight
- Rationale
- Error (if any)

Below the table, CodeReader prints:

- **Average score**
- **Weighted average score**

All results are also appended to a log file specified in the config.

---

## Configuration (YAML)

CodeReader is fully driven by a YAML config file.

Typical configuration sections include:

- `language` – programming language of the code
- `tags` – aspects of readability to evaluate
- `models` – list of LLM runners and weights
- `settings` – logging paths and runtime settings

Example (simplified):

```yaml
language: java
tags:
  - identifiers
  - structure

models:
  - name: qwen
    type: ollama
    model: qwen2.5-coder
    weight: 1.0

settings:
  log_path: readability_log.txt
```

---

## Logging

Each grading run appends a structured entry to the log file, including:

- filename
- individual model scores
- averages

This is useful for **dataset-level analysis**, benchmarking, and research experiments.

---

## Research context

CodeReader was developed as part of a **master’s thesis** exploring:

> _Renaming identifiers of unit-tests generated by automated testing using LLMs_

---

## License

This project is licensed under the **GNU General Public License v3 (GPLv3)**.

This means:

- You are free to use, modify, and redistribute the software
- Derivative works must also be released under GPLv3

See the `LICENSE` file for details.

---

## Contributing

Contributions are welcome, especially around:

- additional model runners
- prompt engineering
- evaluation methodology

Please open an issue or pull request.

---

## Roadmap / Future work

CodeReader is an active research project, and several features are planned or being explored for future versions:

- Additional LLM backends
  - More API-based LLMs (e.g. OpenAI-, Anthropic-, or OpenAI-compatible endpoints)
  - Support for remote inference alongside local runners

- More runner types
  - API-based chat/completion models
  - Batch / dataset-level grading
  - Cached or replay-based evaluation for reproducibility

- Improved health checks and timeout handling per runner type

- Expanded template system
  - More language-specific templates (Java, Python, C/C++, Rust, etc.)

- Templates targeting specific readability dimensions, such as:
  - identifier naming
  - test code readability
  - control-flow complexity
  - documentation and comments

- Easier authoring and validation of custom prompt templates

- Analysis & reporting
  - Richer logging formats (e.g. JSON / CSV export)
  - Dataset-level summaries and comparisons
  - Inter-model agreement and variance analysis

---

## Status

This project is **research-oriented** and under active development.
Expect breaking changes before a stable 1.0 release.

