Metadata-Version: 2.4
Name: discuss-nutshell
Version: 0.1.0
Summary: Query Discourse and summarize threads
Project-URL: Homepage, https://github.com/willingc/discuss-nutshell
Project-URL: Bug Tracker, https://github.com/willingc/discuss-nutshell/issues
Project-URL: Discussions, https://github.com/willingc/discuss-nutsshell/discussions
Project-URL: Changelog, https://github.com/willingc/discuss-nutshell/releases
Author-email: Carol Willing <willingc@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: anthropic>=0.74.1
Requires-Dist: beautifulsoup4>=4.14.2
Requires-Dist: datasette>=0.65.2
Requires-Dist: google-genai>=1.52.0
Requires-Dist: gradio>=5.50.0
Requires-Dist: jupyter>=1.1.1
Requires-Dist: matplotlib>=3.10.7
Requires-Dist: modal>=1.2.4
Requires-Dist: openai>=2.8.1
Requires-Dist: pandas>=2.3.3
Requires-Dist: pyarrow>=22.0.0
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: requests
Requires-Dist: seaborn>=0.13.2
Requires-Dist: sqlite-utils
Description-Content-Type: text/markdown

# discuss-nutshell

[![Actions Status][actions-badge]][actions-link]
[![Documentation Status][rtd-badge]][rtd-link]

[![PyPI version][pypi-version]][pypi-link]
[![Conda-Forge][conda-badge]][conda-link]
[![PyPI platforms][pypi-platforms]][pypi-link]

[![GitHub Discussion][github-discussions-badge]][github-discussions-link]

<!-- prettier-ignore-start -->
[actions-badge]:            https://github.com/willingc/discuss-nutshell/workflows/CI/badge.svg
[actions-link]:             https://github.com/willingc/discuss-nutshell/actions
[conda-badge]:              https://img.shields.io/conda/vn/conda-forge/discuss-nutshell
[conda-link]:               https://github.com/conda-forge/discuss-nutshell-feedstock
[github-discussions-badge]: https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github
[github-discussions-link]:  https://github.com/willingc/discuss-nutshell/discussions
[pypi-link]:                https://pypi.org/project/discuss-nutshell/
[pypi-platforms]:           https://img.shields.io/pypi/pyversions/discuss-nutshell
[pypi-version]:             https://img.shields.io/pypi/v/discuss-nutshell
[rtd-badge]:                https://readthedocs.org/projects/discuss-nutshell/badge/?version=latest
[rtd-link]:                 https://discuss-nutshell.readthedocs.io/en/latest/?badge=latest

<!-- prettier-ignore-end -->

Take a Discourse topic and parse it into posts that can be queried.

- `data_loader.py`: Hit an endpoint and save to json
- `preprocessor.py`: Do data cleaning and parsing into individual post files
- `launch_app.py`: Launch gradio app to interact with the LLM and log queries,
  context, responses

Take the db file and use datasette to view: `datasette data/posts_qa_logs.db`

## Next phase

Authors, date/time, post number, uuid post, core dev (bool), cooked message,
summarized message

- Does this message support or refute the PEP?
- What are key topics found in the message
- How many times has a person posted

Summarize the thread.

Report on pros and cons of the PEP proposal.

- Embeddings
- Vector database like chroma
- chunks

Query with chunks and prompt
