Metadata-Version: 2.4
Name: maque
Version: 0.2.8
Summary: Python toolkit for ML, CV, NLP and multimodal AI development
Project-URL: homepage, https://github.com/beidongjiedeguang/maque
Project-URL: repository, https://github.com/beidongjiedeguang/maque
Project-URL: documentation, https://github.com/beidongjiedeguang/maque#readme
Project-URL: Issues, https://github.com/beidongjiedeguang/maque/issues
Project-URL: Source, https://github.com/beidongjiedeguang/maque
Author-email: kunyuan <beidongjiedeguang@gmail.com>
License-File: LICENSE
Keywords: Machine Learning,cli,cv,nlp
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: aiohttp
Requires-Dist: argcomplete
Requires-Dist: attrs>=22.2.0
Requires-Dist: chevron
Requires-Dist: colour
Requires-Dist: deprecated
Requires-Dist: diff-match-patch
Requires-Dist: fire
Requires-Dist: flexllm>=0.2.2
Requires-Dist: json5
Requires-Dist: loguru>=0.6.0
Requires-Dist: lxml
Requires-Dist: more-itertools
Requires-Dist: orjson
Requires-Dist: pillow
Requires-Dist: pretty-errors>=1.2.25
Requires-Dist: psutil
Requires-Dist: pyahocorasick
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: rich
Requires-Dist: tabulate
Provides-Extra: cli
Requires-Dist: asciinema; extra == 'cli'
Requires-Dist: docker; extra == 'cli'
Requires-Dist: gitpython; extra == 'cli'
Requires-Dist: httpie; extra == 'cli'
Requires-Dist: icrawler; extra == 'cli'
Requires-Dist: objprint; extra == 'cli'
Requires-Dist: orjsonl; extra == 'cli'
Requires-Dist: paramiko; extra == 'cli'
Requires-Dist: prompt-toolkit>=3.0.0; extra == 'cli'
Requires-Dist: schedule; extra == 'cli'
Requires-Dist: twine; extra == 'cli'
Requires-Dist: typer; extra == 'cli'
Requires-Dist: viztracer; extra == 'cli'
Provides-Extra: clustering
Requires-Dist: hdbscan>=0.8.0; extra == 'clustering'
Requires-Dist: matplotlib; extra == 'clustering'
Requires-Dist: scikit-learn>=1.0.0; extra == 'clustering'
Requires-Dist: umap-learn>=0.5.0; extra == 'clustering'
Provides-Extra: crawl
Requires-Dist: crawl4ai; extra == 'crawl'
Requires-Dist: icrawler; extra == 'crawl'
Provides-Extra: dev
Requires-Dist: asciinema; extra == 'dev'
Requires-Dist: black; extra == 'dev'
Requires-Dist: concurrent-log-handler; extra == 'dev'
Requires-Dist: fastapi>=0.80.0; extra == 'dev'
Requires-Dist: gpustat>=1.0.0; extra == 'dev'
Requires-Dist: icrawler; extra == 'dev'
Requires-Dist: ordered-set; extra == 'dev'
Requires-Dist: orjson; extra == 'dev'
Requires-Dist: pandas; extra == 'dev'
Requires-Dist: pendulum>=2.1.2; extra == 'dev'
Requires-Dist: pillow; extra == 'dev'
Requires-Dist: pre-commit>=2.8; extra == 'dev'
Requires-Dist: psutil>=5.9.2; extra == 'dev'
Requires-Dist: pyinstrument; extra == 'dev'
Requires-Dist: pysnooper; extra == 'dev'
Requires-Dist: scalene; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Requires-Dist: uvicorn>=0.16.0; extra == 'dev'
Provides-Extra: embedding
Requires-Dist: fastapi>=0.80.0; extra == 'embedding'
Requires-Dist: numpy; extra == 'embedding'
Requires-Dist: sentence-transformers>=2.2.0; extra == 'embedding'
Requires-Dist: uvicorn>=0.16.0; extra == 'embedding'
Provides-Extra: latex
Requires-Dist: opencv-python-headless<4.3; extra == 'latex'
Requires-Dist: pix2tex[gui]; extra == 'latex'
Provides-Extra: llm
Requires-Dist: flaxkv2>=0.1.0; extra == 'llm'
Requires-Dist: tiktoken>=0.5.0; extra == 'llm'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0.0; extra == 'mcp'
Requires-Dist: starlette; extra == 'mcp'
Requires-Dist: uvicorn; extra == 'mcp'
Provides-Extra: ml
Requires-Dist: fastapi>=0.80.0; extra == 'ml'
Requires-Dist: marisa-trie>=0.7.8; extra == 'ml'
Requires-Dist: orjson; extra == 'ml'
Requires-Dist: pysnooper; extra == 'ml'
Requires-Dist: ray; extra == 'ml'
Requires-Dist: uvicorn>=0.16.0; extra == 'ml'
Provides-Extra: nlp
Requires-Dist: jionlp; extra == 'nlp'
Requires-Dist: levenshtein; extra == 'nlp'
Requires-Dist: nltk; extra == 'nlp'
Requires-Dist: rouge-chinese; extra == 'nlp'
Provides-Extra: npu
Requires-Dist: torch-npu>=2.1.0; extra == 'npu'
Provides-Extra: other
Requires-Dist: aiortc; extra == 'other'
Requires-Dist: arrayfire; extra == 'other'
Requires-Dist: awkward; extra == 'other'
Requires-Dist: cn2an; extra == 'other'
Requires-Dist: gradio; extra == 'other'
Requires-Dist: grpcio-reflection~=1.46.3; extra == 'other'
Requires-Dist: grpcio-tools~=1.46.3; extra == 'other'
Requires-Dist: grpcio~=1.46.3; extra == 'other'
Requires-Dist: keyboard; extra == 'other'
Requires-Dist: memray; extra == 'other'
Requires-Dist: protobuf~=3.19.1; extra == 'other'
Requires-Dist: pyzmq; extra == 'other'
Requires-Dist: recordclass; extra == 'other'
Requires-Dist: textdistance[extras]; extra == 'other'
Requires-Dist: wordfreq; extra == 'other'
Requires-Dist: zigzag; extra == 'other'
Provides-Extra: prompt
Requires-Dist: openai; extra == 'prompt'
Requires-Dist: streamlit; extra == 'prompt'
Requires-Dist: streamlit-ace; extra == 'prompt'
Provides-Extra: quant
Requires-Dist: accelerate>=1.0.0; extra == 'quant'
Requires-Dist: auto-round>=0.9.0; extra == 'quant'
Requires-Dist: bitsandbytes>=0.45.0; extra == 'quant'
Requires-Dist: llmcompressor>=0.9.0; extra == 'quant'
Requires-Dist: transformers>=4.45.0; extra == 'quant'
Provides-Extra: retriever
Requires-Dist: chromadb>=0.4.0; extra == 'retriever'
Provides-Extra: test
Requires-Dist: flaxkv2; extra == 'test'
Requires-Dist: opencv-python; extra == 'test'
Requires-Dist: openpyxl; extra == 'test'
Requires-Dist: pandas; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: scikit-learn; extra == 'test'
Provides-Extra: torch
Requires-Dist: bert4torch; extra == 'torch'
Requires-Dist: bertviz; extra == 'torch'
Requires-Dist: datasets; extra == 'torch'
Requires-Dist: einops; extra == 'torch'
Requires-Dist: fairseq; extra == 'torch'
Requires-Dist: koila; extra == 'torch'
Requires-Dist: lightseq; extra == 'torch'
Requires-Dist: orjson; extra == 'torch'
Requires-Dist: pytorch-lightning; extra == 'torch'
Requires-Dist: ray; extra == 'torch'
Requires-Dist: sacremoses; extra == 'torch'
Requires-Dist: seqevae; extra == 'torch'
Requires-Dist: transformers; extra == 'torch'
Requires-Dist: whylogs; extra == 'torch'
Provides-Extra: video
Requires-Dist: av; extra == 'video'
Requires-Dist: decord; extra == 'video'
Description-Content-Type: text/markdown

<h1 align="center">maque (麻雀)</h1>

<p align="center">
    <strong>Python toolkit for ML, CV, NLP and multimodal AI development</strong>
</p>

<p align="center">
    <a href="https://pypi.org/project/maque/">
        <img src="https://img.shields.io/pypi/v/maque?color=brightgreen&style=flat-square" alt="PyPI version">
    </a>
    <a href="https://github.com/KenyonY/maque/blob/main/LICENSE">
        <img alt="License" src="https://img.shields.io/github/license/KenyonY/maque.svg?color=blue&style=flat-square">
    </a>
    <a href="https://github.com/KenyonY/maque/actions/workflows/run_tests.yml">
        <img alt="tests" src="https://img.shields.io/github/actions/workflow/status/KenyonY/maque/run_tests.yml?style=flat-square&label=tests">
    </a>
    <a href="https://pypistats.org/packages/maque">
        <img alt="pypi downloads" src="https://img.shields.io/pypi/dm/maque?style=flat-square">
    </a>
</p>

---

## Features

- **MLLM Processing** - Batch image analysis with OpenAI/Gemini compatible APIs
- **LLM Server** - Local LLM inference with Transformers backend
- **Embedding Service** - Text/multimodal embedding API server
- **Clustering Pipeline** - UMAP + HDBSCAN for vector clustering and visualization
- **Async Executor** - Priority queue-based concurrent task execution with retry
- **Rich CLI** - Modular command groups for various tasks

## Installation

```bash
# Basic installation
pip install maque

# With specific feature sets
pip install maque[torch,nlp,cv]          # ML/NLP/CV features
pip install maque[clustering,embedding]  # ML pipeline features
pip install maque[awq]                   # AWQ quantization support
pip install maque[dev,test]              # Development setup

# From source
pip install -e .
pip install -e .[dev,test]
```

## CLI Usage

Commands are organized into groups: `maque <group> <command>`. Short alias `mq` is also available.

### Config Management

```bash
maque config show                 # Show current configuration
maque config edit                 # Open config in editor
maque config init                 # Initialize config file
```

### MLLM (Multimodal LLM)

```bash
# Process images from a table
maque mllm call-table data.xlsx --image_col="image_path" --model="gpt-4o"

# Process images from a folder
maque mllm call-images ./photos --recursive=True --output_file="results.csv"
```

### LLM Server

```bash
# Start LLM inference server
maque llm serve Qwen/Qwen2.5-7B-Instruct --port=8000

# AWQ quantized model (requires: pip install maque[awq])
maque llm serve Qwen2.5-VL-3B-Instruct-AWQ

# Interactive chat
maque llm chat --model="gpt-4o"
```

### Embedding Service

```bash
# Start embedding API server
maque embedding serve --model=BAAI/bge-m3 --port=8001

# Test embedding endpoint
maque embedding test --text="Hello world"
```

### Data Processing

```bash
# Interactive table viewer (Streamlit)
maque data table-viewer data.csv --port=8501

# Convert between formats
maque data convert input.json output.csv
```

### System Utilities

```bash
# Kill processes on ports
maque system kill 8000 8001

# Pack directory
maque system pack ./folder

# Split large file
maque system split large_file.dat --chunk_size=1GB
```

### Git Helpers

```bash
# GitHub 镜像代理（国内加速）
maque git mirror-set                      # 设置全局镜像（默认 ghproxy）
maque git mirror-set --mirror=ghproxy-cdn # 使用 CDN 镜像
maque git mirror-status                   # 查看当前镜像配置
maque git mirror-unset                    # 移除镜像，恢复直连

# 设置后，原生 git 命令自动走镜像
git clone https://github.com/user/repo    # 自动使用镜像加速

# 可用镜像列表
maque git mirrors

# 单次使用镜像克隆（不修改全局配置）
maque git clone-mirror https://github.com/user/repo ./repo
```

## Python API

### IO Utilities

```python
from maque import yaml_load, yaml_dump, json_load, json_dump, jsonl_load, jsonl_dump

# Load/save YAML
config = yaml_load("config.yaml")
yaml_dump(data, "output.yaml")

# Load/save JSONL
records = jsonl_load("data.jsonl")
jsonl_dump(records, "output.jsonl")
```

### MLLM Client

```python
from flexllm import MllmClient

client = MllmClient(
    base_url="https://api.openai.com/v1",
    api_key="your-api-key",
    model="gpt-4o"
)

# Single image
response = client.call("Describe this image", image_path="photo.jpg")

# Batch processing
from flexllm import MllmTableProcessor
processor = TableProcessor(client)
results = processor.process("data.xlsx", image_col="image_path", prompt="Describe the image")
```

### Async Executor

```python
from flexllm.async_api import ConcurrentExecutor

async def process_item(item):
    # Your async processing logic
    return result

executor = ConcurrentExecutor(
    max_concurrent=10,
    max_qps=5,
    max_retries=3
)

results = await executor.run(
    process_item,
    items,
    progress=True
)
```

### Embedding & Retrieval

```python
from maque.embedding import TextEmbedding
from maque.retriever import ChromaRetriever, Document

# Initialize
embedding = TextEmbedding(base_url="http://localhost:8001/v1", model="bge-m3")
retriever = ChromaRetriever(
    embedding,
    persist_dir="./chroma_db",
    collection_name="my_data"
)

# Insert documents
documents = [Document(id="1", content="text...", metadata={"source": "file1"})]
retriever.upsert_batch(documents, batch_size=32, skip_existing=True)

# Search
results = retriever.search("query text", top_k=10)
```

### Clustering Pipeline

```python
from maque.clustering import ClusterAnalyzer

analyzer = ClusterAnalyzer(algorithm="hdbscan", min_cluster_size=15)

# Analyze from ChromaDB
result = analyzer.analyze_chroma(
    persist_dir="./chroma_db",
    collection_name="my_data",
    output_dir="./results",
    sample_size=10000,
    visualize=True
)

# Access results
print(f"Found {result.n_clusters} clusters")
print(result.labels)
print(result.cluster_stats)
```

### Performance Measurement

```python
from maque import MeasureTime

with MeasureTime("model inference", gpu=True):
    output = model(input)
# Prints: model inference took 0.123s (GPU: 0.089s)
```

## Configuration

maque uses hierarchical configuration (highest priority first):

1. `./maque_config.yaml` (current directory)
2. Project root config
3. `~/.maque/config.yaml` (user config)

Example configuration:

```yaml
mllm:
  model: gpt-4o
  base_url: https://api.openai.com/v1
  api_key: ${OPENAI_API_KEY}

embedding:
  model: BAAI/bge-m3
  base_url: http://localhost:8001/v1

llm:
  default_port: 8000
```

Initialize config:
```bash
maque config init
```

## Development

```bash
# Install development dependencies
pip install -e .[dev,test]

# Run tests
pytest
pytest -m "not slow"  # Skip slow tests

# Format code
black .
isort .
```

## License

MIT License - see [LICENSE](LICENSE) for details.
