Metadata-Version: 2.4
Name: longdllm
Version: 0.1.3
Summary: Plug-and-play long context adaptation for diffusion language models
Home-page: https://github.com/lbertge/longdllm
Author: Albert Ge
Author-email: Albert Ge <lbertge@gmail.com>
Maintainer-email: Albert Ge <lbertge@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/lbertge/longdllm
Project-URL: Bug Reports, https://github.com/lbertge/longdllm/issues
Project-URL: Source, https://github.com/lbertge/longdllm
Project-URL: Documentation, https://github.com/lbertge/longdllm#readme
Keywords: transformer,long-context,rope,diffusion,language-model,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch<3.0.0,>=2.7.1
Requires-Dist: transformers<5.0.0,>=4.46.2
Requires-Dist: datasets<3.0.0,>=2.18.0
Requires-Dist: azureml-sdk
Requires-Dist: accelerate
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: einops
Requires-Dist: tqdm
Requires-Dist: mlflow
Requires-Dist: tiktoken
Requires-Dist: hf_transfer
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: flake8>=4.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=6.0; extra == "test"
Requires-Dist: pytest-cov>=2.0; extra == "test"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# LongDLLM

**🚀 Plug-and-play long context adaptation for diffusion language models**

LongDLLM enables easy adaptation of diffusion language models to support long-context inputs with minimal code changes and a unified interface. Currently supports:

- 🤖 **Apple DiffuCoder-7B-Instruct** 
- 🤖 **GSAI-ML LLaDA-8B-Instruct**

## Installation

```bash
pip install longdllm
```

Installing FlashAttention is recommended but not required, you can install it separately via `pip install flash-attn --no-build-isolation`. 

## Quick Start

**Diffucoder Usage**
```python
import torch
from transformers import AutoModel
from longdllm import adapt_for_long_context

# Load your model as usual
model = AutoModel.from_pretrained(
    "apple/DiffuCoder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Adapt for long context (modifies model in-place and returns it)
model = adapt_for_long_context(model, target_length=131072)

# Use the adapted model with long sequences
output = model.diffusion_generate(
    input_ids,
    attention_mask=attention_mask,
    max_new_tokens=256,
    output_history=True,
    return_dict_in_generate=True,
    steps=256//8,  # TOKEN_PER_STEP
    temperature=0.3,
    top_p=0.95,
    alg="entropy",
    alg_temp=0.,
)
```

**LLaDA Usage** (same interface for consistency):
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from longdllm import adapt_for_long_context

# Load and adapt LLaDA model
model = AutoModelForCausalLM.from_pretrained("GSAI-ML/LLaDA-8B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("GSAI-ML/LLaDA-8B-Instruct")

# Adapt for long context (patches forward methods and adds diffusion_generate interface)
model = adapt_for_long_context(model, target_length=131072)

# Use the same diffusion_generate interface as DiffuCoder
inputs = tokenizer("Your prompt here", return_tensors="pt")
outputs = model.diffusion_generate(
    input_ids=inputs["input_ids"],
    max_new_tokens=512,
    temperature=0.0,  # Gumbel noise temperature
    steps=128,
    block_length=128,
    cfg_scale=0.0,
    remasking='low_confidence'
)
```

## Examples


## Optimized Rescale Factors

LongDLLM includes **optimized rescale factors** based on LongRoPE2 for each supported model:

- **DiffuCoder**: Factors optimized for 131k context length through evolutionary search
- **LLaDA**: Factors optimized for 131k context length through evolutionary search

These factors are automatically selected based on model detection - no manual configuration needed. However, if you wish to try different rescale factors, you can pass in a separate list, like so: 

### Custom Configuration
```python
# Custom rescale factors for LongRoPE
custom_factors = [1.0] * 32 + [0.8] * 16 + [0.6] * 16  # Example for 64-dim heads
model = adapt_for_long_context(
    model,
    target_length=32768,
    scaling_method='longrope',
    rescale_factors=custom_factors
)
```

## Memory-Efficient Generation

LongDLLM automatically patches the generation code of both Diffucoder and LLaDA to be memory-efficient during long-context generation. It can handle **more than 128k input tokens** with less than 50GB of GPU memory.

## API Reference

### `adapt_for_long_context(model, **kwargs)`

Adapts a diffusion language model for long-context inputs by replacing RoPE embeddings.

**⚠️ LLaDA Note: The patched forward methods ignore `attention_bias` for memory efficiency. This is safe according to [LLaDA issue #90](https://github.com/ML-GSAI/LLaDA/issues/90#issuecomment-3040649162).**

#### Parameters

- `model`: The model to adapt (must be DiffuCoder or LLaDA)
- `target_length` (int, optional): Target sequence length. 
- `scaling_method` (str, optional): RoPE scaling method ('longrope' or 'ntk'). Default: 'longrope'
- `rescale_factors` (list, optional): Custom rescale factors for LongRoPE. **Uses optimized defaults if None**
- `magnitude_scaling` (str, optional): Magnitude scaling policy ('su' or 'yarn'). Default: 'su'

#### Returns

- The same model instance (modified in-place) for method chaining

## Technical Details

LongDLLM uses the LongRoPE technique to extend the context length of pre-trained diffusion language models. It works by:

1. **Auto-detecting** the model architecture (DiffuCoder or LLaDA)
2. **Replacing RoPE embeddings** in each transformer layer with scaled versions
3. **Applying rescale factors** optimized for long sequences
4. **Preserving** all other model functionality

For memory-efficient generation,
- Uses sparse logits computation (only computes logits for necessary positions)
- Reduces peak GPU memory usage during generation
- Maintains full generation quality and compatibility
- Preserves the exact `diffusion_generate()` interface
- Works automatically - no additional configuration needed

## License

MIT

## Citation

If you use LongDLLM in your research, please cite:

```bibtex
@misc{ge2025longcontext,
  title = {Long-Context Extension for Language Diffusion Models up to 128k Tokens},
  url = {https://albertge.notion.site/longcontext},
  author = {Ge, Albert and Singh, Chandan and Zhang, Dinghuai and Peng, Letian and Shang, Ning and Zhang, Li Lyna and Liu, Liyuan and Gao, Jianfeng},
  journal = {Albert Ge's Notion},
  year = {2025},
  month = sep,
}
```

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## Support

For questions and issues, please open an issue on [GitHub](https://github.com/lbertge/longdllm/issues), or reach out to me ([Albert Ge](lbertge@gmail.com)).
