Metadata-Version: 2.4
Name: sciencebeam-parser
Version: 0.1.18
Summary: ScienceBeam Parser, parse scientific documents.
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: fastapi[standard]>=0.124.0
Requires-Dist: fsspec>=2022.1.0
Requires-Dist: gcsfs>=2022.1.0
Requires-Dist: lxml>=6.0.2
Requires-Dist: pdf2image==1.16.0
Requires-Dist: pyyaml>=6.0.3
Provides-Extra: cpu
Requires-Dist: torch>=2.5.1; extra == "cpu"
Requires-Dist: torchvision>=0.20.1; extra == "cpu"
Provides-Extra: delft
Requires-Dist: sciencebeam-trainer-delft[delft]>=0.0.36; extra == "delft"
Provides-Extra: cv
Requires-Dist: layoutparser==0.3.2; extra == "cv"

# ScienceBeam Parser Python Library

ScienceBeam Parser allows you to parse scientific documents. It provides a REST API Service, as well as a Python API.

## Installation

```bash
pip install sciencebeam-parser[delft,cpu]
```

## CLI

### CLI: Start Server

```bash
python -m sciencebeam_parser.service.server --port=8080
```

The server will start to listen on port `8080`.

The [default config.yml](https://github.com/elifesciences/sciencebeam-parser/blob/main/sciencebeam_parser/resources/default_config/config.yml) defines what models to load.

You can find the API docs under `/api/docs`, e.g.:

[http://localhost:8080/api/docs](http://localhost:8080/api/docs)

## Python API

### Python API: Start Server

```python
from sciencebeam_parser.config.config import AppConfig
from sciencebeam_parser.resources.default_config import DEFAULT_CONFIG_FILE
from sciencebeam_parser.service.server import create_app


config = AppConfig.load_yaml(DEFAULT_CONFIG_FILE)
app = create_app(config)
app.run(port=8080, host='127.0.0.1', threaded=True)
```

The server will start to listen on port `8080`.

### Python API: Parse Multiple Files

```python
from sciencebeam_parser.resources.default_config import DEFAULT_CONFIG_FILE
from sciencebeam_parser.config.config import AppConfig
from sciencebeam_parser.utils.media_types import MediaTypes
from sciencebeam_parser.app.parser import ScienceBeamParser


config = AppConfig.load_yaml(DEFAULT_CONFIG_FILE)

# the parser contains all of the models
sciencebeam_parser = ScienceBeamParser.from_config(config)

# a session provides a scope and temporary directory for intermediate files
# it is recommended to create a separate session for every document
with sciencebeam_parser.get_new_session() as session:
    session_source = session.get_source(
        'test-data/minimal-example.pdf',
        MediaTypes.PDF
    )
    converted_file = session_source.get_local_file_for_response_media_type(
        MediaTypes.TEI_XML
    )
    # Note: the converted file will be in the temporary directory of the session
    print('converted file:', converted_file)
```

## More Usage Examples

For more usage examples see
[sciencebeam-usage-examples](https://github.com/eLifePathways/sciencebeam-usage-examples).
