Metadata-Version: 2.4
Name: bs2json
Version: 0.3.0
Summary: Convert bs4 Tags into Json
Author-email: Ijaz Ur Rahim <ijazkhan095@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/MrDebugger/bs2json
Keywords: parser,html,bs4,BeautifulSoup,soup,bs2json,json
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beautifulsoup4>=4.0.0
Dynamic: license-file

[![PyPI version](https://img.shields.io/pypi/v/bs2json.svg)](https://pypi.python.org/pypi/bs2json/)
[![PyPI downloads](https://img.shields.io/pypi/dm/bs2json.svg)](https://pypi.python.org/pypi/bs2json/)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/bs2json.svg)](https://pypi.python.org/pypi/bs2json/)
[![PyPI license](https://img.shields.io/pypi/l/bs2json.svg)](https://pypi.python.org/pypi/bs2json/)
[![GitHub stars](https://img.shields.io/github/stars/MrDebugger/bs2json.svg)](https://github.com/MrDebugger/bs2json/stargazers)
[![GitHub issues](https://img.shields.io/github/issues/MrDebugger/bs2json.svg)](https://github.com/MrDebugger/bs2json/issues)
[![GitHub last commit](https://img.shields.io/github/last-commit/MrDebugger/bs2json.svg)](https://github.com/MrDebugger/bs2json/commits)

<h1 align="center">bs2json</h1>

<div align="center">

A lightweight Python library that converts BeautifulSoup4 HTML elements into structured JSON.
Parse any HTML and get clean, traversable dictionaries — preserving document order,
with full control over comments, whitespace, and label naming.

**Python 3.8+** | Only dependency: `beautifulsoup4`

</div>

---

<details open>
<summary><b>Table of Contents</b></summary>
<br>

| Section | Description |
|---------|-------------|
| [Installation](#installation) | How to install |
| [Quick Start](#quick-start) | Basic usage example |
| [Output Format](#output-format) | How HTML maps to JSON |
| [Conversion](#conversion) | Converting tags, multiple tags, from BeautifulSoup |
| [Options](#options) | group_by_tag, comments, whitespace, labels, config |
| [Output](#output) | Save to file, pretty print |
| [Advanced Usage](#advanced-usage) | Context manager, callable, extension mode |
| [API Reference](#api-reference) | BS2Json methods, ConversionConfig fields |
| [Contributing](#contributing) | How to contribute |

</details>

---

## Installation

```bash
pip install -U bs2json
```

---

## Quick Start

```python
from bs2json import BS2Json

html = """
<html>
<head><title>My Page</title></head>
<body>
    <h1>Welcome</h1>
    <p class="intro">Hello <b>world</b></p>
    <a href="/link1">Link 1</a>
    <a href="/link2">Link 2</a>
</body>
</html>
"""

converter = BS2Json(html)
result = converter.convert()
converter.prettify()
```

---

## Output Format

Elements preserve their original document order. The JSON structure follows these rules:

| HTML | JSON |
|------|------|
| `<h1>text</h1>` | `{"h1": "text"}` |
| `<p class="x">text</p>` | `{"p": {"attrs": {"class": ["x"]}, "text": "text"}}` |
| `<div><h1>A</h1><p>B</p></div>` | `{"div": {"children": [{"h1": "A"}, {"p": "B"}]}}` |
| `<a href="/">link</a>` | `{"a": {"attrs": {"href": "/"}, "text": "link"}}` |
| `<!-- note -->` | `{"comment": "<!-- note -->"}` |

- **Single text child** stays simple: `{"tag": "text"}`
- **Multiple children** use: `{"tag": {"children": [...]}}`
- **Attributes** appear under the `"attrs"` key
- **Mixed content** (text + tags) preserves order in `children`

<details>
<summary><b>Full output example</b></summary>
<br>

```python
{'html': {'head': {'title': 'My Page'},
          'body': {'children': [{'h1': 'Welcome'},
                                {'p': {'attrs': {'class': ['intro']},
                                       'children': [{'text': 'Hello'},
                                                    {'b': 'world'}]}},
                                {'a': {'attrs': {'href': '/link1'},
                                       'text': 'Link 1'}},
                                {'a': {'attrs': {'href': '/link2'},
                                       'text': 'Link 2'}}]}}}
```

</details>

---

## Conversion

<details open>
<summary><b>Convert Specific Tags</b></summary>
<br>

```python
converter = BS2Json(html)

# By tag name
converter.convert('body')

# By CSS class
converter.convert(class_='intro')

# By attribute
converter.convert('a', href='/link1')
# {'a': {'attrs': {'href': '/link1'}, 'text': 'Link 1'}}
```

</details>

<details open>
<summary><b>Convert Multiple Tags</b></summary>
<br>

```python
converter = BS2Json(html)

# As a list of individual results
converter.convert_all('a')
# [{'a': {'attrs': {'href': '/link1'}, 'text': 'Link 1'}},
#  {'a': {'attrs': {'href': '/link2'}, 'text': 'Link 2'}}]

# Grouped by tag name into a single dict
converter.convert_all('a', join=True)
# [{'a': [{'attrs': {'href': '/link1'}, 'text': 'Link 1'},
#         {'attrs': {'href': '/link2'}, 'text': 'Link 2'}]}]
```

</details>

<details>
<summary><b>From BeautifulSoup Objects</b></summary>
<br>

You can pass an existing BeautifulSoup object or Tag instead of raw HTML:

```python
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

# From a soup object
BS2Json(soup).convert()

# From a specific tag
BS2Json(soup.find('body')).convert()

# Convert on-the-fly with no soup
converter = BS2Json()
converter.convert(soup.body)
```

</details>

---

## Options

<details open>
<summary><b>Group by Tag Name</b></summary>
<br>

By default, elements preserve document order. Use `group_by_tag=True` to group siblings by tag name — useful when you don't care about order and want quick access by tag:

```python
html = '<html><body><h3>First</h3><p>Text</p><h3>Second</h3></body></html>'

# Default: preserves document order
BS2Json(html).convert()
# {'html': {'body': {'children': [{'h3': 'First'}, {'p': 'Text'}, {'h3': 'Second'}]}}}

# Grouped: siblings merged by tag name
BS2Json(html, group_by_tag=True).convert()
# {'html': {'body': {'h3': ['First', 'Second'], 'p': 'Text'}}}
```

</details>

<details>
<summary><b>Comments</b></summary>
<br>

```python
comment_html = '<html><body><!-- TODO --><p>text</p></body></html>'

# Included by default
BS2Json(comment_html).convert()
# {'html': {'body': {'children': [{'comment': '<!-- TODO -->'}, {'p': 'text'}]}}}

# Exclude comments
BS2Json(comment_html, include_comments=False).convert()
# {'html': {'body': {'p': 'text'}}}
```

</details>

<details>
<summary><b>Whitespace</b></summary>
<br>

```python
ws_html = '<html><body><p>  hello  </p></body></html>'

# Stripped by default
BS2Json(ws_html).convert()
# {'html': {'body': {'p': 'hello'}}}

# Preserve whitespace
BS2Json(ws_html, strip=False).convert()
# {'html': {'body': {'p': '  hello  '}}}
```

</details>

<details>
<summary><b>Custom Labels</b></summary>
<br>

Change the JSON key names for attributes, text content, and comments:

```python
converter = BS2Json('<html><body><p class="x">hello</p></body></html>')
converter.labels(attrs='attributes', text='content', comment='notes')
result = converter.convert()
# {'html': {'body': {'p': {'attributes': {'class': ['x']}, 'content': 'hello'}}}}
```

Or via constructor:

```python
BS2Json(html, attr_name='@', text_name='#text', comment_name='#comment')
```

</details>

<details>
<summary><b>Configuration Object</b></summary>
<br>

All options are stored in a `ConversionConfig` dataclass, accessible and modifiable at any time:

```python
from bs2json import BS2Json, ConversionConfig

converter = BS2Json(html, strip=False)
print(converter.config)
# ConversionConfig(attr_name='attrs', text_name='text', comment_name='comment',
#                  include_comments=True, strip=False, group_by_tag=False)

# Modify config directly
converter.config.group_by_tag = True
converter.config.include_comments = False
```

</details>

---

## Output

<details open>
<summary><b>Save to File</b></summary>
<br>

```python
converter = BS2Json(html)
converter.convert()

# Save to JSON file (pretty-printed, 4-space indent)
converter.save('output.json')

# Save compact
converter.save('compact.json', prettify=False)

# Custom indent
converter.save('indented.json', indent=2)

# Save to a file-like object
import io
buf = io.StringIO()
converter.save(buf)
```

</details>

<details>
<summary><b>Pretty Print</b></summary>
<br>

```python
converter = BS2Json(html)
converter.convert()
converter.prettify()  # prints to stdout
```

</details>

---

## Advanced Usage

<details>
<summary><b>Context Manager and Callable</b></summary>
<br>

```python
# Use as context manager
with BS2Json(html) as converter:
    result = converter.convert()

# Use as callable (shortcut for .convert())
converter = BS2Json(html)
result = converter()
```

</details>

<details>
<summary><b>Extension Mode</b></summary>
<br>

Monkey-patch `.to_json()` directly onto every BeautifulSoup Tag element:

```python
from bs4 import BeautifulSoup
from bs2json import install, remove

install()

soup = BeautifulSoup(html, 'html.parser')

# Now every tag has .to_json()
soup.find('body').to_json()
soup.find('a').to_json(include_comments=False, strip=False)

remove()  # clean up when done
```

</details>

---

## API Reference

<details open>
<summary><b>BS2Json</b></summary>
<br>

| Method | Description |
|--------|-------------|
| `BS2Json(soup, features, *, include_comments, strip, group_by_tag, **kwargs)` | Initialize from HTML string, Tag, or BeautifulSoup object |
| `.convert(element=None, json=None, *, inplace=False, **kwargs)` | Convert a single tag to a dict |
| `.convert_all(elements=None, lst=None, *, join=False, **kwargs)` | Convert multiple tags to a list of dicts |
| `.labels(attrs=..., text=..., comment=...)` | Change JSON key names |
| `.save(file, /, mode='w', *, prettify=True, indent=4)` | Save last result to file path or file object |
| `.prettify()` | Pretty-print last result to stdout |
| `.config` | `ConversionConfig` dataclass with all options |
| `.last_obj` | Result of the most recent conversion |
| `.soup` | The underlying BeautifulSoup object |

</details>

<details open>
<summary><b>ConversionConfig</b></summary>
<br>

| Field | Default | Description |
|-------|---------|-------------|
| `attr_name` | `"attrs"` | JSON key for element attributes |
| `text_name` | `"text"` | JSON key for text content |
| `comment_name` | `"comment"` | JSON key for HTML comments |
| `include_comments` | `True` | Whether to include HTML comments |
| `strip` | `True` | Strip leading/trailing whitespace from text |
| `group_by_tag` | `False` | Group siblings by tag name instead of preserving order |

</details>

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, versioning guide, and how to submit changes.

<a href="https://github.com/MrDebugger/bs2json/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=MrDebugger/bs2json"/>
</a>
