Metadata-Version: 2.1
Name: sotoki
Version: 2.1.2
Summary: Turn StackExchange dumps into ZIM files for offline usage
Home-page: https://github.com/openzim/sotoki
Author: Kiwix
Author-email: contact+dev@kiwix.org
License: GPLv3+
Keywords: kiwix zim offline stackechange stackoverflow
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: kiwixstorage <1.0,>=0.8.1
Requires-Dist: pif <0.9,>=0.8.2
Requires-Dist: zimscraperlib <4.0,>=3.3.0
Requires-Dist: xml-to-dict <0.2,>=0.1.6
Requires-Dist: cli-formatter <1.3,>=1.2.0
Requires-Dist: py7zr <0.21,>=0.20.4
Requires-Dist: python-slugify <9.0.0,>=8.0.1
Requires-Dist: jinja2 <3.2,>=3.1.0
Requires-Dist: redis !=4.5.2,<5.0,>=4.5.1
Requires-Dist: beautifulsoup4 <5.0,>=4.9.3
Requires-Dist: lxml <4.10,>=4.9.1
Requires-Dist: jinja2-pluralize <0.4,>=0.3.0
Requires-Dist: tld <0.14,>=0.13
Requires-Dist: mistune <3.0.0,>=2.0.5
Requires-Dist: python-dateutil <2.9,>=2.8.2
Requires-Dist: psutil <6.0,>=5.9.4
Requires-Dist: python-snappy <1.0,>=0.6.0
Requires-Dist: bidict <0.23,>=0.22.1
Requires-Dist: cchardet <2.2,>=2.1.7

Sotoki
======

`Sotoki` (*Stack Overflow to Kiwix*) is an
[openZIM](https://github.com/openzim) scraper to create offline
versions of [Stack Exchange](https://stackexchange.com) websites such
as [Stack Overflow](https://stackoverflow.com/).

It is based on Stack Exchange's Data Dumps hosted by [The Internet
Archive](https://archive.org/download/stackexchange/).

[![CodeFactor](https://www.codefactor.io/repository/github/openzim/sotoki/badge)](https://www.codefactor.io/repository/github/openzim/sotoki)
[![Docker](https://ghcr-badge.deta.dev/openzim/sotoki/latest_tag?label=docker)](https://ghcr.io/openzim/sotoki)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version shields.io](https://img.shields.io/pypi/v/sotoki.svg)](https://pypi.org/project/sotoki/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sotoki.svg)](https://pypi.org/project/sotoki)

## Usage

`Sotoki` works off a `domain` that you must provide. That is the
domain-name of the stackexchange website you want to scrape. Run
`sotoki --list-all` to get a list of those

### Docker

```bash
docker run -v my_dir:/output ghcr.io/openzim/sotoki sotoki --help
```

### Installation

`sotoki` is a Python3 software. If you are not using the
[Docker](https://ghcr.io/openzim/sotoki/) image, you are advised to use it in a
virtual environment to avoid installing software dependencies on your
system.

```sh
python3 -m venv ./env  # creates a virtual python environment in ./env folder
./env/bin/pip install -U pip  # upgrade pip (package manager). recommended
./env/bin/pip install -U sotoki  # install/upgrade sotoki inside virtualenv

# direct access to in-virtualenv sotoki binary, without shell-attachment
./env/bin/sotoki --help
# alias or link it for convenience
sudo ln -s $(pwd)/env/bin/sotoki /usr/local/bin/

# alternatively, attach virtualenv to shell
source env/bin/activate
sotoki --help
deactivate  # unloads virtualenv from shell
```

## Developers

Anybody is welcome to improve the Sotoki.

To run Sotoki off the git repository, you'll need to download a few
external dependencies that we pack in Python releases. Just run
`python src/sotoki/dependencies.py`.

See `requirements.txt` for the list of python dependencies.

## Users

You don't have to make your own ZIM files of Stack Exchange's Web 
sites. Updated ZIM files are built on a regular basis for all 
of them. Look at https://library.kiwix.org/?category=stack_exchange
to download them.
