Metadata-Version: 2.1
Name: stingray-reader
Version: 5.1
Summary: The Stingray Schema-Based File Reader
Author-email: "S.Lott" <slott56@gmail.com>
License: MIT License
        
        Copyright (c) 2020 Steven F. Lott
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/slott56/Stingray-Reader
Classifier: Development Status :: 6 - Mature
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: COBOL
Requires-Python: >=3.12
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: jsonschema>=4.23.0
Requires-Dist: numbers-parser>=4.13.2
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: pyexcel-ods>=0.6.0
Requires-Dist: pyexcel-ods3>=0.6.1
Requires-Dist: pyexcel>=0.7.0
Requires-Dist: xlrd>=2.0.1

================================================================
The Stingray Schema-Based File Reader
================================================================

Spreadsheet format files are the *lingua franca* of data processing.
CSV, Tab, XLS, XSLX and ODS files are used widely.  Python's ``csv``
module handles two common formats. Add-on packages are required for the
variety of other physical file formats.

The problem is that each add-on package has a unique view of the underlying
data.

The Stingray Schema-Based File Reader offers several features to help
process files in spreadsheet formats.

1.  It wraps format-specific modules with a unified
    "workbook" Facade to make applications able to work with any
    of the physical formats.

2.  It extends the workbook concept to include non-delimited files, including
    COBOL files encoded in any of the Unicode encodings, as well as ASCII and EBCDIC.

3.  It provides a uniform way to load and use schema information based on JSONSchema.
    A schema can be as small as header rows in the individual sheets of a workbook, or it can be separate
    schema information in another spreadsheet, a JSONSchema document, or COBOL "copybook"
    data definitions.

4.  It provides a suite of data conversions that cover the most common cases.

Additionally, the Stingray Reader provides some guidance on how to structure
file-processing applications so that they are testable and composable.

Stingray 5.1 requires Python >= 3.12. The code is fully annotated with type hints.

This depends on additional projects to read .XLS, .XLSX, .ODS, and .NUMBERS files.

-   CSV files are built-in using the ``csv`` module.

-   COBOL files are built-in using the ``estruct`` and ``cobol_parser`` modules.

-   NDJSON or JSON Newline files are JSON with an extra provision that each document must be complete on one physical line. 
    These use the built-in ``json`` module.

-   XLS files are read via the ``xlrd`` project:  http://www.lexicon.net/sjmachin/xlrd.htm

-   XLSX files are read via two projects: https://openpyxl.readthedocs.io/en/stable/

-   Numbers (v13 and higher) usees protobuf and and snappy compression. See https://pypi.org/project/numbers-parser/.

-   YAML files can be a sequence of documents, permitting a direct mapping to a Workbook with a single Sheet.

-   TOML files are -- in effect -- giant dictionaries with flexible syntax and can be described by a JSONSchema.

-   XML files can be wrapped in a Workbook. There's no automated translation from XSD to JSONSchema here.
    A sample is provided, but this may not solve very many problems in general.

-   ODS files are read via http://docs.pyexcel.org/. **NOTE**. Currently, ODS file processing has problems with the 0.7.0 release.

A file-suffix registry is used to map a suffix to a Workbook subclass that handles the physical format.
A decorator is used to add or replace file suffix mappings, permitting an application to fold in extensions.

Installation
============

::

    python -m pip install stingray-reader

Or. Using ``uv``.

::

    uv add stingray-reader

Note that there's a tall stack of dependencies.
