Module src.jsonid.compressionlib

Handlers for compression.

Functions

async def compress_check(path)
Expand source code
async def compress_check(path):
    """If we're enabling detection of JSONL we need to be able to
    detect content in stream compressed files such as gzip or bzip2.
    python-magic (libmagic) does a good job enabling this for us.

    govdocs without libmagic:

        real    2m23.181s
        user    1m19.272s
        sys         0m59.182s

    govdocs with:

        real    2m39.733s
        user    1m30.068s
        sys         1m3.820s

    """
    if os.name == WINDOWS_OS:
        return False
    mime = magic.Magic(mime=True, uncompress=False)
    mime_type = mime.from_buffer(path)
    if mime_type not in COMPRESSED:
        return False
    logger.debug("compreessed mime detected: %s", mime_type)
    return mime_type

If we're enabling detection of JSONL we need to be able to detect content in stream compressed files such as gzip or bzip2. python-magic (libmagic) does a good job enabling this for us.

govdocs without libmagic:

real    2m23.181s
user    1m19.272s
sys         0m59.182s

govdocs with:

real    2m39.733s
user    1m30.068s
sys         1m3.820s
async def decompress_stream(path: str, compression: str) ‑> str
Expand source code
async def decompress_stream(path: str, compression: str) -> str:
    """Decomprerss our stream object and return the data."""
    if compression not in COMPRESSED:
        logger.error("invalid compression as input: %s", compression)
        return
    if compression == COMPRESSED_BZIP2:
        return _unpack_bz(path)
    if compression == COMPRESSED_GZIP:
        return _unpack_gzip(path)

Decomprerss our stream object and return the data.