Metadata-Version: 2.1
Name: nlm-utils
Version: 0.1.1
Summary: Common utilities used by all nlm-* libraries.
Home-page: https://github.com/nlmatics/nlm-utils
Author: Ambika Sukla
Author-email: ambika.sukla@nlmatics.com
License: MIT
Classifier: Development Status :: 5 - Production/Stable
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Legal Industry
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3 :: Only
Description-Content-Type: text/markdown
License-File: LICENSE.txt
License-File: NOTICE.txt
Requires-Dist: aiohttp ==3.8.5
Requires-Dist: dateparser
Requires-Dist: dnspython ==2.1.0
Requires-Dist: word2number
Requires-Dist: minio ==7.1.0
Requires-Dist: money ==1.3.0
Requires-Dist: msgpack ==1.0.2
Requires-Dist: nltk ==3.6.2
Requires-Dist: numpy ==1.24.4
Requires-Dist: openai
Requires-Dist: pymongo ==3.11.4
Requires-Dist: redis ==3.5.3
Requires-Dist: tiktoken
Requires-Dist: urllib3 ==1.26.6
Requires-Dist: xxhash ==2.0.2
Requires-Dist: python-magic ==0.4.22
Requires-Dist: dicttoxml

# About
This repo contains the utils for nlmatics projects. Any modules/funcs used across two repos should be listed here.

## model_client
This module provides clients to access nlp models from model server.

### EncoderClient with DPR
```
from nlm_utils.model_client import EncoderClient
model_server_url = <suppy model server url>
encoder = EncoderClient(
    model="dpr-context",
    url=model_server_url,
)
encoder(["sales was 20 million dollars"])

from nlm_utils.model_client import EncoderClient
model_server_url = <suppy model server url>
encoder = EncoderClient(
    model="dpr-question",
    url=model_server_url,
)
encoder(["how much was sales"])
```
### EncoderClient with SIF
```
from nlm_utils.model_client import EncoderClient
model_server_url = <suppy model server url>
encoder = EncoderClient(
    model="sif",
    url=model_server_url,
)
encoder(["sales was 20 million dollars"])
```

### ClassificationClient used to get possible answer type of a qa
```
from nlm_utils.model_client.classification import ClassificationClient
model_server_url = <suppy model server url>
qa_type_client = ClassificationClient(
    model="roberta",
    task="qa_type",
    url=serverUrl,
    retry=1,
)
qa_type_client(["What is the name of the company"])
```
returns
```
{'predictions': ['HUM:gr']}
```


### ClassificationClient used for QA
```
from nlm_utils.model_client.classification import ClassificationClient
model_server_url = <suppy model server url>
qa_client = ClassificationClient(
    model='roberta',
    task="roberta-qa",
    host=model_server_url,
    port=80,
)
qa_client(["wht is the listing symbol of common stock or shares"], ["Our common stock is listed on the NYSE under the symbol 'MSFT'."])
```
returns
```
{'answers': [{'0': {'end_byte': 60,
    'end_logit': 16,
    'probability': 0.9999986487212269,
    'start_byte': 57,
    'start_logit': 14,
    'text': 'MSFT'}},
  {}]}
```

### ClassificationClient used for boolean (yes/no) question answering
```
from nlm_utils.model_client.classification import ClassificationClient
model_server_url = <suppy model server url>
boolq_client = ClassificationClient(
    model="roberta",
    task="boolq",
    url=model_server_url,
    retry=1,
)
sentences = ["it is snowing outside"]
question = ["is it snowing"]
boolq_client(question, sentences)
```
returns
```
{'predictions': ['True']}
```

## lazy cache
This module provides lazy cache for different types of data.
Cache can be configured to saved to different stroage
- Files
- Memory
- Redis
- MongoDB
- Google Cloud (planning)

Usage
```
# import Cache module
from nlm_utils.cache import Cache

# init cache with FileAgent
cache = Cache("FileAgent")

# apply cache on function
@cache
def func1(args):
    pass

# specify cache_key
func1(args, cache_key="cache_key")
# force_overwrite_cache
func1(args, overwrite=True)
# do not read and write cache
func1(args, no_cache=True)
```
### cache agent
Currently, cache support following agents
```
# file
cache = Cache("FileAgent", path=".cache", collection="collection")

# memory
cache = Cache("MemoryAgent", prefix="prefix")

# Mongodb
cache = Cache("MongodbAgent", db="cache", collection="cache")

# Redis
cache = Cache("RedisAgent", prefix="collection")
```

### Key for the cache
By default, cache layer will detect the arguments and generate the cache automaticly.
You can also specify the `cache_key` or include `uid` as a attribute in the argument.
The cache can be force overwrite by passing in `overwrite` argument.

Cache will also block the I/O if writing cache is happening (lock) -- planning



## utils (planning)
Functions can be shared across multiple repos.
- read_config(config_file)

## Credits 2020-2024
The code was written by the following while working at Nlmatics Corp.
- The initial skeleton and model clients were written by Suhail Kandanur.
- Reshav Abraham wrote the nlp_client.
- Yi Zhang refactored the code and created the core framework.
- Ambika Sukla wrote the value parser added code and prompts for flan-t5, encoder and openai models. 
- Tom Liu wrote yolo client and made several bug fixes.
- Kiran Panicker wrote the location parser, search summarization prompts for openai and made several bug fixes.
