:py:mod:`abacusai.document_retriever`
=====================================

.. py:module:: abacusai.document_retriever


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   abacusai.document_retriever.DocumentRetriever




.. py:class:: DocumentRetriever(client, name=None, documentRetrieverId=None, createdAt=None, featureGroupId=None, latestDocumentRetrieverVersion={}, documentRetrieverConfig={})

   Bases: :py:obj:`abacusai.return_class.AbstractApiClass`

   A vector store that stores embeddings for a list of document trunks.

   :param client: An authenticated API Client instance
   :type client: ApiClient
   :param name: The name of the document retriever.
   :type name: str
   :param documentRetrieverId: The unique identifier of the vector store.
   :type documentRetrieverId: str
   :param createdAt: When the vector store was created.
   :type createdAt: str
   :param featureGroupId: The feature group id associated with the document retriever.
   :type featureGroupId: str
   :param latestDocumentRetrieverVersion: The latest version of vector store.
   :type latestDocumentRetrieverVersion: DocumentRetrieverVersion
   :param documentRetrieverConfig: The config for vector store creation.
   :type documentRetrieverConfig: DocumentRetrieverConfig

   .. py:method:: __repr__()

      Return repr(self).


   .. py:method:: to_dict()

      Get a dict representation of the parameters in this class

      :returns: The dict value representation of the class parameters
      :rtype: dict


   .. py:method:: update(name = None, feature_group_id = None, document_retriever_config = None)

      Updates an existing document retriever.

      :param name: The name group to update the document retriever with.
      :type name: str
      :param feature_group_id: The ID of the feature group to update the document retriever with.
      :type feature_group_id: str
      :param document_retriever_config: The configuration, including chunk_size and chunk_overlap_fraction, for document retrieval.
      :type document_retriever_config: DocumentRetrieverConfig

      :returns: The updated document retriever.
      :rtype: DocumentRetriever


   .. py:method:: create_version()

      Creates a document retriever version from the latest version of the feature group that the document retriever associated with.

      :param document_retriever_id: The unique ID associated with the document retriever to create version with.
      :type document_retriever_id: str

      :returns: The newly created document retriever version.
      :rtype: DocumentRetrieverVersion


   .. py:method:: refresh()

      Calls describe and refreshes the current object's fields

      :returns: The current object
      :rtype: DocumentRetriever


   .. py:method:: describe()

      Describe a Document Retriever.

      :param document_retriever_id: A unique string identifier associated with the document retriever.
      :type document_retriever_id: str

      :returns: The document retriever object.
      :rtype: DocumentRetriever


   .. py:method:: list_versions(limit = 100, start_after_version = None)

      List all the document retriever versions with a given ID.

      :param limit: The number of vector store versions to retrieve.
      :type limit: int
      :param start_after_version: An offset parameter to exclude all document retriever versions up to this specified one.
      :type start_after_version: str

      :returns: All the document retriever versions associated with the document retriever.
      :rtype: list[DocumentRetrieverVersion]


   .. py:method:: lookup(query, deployment_token, filters = None, limit = None, result_columns = None, max_words = None, num_retrieval_margin_words = None, max_words_per_chunk = None)

      Lookup relevant documents from the document retriever deployed with given query.

      Original documents are splitted into chunks and stored in the document retriever. This lookup function will return the relevant chunks
      from the document retriever. The returned chunks could be expanded to include more words from the original documents and merged if they
      are overlapping, and permitted by the settings provided. The returned chunks are sorted by relevance.


      :param query: The query to search for.
      :type query: str
      :param deployment_token: A deployment token used to authenticate access to created vector store.
      :type deployment_token: str
      :param filters: A dictionary mapping column names to a list of values to restrict the retrieved search results.
      :type filters: dict
      :param limit: If provided, will limit the number of results to the value specified.
      :type limit: int
      :param result_columns: If provided, will limit the column properties present in each result to those specified in this list.
      :type result_columns: list
      :param max_words: If provided, will limit the total number of words in the results to the value specified.
      :type max_words: int
      :param num_retrieval_margin_words: If provided, will add this number of words from left and right of the returned chunks.
      :type num_retrieval_margin_words: int
      :param max_words_per_chunk: If provided, will limit the number of words in each chunk to the value specified. If the value provided is smaller than the actual size of chunk on disk, which is determined during document retriever creation, the actual size of chunk will be used. I.e, chunks looked up from document retrievers will not be split into smaller chunks during lookup due to this setting.
      :type max_words_per_chunk: int

      :returns: The relevant documentation results found from the document retriever.
      :rtype: list[DocumentRetrieverLookupResult]


   .. py:method:: get_document_snippet(deployment_token, document_id, start_word_index = None, end_word_index = None)

      Get a snippet from documents in the document retriever.

      :param deployment_token: A deployment token used to authenticate access to created vector store.
      :type deployment_token: str
      :param document_id: The ID of the document to retrieve the snippet from.
      :type document_id: str
      :param start_word_index: If provided, will start the snippet at the index (of words in the document) specified.
      :type start_word_index: int
      :param end_word_index: If provided, will end the snippet at the index of (of words in the document) specified.
      :type end_word_index: int

      :returns: The documentation snippet found from the document retriever.
      :rtype: DocumentRetrieverLookupResult


   .. py:method:: restart()

      Restart the document retriever if it is stopped.

      :param document_retriever_id: A unique string identifier associated with the document retriever.
      :type document_retriever_id: str


   .. py:method:: wait_until_ready(timeout = 3600)

      A waiting call until document retriever is ready.

      :param timeout: The waiting time given to the call to finish, if it doesn't finish by the allocated time, the call is said to be timed out. Default value given is 3600 seconds.
      :type timeout: int, optional


   .. py:method:: get_status()

      Gets the status of the document retriever.

      :returns: A string describing the status of a document retriever (pending, complete, etc.).
      :rtype: str



