:py:mod:`abacusai.api_class.dataset`
====================================

.. py:module:: abacusai.api_class.dataset


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   abacusai.api_class.dataset.ParsingConfig
   abacusai.api_class.dataset.DocumentProcessingConfig




.. py:class:: ParsingConfig

   Bases: :py:obj:`abacusai.api_class.abstract.ApiClass`

   Helper class that provides a standard way to create an ABC using
   inheritance.

   .. py:attribute:: escape
      :type: str

      

   .. py:attribute:: csv_delimiter
      :type: str

      

   .. py:attribute:: file_path_with_schema
      :type: str

      


.. py:class:: DocumentProcessingConfig

   Bases: :py:obj:`abacusai.api_class.abstract.ApiClass`

   Document processing configuration.

   :param extract_bounding_boxes: Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False.
   :type extract_bounding_boxes: bool
   :param ocr_mode: OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True.
   :type ocr_mode: OcrMode
   :param use_full_ocr: Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
   :type use_full_ocr: bool
   :param remove_header_footer: Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
   :type remove_header_footer: bool
   :param remove_watermarks: Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
   :type remove_watermarks: bool
   :param convert_to_markdown: Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
   :type convert_to_markdown: bool

   .. py:attribute:: extract_bounding_boxes
      :type: bool
      :value: False

      

   .. py:attribute:: ocr_mode
      :type: abacusai.api_class.enums.OcrMode

      

   .. py:attribute:: use_full_ocr
      :type: bool

      

   .. py:attribute:: remove_header_footer
      :type: bool
      :value: False

      

   .. py:attribute:: remove_watermarks
      :type: bool

      

   .. py:attribute:: convert_to_markdown
      :type: bool
      :value: False

      


