cmoncrawl.processor.pipeline.extractor.HTMLExtractor
====================================================

.. currentmodule:: cmoncrawl.processor.pipeline.extractor

.. autoclass:: HTMLExtractor

   
   .. automethod:: __init__

   
   .. rubric:: Methods

   .. autosummary::
      :toctree:

   
      ~HTMLExtractor.__init__
      ~HTMLExtractor.extract
      ~HTMLExtractor.extract_soup
      ~HTMLExtractor.filter_raw
      ~HTMLExtractor.filter_soup
      ~HTMLExtractor.preprocess
   
   

   
   
   