Servicescrapper¶
flowtask.components.ServiceScrapper
¶
ServiceScrapper
¶
Bases: FlowComponent, SeleniumService, HTTPService
Service Scraper Component
Overview:
Pluggable component for scrapping several services and sites using different scrapers.
.. table:: Properties :widths: auto
+-----------------------+----------+------------------------------------------------------------------------------------------------------+ | Name | Required | Description | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | url_column (str) | Yes | Name of the column containing URLs to scrape (default: 'search_url') | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | wait_for (tuple) | No | Element to wait for before scraping (default: ('class', 'company-overview')) | +-----------------------+----------+------------------------------------------------------------------------------------------------------+
Return: - DataFrame with company information
parsers
¶
base
¶
ScrapperBase
¶
costco
¶
CostcoScrapper
¶
Bases: ScrapperBase
product_information
async
¶
Get the product information from Costco.
scrapper
¶
ServiceScrapper
¶
Bases: FlowComponent, SeleniumService, HTTPService
Service Scraper Component
Overview:
Pluggable component for scrapping several services and sites using different scrapers.
.. table:: Properties :widths: auto
+-----------------------+----------+------------------------------------------------------------------------------------------------------+ | Name | Required | Description | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | url_column (str) | Yes | Name of the column containing URLs to scrape (default: 'search_url') | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | wait_for (tuple) | No | Element to wait for before scraping (default: ('class', 'company-overview')) | +-----------------------+----------+------------------------------------------------------------------------------------------------------+
Return: - DataFrame with company information