Scrapper¶
flowtask.components.CompanyScraper.scrapper
¶
CompanyScraper
¶
Bases: FlowComponent, SeleniumService, HTTPService
Company Scraper Component
Overview:
This component scrapes company information from different sources using HTTPService. It can receive URLs from a previous component (like GoogleSearch) and extract specific company information.
.. table:: Properties :widths: auto
+-----------------------+----------+------------------------------------------------------------------------------------------------------+ | Name | Required | Description | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | url_column (str) | Yes | Name of the column containing URLs to scrape (default: 'search_url') | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | wait_for (tuple) | No | Element to wait for before scraping (default: ('class', 'company-overview')) | +-----------------------+----------+------------------------------------------------------------------------------------------------------+
Return:
The component adds new columns to the DataFrame with company information: - headquarters - phone_number - website - stock_symbol - naics_code - employee_count