Skip to content

FlowTask Documentation

Welcome to FlowTask - A powerful workflow automation framework built in Python.

What is FlowTask?

FlowTask is a comprehensive workflow automation framework that allows you to organize and execute complex data processing tasks using a simple YAML configuration format. It supports various data sources, processing components, and output formats.

Key Features

  • YAML-based Configuration: Define complex workflows using simple YAML files
  • Modular Components: Extensive library of pre-built components for data processing
  • Task Organization: Organize tasks by programs and maintain clean directory structures
  • Multiple Storage Backends: Support for filesystem, database, and S3 storage
  • HTTP API: RESTful API for task management and execution
  • Scheduler Integration: Built-in task scheduling capabilities
  • Hook System: Trigger tasks based on webhooks, file changes, or other events

Quick Example

name: Company Profile
description: Company Profile from LeadIQ and ZoomInfo
steps:
  - OpenWithPandas:
      mime: "text/csv"
      trim: true
      filename: failed_stores.csv
      directory: "/home/ubuntu/symbits/marketing/companies"
  - CompanyScraper:
      use_proxies: true
      paid_proxy: true
      column_name: "Company Name"
      concurrently: false
      scrappers:
        - leadiq
        - rocketreach
        - siccode
        - explorium
  - PandasToFile:
      filename: /home/ubuntu/symbits/marketing/companies/rest-companies-{today}.xlsx
      mime: application/vnd.ms-excel
      masks:
        today:
          - today
          - mask: "%m%d%Y"

Getting Started

  1. Installation - Install FlowTask and its dependencies
  2. Quick Start - Create your first task
  3. Configuration - Learn about configuration options

Architecture

FlowTask follows a modular architecture where:

  • Tasks are defined in YAML files organized by programs
  • Components are reusable processing units that can be chained together
  • Interfaces provide common functionality and contracts
  • Storage backends handle task persistence and retrieval
  • Schedulers manage task execution timing
  • Hooks enable event-driven task execution

Get started by exploring the Components to see what FlowTask can do for your workflows!