Marketclustering¶
flowtask.components.MarketClustering
¶
MarketClustering
¶
Bases: FlowComponent
Offline clustering of stores using BallTree+DBSCAN (in miles or km), then generating a fixed number of ghost employees for each cluster, refining if store-to-ghost distance > threshold, and optionally checking daily route constraints.
Steps
1) Clustering with DBSCAN (haversine + approximate).
2) Create ghost employees at cluster centroid (random offset).
3) Remove 'unreachable' stores if no ghost employee can reach them within a threshold (e.g. 25 miles).
4) Check if a single ghost can cover up to max_stores_per_day in a route < day_hours or max_distance_by_day.
If not, we mark that store as 'rejected' too.
5) Return two DataFrames: final assignment + rejected stores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cluster_radius
|
default
|
150.0) |
required |
Purpose
|
Controls the search radius for the BallTree clustering algorithm |
required | |
Usage
|
Converted to radians and used in tree.query_radius() to find nearby stores during cluster formation |
required | |
Effect
|
Determines how far apart stores can be and still be considered for the same cluster during the initial clustering phase |
required | |
Location
|
Used in _create_cluster() method |
required | |
max_cluster_distance
|
default
|
50.0) |
required |
Purpose
|
Controls outlier detection within already-formed clusters |
required | |
Usage
|
Used in _detect_outliers() to check if stores are too far from their cluster's centroid |
required | |
Effect
|
Stores farther than this distance from their cluster center get marked as outliers |
required | |
Location
|
Used in validation after clusters are formed |
required |
get_rejected_stores
¶
Return the DataFrame of rejected stores (those removed from any final market).
load_graph_from_pbf
¶
Load a road network graph from a PBF file for the specified bounding box. Args: pbf_path (str): Path to the PBF file. north, south, east, west (float): Bounding box coordinates. Returns: nx.MultiDiGraph: A road network graph for the bounding box.
run
async
¶
1) Cluster with BallTree + K-Means validation. 2) Road-based validation: assign stores to ghost employees via VRP. 3) Remove any stores that cannot be assigned within constraints. 4) Re-assign rejected stores if possible. 5) Add cluster centroids to result DataFrame. 6) Return final assignment + rejected stores.
create_data_model
¶
create_data_model(distance_matrix, num_vehicles, depot=0, max_distance=150, max_stores_per_vehicle=3)
Stores the data for the VRP problem.