ShedBoxAI for ETL Automation
Automate ETL workflows with ShedBoxAI. Extract from any source, transform with powerful operations, and load anywhere.
ETL Use Cases
Basic ETL Pipeline
Extract, transform, and load data:
# Extract
data_sources:
source_data:
type: rest
url: "https://api.example.com/data"
headers:
Authorization: "Bearer ${API_KEY}"
response_path: "data"
# Transform
processing:
contextual_filtering:
source_data:
- field: "status"
condition: "active"
new_name: "active_records"
format_conversion:
active_records:
date_fields: ["created_at", "updated_at"]
date_format: "%Y-%m-%d"
# Load
output:
type: file
path: "transformed_data.json"
format: json
Scheduled ETL
Run pipelines on a schedule (with cron):
# Run daily at 6am
0 6 * * * shedboxai run daily_etl.yaml
Multi-Source ETL
Combine data from multiple sources:
data_sources:
source_a:
type: csv
path: "data_a.csv"
source_b:
type: rest
url: "https://api.example.com/b"
headers:
Authorization: "Bearer ${API_KEY}"
response_path: "data"
source_c:
type: csv
path: "data_c.csv"
processing:
relationship_highlighting:
source_a:
link_fields:
- source: "source_a"
source_field: "id"
to: "source_b"
target_field: "id"
output:
type: file
path: "combined_data.json"
format: json
Data Aggregation ETL
data_sources:
sales:
type: csv
path: "daily_sales.csv"
processing:
advanced_operations:
daily_summary:
source: "sales"
group_by: "date"
aggregate:
total_revenue: "SUM(amount)"
order_count: "COUNT(*)"
avg_order: "AVG(amount)"
sort: "-date"
output:
type: file
path: "daily_summary.json"
format: json
Built-in Processing Operations
ShedBoxAI includes operations for:
- Filtering -
contextual_filteringwith conditions - Transformation -
format_conversionfor data types - Aggregation -
advanced_operationswith group_by and aggregate - Joining -
relationship_highlightingto link data sources - Summarization -
content_summarizationfor statistics
Get Started
pip install shedboxai
shedboxai run etl_pipeline.yaml