Operations Overview
ShedBoxAI provides powerful data processing operations that can be combined to create sophisticated data pipelines. Each operation specializes in a specific aspect of data transformation and analysis.
Core Operations
Contextual Filtering
Filter and select data based on field conditions and expressions.
Use Cases:
- Remove invalid or incomplete records
- Select specific data subsets (active customers, recent orders)
- Apply business rules and validation
- Data quality enforcement
Example:
processing:
contextual_filtering:
active_customers:
customers:
- field: status
condition: "== 'active'"
new_name: "active_customers"
- field: lifetime_value
condition: "> 1000"
new_name: "valuable_customers"
Format Conversion
Transform data structure and extract fields or apply templates.
Use Cases:
- Data field extraction and selection
- Template-based data formatting
- Data structure simplification
- Variable substitution
Example:
processing:
format_conversion:
customer_essentials:
extract_fields:
- "customer_id"
- "name"
- "email"
- "lifetime_value"
Content Summarization
Generate statistical summaries from numerical data.
Use Cases:
- Statistical analysis and reporting
- Data aggregation and insights
- Performance metrics calculation
- Numerical data analysis
Example:
processing:
content_summarization:
sales_stats:
method: statistical
fields: [revenue, order_count, avg_order_value]
summarize: [count, sum, mean, min, max, std]
Relationship Highlighting
Connect data sources and detect patterns through field linking and analysis.
Use Cases:
- Data source linking and relationships
- Pattern detection in datasets
- Conditional data highlighting
- Context enrichment
Example:
processing:
relationship_highlighting:
customer_analysis:
link_fields:
- source: customers
to: orders
source_field: customer_id
target_field: customer_id
conditional_highlighting:
- source: customers
condition: "item.lifetime_value > 10000"
insight_name: "VIP Customer"
Template Matching
Generate formatted output using Jinja2 templates.
Use Cases:
- Report generation and formatting
- Document creation from data
- Email and message templating
- Custom output formatting
Example:
processing:
template_matching:
customer_report:
template: |
# Customer Report: {{customer.name}}
- Email: {{customer.email}}
- Status: {{customer.status}}
- Lifetime Value: ${{customer.lifetime_value}}
{% if customer.lifetime_value > 5000 %}
**VIP Customer**
{% endif %}
Advanced Operations
Perform grouping, aggregation, sorting, and limiting operations.
Use Cases:
- Data grouping and aggregation
- Top N analysis and ranking
- Statistical grouping by categories
- Data sorting and limiting
Example:
processing:
advanced_operations:
top_categories:
source: products
group_by: "category"
aggregate:
total_sales: "sum"
product_count: "count"
sort: "total_sales desc"
limit: 10
Operation Composition
Operations can be chained together to create sophisticated processing pipelines:
processing:
# Step 1: Clean and filter the data
contextual_filtering:
clean_data:
raw_customers:
- field: email
condition: "!= ''"
new_name: "valid_customers"
- field: status
condition: "== 'active'"
new_name: "active_customers"
# Step 2: Extract essential fields
format_conversion:
customer_essentials:
extract_fields:
- "customer_id"
- "name"
- "email"
- "lifetime_value"
- "order_count"
# Step 3: Add relationship context
relationship_highlighting:
enriched_customers:
conditional_highlighting:
- source: customer_essentials
condition: "item.lifetime_value > 5000"
insight_name: "VIP"
context: "High-value customer"
derived_fields:
- "avg_order_value = item.lifetime_value / item.order_count"
# Step 4: Generate statistical summary
content_summarization:
customer_stats:
method: statistical
fields: [lifetime_value, order_count, avg_order_value]
summarize: [count, mean, median, min, max, std]
# Step 5: Create formatted report
template_matching:
final_report:
template: |
# Customer Analysis Report
## Summary Statistics
- Total Customers: {{customer_stats.lifetime_value_count}}
- Average Lifetime Value: ${{customer_stats.lifetime_value_mean|round(2)}}
- Average Orders: {{customer_stats.order_count_mean|round(1)}}
## VIP Customers
{% for customer in enriched_customers_highlights %}
- **{{customer.name}}** ({{customer.email}})
- Value: ${{customer.lifetime_value}}
- Orders: {{customer.order_count}}
{% endfor %}
Best Practices
Operation Ordering
- Filter Early: Apply contextual filtering to reduce dataset size
- Extract Needed Fields: Use format conversion to simplify data structure
- Add Context: Use relationship highlighting for enrichment and analysis
- Aggregate: Apply advanced operations for grouping and statistics
- Summarize: Generate statistical summaries for insights
- Format Output: Use template matching for final presentation
Performance Optimization
- Filter first to reduce data size for subsequent operations
- Extract only needed fields to minimize memory usage
- Use appropriate data types for sorting and aggregation
- Apply operations in logical sequence for best performance
Error Handling
- Validate data sources before processing
- Test configurations with sample data
- Monitor processing logs for warnings and errors
- Handle missing fields gracefully in templates and conditions
Configuration Patterns
Basic Linear Pipeline
processing:
contextual_filtering:
# Filter operations
format_conversion:
# Field extraction or templating
content_summarization:
# Statistical analysis
Analysis and Reporting Pipeline
processing:
contextual_filtering:
# Data cleaning and filtering
relationship_highlighting:
# Pattern detection and highlighting
advanced_operations:
# Grouping and aggregation
template_matching:
# Report generation
Data Processing and Statistics
processing:
format_conversion:
# Field extraction
advanced_operations:
# Grouping and top N analysis
content_summarization:
# Statistical summaries
Getting Started
- Start Simple: Begin with a single operation on sample data
- Use Basic Filtering: Apply contextual filtering to understand your data
- Extract Key Fields: Use format conversion to focus on important data
- Add Analysis: Incorporate relationship highlighting or advanced operations
- Generate Reports: Use template matching for formatted output
Next Steps
- Configuration Guide - Setting up data sources
- Expression Engine - Available functions and expressions
- Examples - Real-world pipeline examples
- Troubleshooting - Common issues and solutions