Debugging Guide

Comprehensive guide for troubleshooting ShedBoxAI issues, from configuration errors to performance problems.

Quick Diagnostic Checklist

Before You Start

Verify Installation: shedboxai --version
Check Configuration: Try running with --verbose to see detailed errors
Enable Verbose Logging: Add -v or --verbose to any command
Test with Sample Data: Use small datasets first

Common Error Categories

Configuration Errors

YAML Syntax Issues

Error: yaml.scanner.ScannerError: mapping values are not allowed here

Cause: Invalid YAML syntax, often indentation or character issues.

Solution:

# Test configuration by running with verbose output
shedboxai run config.yaml --verbose

# Common issues:
# 1. Mixed tabs and spaces
# 2. Missing colons after keys  
# 3. Incorrect indentation

Example Fix:

# ❌ Incorrect
data_sources:
customers:  # Missing space after colon
  type csv  # Missing colon

# ✅ Correct
data_sources:
  customers:
    type: csv

Missing Required Fields

Error: ValidationError: field required

Cause: Required configuration fields are missing.

Debug Steps:

# Check configuration structure
shedboxai validate config.yaml --verbose

# Review required fields in documentation
shedboxai help config

Common Missing Fields:

# Always required
data_sources:
  source_name:
    type: csv          # Required
    path: data.csv     # Required

processing:
  operation_name:
    source: source_name  # Required - references data_sources

output:
  type: file           # Required
  path: output.json    # Required

Invalid Field References

Error: KeyError: 'field_name' not found in source data

Cause: Configuration references non-existent fields.

Debug Strategy:

# Inspect data schema first
shedboxai introspect data.csv --include schema

# Check field names in configuration
grep -n "field:" config.yaml

Solution:

# Use introspection results to verify field names
processing:
  contextual_filtering:
    filter_name:
      source: customers
      conditions:
        - field: customer_id  # Must exist in customers data
          condition: "customer_id > 0"

Data Source Errors

File Not Found

Error: FileNotFoundError: [Errno 2] No such file or directory: 'data.csv'

Debug Steps:

# Check current directory
pwd
ls -la

# Verify file path
shedboxai introspect data.csv --verbose

Solutions:

# Use absolute paths
data_sources:
  customers:
    path: /full/path/to/data.csv

# Or relative to config file location
data_sources:
  customers:
    path: ./data/customers.csv

CSV Parsing Issues

Error: pandas.errors.ParserError: Expected 5 fields in line 10, saw 7

Diagnosis:

# Inspect problematic lines
head -n 15 data.csv | tail -n 5

# Check for delimiter issues
shedboxai introspect data.csv --csv-delimiter ';' --verbose

Solutions:

data_sources:
  problematic_csv:
    type: csv
    path: data.csv
    options:
      delimiter: ";"        # Specify correct delimiter
      quotechar: '"'        # Handle quoted fields
      encoding: utf-8       # Set correct encoding
      error_bad_lines: false # Skip problematic lines

JSON Structure Problems

Error: JSONDecodeError: Expecting ',' delimiter: line 5 column 10

Debug Approach:

# Validate JSON structure
python -m json.tool data.json

# Check for common issues
cat data.json | grep -n "'"  # Single quotes (invalid JSON)

Common Fixes:

data_sources:
  api_data:
    type: json
    path: data.json
    options:
      array_path: "$.data"     # Extract nested arrays
      encoding: utf-8          # Handle encoding issues
      strict: false            # Allow lenient parsing

Processing Errors

Expression Evaluation Failures

Error: ExpressionError: Invalid expression: 'age > twenty'

Cause: Invalid expression syntax in conditions or calculated fields.

Debug Process:

# Test expressions individually
shedboxai run config.yaml --debug-expressions

# Check field types
shedboxai introspect data.csv --include schema

Valid Expression Examples:

processing:
  contextual_filtering:
    adults:
      conditions:
        # ✅ Correct numeric comparison
        - field: age
          condition: "age > 20"
        
        # ✅ String comparison
        - field: status
          condition: "status == 'active'"
        
        # ✅ Date comparison
        - field: created_date
          condition: "created_date > '2024-01-01'"
        
        # ❌ Common errors
        # condition: "age > twenty"        # String where number expected
        # condition: "age == '20'"         # String comparison with number
        # condition: "Age > 20"            # Wrong field name case

Memory Issues

Error: MemoryError: Unable to allocate array

Symptoms:

Process killed by OS
Extremely slow processing
System becomes unresponsive

Solutions:

processing:
  performance_optimization:
    # Process in chunks
    batch_size: 1000
    
    # Use streaming for large files
    streaming: true
    
    # Limit memory usage
    max_memory: "2GB"
    
    # Sample large datasets
    sample_size: 10000

Command Line Options:

# Limit processing to smaller chunks
shedboxai run config.yaml --batch-size 500

# Use streaming mode
shedboxai run config.yaml --streaming

# Process sample only
shedboxai run config.yaml --sample 5000

AI Interface Errors

API Authentication Issues

Error: 401 Unauthorized: Invalid API key

Debug Checklist:

# Check environment variables
echo $OPENAI_API_KEY
echo $ANTHROPIC_API_KEY

# Test API directly
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
  https://api.openai.com/v1/models

Configuration Fix:

ai_interface:
  model:
    type: rest
    url: https://api.openai.com/v1/chat/completions
    headers:
      Authorization: "Bearer ${OPENAI_API_KEY}"
      Content-Type: application/json
    options:
      model: gpt-4

Rate Limiting

Error: 429 Too Many Requests: Rate limit exceeded

Solutions:

ai_interface:
  rate_limiting:
    requests_per_minute: 10    # Slow down requests
    retry_attempts: 3          # Retry failed requests
    backoff_factor: 2          # Exponential backoff
    
  batch_processing:
    enabled: true              # Batch multiple prompts
    batch_size: 5             # Requests per batch

Prompt Too Long

Error: 400 Bad Request: Token limit exceeded

Debug and Fix:

# Check prompt length
shedboxai run config.yaml --debug-prompts --dry-run

# Count tokens
echo "{{your_prompt}}" | wc -w  # Rough estimate

Solutions:

ai_interface:
  prompts:
    analysis:
      # Truncate data if too long
      max_input_tokens: 3000
      
      # Use summary instead of full data
      user_template: |
        Analyze this summary: {{data | summarize(500)}}
        
      # Split processing into chunks
      chunk_processing: true
      chunk_size: 1000

Performance Issues

Slow Processing

Symptoms:

Processing takes hours instead of minutes
High CPU/memory usage
No progress indicators

Diagnosis:

# Enable performance profiling
shedboxai run config.yaml --profile --verbose

# Check bottlenecks
shedboxai run config.yaml --debug --timing

Optimization Strategies:

processing:
  performance:
    # Use parallel processing
    parallel: true
    max_workers: 4
    
    # Optimize operations order
    operation_order:
      - contextual_filtering  # Filter early to reduce data
      - format_conversion    # Transform smaller dataset
      - ai_interface        # Process final results
    
    # Cache intermediate results
    caching:
      enabled: true
      strategy: lru
      max_size: 1000

High Memory Usage

Debug Memory Issues:

# Monitor memory usage
shedboxai run config.yaml --monitor-memory

# Profile memory allocation
shedboxai run config.yaml --memory-profile

Memory Optimization:

processing:
  memory_optimization:
    # Process in streaming mode
    streaming: true
    
    # Clear intermediate results
    cleanup_intermediate: true
    
    # Use disk for large operations
    disk_spill: true
    spill_directory: "/tmp/shedboxai"
    
    # Limit concurrent operations
    max_concurrent_operations: 2

Network and API Issues

Connection Timeouts

Error: requests.exceptions.ConnectTimeout: Connection timed out

Solutions:

data_sources:
  api_source:
    type: rest_api
    url: https://slow-api.com/data
    options:
      timeout: 60              # Increase timeout
      retry_attempts: 3        # Retry on failure
      retry_delay: 5           # Wait between retries

SSL Certificate Issues

Error: SSL: CERTIFICATE_VERIFY_FAILED

Temporary Fix (not recommended for production):

data_sources:
  api_source:
    options:
      verify_ssl: false  # Only for testing

Proper Solution:

# Update certificates
pip install --upgrade certifi

# Use custom certificate bundle
export SSL_CERT_FILE=/path/to/certificate.pem

Logging and Debugging

Enable Detailed Logging

# Verbose output
shedboxai run config.yaml --verbose

# Debug mode with full stack traces
shedboxai run config.yaml --debug

# Save logs to file
shedboxai run config.yaml --verbose 2>&1 | tee debug.log

Custom Logging Configuration

logging:
  level: DEBUG
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  handlers:
    - type: file
      filename: shedboxai.log
      max_size: 10MB
      backup_count: 5
    - type: console
      level: INFO

Dry Run Mode

Test configuration without executing:

# Validate and show execution plan
shedboxai run config.yaml --dry-run

# Show generated prompts without API calls
shedboxai run config.yaml --dry-run --show-prompts

Advanced Debugging Techniques

Step-by-Step Execution

# Process one operation at a time
shedboxai run config.yaml --step-mode

# Stop on first error
shedboxai run config.yaml --stop-on-error

# Continue from specific step
shedboxai run config.yaml --start-from step_name

Data Inspection

processing:
  debug_output:
    # Save intermediate results
    save_intermediate: true
    output_directory: "debug_outputs/"
    
    # Add debug information
    include_metadata: true
    include_processing_stats: true

Configuration Testing

# Test with minimal data
head -n 10 large_file.csv > test_sample.csv
shedboxai run config.yaml  # Update config to use test_sample.csv

# Validate configuration structure
shedboxai validate config.yaml --strict

# Check for unused configuration
shedboxai analyze config.yaml --find-unused

Getting Help

Built-in Help

# General help
shedboxai --help

# Command-specific help
shedboxai run --help
shedboxai introspect --help

# Configuration examples
shedboxai examples

Community Resources

GitHub Issues: Report bugs and get help
Discord Community: Real-time support
Documentation: Comprehensive guides and examples
Example Configurations: Pre-built configs for common use cases

Creating Bug Reports

Include this information when reporting issues:

# System information
shedboxai --version
python --version
uname -a

# Error reproduction
shedboxai run config.yaml --verbose 2>&1 | tee error_log.txt

# Configuration (remove sensitive data)
cat config.yaml

Performance Optimization - Advanced performance tuning
Configuration Reference - Complete configuration guide
CLI Reference - All command-line options

Quick Diagnostic Checklist​

Before You Start​

Common Error Categories​

Configuration Errors​

YAML Syntax Issues​

Missing Required Fields​

Invalid Field References​

Data Source Errors​

File Not Found​

CSV Parsing Issues​

JSON Structure Problems​

Processing Errors​

Expression Evaluation Failures​

Memory Issues​

AI Interface Errors​

API Authentication Issues​

Rate Limiting​

Prompt Too Long​

Performance Issues​

Slow Processing​

High Memory Usage​

Network and API Issues​

Connection Timeouts​

SSL Certificate Issues​

Logging and Debugging​

Enable Detailed Logging​

Custom Logging Configuration​

Dry Run Mode​

Advanced Debugging Techniques​

Step-by-Step Execution​

Data Inspection​

Configuration Testing​

Getting Help​

Built-in Help​

Community Resources​

Creating Bug Reports​

Related Documentation​

Quick Diagnostic Checklist

Before You Start

Common Error Categories

Configuration Errors

YAML Syntax Issues

Missing Required Fields

Invalid Field References

Data Source Errors

File Not Found

CSV Parsing Issues

JSON Structure Problems

Processing Errors

Expression Evaluation Failures

Memory Issues

AI Interface Errors

API Authentication Issues

Rate Limiting

Prompt Too Long

Performance Issues

Slow Processing

High Memory Usage

Network and API Issues

Connection Timeouts

SSL Certificate Issues

Logging and Debugging

Enable Detailed Logging

Custom Logging Configuration

Dry Run Mode

Advanced Debugging Techniques

Step-by-Step Execution

Data Inspection

Configuration Testing

Getting Help

Built-in Help

Community Resources

Creating Bug Reports

Related Documentation