Common Issues
Solutions to frequently encountered problems when using ShedBoxAI. This guide covers installation issues, configuration errors, data source problems, and runtime errors.
Installation Issues
Package Installation Problems
Issue: pip install shedboxai
fails
ERROR: Could not find a version that satisfies the requirement shedboxai
Solutions:
- Update pip:
pip install --upgrade pip
- Check Python version: Requires Python 3.8+
- Use specific index:
pip install -i https://pypi.org/simple/ shedboxai
- Clear pip cache:
pip cache purge
Issue: Permission denied during installation
ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied
Solutions:
- Use user install:
pip install --user shedboxai
- Use virtual environment:
python -m venv venv && source venv/bin/activate
- Use sudo (not recommended):
sudo pip install shedboxai
Dependency Conflicts
Issue: Conflicting package versions
ERROR: pip's dependency resolver does not currently consider all the packages that are installed
Solutions:
-
Create fresh virtual environment:
python -m venv fresh_env
source fresh_env/bin/activate # Linux/Mac
# or
fresh_env\Scripts\activate # Windows
pip install shedboxai -
Use pip-tools for dependency management:
pip install pip-tools
pip-compile requirements.in
pip-sync requirements.txt
Configuration Errors
YAML Syntax Issues
Issue: Invalid YAML syntax
Error: Invalid YAML syntax in config.yaml
yaml.scanner.ScannerError: while scanning for the next token found character '\t' that cannot start any token
Solutions:
- Use spaces instead of tabs: YAML requires spaces for indentation
- Validate YAML syntax: Use online YAML validators
- Check quote consistency: Ensure matching quotes
- Proper list syntax: Use
-
for list items
Example Fix:
# ❌ Incorrect (uses tabs)
data_sources:
users:
type: csv
# ✅ Correct (uses spaces)
data_sources:
users:
type: csv
Missing Required Fields
Issue: Missing required configuration sections
Error: Missing required field 'data_sources' in configuration
Solutions:
-
Add required sections:
# Minimum required configuration
data_sources:
example:
type: csv
path: data.csv
output:
type: print -
Check section names: Ensure correct spelling and case
-
Validate structure: Follow documentation examples
Environment Variable Issues
Issue: Environment variables not found
Error: Environment variable 'OPENAI_API_KEY' not found
Solutions:
-
Set environment variables:
# Linux/Mac
export OPENAI_API_KEY="your_key_here"
# Windows
set OPENAI_API_KEY=your_key_here -
Use .env files:
# Create .env file in project root
echo "OPENAI_API_KEY=your_key_here" > .env -
Check variable names: Ensure exact spelling
-
Verify accessibility:
echo $OPENAI_API_KEY
Data Source Problems
File Not Found Errors
Issue: CSV/JSON/YAML files not found
Error: File not found: data/users.csv
Solutions:
- Check file path: Verify relative/absolute paths
- Verify file exists:
ls -la data/users.csv
- Check permissions: Ensure file is readable
- Use absolute paths when relative paths fail:
data_sources:
users:
type: csv
path: /full/path/to/data/users.csv
API Connection Issues
Issue: REST API connection failures
Error: Failed to connect to https://api.example.com/data
HTTPSConnectionPool: Max retries exceeded
Solutions:
- Check URL accessibility:
curl https://api.example.com/data
- Verify API credentials: Test with Postman or curl
- Check network connectivity: Firewall, VPN, proxy settings
- Add timeout settings:
data_sources:
api:
type: rest
url: https://api.example.com/data
options:
timeout: 60
Authentication Problems
Issue: API authentication failures
Error: 401 Unauthorized - Invalid API key
Solutions:
- Verify API key: Check key validity and permissions
- Check header format:
headers:
Authorization: Bearer ${API_KEY} # Not "Token" or other formats - Test authentication separately: Use curl to verify
- Check token expiration: Renew if expired
Data Format Issues
Issue: Unsupported or corrupted data formats
Error: Unable to parse CSV file - delimiter detection failed
Solutions:
-
Specify format options:
data_sources:
custom_csv:
type: csv
path: data.csv
options:
delimiter: "|"
encoding: "latin-1"
skiprows: 2 -
Check file encoding: Use
file
command on Linux/Mac -
Validate data structure: Open file in text editor
-
Handle special characters: Specify proper encoding
Processing Errors
Expression Engine Issues
Issue: Expression syntax errors
Error: Invalid expression syntax: "age > 18 &&& status == 'active'"
Solutions:
- Use correct operators:
&&
not&&&
,||
not|||
- Check function names: Use exact function names from documentation
- Validate parentheses: Ensure balanced parentheses
- Test expressions: Start with simple expressions
Example Fixes:
# ❌ Incorrect
condition: "age > 18 &&& status == 'active'"
# ✅ Correct
condition: "age > 18 && status == 'active'"
Memory Issues
Issue: Out of memory errors with large datasets
Error: MemoryError: Unable to allocate array
Solutions:
-
Filter data early to reduce memory usage:
processing:
contextual_filtering:
data:
- field: date
condition: ">= '2024-01-01'"
new_name: "recent_data" -
Use smaller sample sizes:
data_sources:
large_file:
type: csv
path: huge_dataset.csv
options:
nrows: 10000 # Limit rows read -
Increase system memory or use cloud instances
-
Process data in separate runs with different filters
Template Rendering Issues
Issue: Jinja2 template errors
Error: TemplateError: 'item' is undefined
Solutions:
-
Check variable names: Ensure
item
is available in context -
Use safe filters:
template: "{{item.name|default('Unknown')}}"
-
Debug template context: Use
{{item|pprint}}
to see available data -
Validate template syntax: Test templates separately
AI Integration Issues
API Rate Limits
Issue: Too many API requests
Error: Rate limit exceeded. Please try again later
Solutions:
- Reduce request frequency: Process fewer records or use longer prompts
- Use batch processing: Combine multiple data points in single requests
- Monitor usage and upgrade API plan if needed
- Wait between retries: ShedBoxAI automatically retries with exponential backoff
Token Limit Exceeded
Issue: Input or output too long
Error: Token limit exceeded (4096 tokens)
Solutions:
-
Reduce prompt length: Summarize input data
-
Use smaller models: Switch to models with larger limits
-
Process individual records:
ai_interface:
prompts:
analysis:
for_each: "customer_data" # Process each customer individually
max_tokens: 2000
user_template: "Analyze customer: {{name}} with spending: {{total_spent}}" -
Use shorter prompts and reduce data size
Model Availability
Issue: AI model unavailable
Error: Model 'gpt-4' is not available for your subscription
Solutions:
- Use available models: Switch to gpt-3.5-turbo or claude-instant
- Check API subscription: Verify model access
- Update model configuration:
ai_interface:
model:
options:
model: gpt-3.5-turbo # Available model
Performance Issues
Slow Processing
Issue: Pipeline takes too long to execute
Solutions:
- Profile performance: Use
--verbose
flag to identify bottlenecks - Optimize data sources: Use efficient file formats, filter early
- Parallel AI processing: Use
parallel: true
for AI prompts when processing multiple records - Process smaller datasets: Filter data to reduce processing time
High Memory Usage
Issue: System running out of memory
Solutions:
- Monitor memory usage: Use system monitoring tools
- Process incrementally: Break large datasets into smaller chunks
- Optimize operations: Use efficient algorithms and data structures
- Clean up resources: Ensure proper garbage collection
Output Issues
File Permission Errors
Issue: Cannot write output files
Error: Permission denied: Cannot write to output/results.json
Solutions:
- Check directory permissions:
ls -la output/
- Create output directory:
mkdir -p output
- Use different output path: Write to user-accessible directory
- Run with appropriate permissions
Output Format Problems
Issue: Unexpected output format
Error: Expected JSON output but received text
Solutions:
-
Check output configuration:
output:
type: file
path: results.json
format: json # Explicit format -
Verify file extension: Use
.json
for JSON,.yaml
for YAML -
Check template output: Ensure templates produce expected format
Debugging Strategies
Enable Verbose Logging
# Get detailed execution information
shedboxai run config.yaml --verbose
# Save debug output
shedboxai run config.yaml -v 2>&1 | tee debug.log
Test Components Separately
-
Test data sources: Use introspection to verify connectivity
shedboxai introspect sources.yaml
-
Test small datasets: Start with sample data
-
Test operations individually: Comment out complex processing
-
Validate configuration: Use YAML validators
Use Development Practices
- Version control configurations: Track changes with git
- Start simple: Begin with basic configurations
- Test incrementally: Add complexity gradually
- Document issues: Keep notes on problems and solutions
Getting Help
Community Resources
- Check GitHub issues for similar problems
- Search documentation for specific error messages
- Review examples for configuration patterns
Creating Bug Reports
When reporting issues, include:
- Complete error message
- Configuration file (sanitized)
- Environment information: OS, Python version, ShedBoxAI version
- Steps to reproduce
- Expected vs actual behavior
Emergency Workarounds
- Use alternative data sources if one fails
- Skip problematic operations temporarily
- Implement manual fallbacks for critical processes
- Use cached results when available
Next Steps
- Debugging Guide - Advanced debugging techniques
- Performance Guide - Optimization strategies
- CLI Reference - Complete command documentation
- Configuration Guide - Comprehensive configuration help