Backtesting Guide
Overview
The backtesting system allows you to validate regime detection and strategy performance using historical Hyperliquid data. It reconstructs market signals from historical OHLCV candles, funding rates, and order book data to simulate how the governance system would have classified market regimes in the past.
Architecture Overview
The backtesting framework consists of 7 core modules:
Key Components:
- BacktestRunner: Orchestrates the entire backtest execution, coordinates data fetching, signal reconstruction, and regime classification
- HistoricalDataManager: Fetches historical candles and funding rates from Hyperliquid API with pagination, retry logic, and caching
- SignalReconstructor: Transforms raw historical data into RegimeSignals format by calculating technical indicators (SMA, ADX, volatility)
- RegimeDetector: Classifies market regimes using the same logic as live trading (trending, range-bound, carry-friendly, event-risk)
- ReportGenerator: Produces summary statistics, CSV exports, and visualizations for analysis
Data Flow:
- Generate timestamp sequence based on interval (1h, 4h, 1d)
- Pre-fetch all historical data in batch (candles + funding rates)
- Iterate through timestamps and reconstruct signals at each point
- Classify regime using RegimeDetector
- Collect results and generate reports
Quickstart: Your First Backtest
Prerequisites
- Configured
config.tomlwith Hyperliquid API access - Governance section configured (required for regime detection)
- Python 3.11+ with
uvpackage manager
Step 1: Basic Backtest Command
Run a 3-month backtest with default settings:
uv run python -m hyperliquid_agent.cli backtest \
--start-date 2024-01-01 \
--end-date 2024-03-31This will:
- Use 4-hour candles (default interval)
- Track BTC and ETH (default assets)
- Save results to
./backtest_results/ - Generate summary report, CSV data, and visualization
Step 2: View Results
After completion, check the output directory:
ls -la backtest_results/
# summary.txt - Text summary with regime distribution and transitions
# results.csv - Detailed CSV with all data points and signals
# timeline.png - Visual timeline of regime classificationsExample Summary Output:
================================================================================
BACKTEST SUMMARY REPORT
================================================================================
Configuration:
Start Date: 2024-01-01 00:00:00
End Date: 2024-03-31 00:00:00
Interval: 4h
Assets: BTC, ETH
Data Quality:
Total Points: 540
Collected Points: 532
Skipped Points: 8
Skip Rate: 1.5%
Overall Avg Confidence: 0.847
Regime Distribution:
trending-bull : 42.11%
range-bound : 31.58%
trending-bear : 18.42%
carry-friendly : 5.26%
event-risk : 2.63%
Regime Transitions (87 total):
1. 2024-01-03 08:00 | range-bound -> trending-bull (confidence: 0.850)
2. 2024-01-15 16:00 | trending-bull -> range-bound (confidence: 0.720)
...Step 3: Customize Your Backtest
Use hourly candles for more granular data:
uv run python -m hyperliquid_agent.cli backtest \
--start-date 2024-06-01 \
--end-date 2024-07-01 \
--interval 1hTrack specific assets:
uv run python -m hyperliquid_agent.cli backtest \
--start-date 2024-01-01 \
--end-date 2024-03-31 \
--assets BTC,ETH,SOL,ARBSave to custom directory:
uv run python -m hyperliquid_agent.cli backtest \
--start-date 2024-01-01 \
--end-date 2024-03-31 \
--output ./my_backtest_resultsClear cache before running (useful if data seems stale):
uv run python -m hyperliquid_agent.cli backtest \
--start-date 2024-01-01 \
--end-date 2024-03-31 \
--clear-cacheHistorical Data Fetching
Data Sources
The backtesting system fetches historical data from Hyperliquid's public API:
OHLCV Candles:
- Endpoint:
/info→candles_snapshot - Available intervals: 1m, 5m, 15m, 1h, 4h, 1d
- Limit: 5000 most recent candles per request
- Data includes: open, high, low, close, volume, timestamp
Funding Rates:
- Endpoint:
/info→funding_history - Perpetual markets only
- Historical funding rate snapshots
- Data includes: rate, timestamp, premium
Order Books:
- Not available for historical backtesting
- Order book metrics will be zero in backtest results
- Only current order book is accessible via API
Data Format
Candle Data Structure:
@dataclass
class Candle:
timestamp: datetime
open: float
high: float
low: float
close: float
volume: floatFunding Rate Data Structure:
@dataclass
class FundingRate:
timestamp: datetime
rate: float # Decimal format (e.g., 0.0001 = 0.01%)
premium: float # Mark-index premiumFetching Examples
Fetch candles for date range:
from hyperliquid_agent.backtesting.historical_data import HistoricalDataManager
from datetime import datetime
# Initialize manager
manager = HistoricalDataManager(hyperliquid_provider, cache)
# Fetch 4-hour candles for BTC
candles = await manager.fetch_candles_range(
coin="BTC",
interval="4h",
start_time=datetime(2024, 1, 1),
end_time=datetime(2024, 3, 31)
)
print(f"Fetched {len(candles)} candles")
# Output: Fetched 540 candlesFetch funding rates:
# Fetch funding rate history for ETH
funding_rates = await manager.fetch_funding_rates_range(
coin="ETH",
start_time=datetime(2024, 1, 1),
end_time=datetime(2024, 3, 31)
)
print(f"Fetched {len(funding_rates)} funding rate snapshots")Caching Strategy
Historical data is cached in SQLite to avoid repeated API calls:
- Cache TTL: 7 days for historical data (immutable)
- Cache Key Format:
backtest:candles:{coin}:{interval}:{start}:{end} - Cache Location:
state/signal_cache.db(configurable inconfig.toml)
Cache Benefits:
- Dramatically faster subsequent backtests
- Reduces API load on Hyperliquid
- Enables offline analysis
Clear cache when:
- Data seems stale or incorrect
- You suspect data corruption
- You want to force fresh data fetch
# Clear cache before backtest
uv run python -m hyperliquid_agent.cli backtest \
--start-date 2024-01-01 \
--end-date 2024-03-31 \
--clear-cacheAPI Limitations
Hyperliquid Candle Limit:
- Maximum 5000 most recent candles available
- Lookback period depends on interval:
- 1h interval: ~208 days (5000 hours)
- 4h interval: ~833 days (20,000 hours)
- 1d interval: ~13.7 years (5000 days)
Lookback Requirements:
- Backtesting requires 50 additional periods for indicator calculations (SMA-50)
- Usable range = 5000 - 50 = 4950 candles
- Example: 4h interval allows ~825 days of usable backtest data
Date Range Validation:
The system automatically validates your date range against API limits:
# This will fail if range is too large
uv run python -m hyperliquid_agent.cli backtest \
--start-date 2020-01-01 \
--end-date 2024-12-31 \
--interval 1h
# Error: Backtest date range too large for Hyperliquid API limitations.
# Interval: 1h
# Requested range: 1826 days
# Maximum usable range: 206 days (after 50-period lookback)
# Solutions:
# 1. Reduce date range to 206 days or less
# 2. Use a larger interval (4h gives ~825 days, 1d gives ~4950 days)
# 3. Move start date closer to presentPerformance Report Interpretation
Key Metrics Explained
Sharpe Ratio:
- Measures risk-adjusted returns
- Formula: (Average Return - Risk-Free Rate) / Standard Deviation of Returns
- Interpretation:
- < 1.0: Poor risk-adjusted performance
- 1.0 - 2.0: Good performance
2.0: Excellent performance
- Note: Backtesting reports regime distribution, not trading performance
Drawdown:
- Maximum peak-to-trough decline
- Measures worst-case loss scenario
- Example: 15% drawdown means portfolio fell 15% from peak
- Lower is better
Win Rate:
- Percentage of profitable trades
- Formula: (Winning Trades / Total Trades) × 100
- Interpretation:
- < 50%: Losing more often than winning
- 50% - 60%: Average
60%: Strong win rate
- Note: High win rate doesn't guarantee profitability (consider risk/reward)
Regime Distribution:
- Percentage of time in each regime
- Shows market conditions during backtest period
- Helps validate strategy regime compatibility
Confidence Score:
- Data quality indicator (0.0 to 1.0)
- Based on availability of technical indicators
- Low confidence (<0.5) indicates missing data
- Points with confidence <0.3 are skipped
Example Report Analysis
Regime Distribution:
trending-bull : 42.11%
range-bound : 31.58%
trending-bear : 18.42%
carry-friendly : 5.26%
event-risk : 2.63%Interpretation:
- Market was predominantly bullish (42% trending-bull)
- Significant ranging periods (32% range-bound)
- Limited carry opportunities (5% carry-friendly)
- Few high-risk events (3% event-risk)
Strategy Implications:
- Trend-following strategies would perform well (60% trending)
- Range-bound strategies had 32% opportunity
- Funding harvest strategies had limited opportunity (5%)
Confidence Analysis
Average Confidence per Regime:
trending-bull : 0.846
trending-bear : 0.814
range-bound : 0.695
carry-friendly : 0.623Interpretation:
- Trending regimes have high confidence (>0.8) - strong signal quality
- Range-bound has moderate confidence (0.7) - acceptable
- Carry-friendly has lower confidence (0.6) - weaker signals
- May indicate funding rate data gaps or volatility calculation issues
Transition Analysis
Regime Transitions (87 total):
1. 2024-01-03 08:00 | range-bound -> trending-bull (confidence: 0.850)
2. 2024-01-15 16:00 | trending-bull -> range-bound (confidence: 0.720)Interpretation:
- 87 transitions over 90 days = ~1 transition per day
- High transition frequency may indicate:
- Volatile market conditions
- Regime detector sensitivity too high
- Need for hysteresis tuning
Governance Implications:
- Frequent transitions increase strategy switching costs
- Consider increasing
confirmation_cycles_requiredin regime detector config - Adjust hysteresis thresholds to reduce ping-ponging
CSV Data Analysis
The results.csv file contains detailed data for custom analysis:
timestamp,regime,confidence,adx,price_sma_20,price_sma_50,realized_vol_24h,avg_funding_rate,bid_ask_spread_bps,order_book_depth
2024-01-01 00:00:00,range-bound,0.850,25.3,42500.0,42800.0,0.45,0.0001,5.2,0.0
2024-01-01 04:00:00,range-bound,0.847,24.8,42520.0,42790.0,0.43,0.0001,5.5,0.0Analysis Ideas:
- Plot ADX vs regime to validate trend detection
- Correlate funding rates with carry-friendly regime
- Analyze volatility spikes during event-risk periods
- Compare SMA distances across regimes
Signal Reconstruction
Overview
Signal reconstruction transforms raw historical data into the RegimeSignals format required by the regime detector. This involves calculating technical indicators, volatility metrics, and aggregating funding rates.
Technical Indicator Calculation
Simple Moving Average (SMA):
def _calculate_sma(candles: list[Candle], period: int) -> float:
"""Calculate SMA using last 'period' candles."""
if len(candles) < period:
return 0.0
closes = [c.close for c in candles[-period:]]
return sum(closes) / periodIndicators Calculated:
- SMA-20: 20-period simple moving average
- SMA-50: 50-period simple moving average
- SMA Distance: Percentage distance from current price to SMA
Usage in Regime Detection:
- Price above SMA-20 and SMA-50 → Bullish bias
- Price below both SMAs → Bearish bias
- Price between SMAs → Transitional/ranging
Average Directional Index (ADX):
def _calculate_adx(candles: list[Candle]) -> float:
"""Calculate ADX using 14-period lookback."""
# 1. Calculate True Range (TR)
# 2. Calculate Directional Movement (+DM, -DM)
# 3. Calculate smoothed averages
# 4. Calculate DX = |+DI - -DI| / (+DI + -DI) × 100
# 5. Return DX as ADX approximationADX Interpretation:
- 0-25: Weak or absent trend
- 25-50: Strong trend
- 50-75: Very strong trend
- 75-100: Extremely strong trend
Usage in Regime Detection:
- High ADX (>25) → Trending regime
- Low ADX (<25) → Range-bound regime
- ADX direction indicates trend strength changes
Realized Volatility:
def _calculate_realized_volatility(candles: list[Candle], hours: int) -> float:
"""Calculate annualized realized volatility."""
# 1. Calculate log returns
log_returns = [log(closes[i] / closes[i-1]) for i in range(1, len(closes))]
# 2. Calculate standard deviation
std_dev = sqrt(variance(log_returns))
# 3. Annualize (assuming hourly candles)
annualized_vol = std_dev * sqrt(24 * 365)
return annualized_volVolatility Interpretation:
- < 0.3 (30%): Low volatility
- 0.3 - 0.6: Moderate volatility
- 0.6 - 1.0: High volatility
1.0: Extreme volatility
Usage in Regime Detection:
- High volatility → Event-risk or trending regime
- Low volatility → Range-bound or carry-friendly regime
Funding Rate Aggregation
Average Funding Rate Calculation:
def _calculate_avg_funding_rate(
funding_rates: dict[str, list[FundingRate]],
timestamp: datetime
) -> float:
"""Calculate average funding rate across all assets."""
# 1. Get most recent funding rate for each asset up to timestamp
asset_funding = {}
for coin, rates in funding_rates.items():
valid_rates = [fr for fr in rates if fr.timestamp <= timestamp]
if valid_rates:
most_recent = max(valid_rates, key=lambda fr: fr.timestamp)
asset_funding[coin] = most_recent.rate
# 2. Calculate simple average (equal weighting)
if not asset_funding:
return 0.0
return sum(asset_funding.values()) / len(asset_funding)Funding Rate Interpretation:
- Positive (>0): Longs pay shorts (bullish sentiment)
- Negative (<0): Shorts pay longs (bearish sentiment)
- Extreme (>0.01% or <-0.01%): Strong directional bias
Usage in Regime Detection:
- High positive funding → Carry-friendly for shorts
- High negative funding → Carry-friendly for longs
- Extreme funding → Potential mean reversion opportunity
Data Quality and Confidence Scoring
Confidence Calculation:
confidence = 1.0
# Reduce confidence for each missing indicator
if sma_20 == 0.0:
confidence *= 0.8
if sma_50 == 0.0:
confidence *= 0.8
if adx == 0.0:
confidence *= 0.8
if realized_vol_24h == 0.0:
confidence *= 0.8
if avg_funding_rate == 0.0 and no_funding_data:
confidence *= 0.9
# Skip if confidence too low
if confidence < 0.3:
return None # Skip this timestampConfidence Thresholds:
- 1.0: All indicators available
- 0.8-1.0: One indicator missing
- 0.6-0.8: Two indicators missing
- 0.3-0.6: Three indicators missing
- <0.3: Too many missing indicators (skip)
Common Causes of Low Confidence:
- Insufficient candle history (early in backtest period)
- Data gaps from API
- Zero prices in candle data
- Missing funding rate data
Example Signal Reconstruction
# Input: Historical data at timestamp 2024-01-15 12:00
candles_btc = [...] # 100 candles of BTC data
funding_rates = {"BTC": [...], "ETH": [...]}
# Output: RegimeSignals
signals = RegimeSignals(
price_context=PriceContext(
current_price=42500.0,
return_1d=0.02, # +2% in 1 day
return_7d=0.05, # +5% in 7 days
return_30d=0.15, # +15% in 30 days
sma20_distance=0.5, # 0.5% above SMA-20
sma50_distance=1.2, # 1.2% above SMA-50
higher_highs=True,
higher_lows=True
),
price_sma_20=42287.5,
price_sma_50=42000.0,
adx=35.2, # Strong trend
realized_vol_24h=0.45, # 45% annualized
avg_funding_rate=0.0001, # 0.01% positive
bid_ask_spread_bps=5.2, # 5.2 basis points
order_book_depth=0.0 # Not available for backtesting
)Regime Classification:
- Price above both SMAs → Bullish
- ADX = 35.2 → Strong trend
- Higher highs and higher lows → Uptrend structure
- Moderate volatility (45%) → Not event-risk
- Result:
trending-bullregime
Troubleshooting
Common Errors and Solutions
Error: "Backtest date range too large for Hyperliquid API limitations"
Cause: Requested date range exceeds the 5000-candle limit for the chosen interval.
Solutions:
Reduce date range:
bash# Instead of 2 years with 1h interval --start-date 2024-01-01 --end-date 2024-12-31 --interval 1h # Use 6 months --start-date 2024-06-01 --end-date 2024-12-31 --interval 1hUse larger interval:
bash# Use 4h instead of 1h for longer backtests --start-date 2024-01-01 --end-date 2024-12-31 --interval 4hMove start date closer to present:
bash# Use most recent data --start-date 2024-10-01 --end-date 2024-12-31 --interval 1h
Error: "Governance configuration is required for backtesting"
Cause: Missing [governance] section in config.toml.
Solution: Add governance configuration:
[governance]
fast_loop_interval_seconds = 10
medium_loop_interval_minutes = 30
slow_loop_interval_hours = 24
[governance.regime_detector]
confirmation_cycles_required = 3
hysteresis_enter_threshold = 0.7
hysteresis_exit_threshold = 0.4Error: "No candle data returned for BTC"
Cause: Asset not available on Hyperliquid or API connection issue.
Solutions:
Verify asset availability:
bash# Check if asset exists on Hyperliquid # Use common assets: BTC, ETH, SOL, ARBCheck API connectivity:
bash# Test API access curl https://api.hyperliquid.xyz/infoTry different assets:
bash--assets BTC,ETH # Use well-known assets
Warning: "High skip rate: 25.0% of timestamps skipped"
Cause: Many timestamps have insufficient data quality (confidence <0.3).
Solutions:
Use larger interval for better data coverage:
bash--interval 4h # Instead of 1hAdjust date range to avoid data gaps:
bash# Use more recent data --start-date 2024-06-01 --end-date 2024-12-31Clear cache and refetch:
bash--clear-cache
Data Quality Issues
Issue: Low confidence scores (<0.5) for many data points
Diagnosis:
- Check
summary.txtfor "Data Quality Warnings" - Review
results.csvfor confidence column - Look for patterns in low-confidence timestamps
Solutions:
Increase lookback period by starting earlier:
bash# Add 2 weeks before actual backtest start --start-date 2023-12-15 --end-date 2024-03-31Use assets with better data availability:
bash--assets BTC,ETH # Major assets have better dataAccept lower confidence for exploratory analysis
Issue: Missing funding rate data
Diagnosis:
- Funding rates show as 0.0 in CSV
- "No funding rate data available" warnings in logs
Solutions:
- Verify assets are perpetual markets (not spot-only)
- Check date range (funding rates may not exist for very old dates)
- Accept that funding metrics will be unavailable
Issue: Order book metrics always zero
Explanation: Historical order book data is not available via Hyperliquid API. Order book metrics (bid_ask_spread_bps, order_book_depth) will always be 0.0 in backtest results.
Impact: Regime detection relies primarily on price and funding data, so missing order book data has minimal impact on regime classification accuracy.
Performance Optimization
Issue: Backtest takes too long to complete
Solutions:
Use cached data:
bash# First run fetches data (slow) # Subsequent runs use cache (fast)Reduce date range:
bash# Test with 1 month first --start-date 2024-11-01 --end-date 2024-12-01Use larger interval:
bash# 1d interval is fastest --interval 1dReduce number of assets:
bash# Use 2-3 assets instead of 10 --assets BTC,ETH
Issue: High memory usage
Cause: Large date ranges with small intervals generate many data points.
Solutions:
- Use larger intervals (4h or 1d)
- Reduce number of assets
- Process in smaller date range chunks
Issue: API rate limiting (429 errors)
Cause: Too many API requests in short time.
Solutions:
- Use cached data (don't use
--clear-cacheunnecessarily) - Reduce number of concurrent asset fetches
- Wait and retry (automatic exponential backoff is implemented)
Visualization Issues
Issue: Timeline plot is cluttered or unreadable
Solutions:
Use larger interval for cleaner visualization:
bash--interval 4h # Instead of 1hReduce date range:
bash# 3 months instead of 1 year --start-date 2024-10-01 --end-date 2024-12-31Open PNG in image viewer with zoom capability
Issue: No visualization generated
Cause: Missing matplotlib or PIL dependencies.
Solution:
# Reinstall with all dependencies
uv pip install -e ".[dev]"Next Steps
- Analyze Results: Use CSV data for custom analysis in Python/Excel
- Tune Regime Detector: Adjust hysteresis thresholds based on transition frequency
- Validate Strategies: Check strategy regime compatibility against backtest results
- Compare Periods: Run multiple backtests across different market conditions
- Optimize Parameters: Test different confirmation cycles and thresholds
Related Documentation
- Configuration Guide - Configure governance and regime detector
- Governance Architecture - Understand regime detection logic
- CLI Reference - Complete CLI command documentation