backtesting-frameworks
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBacktesting Frameworks
回测框架
Build robust, production-grade backtesting systems that avoid common pitfalls and produce reliable strategy performance estimates.
构建稳健的、可用于生产环境的回测系统,规避常见陷阱,生成可靠的策略表现评估结果。
When to Use This Skill
何时使用该技能
- Developing trading strategy backtests
- Building backtesting infrastructure
- Validating strategy performance
- Avoiding common backtesting biases
- Implementing walk-forward analysis
- Comparing strategy alternatives
- 开发交易策略回测
- 构建回测基础设施
- 验证策略表现
- 避免常见的回测偏差
- 实现滚动向前分析
- 比较不同策略方案
Core Concepts
核心概念
1. Backtesting Biases
1. 回测偏差
| Bias | Description | Mitigation |
|---|---|---|
| Look-ahead | Using future information | Point-in-time data |
| Survivorship | Only testing on survivors | Use delisted securities |
| Overfitting | Curve-fitting to history | Out-of-sample testing |
| Selection | Cherry-picking strategies | Pre-registration |
| Transaction | Ignoring trading costs | Realistic cost models |
| 偏差类型 | 描述 | 缓解措施 |
|---|---|---|
| 前瞻偏差 | 使用未来信息 | 使用时点数据 |
| 生存偏差 | 仅在存续标的上测试 | 纳入已退市证券数据 |
| 过拟合 | 过度拟合历史数据 | 样本外测试 |
| 选择偏差 | 挑选最优策略 | 预先注册策略 |
| 交易成本偏差 | 忽略交易成本 | 使用真实成本模型 |
2. Proper Backtest Structure
2. 合理的回测结构
Historical Data
│
▼
┌─────────────────────────────────────────┐
│ Training Set │
│ (Strategy Development & Optimization) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Validation Set │
│ (Parameter Selection, No Peeking) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Test Set │
│ (Final Performance Evaluation) │
└─────────────────────────────────────────┘Historical Data
│
▼
┌─────────────────────────────────────────┐
│ Training Set │
│ (Strategy Development & Optimization) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Validation Set │
│ (Parameter Selection, No Peeking) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Test Set │
│ (Final Performance Evaluation) │
└─────────────────────────────────────────┘3. Walk-Forward Analysis
3. 滚动向前分析
Window 1: [Train──────][Test]
Window 2: [Train──────][Test]
Window 3: [Train──────][Test]
Window 4: [Train──────][Test]
─────▶ TimeWindow 1: [Train──────][Test]
Window 2: [Train──────][Test]
Window 3: [Train──────][Test]
Window 4: [Train──────][Test]
─────▶ TimeImplementation Patterns
实现模式
Pattern 1: Event-Driven Backtester
模式1:事件驱动型回测器
python
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from datetime import datetime
from decimal import Decimal
from enum import Enum
from typing import Dict, List, Optional
import pandas as pd
import numpy as np
class OrderSide(Enum):
BUY = "buy"
SELL = "sell"
class OrderType(Enum):
MARKET = "market"
LIMIT = "limit"
STOP = "stop"
@dataclass
class Order:
symbol: str
side: OrderSide
quantity: Decimal
order_type: OrderType
limit_price: Optional[Decimal] = None
stop_price: Optional[Decimal] = None
timestamp: Optional[datetime] = None
@dataclass
class Fill:
order: Order
fill_price: Decimal
fill_quantity: Decimal
commission: Decimal
slippage: Decimal
timestamp: datetime
@dataclass
class Position:
symbol: str
quantity: Decimal = Decimal("0")
avg_cost: Decimal = Decimal("0")
realized_pnl: Decimal = Decimal("0")
def update(self, fill: Fill) -> None:
if fill.order.side == OrderSide.BUY:
new_quantity = self.quantity + fill.fill_quantity
if new_quantity != 0:
self.avg_cost = (
(self.quantity * self.avg_cost + fill.fill_quantity * fill.fill_price)
/ new_quantity
)
self.quantity = new_quantity
else:
self.realized_pnl += fill.fill_quantity * (fill.fill_price - self.avg_cost)
self.quantity -= fill.fill_quantity
@dataclass
class Portfolio:
cash: Decimal
positions: Dict[str, Position] = field(default_factory=dict)
def get_position(self, symbol: str) -> Position:
if symbol not in self.positions:
self.positions[symbol] = Position(symbol=symbol)
return self.positions[symbol]
def process_fill(self, fill: Fill) -> None:
position = self.get_position(fill.order.symbol)
position.update(fill)
if fill.order.side == OrderSide.BUY:
self.cash -= fill.fill_price * fill.fill_quantity + fill.commission
else:
self.cash += fill.fill_price * fill.fill_quantity - fill.commission
def get_equity(self, prices: Dict[str, Decimal]) -> Decimal:
equity = self.cash
for symbol, position in self.positions.items():
if position.quantity != 0 and symbol in prices:
equity += position.quantity * prices[symbol]
return equity
class Strategy(ABC):
@abstractmethod
def on_bar(self, timestamp: datetime, data: pd.DataFrame) -> List[Order]:
pass
@abstractmethod
def on_fill(self, fill: Fill) -> None:
pass
class ExecutionModel(ABC):
@abstractmethod
def execute(self, order: Order, bar: pd.Series) -> Optional[Fill]:
pass
class SimpleExecutionModel(ExecutionModel):
def __init__(self, slippage_bps: float = 10, commission_per_share: float = 0.01):
self.slippage_bps = slippage_bps
self.commission_per_share = commission_per_share
def execute(self, order: Order, bar: pd.Series) -> Optional[Fill]:
if order.order_type == OrderType.MARKET:
base_price = Decimal(str(bar["open"]))
# Apply slippage
slippage_mult = 1 + (self.slippage_bps / 10000)
if order.side == OrderSide.BUY:
fill_price = base_price * Decimal(str(slippage_mult))
else:
fill_price = base_price / Decimal(str(slippage_mult))
commission = order.quantity * Decimal(str(self.commission_per_share))
slippage = abs(fill_price - base_price) * order.quantity
return Fill(
order=order,
fill_price=fill_price,
fill_quantity=order.quantity,
commission=commission,
slippage=slippage,
timestamp=bar.name
)
return None
class Backtester:
def __init__(
self,
strategy: Strategy,
execution_model: ExecutionModel,
initial_capital: Decimal = Decimal("100000")
):
self.strategy = strategy
self.execution_model = execution_model
self.portfolio = Portfolio(cash=initial_capital)
self.equity_curve: List[tuple] = []
self.trades: List[Fill] = []
def run(self, data: pd.DataFrame) -> pd.DataFrame:
"""Run backtest on OHLCV data with DatetimeIndex."""
pending_orders: List[Order] = []
for timestamp, bar in data.iterrows():
# Execute pending orders at today's prices
for order in pending_orders:
fill = self.execution_model.execute(order, bar)
if fill:
self.portfolio.process_fill(fill)
self.strategy.on_fill(fill)
self.trades.append(fill)
pending_orders.clear()
# Get current prices for equity calculation
prices = {data.index.name or "default": Decimal(str(bar["close"]))}
equity = self.portfolio.get_equity(prices)
self.equity_curve.append((timestamp, float(equity)))
# Generate new orders for next bar
new_orders = self.strategy.on_bar(timestamp, data.loc[:timestamp])
pending_orders.extend(new_orders)
return self._create_results()
def _create_results(self) -> pd.DataFrame:
equity_df = pd.DataFrame(self.equity_curve, columns=["timestamp", "equity"])
equity_df.set_index("timestamp", inplace=True)
equity_df["returns"] = equity_df["equity"].pct_change()
return equity_dfpython
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from datetime import datetime
from decimal import Decimal
from enum import Enum
from typing import Dict, List, Optional
import pandas as pd
import numpy as np
class OrderSide(Enum):
BUY = "buy"
SELL = "sell"
class OrderType(Enum):
MARKET = "market"
LIMIT = "limit"
STOP = "stop"
@dataclass
class Order:
symbol: str
side: OrderSide
quantity: Decimal
order_type: OrderType
limit_price: Optional[Decimal] = None
stop_price: Optional[Decimal] = None
timestamp: Optional[datetime] = None
@dataclass
class Fill:
order: Order
fill_price: Decimal
fill_quantity: Decimal
commission: Decimal
slippage: Decimal
timestamp: datetime
@dataclass
class Position:
symbol: str
quantity: Decimal = Decimal("0")
avg_cost: Decimal = Decimal("0")
realized_pnl: Decimal = Decimal("0")
def update(self, fill: Fill) -> None:
if fill.order.side == OrderSide.BUY:
new_quantity = self.quantity + fill.fill_quantity
if new_quantity != 0:
self.avg_cost = (
(self.quantity * self.avg_cost + fill.fill_quantity * fill.fill_price)
/ new_quantity
)
self.quantity = new_quantity
else:
self.realized_pnl += fill.fill_quantity * (fill.fill_price - self.avg_cost)
self.quantity -= fill.fill_quantity
@dataclass
class Portfolio:
cash: Decimal
positions: Dict[str, Position] = field(default_factory=dict)
def get_position(self, symbol: str) -> Position:
if symbol not in self.positions:
self.positions[symbol] = Position(symbol=symbol)
return self.positions[symbol]
def process_fill(self, fill: Fill) -> None:
position = self.get_position(fill.order.symbol)
position.update(fill)
if fill.order.side == OrderSide.BUY:
self.cash -= fill.fill_price * fill.fill_quantity + fill.commission
else:
self.cash += fill.fill_price * fill.fill_quantity - fill.commission
def get_equity(self, prices: Dict[str, Decimal]) -> Decimal:
equity = self.cash
for symbol, position in self.positions.items():
if position.quantity != 0 and symbol in prices:
equity += position.quantity * prices[symbol]
return equity
class Strategy(ABC):
@abstractmethod
def on_bar(self, timestamp: datetime, data: pd.DataFrame) -> List[Order]:
pass
@abstractmethod
def on_fill(self, fill: Fill) -> None:
pass
class ExecutionModel(ABC):
@abstractmethod
def execute(self, order: Order, bar: pd.Series) -> Optional[Fill]:
pass
class SimpleExecutionModel(ExecutionModel):
def __init__(self, slippage_bps: float = 10, commission_per_share: float = 0.01):
self.slippage_bps = slippage_bps
self.commission_per_share = commission_per_share
def execute(self, order: Order, bar: pd.Series) -> Optional[Fill]:
if order.order_type == OrderType.MARKET:
base_price = Decimal(str(bar["open"]))
# Apply slippage
slippage_mult = 1 + (self.slippage_bps / 10000)
if order.side == OrderSide.BUY:
fill_price = base_price * Decimal(str(slippage_mult))
else:
fill_price = base_price / Decimal(str(slippage_mult))
commission = order.quantity * Decimal(str(self.commission_per_share))
slippage = abs(fill_price - base_price) * order.quantity
return Fill(
order=order,
fill_price=fill_price,
fill_quantity=order.quantity,
commission=commission,
slippage=slippage,
timestamp=bar.name
)
return None
class Backtester:
def __init__(
self,
strategy: Strategy,
execution_model: ExecutionModel,
initial_capital: Decimal = Decimal("100000")
):
self.strategy = strategy
self.execution_model = execution_model
self.portfolio = Portfolio(cash=initial_capital)
self.equity_curve: List[tuple] = []
self.trades: List[Fill] = []
def run(self, data: pd.DataFrame) -> pd.DataFrame:
"""Run backtest on OHLCV data with DatetimeIndex."""
pending_orders: List[Order] = []
for timestamp, bar in data.iterrows():
# Execute pending orders at today's prices
for order in pending_orders:
fill = self.execution_model.execute(order, bar)
if fill:
self.portfolio.process_fill(fill)
self.strategy.on_fill(fill)
self.trades.append(fill)
pending_orders.clear()
# Get current prices for equity calculation
prices = {data.index.name or "default": Decimal(str(bar["close"]))}
equity = self.portfolio.get_equity(prices)
self.equity_curve.append((timestamp, float(equity)))
# Generate new orders for next bar
new_orders = self.strategy.on_bar(timestamp, data.loc[:timestamp])
pending_orders.extend(new_orders)
return self._create_results()
def _create_results(self) -> pd.DataFrame:
equity_df = pd.DataFrame(self.equity_curve, columns=["timestamp", "equity"])
equity_df.set_index("timestamp", inplace=True)
equity_df["returns"] = equity_df["equity"].pct_change()
return equity_dfPattern 2: Vectorized Backtester (Fast)
模式2:向量化回测器(高速)
python
import pandas as pd
import numpy as np
from typing import Callable, Dict, Any
class VectorizedBacktester:
"""Fast vectorized backtester for simple strategies."""
def __init__(
self,
initial_capital: float = 100000,
commission: float = 0.001, # 0.1%
slippage: float = 0.0005 # 0.05%
):
self.initial_capital = initial_capital
self.commission = commission
self.slippage = slippage
def run(
self,
prices: pd.DataFrame,
signal_func: Callable[[pd.DataFrame], pd.Series]
) -> Dict[str, Any]:
"""
Run backtest with signal function.
Args:
prices: DataFrame with 'close' column
signal_func: Function that returns position signals (-1, 0, 1)
Returns:
Dictionary with results
"""
# Generate signals (shifted to avoid look-ahead)
signals = signal_func(prices).shift(1).fillna(0)
# Calculate returns
returns = prices["close"].pct_change()
# Calculate strategy returns with costs
position_changes = signals.diff().abs()
trading_costs = position_changes * (self.commission + self.slippage)
strategy_returns = signals * returns - trading_costs
# Build equity curve
equity = (1 + strategy_returns).cumprod() * self.initial_capital
# Calculate metrics
results = {
"equity": equity,
"returns": strategy_returns,
"signals": signals,
"metrics": self._calculate_metrics(strategy_returns, equity)
}
return results
def _calculate_metrics(
self,
returns: pd.Series,
equity: pd.Series
) -> Dict[str, float]:
"""Calculate performance metrics."""
total_return = (equity.iloc[-1] / self.initial_capital) - 1
annual_return = (1 + total_return) ** (252 / len(returns)) - 1
annual_vol = returns.std() * np.sqrt(252)
sharpe = annual_return / annual_vol if annual_vol > 0 else 0
# Drawdown
rolling_max = equity.cummax()
drawdown = (equity - rolling_max) / rolling_max
max_drawdown = drawdown.min()
# Win rate
winning_days = (returns > 0).sum()
total_days = (returns != 0).sum()
win_rate = winning_days / total_days if total_days > 0 else 0
return {
"total_return": total_return,
"annual_return": annual_return,
"annual_volatility": annual_vol,
"sharpe_ratio": sharpe,
"max_drawdown": max_drawdown,
"win_rate": win_rate,
"num_trades": int((returns != 0).sum())
}python
import pandas as pd
import numpy as np
from typing import Callable, Dict, Any
class VectorizedBacktester:
"""Fast vectorized backtester for simple strategies."""
def __init__(
self,
initial_capital: float = 100000,
commission: float = 0.001, # 0.1%
slippage: float = 0.0005 # 0.05%
):
self.initial_capital = initial_capital
self.commission = commission
self.slippage = slippage
def run(
self,
prices: pd.DataFrame,
signal_func: Callable[[pd.DataFrame], pd.Series]
) -> Dict[str, Any]:
"""
Run backtest with signal function.
Args:
prices: DataFrame with 'close' column
signal_func: Function that returns position signals (-1, 0, 1)
Returns:
Dictionary with results
"""
# Generate signals (shifted to avoid look-ahead)
signals = signal_func(prices).shift(1).fillna(0)
# Calculate returns
returns = prices["close"].pct_change()
# Calculate strategy returns with costs
position_changes = signals.diff().abs()
trading_costs = position_changes * (self.commission + self.slippage)
strategy_returns = signals * returns - trading_costs
# Build equity curve
equity = (1 + strategy_returns).cumprod() * self.initial_capital
# Calculate metrics
results = {
"equity": equity,
"returns": strategy_returns,
"signals": signals,
"metrics": self._calculate_metrics(strategy_returns, equity)
}
return results
def _calculate_metrics(
self,
returns: pd.Series,
equity: pd.Series
) -> Dict[str, float]:
"""Calculate performance metrics."""
total_return = (equity.iloc[-1] / self.initial_capital) - 1
annual_return = (1 + total_return) ** (252 / len(returns)) - 1
annual_vol = returns.std() * np.sqrt(252)
sharpe = annual_return / annual_vol if annual_vol > 0 else 0
# Drawdown
rolling_max = equity.cummax()
drawdown = (equity - rolling_max) / rolling_max
max_drawdown = drawdown.min()
# Win rate
winning_days = (returns > 0).sum()
total_days = (returns != 0).sum()
win_rate = winning_days / total_days if total_days > 0 else 0
return {
"total_return": total_return,
"annual_return": annual_return,
"annual_volatility": annual_vol,
"sharpe_ratio": sharpe,
"max_drawdown": max_drawdown,
"win_rate": win_rate,
"num_trades": int((returns != 0).sum())
}Example usage
Example usage
def momentum_signal(prices: pd.DataFrame, lookback: int = 20) -> pd.Series:
"""Simple momentum strategy: long when price > SMA, else flat."""
sma = prices["close"].rolling(lookback).mean()
return (prices["close"] > sma).astype(int)
def momentum_signal(prices: pd.DataFrame, lookback: int = 20) -> pd.Series:
"""Simple momentum strategy: long when price > SMA, else flat."""
sma = prices["close"].rolling(lookback).mean()
return (prices["close"] > sma).astype(int)
Run backtest
Run backtest
backtester = VectorizedBacktester()
backtester = VectorizedBacktester()
results = backtester.run(price_data, lambda p: momentum_signal(p, 50))
results = backtester.run(price_data, lambda p: momentum_signal(p, 50))
undefinedundefinedPattern 3: Walk-Forward Optimization
模式3:滚动向前优化
python
from typing import Callable, Dict, List, Tuple, Any
import pandas as pd
import numpy as np
from itertools import product
class WalkForwardOptimizer:
"""Walk-forward analysis with anchored or rolling windows."""
def __init__(
self,
train_period: int,
test_period: int,
anchored: bool = False,
n_splits: int = None
):
"""
Args:
train_period: Number of bars in training window
test_period: Number of bars in test window
anchored: If True, training always starts from beginning
n_splits: Number of train/test splits (auto-calculated if None)
"""
self.train_period = train_period
self.test_period = test_period
self.anchored = anchored
self.n_splits = n_splits
def generate_splits(
self,
data: pd.DataFrame
) -> List[Tuple[pd.DataFrame, pd.DataFrame]]:
"""Generate train/test splits."""
splits = []
n = len(data)
if self.n_splits:
step = (n - self.train_period) // self.n_splits
else:
step = self.test_period
start = 0
while start + self.train_period + self.test_period <= n:
if self.anchored:
train_start = 0
else:
train_start = start
train_end = start + self.train_period
test_end = min(train_end + self.test_period, n)
train_data = data.iloc[train_start:train_end]
test_data = data.iloc[train_end:test_end]
splits.append((train_data, test_data))
start += step
return splits
def optimize(
self,
data: pd.DataFrame,
strategy_func: Callable,
param_grid: Dict[str, List],
metric: str = "sharpe_ratio"
) -> Dict[str, Any]:
"""
Run walk-forward optimization.
Args:
data: Full dataset
strategy_func: Function(data, **params) -> results dict
param_grid: Parameter combinations to test
metric: Metric to optimize
Returns:
Combined results from all test periods
"""
splits = self.generate_splits(data)
all_results = []
optimal_params_history = []
for i, (train_data, test_data) in enumerate(splits):
# Optimize on training data
best_params, best_metric = self._grid_search(
train_data, strategy_func, param_grid, metric
)
optimal_params_history.append(best_params)
# Test with optimal params
test_results = strategy_func(test_data, **best_params)
test_results["split"] = i
test_results["params"] = best_params
all_results.append(test_results)
print(f"Split {i+1}/{len(splits)}: "
f"Best {metric}={best_metric:.4f}, params={best_params}")
return {
"split_results": all_results,
"param_history": optimal_params_history,
"combined_equity": self._combine_equity_curves(all_results)
}
def _grid_search(
self,
data: pd.DataFrame,
strategy_func: Callable,
param_grid: Dict[str, List],
metric: str
) -> Tuple[Dict, float]:
"""Grid search for best parameters."""
best_params = None
best_metric = -np.inf
# Generate all parameter combinations
param_names = list(param_grid.keys())
param_values = list(param_grid.values())
for values in product(*param_values):
params = dict(zip(param_names, values))
results = strategy_func(data, **params)
if results["metrics"][metric] > best_metric:
best_metric = results["metrics"][metric]
best_params = params
return best_params, best_metric
def _combine_equity_curves(
self,
results: List[Dict]
) -> pd.Series:
"""Combine equity curves from all test periods."""
combined = pd.concat([r["equity"] for r in results])
return combinedpython
from typing import Callable, Dict, List, Tuple, Any
import pandas as pd
import numpy as np
from itertools import product
class WalkForwardOptimizer:
"""Walk-forward analysis with anchored or rolling windows."""
def __init__(
self,
train_period: int,
test_period: int,
anchored: bool = False,
n_splits: int = None
):
"""
Args:
train_period: Number of bars in training window
test_period: Number of bars in test window
anchored: If True, training always starts from beginning
n_splits: Number of train/test splits (auto-calculated if None)
"""
self.train_period = train_period
self.test_period = test_period
self.anchored = anchored
self.n_splits = n_splits
def generate_splits(
self,
data: pd.DataFrame
) -> List[Tuple[pd.DataFrame, pd.DataFrame]]:
"""Generate train/test splits."""
splits = []
n = len(data)
if self.n_splits:
step = (n - self.train_period) // self.n_splits
else:
step = self.test_period
start = 0
while start + self.train_period + self.test_period <= n:
if self.anchored:
train_start = 0
else:
train_start = start
train_end = start + self.train_period
test_end = min(train_end + self.test_period, n)
train_data = data.iloc[train_start:train_end]
test_data = data.iloc[train_end:test_end]
splits.append((train_data, test_data))
start += step
return splits
def optimize(
self,
data: pd.DataFrame,
strategy_func: Callable,
param_grid: Dict[str, List],
metric: str = "sharpe_ratio"
) -> Dict[str, Any]:
"""
Run walk-forward optimization.
Args:
data: Full dataset
strategy_func: Function(data, **params) -> results dict
param_grid: Parameter combinations to test
metric: Metric to optimize
Returns:
Combined results from all test periods
"""
splits = self.generate_splits(data)
all_results = []
optimal_params_history = []
for i, (train_data, test_data) in enumerate(splits):
# Optimize on training data
best_params, best_metric = self._grid_search(
train_data, strategy_func, param_grid, metric
)
optimal_params_history.append(best_params)
# Test with optimal params
test_results = strategy_func(test_data, **best_params)
test_results["split"] = i
test_results["params"] = best_params
all_results.append(test_results)
print(f"Split {i+1}/{len(splits)}: "
f"Best {metric}={best_metric:.4f}, params={best_params}")
return {
"split_results": all_results,
"param_history": optimal_params_history,
"combined_equity": self._combine_equity_curves(all_results)
}
def _grid_search(
self,
data: pd.DataFrame,
strategy_func: Callable,
param_grid: Dict[str, List],
metric: str
) -> Tuple[Dict, float]:
"""Grid search for best parameters."""
best_params = None
best_metric = -np.inf
# Generate all parameter combinations
param_names = list(param_grid.keys())
param_values = list(param_grid.values())
for values in product(*param_values):
params = dict(zip(param_names, values))
results = strategy_func(data, **params)
if results["metrics"][metric] > best_metric:
best_metric = results["metrics"][metric]
best_params = params
return best_params, best_metric
def _combine_equity_curves(
self,
results: List[Dict]
) -> pd.Series:
"""Combine equity curves from all test periods."""
combined = pd.concat([r["equity"] for r in results])
return combinedPattern 4: Monte Carlo Analysis
模式4:蒙特卡洛分析
python
import numpy as np
import pandas as pd
from typing import Dict, List
class MonteCarloAnalyzer:
"""Monte Carlo simulation for strategy robustness."""
def __init__(self, n_simulations: int = 1000, confidence: float = 0.95):
self.n_simulations = n_simulations
self.confidence = confidence
def bootstrap_returns(
self,
returns: pd.Series,
n_periods: int = None
) -> np.ndarray:
"""
Bootstrap simulation by resampling returns.
Args:
returns: Historical returns series
n_periods: Length of each simulation (default: same as input)
Returns:
Array of shape (n_simulations, n_periods)
"""
if n_periods is None:
n_periods = len(returns)
simulations = np.zeros((self.n_simulations, n_periods))
for i in range(self.n_simulations):
# Resample with replacement
simulated_returns = np.random.choice(
returns.values,
size=n_periods,
replace=True
)
simulations[i] = simulated_returns
return simulations
def analyze_drawdowns(
self,
returns: pd.Series
) -> Dict[str, float]:
"""Analyze drawdown distribution via simulation."""
simulations = self.bootstrap_returns(returns)
max_drawdowns = []
for sim_returns in simulations:
equity = (1 + sim_returns).cumprod()
rolling_max = np.maximum.accumulate(equity)
drawdowns = (equity - rolling_max) / rolling_max
max_drawdowns.append(drawdowns.min())
max_drawdowns = np.array(max_drawdowns)
return {
"expected_max_dd": np.mean(max_drawdowns),
"median_max_dd": np.median(max_drawdowns),
f"worst_{int(self.confidence*100)}pct": np.percentile(
max_drawdowns, (1 - self.confidence) * 100
),
"worst_case": max_drawdowns.min()
}
def probability_of_loss(
self,
returns: pd.Series,
holding_periods: List[int] = [21, 63, 126, 252]
) -> Dict[int, float]:
"""Calculate probability of loss over various holding periods."""
results = {}
for period in holding_periods:
if period > len(returns):
continue
simulations = self.bootstrap_returns(returns, period)
total_returns = (1 + simulations).prod(axis=1) - 1
prob_loss = (total_returns < 0).mean()
results[period] = prob_loss
return results
def confidence_interval(
self,
returns: pd.Series,
periods: int = 252
) -> Dict[str, float]:
"""Calculate confidence interval for future returns."""
simulations = self.bootstrap_returns(returns, periods)
total_returns = (1 + simulations).prod(axis=1) - 1
lower = (1 - self.confidence) / 2
upper = 1 - lower
return {
"expected": total_returns.mean(),
"lower_bound": np.percentile(total_returns, lower * 100),
"upper_bound": np.percentile(total_returns, upper * 100),
"std": total_returns.std()
}python
import numpy as np
import pandas as pd
from typing import Dict, List
class MonteCarloAnalyzer:
"""Monte Carlo simulation for strategy robustness."""
def __init__(self, n_simulations: int = 1000, confidence: float = 0.95):
self.n_simulations = n_simulations
self.confidence = confidence
def bootstrap_returns(
self,
returns: pd.Series,
n_periods: int = None
) -> np.ndarray:
"""
Bootstrap simulation by resampling returns.
Args:
returns: Historical returns series
n_periods: Length of each simulation (default: same as input)
Returns:
Array of shape (n_simulations, n_periods)
"""
if n_periods is None:
n_periods = len(returns)
simulations = np.zeros((self.n_simulations, n_periods))
for i in range(self.n_simulations):
# Resample with replacement
simulated_returns = np.random.choice(
returns.values,
size=n_periods,
replace=True
)
simulations[i] = simulated_returns
return simulations
def analyze_drawdowns(
self,
returns: pd.Series
) -> Dict[str, float]:
"""Analyze drawdown distribution via simulation."""
simulations = self.bootstrap_returns(returns)
max_drawdowns = []
for sim_returns in simulations:
equity = (1 + sim_returns).cumprod()
rolling_max = np.maximum.accumulate(equity)
drawdowns = (equity - rolling_max) / rolling_max
max_drawdowns.append(drawdowns.min())
max_drawdowns = np.array(max_drawdowns)
return {
"expected_max_dd": np.mean(max_drawdowns),
"median_max_dd": np.median(max_drawdowns),
f"worst_{int(self.confidence*100)}pct": np.percentile(
max_drawdowns, (1 - self.confidence) * 100
),
"worst_case": max_drawdowns.min()
}
def probability_of_loss(
self,
returns: pd.Series,
holding_periods: List[int] = [21, 63, 126, 252]
) -> Dict[int, float]:
"""Calculate probability of loss over various holding periods."""
results = {}
for period in holding_periods:
if period > len(returns):
continue
simulations = self.bootstrap_returns(returns, period)
total_returns = (1 + simulations).prod(axis=1) - 1
prob_loss = (total_returns < 0).mean()
results[period] = prob_loss
return results
def confidence_interval(
self,
returns: pd.Series,
periods: int = 252
) -> Dict[str, float]:
"""Calculate confidence interval for future returns."""
simulations = self.bootstrap_returns(returns, periods)
total_returns = (1 + simulations).prod(axis=1) - 1
lower = (1 - self.confidence) / 2
upper = 1 - lower
return {
"expected": total_returns.mean(),
"lower_bound": np.percentile(total_returns, lower * 100),
"upper_bound": np.percentile(total_returns, upper * 100),
"std": total_returns.std()
}Performance Metrics
性能指标
python
def calculate_metrics(returns: pd.Series, rf_rate: float = 0.02) -> Dict[str, float]:
"""Calculate comprehensive performance metrics."""
# Annualization factor (assuming daily returns)
ann_factor = 252
# Basic metrics
total_return = (1 + returns).prod() - 1
annual_return = (1 + total_return) ** (ann_factor / len(returns)) - 1
annual_vol = returns.std() * np.sqrt(ann_factor)
# Risk-adjusted returns
sharpe = (annual_return - rf_rate) / annual_vol if annual_vol > 0 else 0
# Sortino (downside deviation)
downside_returns = returns[returns < 0]
downside_vol = downside_returns.std() * np.sqrt(ann_factor)
sortino = (annual_return - rf_rate) / downside_vol if downside_vol > 0 else 0
# Calmar ratio
equity = (1 + returns).cumprod()
rolling_max = equity.cummax()
drawdowns = (equity - rolling_max) / rolling_max
max_drawdown = drawdowns.min()
calmar = annual_return / abs(max_drawdown) if max_drawdown != 0 else 0
# Win rate and profit factor
wins = returns[returns > 0]
losses = returns[returns < 0]
win_rate = len(wins) / len(returns[returns != 0]) if len(returns[returns != 0]) > 0 else 0
profit_factor = wins.sum() / abs(losses.sum()) if losses.sum() != 0 else np.inf
return {
"total_return": total_return,
"annual_return": annual_return,
"annual_volatility": annual_vol,
"sharpe_ratio": sharpe,
"sortino_ratio": sortino,
"calmar_ratio": calmar,
"max_drawdown": max_drawdown,
"win_rate": win_rate,
"profit_factor": profit_factor,
"num_trades": int((returns != 0).sum())
}python
def calculate_metrics(returns: pd.Series, rf_rate: float = 0.02) -> Dict[str, float]:
"""Calculate comprehensive performance metrics."""
# Annualization factor (assuming daily returns)
ann_factor = 252
# Basic metrics
total_return = (1 + returns).prod() - 1
annual_return = (1 + total_return) ** (ann_factor / len(returns)) - 1
annual_vol = returns.std() * np.sqrt(ann_factor)
# Risk-adjusted returns
sharpe = (annual_return - rf_rate) / annual_vol if annual_vol > 0 else 0
# Sortino (downside deviation)
downside_returns = returns[returns < 0]
downside_vol = downside_returns.std() * np.sqrt(ann_factor)
sortino = (annual_return - rf_rate) / downside_vol if downside_vol > 0 else 0
# Calmar ratio
equity = (1 + returns).cumprod()
rolling_max = equity.cummax()
drawdowns = (equity - rolling_max) / rolling_max
max_drawdown = drawdowns.min()
calmar = annual_return / abs(max_drawdown) if max_drawdown != 0 else 0
# Win rate and profit factor
wins = returns[returns > 0]
losses = returns[returns < 0]
win_rate = len(wins) / len(returns[returns != 0]) if len(returns[returns != 0]) > 0 else 0
profit_factor = wins.sum() / abs(losses.sum()) if losses.sum() != 0 else np.inf
return {
"total_return": total_return,
"annual_return": annual_return,
"annual_volatility": annual_vol,
"sharpe_ratio": sharpe,
"sortino_ratio": sortino,
"calmar_ratio": calmar,
"max_drawdown": max_drawdown,
"win_rate": win_rate,
"profit_factor": profit_factor,
"num_trades": int((returns != 0).sum())
}Best Practices
最佳实践
Do's
建议事项
- Use point-in-time data - Avoid look-ahead bias
- Include transaction costs - Realistic estimates
- Test out-of-sample - Always reserve data
- Use walk-forward - Not just train/test
- Monte Carlo analysis - Understand uncertainty
- 使用时点数据 - 避免前瞻偏差
- 纳入交易成本 - 使用真实的成本估算
- 进行样本外测试 - 始终预留测试数据
- 采用滚动向前分析 - 不要仅用单一的训练/测试划分
- 开展蒙特卡洛分析 - 了解策略的不确定性
Don'ts
禁忌事项
- Don't overfit - Limit parameters
- Don't ignore survivorship - Include delisted
- Don't use adjusted data carelessly - Understand adjustments
- Don't optimize on full history - Reserve test set
- Don't ignore capacity - Market impact matters
- 不要过度拟合 - 限制参数数量
- 不要忽略生存偏差 - 纳入已退市标的数据
- 不要随意使用调整后数据 - 理解数据调整逻辑
- 不要在全量历史数据上优化 - 预留测试集
- 不要忽略容量限制 - 市场冲击不可忽视