adaptive-wfo-epoch

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Adaptive Walk-Forward Epoch Selection (AWFES)

自适应滚动窗口Epoch选择(AWFES)

Machine-readable reference for adaptive epoch selection within Walk-Forward Optimization (WFO). Optimizes training epochs per-fold using Walk-Forward Efficiency (WFE) as the objective.
这是滚动窗口优化(WFO)中自适应Epoch选择的机器可读参考文档,以滚动窗口效率(WFE)为目标,对每个折的训练Epoch进行优化。

When to Use This Skill

何时使用该工具

Use this skill when:
  • Selecting optimal training epochs for ML models in WFO
  • Avoiding overfitting via Walk-Forward Efficiency metrics
  • Implementing per-fold adaptive epoch selection
  • Computing efficient frontiers for epoch-performance trade-offs
  • Carrying epoch priors across WFO folds
在以下场景中使用该工具:
  • 为WFO中的ML模型选择最优训练Epoch
  • 通过滚动窗口效率指标避免过拟合
  • 实现按折自适应的Epoch选择
  • 计算Epoch-性能权衡的有效前沿
  • 在WFO折之间传递Epoch先验值

Quick Start

快速开始

python
from adaptive_wfo_epoch import AWFESConfig, compute_efficient_frontier
python
from adaptive_wfo_epoch import AWFESConfig, compute_efficient_frontier

Generate epoch candidates from search bounds and granularity

从搜索范围和粒度生成Epoch候选值

config = AWFESConfig.from_search_space( min_epoch=100, max_epoch=2000, granularity=5, # Number of frontier points )
config = AWFESConfig.from_search_space( min_epoch=100, max_epoch=2000, granularity=5, # 前沿点数量 )

config.epoch_configs → [100, 211, 447, 945, 2000] (log-spaced)

config.epoch_configs → [100, 211, 447, 945, 2000](对数间距)

Per-fold epoch sweep

按折进行Epoch扫描

for fold in wfo_folds: epoch_metrics = [] for epoch in config.epoch_configs: is_sharpe, oos_sharpe = train_and_evaluate(fold, epochs=epoch) wfe = config.compute_wfe(is_sharpe, oos_sharpe, n_samples=len(fold.train)) epoch_metrics.append({"epoch": epoch, "wfe": wfe, "is_sharpe": is_sharpe})
# Select from efficient frontier
selected_epoch = compute_efficient_frontier(epoch_metrics)

# Carry forward to next fold as prior
prior_epoch = selected_epoch
undefined
for fold in wfo_folds: epoch_metrics = [] for epoch in config.epoch_configs: is_sharpe, oos_sharpe = train_and_evaluate(fold, epochs=epoch) wfe = config.compute_wfe(is_sharpe, oos_sharpe, n_samples=len(fold.train)) epoch_metrics.append({"epoch": epoch, "wfe": wfe, "is_sharpe": is_sharpe})
# 从有效前沿中选择
selected_epoch = compute_efficient_frontier(epoch_metrics)

# 传递到下一个折作为先验值
prior_epoch = selected_epoch
undefined

Methodology Overview

方法概述

What This Is

什么是AWFES

Per-fold adaptive epoch selection where:
  1. Train models across a range of epochs (e.g., 400, 800, 1000, 2000)
  2. Compute WFE = OOS_Sharpe / IS_Sharpe for each epoch count
  3. Find the "efficient frontier" - epochs maximizing WFE vs training cost
  4. Select optimal epoch from frontier for OOS evaluation
  5. Carry forward as prior for next fold
按折自适应Epoch选择,步骤如下:
  1. 在一系列Epoch上训练模型(例如400、800、1000、2000)
  2. 为每个Epoch数量计算WFE = OOS_Sharpe / IS_Sharpe
  3. 找到“有效前沿”——在WFE与训练成本之间达到最优的Epoch
  4. 从前沿中选择最优Epoch用于OOS评估
  5. 将该Epoch传递到下一个折作为先验值

What This Is NOT

什么不是AWFES

  • NOT early stopping: Early stopping monitors validation loss continuously; this evaluates discrete candidates post-hoc
  • NOT Bayesian optimization: No surrogate model; direct evaluation of all candidates
  • NOT nested cross-validation: Uses temporal WFO, not shuffled splits
  • 不是早停机制:早停会持续监控验证损失,而本工具是在事后评估离散候选值
  • 不是贝叶斯优化:没有 surrogate 模型,直接评估所有候选值
  • 不是嵌套交叉验证:使用时序WFO,而非打乱的拆分

Academic Foundations

学术基础

ConceptCitationKey Insight
Walk-Forward EfficiencyPardo (1992, 2008)WFE = OOS_Return / IS_Return as robustness metric
Deflated Sharpe RatioBailey & López de Prado (2014)Adjusts for multiple testing
Pareto-Optimal HP SelectionBischl et al. (2023)Multi-objective hyperparameter optimization
Warm-StartingNomura & Ono (2021)Transfer knowledge between optimization runs
See references/academic-foundations.md for full literature review.
概念引用文献核心见解
滚动窗口效率(Walk-Forward Efficiency)Pardo (1992, 2008)WFE = OOS_Return / IS_Return 作为鲁棒性指标
压缩夏普比率(Deflated Sharpe Ratio)Bailey & López de Prado (2014)针对多重检验进行调整
帕累托最优超参数选择(Pareto-Optimal HP Selection)Bischl et al. (2023)多目标超参数优化
热启动(Warm-Starting)Nomura & Ono (2021)在优化运行之间传递知识
完整文献综述请参见 references/academic-foundations.md

Core Formula: Walk-Forward Efficiency

核心公式:滚动窗口效率

python
def compute_wfe(
    is_sharpe: float,
    oos_sharpe: float,
    n_samples: int | None = None,
) -> float | None:
    """Walk-Forward Efficiency - measures performance transfer.

    WFE = OOS_Sharpe / IS_Sharpe

    Interpretation (guidelines, not hard thresholds):
    - WFE ≥ 0.70: Excellent transfer (low overfitting)
    - WFE 0.50-0.70: Good transfer
    - WFE 0.30-0.50: Moderate transfer (investigate)
    - WFE < 0.30: Severe overfitting (likely reject)

    The IS_Sharpe minimum is derived from signal-to-noise ratio,
    not a fixed magic number. See compute_is_sharpe_threshold().

    Reference: Pardo (2008) "The Evaluation and Optimization of Trading Strategies"
    """
    # Data-driven threshold: IS_Sharpe must exceed 2σ noise floor
    min_is_sharpe = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1

    if abs(is_sharpe) < min_is_sharpe:
        return None
    return oos_sharpe / is_sharpe
python
def compute_wfe(
    is_sharpe: float,
    oos_sharpe: float,
    n_samples: int | None = None,
) -> float | None:
    """滚动窗口效率 - 衡量性能迁移能力。

    WFE = OOS_Sharpe / IS_Sharpe

    解读(指南,非硬性阈值):
    - WFE ≥ 0.70: 优秀的迁移能力(低过拟合)
    - WFE 0.50-0.70: 良好的迁移能力
    - WFE 0.30-0.50: 中等迁移能力(需调研)
    - WFE < 0.30: 严重过拟合(可能需要拒绝)

    IS_Sharpe的最小值由信噪比推导而来,
    并非固定的魔法数字。请参见compute_is_sharpe_threshold()。

    参考:Pardo (2008) 《交易策略的评估与优化》
    """
    # 数据驱动的阈值:IS_Sharpe必须超过2σ噪声下限
    min_is_sharpe = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1

    if abs(is_sharpe) < min_is_sharpe:
        return None
    return oos_sharpe / is_sharpe

Principled Configuration Framework

原则性配置框架

All parameters in AWFES are derived from first principles or data characteristics, not arbitrary magic numbers.
AWFES中的所有参数均从第一性原理或数据特征推导而来,而非任意的魔法数字。

AWFESConfig: Unified Configuration

AWFESConfig:统一配置

python
from dataclasses import dataclass, field
from typing import Literal
import numpy as np

@dataclass
class AWFESConfig:
    """AWFES configuration with principled parameter derivation.

    No magic numbers - all values derived from search space or data.
    """
    # Search space bounds (user-specified)
    min_epoch: int
    max_epoch: int
    granularity: int  # Number of frontier points

    # Derived automatically
    epoch_configs: list[int] = field(init=False)
    prior_variance: float = field(init=False)
    observation_variance: float = field(init=False)

    # Market context for annualization
    # crypto_session_filtered: Use when data is filtered to London-NY weekday hours
    market_type: Literal["crypto_24_7", "crypto_session_filtered", "equity", "forex"] = "crypto_24_7"
    time_unit: Literal["bar", "daily", "weekly"] = "weekly"

    def __post_init__(self):
        # Generate epoch configs with log spacing (optimal for frontier discovery)
        self.epoch_configs = self._generate_epoch_configs()

        # Derive Bayesian variances from search space
        self.prior_variance, self.observation_variance = self._derive_variances()

    def _generate_epoch_configs(self) -> list[int]:
        """Generate epoch candidates with log spacing.

        Log spacing is optimal for efficient frontier because:
        1. Early epochs: small changes matter more (underfit → fit transition)
        2. Late epochs: diminishing returns (already near convergence)
        3. Uniform coverage of the WFE vs cost trade-off space

        Formula: epoch_i = min × (max/min)^(i/(n-1))
        """
        if self.granularity < 2:
            return [self.min_epoch]

        log_min = np.log(self.min_epoch)
        log_max = np.log(self.max_epoch)
        log_epochs = np.linspace(log_min, log_max, self.granularity)

        return sorted(set(int(round(np.exp(e))) for e in log_epochs))

    def _derive_variances(self) -> tuple[float, float]:
        """Derive Bayesian variances from search space.

        Principle: Prior should span the search space with ~95% coverage.

        For Normal distribution: 95% CI = mean ± 1.96σ
        If we want 95% of prior mass in [min_epoch, max_epoch]:
            range = max - min = 2 × 1.96 × σ = 3.92σ
            σ = range / 3.92
            σ² = (range / 3.92)²

        Observation variance: Set to achieve reasonable learning rate.
        Rule: observation_variance ≈ prior_variance / 4
        This means each observation updates the posterior meaningfully
        but doesn't dominate the prior immediately.
        """
        epoch_range = self.max_epoch - self.min_epoch
        prior_std = epoch_range / 3.92  # 95% CI spans search space
        prior_variance = prior_std ** 2

        # Observation variance: 1/4 of prior for balanced learning
        # This gives ~0.2 weight to each new observation initially
        observation_variance = prior_variance / 4

        return prior_variance, observation_variance

    @classmethod
    def from_search_space(
        cls,
        min_epoch: int,
        max_epoch: int,
        granularity: int = 5,
        market_type: str = "crypto_24_7",
    ) -> "AWFESConfig":
        """Create config from search space bounds."""
        return cls(
            min_epoch=min_epoch,
            max_epoch=max_epoch,
            granularity=granularity,
            market_type=market_type,
        )

    def compute_wfe(
        self,
        is_sharpe: float,
        oos_sharpe: float,
        n_samples: int | None = None,
    ) -> float | None:
        """Compute WFE with data-driven IS_Sharpe threshold."""
        min_is = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1
        if abs(is_sharpe) < min_is:
            return None
        return oos_sharpe / is_sharpe

    def get_annualization_factor(self) -> float:
        """Get annualization factor to scale Sharpe from time_unit to ANNUAL.

        IMPORTANT: This returns sqrt(periods_per_year) for scaling to ANNUAL Sharpe.
        For daily-to-weekly scaling, use get_daily_to_weekly_factor() instead.

        Principled derivation:
        - Sharpe scales with √(periods per year)
        - Crypto 24/7: 365 days/year, 52.14 weeks/year
        - Crypto session-filtered: 252 days/year (like equity)
        - Equity: 252 trading days/year, ~52 weeks/year
        - Forex: ~252 days/year (varies by pair)
        """
        PERIODS_PER_YEAR = {
            ("crypto_24_7", "daily"): 365,
            ("crypto_24_7", "weekly"): 52.14,
            ("crypto_24_7", "bar"): None,  # Cannot annualize bars directly
            ("crypto_session_filtered", "daily"): 252,  # London-NY weekdays only
            ("crypto_session_filtered", "weekly"): 52,
            ("equity", "daily"): 252,
            ("equity", "weekly"): 52,
            ("forex", "daily"): 252,
        }

        key = (self.market_type, self.time_unit)
        periods = PERIODS_PER_YEAR.get(key)

        if periods is None:
            raise ValueError(
                f"Cannot annualize {self.time_unit} for {self.market_type}. "
                "Use daily or weekly aggregation first."
            )

        return np.sqrt(periods)

    def get_daily_to_weekly_factor(self) -> float:
        """Get factor to scale DAILY Sharpe to WEEKLY Sharpe.

        This is different from get_annualization_factor()!
        - Daily → Weekly: sqrt(days_per_week)
        - Daily → Annual: sqrt(days_per_year)  (use get_annualization_factor)

        Market-specific:
        - Crypto 24/7: sqrt(7) = 2.65 (7 trading days/week)
        - Crypto session-filtered: sqrt(5) = 2.24 (weekdays only)
        - Equity: sqrt(5) = 2.24 (5 trading days/week)
        """
        DAYS_PER_WEEK = {
            "crypto_24_7": 7,
            "crypto_session_filtered": 5,  # London-NY weekdays only
            "equity": 5,
            "forex": 5,
        }

        days = DAYS_PER_WEEK.get(self.market_type)
        if days is None:
            raise ValueError(f"Unknown market type: {self.market_type}")

        return np.sqrt(days)
python
from dataclasses import dataclass, field
from typing import Literal
import numpy as np

@dataclass
class AWFESConfig:
    """AWFES配置,基于原则性参数推导。

    无魔法数字 - 所有值均从搜索空间或数据中推导得出。
    """
    # 搜索空间边界(用户指定)
    min_epoch: int
    max_epoch: int
    granularity: int  # 前沿点数量

    # 自动推导
    epoch_configs: list[int] = field(init=False)
    prior_variance: float = field(init=False)
    observation_variance: float = field(init=False)

    # 用于年化的市场环境
    # crypto_session_filtered: 当数据过滤为伦敦-纽约工作日时段时使用
    market_type: Literal["crypto_24_7", "crypto_session_filtered", "equity", "forex"] = "crypto_24_7"
    time_unit: Literal["bar", "daily", "weekly"] = "weekly"

    def __post_init__(self):
        # 生成对数间距的Epoch配置(对前沿发现最优)
        self.epoch_configs = self._generate_epoch_configs()

        # 从搜索空间推导贝叶斯方差
        self.prior_variance, self.observation_variance = self._derive_variances()

    def _generate_epoch_configs(self) -> list[int]:
        """生成对数间距的Epoch候选值。

        对数间距对有效前沿而言是最优的,原因如下:
        1. 早期Epoch:微小变化影响更大(从欠拟合到拟合的过渡)
        2. 晚期Epoch:收益递减(已接近收敛)
        3. 均匀覆盖WFE与成本的权衡空间

        公式:epoch_i = min × (max/min)^(i/(n-1))
        """
        if self.granularity < 2:
            return [self.min_epoch]

        log_min = np.log(self.min_epoch)
        log_max = np.log(self.max_epoch)
        log_epochs = np.linspace(log_min, log_max, self.granularity)

        return sorted(set(int(round(np.exp(e))) for e in log_epochs))

    def _derive_variances(self) -> tuple[float, float]:
        """从搜索空间推导贝叶斯方差。

        原则:先验分布应覆盖搜索空间的~95%范围。

        对于正态分布:95%置信区间 = 均值 ± 1.96σ
        如果希望95%的先验质量落在[min_epoch, max_epoch]内:
            range = max - min = 2 × 1.96 × σ = 3.92σ
            σ = range / 3.92
            σ² = (range / 3.92)²

        观测方差:设置为实现合理的学习率。
        规则:observation_variance ≈ prior_variance / 4
        这意味着每个观测值都会有意义地更新后验分布,但不会立即主导先验分布。
        """
        epoch_range = self.max_epoch - self.min_epoch
        prior_std = epoch_range / 3.92  # 95%置信区间覆盖搜索空间
        prior_variance = prior_std ** 2

        # 观测方差:先验的1/4,实现平衡学习
        # 初始时每个新观测值的权重约为0.2
        observation_variance = prior_variance / 4

        return prior_variance, observation_variance

    @classmethod
    def from_search_space(
        cls,
        min_epoch: int,
        max_epoch: int,
        granularity: int = 5,
        market_type: str = "crypto_24_7",
    ) -> "AWFESConfig":
        """从搜索空间边界创建配置。"""
        return cls(
            min_epoch=min_epoch,
            max_epoch=max_epoch,
            granularity=granularity,
            market_type=market_type,
        )

    def compute_wfe(
        self,
        is_sharpe: float,
        oos_sharpe: float,
        n_samples: int | None = None,
    ) -> float | None:
        """使用数据驱动的IS_Sharpe阈值计算WFE。"""
        min_is = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1
        if abs(is_sharpe) < min_is:
            return None
        return oos_sharpe / is_sharpe

    def get_annualization_factor(self) -> float:
        """获取年化因子,将Sharpe从time_unit转换为年度。

        重要提示:返回sqrt(每年周期数)用于转换为年度Sharpe。
        如需从日度转换为周度,请使用get_daily_to_weekly_factor()。

        原则性推导:
        - Sharpe与√(每年周期数)成正比
        - 加密货币24/7:每年365天,52.14周
        - 过滤时段的加密货币:每年252天(与股票相同)
        - 股票:每年252个交易日,约52周
        - 外汇:每年约252天(因货币对而异)
        """
        PERIODS_PER_YEAR = {
            ("crypto_24_7", "daily"): 365,
            ("crypto_24_7", "weekly"): 52.14,
            ("crypto_24_7", "bar"): None,  # 无法直接对bar进行年化
            ("crypto_session_filtered", "daily"): 252,  # 仅伦敦-纽约工作日
            ("crypto_session_filtered", "weekly"): 52,
            ("equity", "daily"): 252,
            ("equity", "weekly"): 52,
            ("forex", "daily"): 252,
        }

        key = (self.market_type, self.time_unit)
        periods = PERIODS_PER_YEAR.get(key)

        if periods is None:
            raise ValueError(
                f"无法对{self.market_type}{self.time_unit}进行年化。 "
                "请先进行日度或周度聚合。"
            )

        return np.sqrt(periods)

    def get_daily_to_weekly_factor(self) -> float:
        """获取将日度Sharpe转换为周度Sharpe的因子。

        这与get_annualization_factor()不同!
        - 日度 → 周度:sqrt(每周天数)
        - 日度 → 年度:sqrt(每年天数) (使用get_annualization_factor)

        市场特定值:
        - 加密货币24/7:sqrt(7) = 2.65(每周7个交易日)
        - 过滤时段的加密货币:sqrt(5) = 2.24(仅工作日)
        - 股票:sqrt(5) = 2.24(每周5个交易日)
        """
        DAYS_PER_WEEK = {
            "crypto_24_7": 7,
            "crypto_session_filtered": 5,  # 仅伦敦-纽约工作日
            "equity": 5,
            "forex": 5,
        }

        days = DAYS_PER_WEEK.get(self.market_type)
        if days is None:
            raise ValueError(f"未知市场类型:{self.market_type}")

        return np.sqrt(days)

IS_Sharpe Threshold: Signal-to-Noise Derivation

IS_Sharpe阈值:信噪比推导

python
def compute_is_sharpe_threshold(n_samples: int | None = None) -> float:
    """Compute minimum IS_Sharpe threshold from signal-to-noise ratio.

    Principle: IS_Sharpe must be statistically distinguishable from zero.

    Under null hypothesis (no skill), Sharpe ~ N(0, 1/√n).
    To reject null at α=0.05 (one-sided), need Sharpe > 1.645/√n.

    For practical use, we use 2σ threshold (≈97.7% confidence):
        threshold = 2.0 / √n

    This adapts to sample size:
    - n=100: threshold ≈ 0.20
    - n=400: threshold ≈ 0.10
    - n=1600: threshold ≈ 0.05

    Fallback for unknown n: 0.1 (assumes n≈400, typical fold size)

    Rationale for 0.1 fallback:
    - 2/√400 = 0.1, so 0.1 assumes ~400 samples per fold
    - This is conservative: 400 samples is typical for weekly folds
    - If actual n is smaller, threshold is looser (accepts more noise)
    - If actual n is larger, threshold is tighter (fine, we're conservative)
    - The 0.1 value also corresponds to "not statistically distinguishable
      from zero at reasonable sample sizes" - a natural floor for Sharpe SE
    """
    if n_samples is None or n_samples < 10:
        # Conservative fallback: 0.1 assumes ~400 samples (typical fold size)
        # Derivation: 2/√400 = 0.1; see rationale above
        return 0.1

    return 2.0 / np.sqrt(n_samples)
python
def compute_is_sharpe_threshold(n_samples: int | None = None) -> float:
    """从信噪比推导最小IS_Sharpe阈值。

    原则:IS_Sharpe必须在统计上显著区别于零。

    在原假设(无技能)下,Sharpe ~ N(0, 1/√n)。
    要在α=0.05(单侧)下拒绝原假设,需要Sharpe > 1.645/√n。

    实际使用中,我们使用2σ阈值(≈97.7%置信度):
        threshold = 2.0 / √n

    该阈值会随样本量调整:
    - n=100: 阈值≈0.20
    - n=400: 阈值≈0.10
    - n=1600: 阈值≈0.05

    未知n时的 fallback:0.1(假设n≈400,典型折大小)

    0.1 fallback的理由:
    - 2/√400 = 0.1,因此0.1假设每个折约400个样本
    - 这是保守值:400个样本是周度折的典型大小
    - 如果实际n更小,阈值更宽松(接受更多噪声)
    - 如果实际n更大,阈值更严格(没问题,我们是保守的)
    - 0.1值也对应“在合理样本量下无法与零统计区分”——Sharpe标准误的自然下限
    """
    if n_samples is None or n_samples < 10:
        # 保守fallback:0.1假设约400个样本(典型折大小)
        # 推导:2/√400 = 0.1;参见上述理由
        return 0.1

    return 2.0 / np.sqrt(n_samples)

Guardrails (Principled Guidelines)

防护准则(原则性指南)

G1: WFE Thresholds

G1: WFE阈值

The traditional thresholds (0.30, 0.50, 0.70) are guidelines based on practitioner consensus, not derived from first principles. They represent:
ThresholdMeaningStatistical Basis
0.30Hard rejectRetaining <30% of IS performance is almost certainly noise
0.50WarningAt 50%, half the signal is lost - investigate
0.70TargetIndustry standard for "good" transfer
python
undefined
传统阈值(0.30、0.50、0.70)是基于从业者共识的指南,而非从第一性原理推导而来。它们代表:
阈值含义统计基础
0.30硬性拒绝保留的IS性能<30%几乎可以肯定是噪声
0.50警告达到50%时,一半信号丢失——需调研
0.70目标值行业标准的“良好”迁移能力
python
undefined

These are GUIDELINES, not hard rules

这些是指南,而非硬性规则

Adjust based on your domain and risk tolerance

根据你的领域和风险容忍度进行调整

WFE_THRESHOLDS = { "hard_reject": 0.30, # Below this: almost certainly overfitting "warning": 0.50, # Below this: significant signal loss "target": 0.70, # Above this: good generalization }
def classify_wfe(wfe: float | None) -> str: """Classify WFE with principled thresholds.""" if wfe is None: return "INVALID" # IS_Sharpe below noise floor if wfe < WFE_THRESHOLDS["hard_reject"]: return "REJECT" if wfe < WFE_THRESHOLDS["warning"]: return "INVESTIGATE" if wfe < WFE_THRESHOLDS["target"]: return "ACCEPTABLE" return "EXCELLENT"
undefined
WFE_THRESHOLDS = { "hard_reject": 0.30, # 低于此值:几乎肯定过拟合 "warning": 0.50, # 低于此值:显著信号丢失 "target": 0.70, # 高于此值:良好的泛化能力 }
def classify_wfe(wfe: float | None) -> str: """使用原则性阈值对WFE进行分类。""" if wfe is None: return "INVALID" # IS_Sharpe低于噪声下限 if wfe < WFE_THRESHOLDS["hard_reject"]: return "REJECT" if wfe < WFE_THRESHOLDS["warning"]: return "INVESTIGATE" if wfe < WFE_THRESHOLDS["target"]: return "ACCEPTABLE" return "EXCELLENT"
undefined

G2: IS_Sharpe Minimum (Data-Driven)

G2: IS_Sharpe最小值(数据驱动)

OLD (magic number):
python
undefined
旧方法(魔法数字):
python
undefined

WRONG: Fixed threshold regardless of sample size

错误:固定阈值,与样本量无关

if is_sharpe < 1.0: wfe = None

**NEW (principled):**

```python
if is_sharpe < 1.0: wfe = None

**新方法(原则性):**

```python

CORRECT: Threshold adapts to sample size

正确:阈值随样本量调整

min_is_sharpe = compute_is_sharpe_threshold(n_samples) if is_sharpe < min_is_sharpe: wfe = None # Below noise floor for this sample size

The threshold derives from the standard error of Sharpe ratio: SE(SR) ≈ 1/√n.

**Note on SE(Sharpe) approximation**: The formula `1/√n` is a first-order approximation valid when SR is small (close to 0). The full Lo (2002) formula is:
SE(SR) = √((1 + 0.5×SR²) / n)

For high-Sharpe strategies (SR > 1.0), the simplified formula underestimates SE by ~25-50%. Use the full formula when evaluating strategies with SR > 1.0.
min_is_sharpe = compute_is_sharpe_threshold(n_samples) if is_sharpe < min_is_sharpe: wfe = None # 低于该样本量下的噪声下限

该阈值来源于Sharpe比率的标准误:SE(SR) ≈ 1/√n。

**关于SE(Sharpe)近似的说明**:公式`1/√n`是一阶近似,在SR较小(接近0)时有效。完整的Lo (2002)公式为:
SE(SR) = √((1 + 0.5×SR²) / n)

对于高Sharpe策略(SR > 1.0),简化公式会低估SE约25-50%。当评估SR > 1.0的策略时,请使用完整公式。

G3: Stability Penalty for Epoch Changes (Adaptive)

G3: Epoch变化的稳定性惩罚(自适应)

The stability penalty prevents hyperparameter churn. Instead of fixed thresholds, use relative improvement based on WFE variance:
python
def compute_stability_threshold(wfe_history: list[float]) -> float:
    """Compute stability threshold from observed WFE variance.

    Principle: Require improvement exceeding noise level.

    If WFE has std=0.15 across folds, random fluctuation could be ±0.15.
    To distinguish signal from noise, require improvement > 1σ of WFE.

    Minimum: 5% (prevent switching on negligible improvements)
    Maximum: 20% (don't be overly conservative)
    """
    if len(wfe_history) < 3:
        return 0.10  # Default until enough history

    wfe_std = np.std(wfe_history)
    threshold = max(0.05, min(0.20, wfe_std))
    return threshold


class AdaptiveStabilityPenalty:
    """Stability penalty that adapts to observed WFE variance."""

    def __init__(self):
        self.wfe_history: list[float] = []
        self.epoch_changes: list[int] = []

    def should_change_epoch(
        self,
        current_wfe: float,
        candidate_wfe: float,
        current_epoch: int,
        candidate_epoch: int,
    ) -> bool:
        """Decide whether to change epochs based on adaptive threshold."""
        self.wfe_history.append(current_wfe)

        if current_epoch == candidate_epoch:
            return False  # Same epoch, no change needed

        threshold = compute_stability_threshold(self.wfe_history)
        improvement = (candidate_wfe - current_wfe) / max(abs(current_wfe), 0.01)

        if improvement > threshold:
            self.epoch_changes.append(len(self.wfe_history))
            return True

        return False  # Improvement not significant
稳定性惩罚用于防止超参数频繁变化。替代固定阈值,使用基于WFE方差的相对改进
python
def compute_stability_threshold(wfe_history: list[float]) -> float:
    """从观测到的WFE方差计算稳定性阈值。

    原则:要求改进超过噪声水平。

    如果WFE在各折中的标准差为0.15,随机波动可能为±0.15。
    为区分信号与噪声,要求改进> WFE的1σ。

    最小值:5%(防止因微小改进而切换)
    最大值:20%(不过于保守)
    """
    if len(wfe_history) < 3:
        return 0.10  # 有足够历史前的默认值

    wfe_std = np.std(wfe_history)
    threshold = max(0.05, min(0.20, wfe_std))
    return threshold


class AdaptiveStabilityPenalty:
    """自适应稳定性惩罚,基于观测到的WFE方差调整。"""

    def __init__(self):
        self.wfe_history: list[float] = []
        self.epoch_changes: list[int] = []

    def should_change_epoch(
        self,
        current_wfe: float,
        candidate_wfe: float,
        current_epoch: int,
        candidate_epoch: int,
    ) -> bool:
        """基于自适应阈值决定是否更换Epoch。"""
        self.wfe_history.append(current_wfe)

        if current_epoch == candidate_epoch:
            return False  # 相同Epoch,无需更换

        threshold = compute_stability_threshold(self.wfe_history)
        improvement = (candidate_wfe - current_wfe) / max(abs(current_wfe), 0.01)

        if improvement > threshold:
            self.epoch_changes.append(len(self.wfe_history))
            return True

        return False  # 改进不显著

G4: DSR Adjustment for Epoch Search (Principled)

G4: 针对Epoch搜索的DSR调整(原则性)

python
def adjusted_dsr_for_epoch_search(
    sharpe: float,
    n_folds: int,
    n_epochs: int,
    sharpe_se: float | None = None,
    n_samples_per_fold: int | None = None,
) -> float:
    """Deflated Sharpe Ratio accounting for epoch selection multiplicity.

    When selecting from K epochs, the expected maximum Sharpe under null
    is inflated. This adjustment corrects for that selection bias.

    Principled SE estimation:
    - If n_samples provided: SE(Sharpe) ≈ 1/√n
    - Otherwise: estimate from typical fold size

    Reference: Bailey & López de Prado (2014), Gumbel distribution
    """
    from math import sqrt, log, pi

    n_trials = n_folds * n_epochs  # Total selection events

    if n_trials < 2:
        return sharpe  # No multiple testing correction needed

    # Expected maximum under null (Gumbel distribution)
    # E[max(Z_1, ..., Z_n)] ≈ √(2·ln(n)) - (γ + ln(π/2)) / √(2·ln(n))
    # where γ ≈ 0.5772 is Euler-Mascheroni constant
    euler_gamma = 0.5772156649
    sqrt_2_log_n = sqrt(2 * log(n_trials))
    e_max_z = sqrt_2_log_n - (euler_gamma + log(pi / 2)) / sqrt_2_log_n

    # Estimate Sharpe SE if not provided
    if sharpe_se is None:
        if n_samples_per_fold is not None:
            sharpe_se = 1.0 / sqrt(n_samples_per_fold)
        else:
            # Conservative default: assume ~300 samples per fold
            sharpe_se = 1.0 / sqrt(300)

    # Expected maximum Sharpe under null
    e_max_sharpe = e_max_z * sharpe_se

    # Deflated Sharpe
    return max(0, sharpe - e_max_sharpe)
Example: For 5 epochs × 50 folds = 250 trials with 300 samples/fold:
  • sharpe_se ≈ 0.058
  • e_max_z ≈ 2.88
  • e_max_sharpe ≈ 0.17
  • A Sharpe of 1.0 deflates to 0.83 after adjustment.
python
def adjusted_dsr_for_epoch_search(
    sharpe: float,
    n_folds: int,
    n_epochs: int,
    sharpe_se: float | None = None,
    n_samples_per_fold: int | None = None,
) -> float:
    """针对Epoch选择的多重检验调整压缩夏普比率。

    当从K个Epoch中选择时,原假设下的最大Sharpe会被高估。此调整用于修正该选择偏差。

    原则性SE估计:
    - 如果提供n_samples:SE(Sharpe) ≈ 1/√n
    - 否则:从典型折大小估计

    参考:Bailey & López de Prado (2014),Gumbel分布
    """
    from math import sqrt, log, pi

    n_trials = n_folds * n_epochs  # 总选择事件数

    if n_trials < 2:
        return sharpe  # 无需多重检验校正

    # 原假设下的期望最大值(Gumbel分布)
    # E[max(Z_1, ..., Z_n)] ≈ √(2·ln(n)) - (γ + ln(π/2)) / √(2·ln(n))
    # 其中γ ≈ 0.5772是欧拉-马歇罗尼常数
    euler_gamma = 0.5772156649
    sqrt_2_log_n = sqrt(2 * log(n_trials))
    e_max_z = sqrt_2_log_n - (euler_gamma + log(pi / 2)) / sqrt_2_log_n

    # 如果未提供,估计Sharpe SE
    if sharpe_se is None:
        if n_samples_per_fold is not None:
            sharpe_se = 1.0 / sqrt(n_samples_per_fold)
        else:
            # 保守默认:假设每个折约300个样本
            sharpe_se = 1.0 / sqrt(300)

    # 原假设下的期望最大Sharpe
    e_max_sharpe = e_max_z * sharpe_se

    # 压缩夏普比率
    return max(0, sharpe - e_max_sharpe)
示例:对于5个Epoch × 50个折 = 250次试验,每个折300个样本:
  • sharpe_se ≈ 0.058
  • e_max_z ≈ 2.88
  • e_max_sharpe ≈ 0.17
  • Sharpe为1.0时,调整后为0.83

WFE Aggregation Methods

WFE聚合方法

WARNING: Cauchy Distribution Under Null
Under the null hypothesis (no predictive skill), WFE follows a Cauchy distribution, which has:
  • No defined mean (undefined expectation)
  • No defined variance (infinite)
  • Heavy tails (extreme values common)
This makes arithmetic mean unreliable. A single extreme WFE can dominate the average. Always prefer median or pooled methods for robust WFE aggregation. See references/mathematical-formulation.md for the proof:
WFE | H0 ~ Cauchy(0, sqrt(T_IS/T_OOS))
.
警告:原假设下的柯西分布
在原假设(无预测能力)下,WFE服从柯西分布,该分布具有:
  • 无定义均值(期望不存在)
  • 无定义方差(无穷大)
  • 厚尾(极端值常见)
这使得算术均值不可靠。单个极端WFE会主导平均值。始终优先使用中位数或加权聚合方法进行鲁棒的WFE聚合。证明请参见references/mathematical-formulation.md
WFE | H0 ~ Cauchy(0, sqrt(T_IS/T_OOS))

Method 1: Pooled WFE (Recommended for precision-weighted)

方法1:加权WFE(推荐用于精度加权)

python
def pooled_wfe(fold_results: list[dict]) -> float:
    """Weights each fold by its sample size (precision).

    Formula: Σ(T_OOS × SR_OOS) / Σ(T_IS × SR_IS)

    Advantage: More stable than arithmetic mean, handles varying fold sizes.
    Use when: Fold sizes vary significantly.
    """
    numerator = sum(r["n_oos"] * r["oos_sharpe"] for r in fold_results)
    denominator = sum(r["n_is"] * r["is_sharpe"] for r in fold_results)

    if denominator < 1e-10:
        return float("nan")
    return numerator / denominator
python
def pooled_wfe(fold_results: list[dict]) -> float:
    """按样本量(精度)对每个折进行加权。

    公式:Σ(T_OOS × SR_OOS) / Σ(T_IS × SR_IS)

    优势:比算术均值更稳定,处理不同折大小。
    使用场景:折大小差异显著时。
    """
    numerator = sum(r["n_oos"] * r["oos_sharpe"] for r in fold_results)
    denominator = sum(r["n_is"] * r["is_sharpe"] for r in fold_results)

    if denominator < 1e-10:
        return float("nan")
    return numerator / denominator

Method 2: Median WFE (Recommended for robustness)

方法2:中位数WFE(推荐用于鲁棒性)

python
def median_wfe(fold_results: list[dict]) -> float:
    """Robust to outliers, standard in robust statistics.

    Advantage: Single extreme fold doesn't dominate.
    Use when: Suspected outlier folds (regime changes, data issues).
    """
    wfes = [r["wfe"] for r in fold_results if r["wfe"] is not None]
    return float(np.median(wfes)) if wfes else float("nan")
python
def median_wfe(fold_results: list[dict]) -> float:
    """对异常值鲁棒,是鲁棒统计中的标准方法。

    优势:单个极端折不会主导结果。
    使用场景:怀疑存在异常折( regime变化、数据问题)时。
    """
    wfes = [r["wfe"] for r in fold_results if r["wfe"] is not None]
    return float(np.median(wfes)) if wfes else float("nan")

Method 3: Weighted Arithmetic Mean

方法3:加权算术均值

python
def weighted_mean_wfe(fold_results: list[dict]) -> float:
    """Weights by inverse variance (efficiency weighting).

    Formula: Σ(w_i × WFE_i) / Σ(w_i)
    where w_i = 1 / Var(WFE_i) ≈ n_oos × n_is / (n_oos + n_is)

    Advantage: Optimal when combining estimates of different precision.
    Use when: All folds have similar characteristics.
    """
    weighted_sum = 0.0
    weight_total = 0.0

    for r in fold_results:
        if r["wfe"] is None:
            continue
        weight = r["n_oos"] * r["n_is"] / (r["n_oos"] + r["n_is"] + 1e-10)
        weighted_sum += weight * r["wfe"]
        weight_total += weight

    return weighted_sum / weight_total if weight_total > 0 else float("nan")
python
def weighted_mean_wfe(fold_results: list[dict]) -> float:
    """按逆方差加权(效率加权)。

    公式:Σ(w_i × WFE_i) / Σ(w_i)
    其中w_i = 1 / Var(WFE_i) ≈ n_oos × n_is / (n_oos + n_is)

    优势:当合并不同精度的估计时最优。
    使用场景:所有折特征相似时。
    """
    weighted_sum = 0.0
    weight_total = 0.0

    for r in fold_results:
        if r["wfe"] is None:
            continue
        weight = r["n_oos"] * r["n_is"] / (r["n_oos"] + r["n_is"] + 1e-10)
        weighted_sum += weight * r["wfe"]
        weight_total += weight

    return weighted_sum / weight_total if weight_total > 0 else float("nan")

Aggregation Selection Guide

聚合方法选择指南

ScenarioRecommended MethodRationale
Variable fold sizesPooled WFEWeights by precision
Suspected outliersMedian WFERobust to extremes
Homogeneous foldsWeighted meanOptimal efficiency
ReportingAll threeCross-check consistency
场景推荐方法理由
折大小可变加权WFE按精度加权
怀疑存在异常折中位数WFE对极端值鲁棒
折特征相似加权均值最优效率
报告三种方法都用交叉检查一致性

Efficient Frontier Algorithm

有效前沿算法

python
def compute_efficient_frontier(
    epoch_metrics: list[dict],
    wfe_weight: float = 1.0,
    time_weight: float = 0.1,
) -> tuple[list[int], int]:
    """
    Find Pareto-optimal epochs and select best.

    An epoch is on the frontier if no other epoch dominates it
    (better WFE AND lower training time).

    Args:
        epoch_metrics: List of {epoch, wfe, training_time_sec}
        wfe_weight: Weight for WFE in selection (higher = prefer generalization)
        time_weight: Weight for training time (higher = prefer speed)

    Returns:
        (frontier_epochs, selected_epoch)
    """
    import numpy as np

    # Filter valid metrics
    valid = [(m["epoch"], m["wfe"], m.get("training_time_sec", m["epoch"]))
             for m in epoch_metrics
             if m["wfe"] is not None and np.isfinite(m["wfe"])]

    if not valid:
        # Fallback: return epoch with best OOS Sharpe
        best_oos = max(epoch_metrics, key=lambda m: m.get("oos_sharpe", 0))
        return ([best_oos["epoch"]], best_oos["epoch"])

    # Pareto dominance check
    frontier = []
    for i, (epoch_i, wfe_i, time_i) in enumerate(valid):
        dominated = False
        for j, (epoch_j, wfe_j, time_j) in enumerate(valid):
            if i == j:
                continue
            # j dominates i if: better/equal WFE AND lower/equal time (strict in at least one)
            if (wfe_j >= wfe_i and time_j <= time_i and
                (wfe_j > wfe_i or time_j < time_i)):
                dominated = True
                break
        if not dominated:
            frontier.append((epoch_i, wfe_i, time_i))

    frontier_epochs = [e for e, _, _ in frontier]

    if len(frontier) == 1:
        return (frontier_epochs, frontier[0][0])

    # Weighted score selection
    wfes = np.array([w for _, w, _ in frontier])
    times = np.array([t for _, _, t in frontier])

    wfe_norm = (wfes - wfes.min()) / (wfes.max() - wfes.min() + 1e-10)
    time_norm = (times.max() - times) / (times.max() - times.min() + 1e-10)

    scores = wfe_weight * wfe_norm + time_weight * time_norm
    best_idx = np.argmax(scores)

    return (frontier_epochs, frontier[best_idx][0])
python
def compute_efficient_frontier(
    epoch_metrics: list[dict],
    wfe_weight: float = 1.0,
    time_weight: float = 0.1,
) -> tuple[list[int], int]:
    """
    找到帕累托最优Epoch并选择最优值。

    一个Epoch在前沿上的条件是:没有其他Epoch能在WFE和训练时间上同时优于它
    (更好的WFE AND更短的训练时间,至少有一项严格更优)。

    参数:
        epoch_metrics: 列表,元素为{epoch, wfe, training_time_sec}
        wfe_weight: 选择中WFE的权重(值越高越偏好泛化能力)
        time_weight: 选择中训练时间的权重(值越高越偏好速度)

    返回:
        (frontier_epochs, selected_epoch)
    """
    import numpy as np

    # 过滤有效指标
    valid = [(m["epoch"], m["wfe"], m.get("training_time_sec", m["epoch"]))
             for m in epoch_metrics
             if m["wfe"] is not None and np.isfinite(m["wfe"])]

    if not valid:
        # 回退:返回OOS Sharpe最优的Epoch
        best_oos = max(epoch_metrics, key=lambda m: m.get("oos_sharpe", 0))
        return ([best_oos["epoch"]], best_oos["epoch"])

    # 帕累托占优检查
    frontier = []
    for i, (epoch_i, wfe_i, time_i) in enumerate(valid):
        dominated = False
        for j, (epoch_j, wfe_j, time_j) in enumerate(valid):
            if i == j:
                continue
            # j占优i的条件:WFE更好/相等 且 训练时间更短/相等(至少一项严格更优)
            if (wfe_j >= wfe_i and time_j <= time_i and
                (wfe_j > wfe_i or time_j < time_i)):
                dominated = True
                break
        if not dominated:
            frontier.append((epoch_i, wfe_i, time_i))

    frontier_epochs = [e for e, _, _ in frontier]

    if len(frontier) == 1:
        return (frontier_epochs, frontier[0][0])

    # 加权分数选择
    wfes = np.array([w for _, w, _ in frontier])
    times = np.array([t for _, _, t in frontier])

    wfe_norm = (wfes - wfes.min()) / (wfes.max() - wfes.min() + 1e-10)
    time_norm = (times.max() - times) / (times.max() - times.min() + 1e-10)

    scores = wfe_weight * wfe_norm + time_weight * time_norm
    best_idx = np.argmax(scores)

    return (frontier_epochs, frontier[best_idx][0])

Carry-Forward Mechanism

传递机制

python
class AdaptiveEpochSelector:
    """Maintains epoch selection state across WFO folds with adaptive stability."""

    def __init__(self, epoch_configs: list[int]):
        self.epoch_configs = epoch_configs
        self.selection_history: list[dict] = []
        self.last_selected: int | None = None
        self.stability = AdaptiveStabilityPenalty()  # Use adaptive, not fixed

    def select_epoch(self, epoch_metrics: list[dict]) -> int:
        """Select epoch with adaptive stability penalty for changes."""
        frontier_epochs, candidate = compute_efficient_frontier(epoch_metrics)

        # Apply adaptive stability penalty if changing epochs
        if self.last_selected is not None and candidate != self.last_selected:
            candidate_wfe = next(
                m["wfe"] for m in epoch_metrics if m["epoch"] == candidate
            )
            last_wfe = next(
                (m["wfe"] for m in epoch_metrics if m["epoch"] == self.last_selected),
                0.0
            )

            # Use adaptive threshold derived from WFE variance
            if not self.stability.should_change_epoch(
                last_wfe, candidate_wfe, self.last_selected, candidate
            ):
                candidate = self.last_selected

        # Record and return
        self.selection_history.append({
            "epoch": candidate,
            "frontier": frontier_epochs,
            "changed": candidate != self.last_selected,
        })
        self.last_selected = candidate
        return candidate
python
class AdaptiveEpochSelector:
    """在WFO折之间维护Epoch选择状态,带有自适应稳定性。"""

    def __init__(self, epoch_configs: list[int]):
        self.epoch_configs = epoch_configs
        self.selection_history: list[dict] = []
        self.last_selected: int | None = None
        self.stability = AdaptiveStabilityPenalty()  # 使用自适应,而非固定

    def select_epoch(self, epoch_metrics: list[dict]) -> int:
        """使用自适应稳定性惩罚选择Epoch。"""
        frontier_epochs, candidate = compute_efficient_frontier(epoch_metrics)

        # 更换Epoch时应用自适应稳定性惩罚
        if self.last_selected is not None and candidate != self.last_selected:
            candidate_wfe = next(
                m["wfe"] for m in epoch_metrics if m["epoch"] == candidate
            )
            last_wfe = next(
                (m["wfe"] for m in epoch_metrics if m["epoch"] == self.last_selected),
                0.0
            )

            # 使用从WFE方差推导的自适应阈值
            if not self.stability.should_change_epoch(
                last_wfe, candidate_wfe, self.last_selected, candidate
            ):
                candidate = self.last_selected

        # 记录并返回
        self.selection_history.append({
            "epoch": candidate,
            "frontier": frontier_epochs,
            "changed": candidate != self.last_selected,
        })
        self.last_selected = candidate
        return candidate

Anti-Patterns

反模式

Anti-PatternSymptomFixSeverity
Expanding window (range bars)Train size grows per foldUse fixed sliding windowCRITICAL
Peak pickingBest epoch always at sweep boundaryExpand range, check for plateauHIGH
Insufficient foldseffective_n < 30Increase folds or data spanHIGH
Ignoring temporal autocorrFolds correlatedUse purged CV, gap between foldsHIGH
Overfitting to ISIS >> OOS SharpeReduce epochs, add regularizationHIGH
sqrt(252) for cryptoInflated SharpeUse sqrt(365) or sqrt(7) weeklyMEDIUM
Single epoch selectionNo uncertainty quantificationReport confidence intervalMEDIUM
Meta-overfittingEpoch selection itself overfitsLimit to 3-4 candidates maxHIGH
CRITICAL: Never use expanding window for range bar ML training. Expanding windows create fold non-equivalence, regime dilution, and systematically bias risk metrics. See references/anti-patterns.md for the full analysis (Section 7).
反模式症状修复方案严重程度
扩展窗口(范围bar)训练集大小随折增长使用固定滑动窗口严重
峰值选取最优Epoch始终在扫描边界扩大范围,检查是否存在平台
折数量不足effective_n < 30增加折数量或数据跨度
忽略时序自相关折之间存在相关性使用净化CV,在折之间添加间隔
过拟合到ISIS >> OOS Sharpe减少Epoch,添加正则化
对加密货币使用sqrt(252)Sharpe被高估使用sqrt(365)或周度的sqrt(7)
单个Epoch选择无不确定性量化报告置信区间
元过拟合Epoch选择本身过拟合最多限制为3-4个候选Epoch
严重警告:永远不要对范围bar ML训练使用扩展窗口。扩展窗口会导致折非等价、regime稀释,并系统性地偏置风险指标。完整分析请参见references/anti-patterns.md(第7节)。

Decision Tree

决策树

See references/epoch-selection-decision-tree.md for the full practitioner decision tree.
Start
  ├─ IS_Sharpe > compute_is_sharpe_threshold(n)? ──NO──> Mark WFE invalid, use fallback
  │         │                                            (threshold = 2/√n, adapts to sample size)
  │        YES
  │         │
  ├─ Compute WFE for each epoch
  │         │
  ├─ Any WFE > 0.30? ──NO──> REJECT all epochs (severe overfit)
  │         │                (guideline, not hard threshold)
  │        YES
  │         │
  ├─ Compute efficient frontier
  │         │
  ├─ Apply AdaptiveStabilityPenalty
  │         │ (threshold derived from WFE variance)
  └─> Return selected epoch
完整从业者决策树请参见references/epoch-selection-decision-tree.md
开始
  ├─ IS_Sharpe > compute_is_sharpe_threshold(n)? ──NO──> 标记WFE无效,使用回退方案
  │         │                                            (阈值 = 2/√n,随样本量调整)
  │        YES
  │         │
  ├─ 为每个Epoch计算WFE
  │         │
  ├─ 是否有WFE > 0.30? ──NO──> 拒绝所有Epoch(严重过拟合)
  │         │                (指南,非硬性阈值)
  │        YES
  │         │
  ├─ 计算有效前沿
  │         │
  ├─ 应用AdaptiveStabilityPenalty
  │         │ (阈值从WFE方差推导)
  └─> 返回选中的Epoch

Integration with rangebar-eval-metrics

与rangebar-eval-metrics集成

This skill extends rangebar-eval-metrics:
Metric SourceUsed ForReference
sharpe_tw
WFE numerator (OOS) and denominator (IS)range-bar-metrics.md
n_bars
Sample size for aggregation weightsmetrics-schema.md
psr
,
dsr
Final acceptance criteriasharpe-formulas.md
prediction_autocorr
Validate model isn't collapsedml-prediction-quality.md
is_collapsed
Model health checkml-prediction-quality.md
Extended risk metricsDeep risk analysis (optional)risk-metrics.md
本工具扩展了rangebar-eval-metrics
指标来源用途参考链接
sharpe_tw
WFE的分子(OOS)和分母(IS)range-bar-metrics.md
n_bars
聚合权重的样本大小metrics-schema.md
psr
,
dsr
最终验收标准sharpe-formulas.md
prediction_autocorr
验证模型未崩溃ml-prediction-quality.md
is_collapsed
模型健康检查ml-prediction-quality.md
扩展风险指标深度风险分析(可选)risk-metrics.md

Recommended Workflow

推荐工作流

  1. Compute base metrics using
    rangebar-eval-metrics:compute_metrics.py
  2. Feed to AWFES for epoch selection with
    sharpe_tw
    as primary signal
  3. Validate with
    psr > 0.85
    and
    dsr > 0.50
    before deployment
  4. Monitor
    is_collapsed
    and
    prediction_autocorr
    for model health

  1. 计算基础指标 使用
    rangebar-eval-metrics:compute_metrics.py
  2. 输入到AWFES
    sharpe_tw
    为主要信号进行Epoch选择
  3. 验证 部署前需满足
    psr > 0.85
    dsr > 0.50
  4. 监控
    is_collapsed
    prediction_autocorr
    以确保模型健康

OOS Application Phase

OOS应用阶段

Overview

概述

After epoch selection via efficient frontier, apply the selected epochs to held-out test data for final OOS performance metrics. This phase produces "live trading" results that simulate deployment.
通过有效前沿选择Epoch后,将选中的Epoch应用于保留的测试数据以获取最终OOS性能指标。此阶段生成模拟部署的“实盘交易”结果。

Nested WFO Structure

嵌套WFO结构

AWFES uses Nested WFO with three data splits per fold:
                    AWFES: Nested WFO Data Split (per fold)

#############     +----------+     +---------+     +----------+     #==========#
AWFES使用嵌套WFO,每个折包含三个数据拆分:
                    AWFES: 嵌套WFO数据拆分(每个折)

#############     +----------+     +---------+     +----------+     #==========#

Train 60% # --> | Gap 6% A | --> | Val 20% | --> | Gap 6% B | --> H Test 20% H

训练集60% # --> | 间隔6% A | --> | 验证集20% | --> | 间隔6% B | --> H 测试集20% H

############# +----------+ +---------+ +----------+ #==========#

<details>
<summary>graph-easy source</summary>
graph { label: "AWFES: Nested WFO Data Split (per fold)"; flow: east; }
[ Train 60% ] { border: bold; } [ Gap 6% A ] [ Val 20% ] [ Gap 6% B ] [ Test 20% ] { border: double; }
[ Train 60% ] -> [ Gap 6% A ] [ Gap 6% A ] -> [ Val 20% ] [ Val 20% ] -> [ Gap 6% B ] [ Gap 6% B ] -> [ Test 20% ]

</details>
############# +----------+ +---------+ +----------+ #==========#

<details>
<summary>graph-easy源码</summary>
graph { label: "AWFES: 嵌套WFO数据拆分(每个折)"; flow: east; }
[ 训练集60% ] { border: bold; } [ 间隔6% A ] [ 验证集20% ] [ 间隔6% B ] [ 测试集20% ] { border: double; }
[ 训练集60% ] -> [ 间隔6% A ] [ 间隔6% A ] -> [ 验证集20% ] [ 验证集20% ] -> [ 间隔6% B ] [ 间隔6% B ] -> [ 测试集20% ]

</details>

Per-Fold Workflow

单折工作流

                  AWFES: Per-Fold Workflow

                   -----------------------
                  |      Fold i Data      |
                   -----------------------
                    |
                    v
                  +-----------------------+
                  | Split: Train/Val/Test |
                  +-----------------------+
                    |
                    v
                  +-----------------------+
                  | Epoch Sweep on Train  |
                  +-----------------------+
                    |
                    v
                  +-----------------------+
                  |  Compute WFE on Val   |
                  +-----------------------+
                    |
                    | val optimal
                    v
                  #=======================#
                  H    Bayesian Update    H
                  #=======================#
                    |
                    | smoothed epoch
                    v
                  +-----------------------+
                  |   Train Final Model   |
                  +-----------------------+
                    |
                    v
                  #=======================#
                  H   Evaluate on Test    H
                  #=======================#
                    |
                    v
                   -----------------------
                  |    Fold i Metrics     |
                   -----------------------
<details> <summary>graph-easy source</summary>
graph { label: "AWFES: Per-Fold Workflow"; flow: south; }

[ Fold i Data ] { shape: rounded; }
[ Split: Train/Val/Test ]
[ Epoch Sweep on Train ]
[ Compute WFE on Val ]
[ Bayesian Update ] { border: double; }
[ Train Final Model ]
[ Evaluate on Test ] { border: double; }
[ Fold i Metrics ] { shape: rounded; }

[ Fold i Data ] -> [ Split: Train/Val/Test ]
[ Split: Train/Val/Test ] -> [ Epoch Sweep on Train ]
[ Epoch Sweep on Train ] -> [ Compute WFE on Val ]
[ Compute WFE on Val ] -- val optimal --> [ Bayesian Update ]
[ Bayesian Update ] -- smoothed epoch --> [ Train Final Model ]
[ Train Final Model ] -> [ Evaluate on Test ]
[ Evaluate on Test ] -> [ Fold i Metrics ]
</details>
                  AWFES: 单折工作流

                   -----------------------
                  |      折i数据      |
                   -----------------------
                    |
                    v
                  +-----------------------+
                  | 拆分:训练/验证/测试 |
                  +-----------------------+
                    |
                    v
                  +-----------------------+
                  | 在训练集上进行Epoch扫描 |
                  +-----------------------+
                    |
                    v
                  +-----------------------+
                  |  在验证集上计算WFE   |
                  +-----------------------+
                    |
                    | 验证集最优
                    v
                  #=======================#
                  H    贝叶斯更新    H
                  #=======================#
                    |
                    | 平滑后的Epoch
                    v
                  +-----------------------+
                  |   训练最终模型   |
                  +-----------------------+
                    |
                    v
                  #=======================#
                  H   在测试集上评估    H
                  #=======================#
                    |
                    v
                   -----------------------
                  |    折i指标     |
                   -----------------------
<details> <summary>graph-easy源码</summary>
graph { label: "AWFES: 单折工作流"; flow: south; }

[ 折i数据 ] { shape: rounded; }
[ 拆分:训练/验证/测试 ]
[ 在训练集上进行Epoch扫描 ]
[ 在验证集上计算WFE ]
[ 贝叶斯更新 ] { border: double; }
[ 训练最终模型 ]
[ 在测试集上评估 ] { border: double; }
[ 折i指标 ] { shape: rounded; }

[ 折i数据 ] -> [ 拆分:训练/验证/测试 ]
[ 拆分:训练/验证/测试 ] -> [ 在训练集上进行Epoch扫描 ]
[ 在训练集上进行Epoch扫描 ] -> [ 在验证集上计算WFE ]
[ 在验证集上计算WFE ] -- 验证集最优 --> [ 贝叶斯更新 ]
[ 贝叶斯更新 ] -- 平滑后的Epoch --> [ 训练最终模型 ]
[ 训练最终模型 ] -> [ 在测试集上评估 ]
[ 在测试集上评估 ] -> [ 折i指标 ]
</details>

Bayesian Carry-Forward Across Folds

跨折贝叶斯传递

                                 AWFES: Bayesian Carry-Forward Across Folds

 -------   init   +--------+  posterior   +--------+  posterior   +--------+     +--------+      -----------
| Prior | ------> | Fold 1 | -----------> | Fold 2 | -----------> | Fold 3 | ..> | Fold N | --> | Aggregate |
 -------          +--------+              +--------+              +--------+     +--------+      -----------
<details> <summary>graph-easy source</summary>
graph { label: "AWFES: Bayesian Carry-Forward Across Folds"; flow: east; }

[ Prior ] { shape: rounded; }
[ Fold 1 ]
[ Fold 2 ]
[ Fold 3 ]
[ Fold N ]
[ Aggregate ] { shape: rounded; }

[ Prior ] -- init --> [ Fold 1 ]
[ Fold 1 ] -- posterior --> [ Fold 2 ]
[ Fold 2 ] -- posterior --> [ Fold 3 ]
[ Fold 3 ] ..> [ Fold N ]
[ Fold N ] -> [ Aggregate ]
</details>
                                 AWFES: 跨折贝叶斯传递

 -------   初始化   +--------+  后验   +--------+  后验   +--------+     +--------+      -----------
| 先验 | ------> | 折1 | -----------> | 折2 | -----------> | 折3 | ..> | 折N | --> | 聚合 |
 -------          +--------+              +--------+              +--------+     +--------+      -----------
<details> <summary>graph-easy源码</summary>
graph { label: "AWFES: 跨折贝叶斯传递"; flow: east; }

[ 先验 ] { shape: rounded; }
[ 折1 ]
[ 折2 ]
[ 折3 ]
[ 折N ]
[ 聚合 ] { shape: rounded; }

[ 先验 ] -- 初始化 --> [ 折1 ]
[ 折1 ] -- 后验 --> [ 折2 ]
[ 折2 ] -- 后验 --> [ 折3 ]
[ 折3 ] ..> [ 折N ]
[ 折N ] -> [ 聚合 ]
</details>

Bayesian Epoch Selection for OOS

用于OOS的贝叶斯Epoch选择

Instead of using the current fold's optimal epoch (look-ahead bias), use Bayesian-smoothed epoch from prior folds:
python
class BayesianEpochSelector:
    """Bayesian updating of epoch selection across folds.

    Also known as: BayesianEpochSmoother (alias in epoch-smoothing.md)

    Variance parameters are DERIVED from search space, not hard-coded.
    See AWFESConfig._derive_variances() for the principled derivation.
    """

    def __init__(
        self,
        epoch_configs: list[int],
        prior_mean: float | None = None,
        prior_variance: float | None = None,
        observation_variance: float | None = None,
    ):
        self.epoch_configs = sorted(epoch_configs)

        # PRINCIPLED DERIVATION: Variances from search space
        # If not provided, derive from epoch range
        epoch_range = max(epoch_configs) - min(epoch_configs)

        # Prior spans search space with 95% coverage
        # 95% CI = mean ± 1.96σ → range = 3.92σ → σ² = (range/3.92)²
        default_prior_var = (epoch_range / 3.92) ** 2

        # Observation variance: 1/4 of prior for balanced learning
        default_obs_var = default_prior_var / 4

        self.posterior_mean = prior_mean or np.mean(epoch_configs)
        self.posterior_variance = prior_variance or default_prior_var
        self.observation_variance = observation_variance or default_obs_var
        self.history: list[dict] = []

    def update(self, observed_optimal_epoch: int, wfe: float) -> int:
        """Update posterior with new fold's optimal epoch.

        Uses precision-weighted Bayesian update:
        posterior_mean = (prior_precision * prior_mean + obs_precision * obs) /
                        (prior_precision + obs_precision)

        Args:
            observed_optimal_epoch: Optimal epoch from current fold's validation
            wfe: Walk-Forward Efficiency (used to weight observation)

        Returns:
            Smoothed epoch selection for TEST evaluation
        """
        # Weight observation by WFE (higher WFE = more reliable signal)
        # Clamp WFE to [0.1, 2.0] to prevent extreme weights:
        #   - Lower bound 0.1: Prevents division issues and ensures minimum weight
        #   - Upper bound 2.0: WFE > 2 is suspicious (OOS > 2× IS suggests:
        #       a) Regime shift favoring OOS (lucky timing, not skill)
        #       b) IS severely overfit (artificially low denominator)
        #       c) Data anomaly or look-ahead bias
        #     Capping at 2.0 treats such observations with skepticism
        wfe_clamped = max(0.1, min(wfe, 2.0))
        effective_variance = self.observation_variance / wfe_clamped

        prior_precision = 1.0 / self.posterior_variance
        obs_precision = 1.0 / effective_variance

        # Bayesian update
        new_precision = prior_precision + obs_precision
        new_mean = (
            prior_precision * self.posterior_mean +
            obs_precision * observed_optimal_epoch
        ) / new_precision

        # Record before updating
        self.history.append({
            "observed_epoch": observed_optimal_epoch,
            "wfe": wfe,
            "prior_mean": self.posterior_mean,
            "posterior_mean": new_mean,
            "selected_epoch": self._snap_to_config(new_mean),
        })

        self.posterior_mean = new_mean
        self.posterior_variance = 1.0 / new_precision

        return self._snap_to_config(new_mean)

    def _snap_to_config(self, continuous_epoch: float) -> int:
        """Snap continuous estimate to nearest valid epoch config."""
        return min(self.epoch_configs, key=lambda e: abs(e - continuous_epoch))

    def get_current_epoch(self) -> int:
        """Get current smoothed epoch without updating."""
        return self._snap_to_config(self.posterior_mean)
不要使用当前折的最优Epoch(前瞻偏差),而是使用来自先验折的贝叶斯平滑Epoch
python
class BayesianEpochSelector:
    """跨折贝叶斯更新Epoch选择。

    也称为:BayesianEpochSmoother(在epoch-smoothing.md中的别名)

    方差参数从搜索空间推导,而非硬编码。
    推导细节请参见AWFESConfig._derive_variances()。
    """

    def __init__(
        self,
        epoch_configs: list[int],
        prior_mean: float | None = None,
        prior_variance: float | None = None,
        observation_variance: float | None = None,
    ):
        self.epoch_configs = sorted(epoch_configs)

        # 原则性推导:从搜索空间得到方差
        # 如果未提供,从Epoch范围推导
        epoch_range = max(epoch_configs) - min(epoch_configs)

        # 先验分布覆盖搜索空间的95%范围
        # 95%置信区间 = 均值 ± 1.96σ → 范围 = 3.92σ → σ² = (range/3.92)²
        default_prior_var = (epoch_range / 3.92) ** 2

        # 观测方差:先验的1/4,实现平衡学习
        default_obs_var = default_prior_var / 4

        self.posterior_mean = prior_mean or np.mean(epoch_configs)
        self.posterior_variance = prior_variance or default_prior_var
        self.observation_variance = observation_variance or default_obs_var
        self.history: list[dict] = []

    def update(self, observed_optimal_epoch: int, wfe: float) -> int:
        """使用当前折的验证最优Epoch更新后验分布。

        使用精度加权贝叶斯更新:
        posterior_mean = (prior_precision * prior_mean + obs_precision * obs) /
                        (prior_precision + obs_precision)

        参数:
            observed_optimal_epoch: 当前折验证集的最优Epoch
            wfe: 滚动窗口效率(用于加权观测值)

        返回:
            用于TEST评估的平滑Epoch选择
        """
        # 按WFE加权观测值(WFE越高,信号越可靠)
        # 将WFE限制在[0.1, 2.0]以防止极端权重:
        #   - 下限0.1:防止除零问题并确保最小权重
        #   - 上限2.0:WFE > 2可疑(OOS > 2× IS表明:
        #       a) 有利于OOS的regime变化(幸运时机,而非技能)
        #       b) IS严重过拟合(人为降低分母)
        #       c) 数据异常或前瞻偏差
        #     限制为2.0表示对这类观测值持怀疑态度
        wfe_clamped = max(0.1, min(wfe, 2.0))
        effective_variance = self.observation_variance / wfe_clamped

        prior_precision = 1.0 / self.posterior_variance
        obs_precision = 1.0 / effective_variance

        # 贝叶斯更新
        new_precision = prior_precision + obs_precision
        new_mean = (
            prior_precision * self.posterior_mean +
            obs_precision * observed_optimal_epoch
        ) / new_precision

        # 更新前记录
        self.history.append({
            "observed_epoch": observed_optimal_epoch,
            "wfe": wfe,
            "prior_mean": self.posterior_mean,
            "posterior_mean": new_mean,
            "selected_epoch": self._snap_to_config(new_mean),
        })

        self.posterior_mean = new_mean
        self.posterior_variance = 1.0 / new_precision

        return self._snap_to_config(new_mean)

    def _snap_to_config(self, continuous_epoch: float) -> int:
        """将连续估计值对齐到最近的有效Epoch配置。"""
        return min(self.epoch_configs, key=lambda e: abs(e - continuous_epoch))

    def get_current_epoch(self) -> int:
        """获取当前平滑后的Epoch,无需更新。"""
        return self._snap_to_config(self.posterior_mean)

Application Workflow

应用工作流

python
def apply_awfes_to_test(
    folds: list[Fold],
    model_factory: Callable,
    bayesian_selector: BayesianEpochSelector,
) -> list[dict]:
    """Apply AWFES with Bayesian smoothing to test data.

    Workflow per fold:
    1. Split into train/validation/test (60/20/20)
    2. Sweep epochs on train, compute WFE on validation
    3. Update Bayesian posterior with validation-optimal epoch
    4. Train final model at Bayesian-selected epoch on train+validation
    5. Evaluate on TEST (untouched data)
    """
    results = []

    for fold_idx, fold in enumerate(folds):
        # Step 1: Split data
        train, validation, test = fold.split_nested(
            train_pct=0.60,
            validation_pct=0.20,
            test_pct=0.20,
            embargo_pct=0.06,  # 6% gap at each boundary
        )

        # Step 2: Epoch sweep on train → validate on validation
        epoch_metrics = []
        for epoch in bayesian_selector.epoch_configs:
            model = model_factory()
            model.fit(train.X, train.y, epochs=epoch)

            is_sharpe = compute_sharpe(model.predict(train.X), train.y)
            val_sharpe = compute_sharpe(model.predict(validation.X), validation.y)

            # Use data-driven threshold instead of hardcoded 0.1
            is_threshold = compute_is_sharpe_threshold(len(train.X))
            wfe = val_sharpe / is_sharpe if is_sharpe > is_threshold else None

            epoch_metrics.append({
                "epoch": epoch,
                "is_sharpe": is_sharpe,
                "val_sharpe": val_sharpe,
                "wfe": wfe,
            })

        # Step 3: Find validation-optimal and update Bayesian
        val_optimal = max(
            [m for m in epoch_metrics if m["wfe"] is not None],
            key=lambda m: m["wfe"],
            default={"epoch": bayesian_selector.epoch_configs[0], "wfe": 0.3}
        )
        selected_epoch = bayesian_selector.update(
            val_optimal["epoch"],
            val_optimal["wfe"],
        )

        # Step 4: Train final model on train+validation at selected epoch
        combined_X = np.vstack([train.X, validation.X])
        combined_y = np.hstack([train.y, validation.y])
        final_model = model_factory()
        final_model.fit(combined_X, combined_y, epochs=selected_epoch)

        # Step 5: Evaluate on TEST (untouched)
        test_predictions = final_model.predict(test.X)
        test_metrics = compute_oos_metrics(test_predictions, test.y, test.timestamps)

        results.append({
            "fold_idx": fold_idx,
            "validation_optimal_epoch": val_optimal["epoch"],
            "bayesian_selected_epoch": selected_epoch,
            "test_metrics": test_metrics,
            "epoch_metrics": epoch_metrics,
        })

    return results
See references/oos-application.md for complete implementation.

python
def apply_awfes_to_test(
    folds: list[Fold],
    model_factory: Callable,
    bayesian_selector: BayesianEpochSelector,
) -> list[dict]:
    """将AWFES与贝叶斯平滑应用于测试数据。

    单折工作流:
    1. 拆分为训练/验证/测试(60/20/20)
    2. 在训练集上扫描Epoch,在验证集上计算WFE
    3. 使用验证最优Epoch更新贝叶斯后验
    4. 在训练+验证集上以贝叶斯选中的Epoch训练最终模型
    5. 在TEST(未接触的数据)上评估
    """
    results = []

    for fold_idx, fold in enumerate(folds):
        # 步骤1:拆分数据
        train, validation, test = fold.split_nested(
            train_pct=0.60,
            validation_pct=0.20,
            test_pct=0.20,
            embargo_pct=0.06,  # 每个边界6%的间隔
        )

        # 步骤2:在训练集上扫描Epoch → 在验证集上验证
        epoch_metrics = []
        for epoch in bayesian_selector.epoch_configs:
            model = model_factory()
            model.fit(train.X, train.y, epochs=epoch)

            is_sharpe = compute_sharpe(model.predict(train.X), train.y)
            val_sharpe = compute_sharpe(model.predict(validation.X), validation.y)

            # 使用数据驱动的阈值而非硬编码的0.1
            is_threshold = compute_is_sharpe_threshold(len(train.X))
            wfe = val_sharpe / is_sharpe if is_sharpe > is_threshold else None

            epoch_metrics.append({
                "epoch": epoch,
                "is_sharpe": is_sharpe,
                "val_sharpe": val_sharpe,
                "wfe": wfe,
            })

        # 步骤3:找到验证最优并更新贝叶斯
        val_optimal = max(
            [m for m in epoch_metrics if m["wfe"] is not None],
            key=lambda m: m["wfe"],
            default={"epoch": bayesian_selector.epoch_configs[0], "wfe": 0.3}
        )
        selected_epoch = bayesian_selector.update(
            val_optimal["epoch"],
            val_optimal["wfe"],
        )

        # 步骤4:在训练+验证集上以选中的Epoch训练最终模型
        combined_X = np.vstack([train.X, validation.X])
        combined_y = np.hstack([train.y, validation.y])
        final_model = model_factory()
        final_model.fit(combined_X, combined_y, epochs=selected_epoch)

        # 步骤5:在TEST(未接触的数据)上评估
        test_predictions = final_model.predict(test.X)
        test_metrics = compute_oos_metrics(test_predictions, test.y, test.timestamps)

        results.append({
            "fold_idx": fold_idx,
            "validation_optimal_epoch": val_optimal["epoch"],
            "bayesian_selected_epoch": selected_epoch,
            "test_metrics": test_metrics,
            "epoch_metrics": epoch_metrics,
        })

    return results
完整实现请参见references/oos-application.md

Epoch Smoothing Methods

Epoch平滑方法

Why Smooth Epoch Selections?

为什么要平滑Epoch选择?

Raw per-fold epoch selections are noisy due to:
  • Limited validation data per fold
  • Regime changes between folds
  • Stochastic training dynamics
Smoothing reduces variance while preserving signal.
原始的单折Epoch选择存在噪声,原因包括:
  • 每个折的验证数据有限
  • 折之间的regime变化
  • 训练的随机性
平滑可以减少方差同时保留信号。

Method Comparison

方法比较

MethodFormulaProsCons
Bayesian (Recommended)Precision-weighted updatePrincipled, handles uncertaintyMore complex
EMA
α × new + (1-α) × old
Simple, responsiveNo uncertainty quantification
SMAMean of last NMost stableSlow to adapt
MedianMedian of last NRobust to outliersLoses magnitude info
方法公式优点缺点
贝叶斯(推荐)精度加权更新原则性,处理不确定性更复杂
EMA
α × new + (1-α) × old
简单,响应快无不确定性量化
SMA最近N个的均值最稳定适应慢
中位数最近N个的中位数对异常值鲁棒丢失幅度信息

Bayesian Updating (Primary Method)

贝叶斯更新(主要方法)

python
def bayesian_epoch_update(
    prior_mean: float,
    prior_variance: float,
    observed_epoch: int,
    observation_variance: float,
    wfe_weight: float = 1.0,
) -> tuple[float, float]:
    """Single Bayesian update step.

    Mathematical formulation:
    - Prior: N(μ₀, σ₀²)
    - Observation: N(x, σ_obs²/wfe)  # WFE-weighted
    - Posterior: N(μ₁, σ₁²)

    Where:
    μ₁ = (μ₀/σ₀² + x·wfe/σ_obs²) / (1/σ₀² + wfe/σ_obs²)
    σ₁² = 1 / (1/σ₀² + wfe/σ_obs²)
    """
    # Effective observation variance (lower WFE = less reliable)
    eff_obs_var = observation_variance / max(wfe_weight, 0.1)

    prior_precision = 1.0 / prior_variance
    obs_precision = 1.0 / eff_obs_var

    posterior_precision = prior_precision + obs_precision
    posterior_mean = (
        prior_precision * prior_mean + obs_precision * observed_epoch
    ) / posterior_precision
    posterior_variance = 1.0 / posterior_precision

    return posterior_mean, posterior_variance
python
def bayesian_epoch_update(
    prior_mean: float,
    prior_variance: float,
    observed_epoch: int,
    observation_variance: float,
    wfe_weight: float = 1.0,
) -> tuple[float, float]:
    """单次贝叶斯更新步骤。

    数学公式:
    - 先验:N(μ₀, σ₀²)
    - 观测:N(x, σ_obs²/wfe)  # WFE加权
    - 后验:N(μ₁, σ₁²)

    其中:
    μ₁ = (μ₀/σ₀² + x·wfe/σ_obs²) / (1/σ₀² + wfe/σ_obs²)
    σ₁² = 1 / (1/σ₀² + wfe/σ_obs²)
    """
    # 有效观测方差(WFE越低,可靠性越差)
    eff_obs_var = observation_variance / max(wfe_weight, 0.1)

    prior_precision = 1.0 / prior_variance
    obs_precision = 1.0 / eff_obs_var

    posterior_precision = prior_precision + obs_precision
    posterior_mean = (
        prior_precision * prior_mean + obs_precision * observed_epoch
    ) / posterior_precision
    posterior_variance = 1.0 / posterior_precision

    return posterior_mean, posterior_variance

Exponential Moving Average (Alternative)

指数移动平均(替代方法)

python
def ema_epoch_update(
    current_ema: float,
    observed_epoch: int,
    alpha: float = 0.3,
) -> float:
    """EMA update: more weight on recent observations.

    α = 0.3 means ~90% of signal from last 7 folds.
    α = 0.5 means ~90% of signal from last 4 folds.
    """
    return alpha * observed_epoch + (1 - alpha) * current_ema
python
def ema_epoch_update(
    current_ema: float,
    observed_epoch: int,
    alpha: float = 0.3,
) -> float:
    """EMA更新:给最近观测值更高权重。

    α = 0.3表示~90%的信号来自最近7个折。
    α = 0.5表示~90%的信号来自最近4个折。
    """
    return alpha * observed_epoch + (1 - alpha) * current_ema

Initialization Strategies

初始化策略

StrategyWhen to UseImplementation
Midpoint priorNo domain knowledge
mean(epoch_configs)
Literature priorPublished optimal existsKnown optimal ± uncertainty
Burn-inSufficient dataUse first N folds for initialization
python
undefined
策略使用场景实现方式
中点先验无领域知识
mean(epoch_configs)
文献先验存在已发表的最优值已知最优值 ± 不确定性
预热期数据充足使用前N个折进行初始化
python
undefined

RECOMMENDED: Use AWFESConfig for principled derivation

推荐:使用AWFESConfig进行原则性推导

config = AWFESConfig.from_search_space( min_epoch=80, max_epoch=400, granularity=5, )
config = AWFESConfig.from_search_space( min_epoch=80, max_epoch=400, granularity=5, )

prior_variance = ((400-80)/3.92)² ≈ 6,658 (derived automatically)

prior_variance = ((400-80)/3.92)² ≈ 6,658(自动推导)

observation_variance = prior_variance/4 ≈ 1,665 (derived automatically)

observation_variance = prior_variance/4 ≈ 1,665(自动推导)

Alternative strategies (if manual configuration needed):

替代策略(如需手动配置):

Strategy 1: Search-space derived (same as AWFESConfig)

策略1:从搜索空间推导(与AWFESConfig相同)

epoch_range = max(EPOCH_CONFIGS) - min(EPOCH_CONFIGS) prior_mean = np.mean(EPOCH_CONFIGS) prior_variance = (epoch_range / 3.92) ** 2 # 95% CI spans search space
epoch_range = max(EPOCH_CONFIGS) - min(EPOCH_CONFIGS) prior_mean = np.mean(EPOCH_CONFIGS) prior_variance = (epoch_range / 3.92) ** 2 # 95%置信区间覆盖搜索空间

Strategy 2: Burn-in (use first 5 folds)

策略2:预热期(使用前5个折)

burn_in_optima = [run_fold_sweep(fold) for fold in folds[:5]] prior_mean = np.mean(burn_in_optima) base_variance = (epoch_range / 3.92) ** 2 / 4 # Reduced after burn-in prior_variance = max(np.var(burn_in_optima), base_variance)

See [references/epoch-smoothing.md](./references/epoch-smoothing.md) for extended analysis.

---
burn_in_optima = [run_fold_sweep(fold) for fold in folds[:5]] prior_mean = np.mean(burn_in_optima) base_variance = (epoch_range / 3.92) ** 2 / 4 # 预热期后减小 prior_variance = max(np.var(burn_in_optima), base_variance)

扩展分析请参见[references/epoch-smoothing.md](./references/epoch-smoothing.md)。

---

OOS Metrics Specification

OOS指标规范

Metric Tiers for Test Evaluation

测试评估的指标层级

Following rangebar-eval-metrics, compute these metrics on TEST data.
CRITICAL for Range Bars: Use time-weighted Sharpe (
sharpe_tw
) instead of simple bar Sharpe. See range-bar-metrics.md for the canonical implementation. The metrics below assume time-weighted computation for range bar data.
遵循rangebar-eval-metrics,在TEST数据上计算这些指标。
范围bar的关键注意事项:使用时间加权Sharpe (
sharpe_tw
)而非简单bar Sharpe。标准实现请参见range-bar-metrics.md。以下指标假设对范围bar数据使用时间加权计算。

Tier 1: Primary Metrics (Mandatory)

层级1:核心指标(必填)

MetricFormulaThresholdPurpose
sharpe_tw
Time-weighted (see range-bar-metrics.md)> 0Core performance
hit_rate
n_correct_sign / n_total
> 0.50Directional accuracy
cumulative_pnl
Σ(pred × actual)
> 0Total return
positive_sharpe_folds
n_folds(sharpe_tw > 0) / n_folds
> 0.55Consistency
wfe_test
test_sharpe_tw / validation_sharpe_tw
> 0.30Final transfer
指标公式阈值用途
sharpe_tw
时间加权(参见range-bar-metrics.md)> 0核心性能
hit_rate
n_correct_sign / n_total
> 0.50方向准确率
cumulative_pnl
Σ(pred × actual)
> 0总收益
positive_sharpe_folds
n_folds(sharpe_tw > 0) / n_folds
> 0.55一致性
wfe_test
test_sharpe_tw / validation_sharpe_tw
> 0.30最终迁移能力

Tier 2: Risk Metrics

层级2:风险指标

MetricFormulaThresholdPurpose
max_drawdown
max(peak - trough) / peak
< 0.30Worst loss
calmar_ratio
annual_return / max_drawdown
> 0.5Risk-adjusted
profit_factor
gross_profit / gross_loss
> 1.0Win/loss ratio
cvar_10pct
mean(worst 10% returns)
> -0.05Tail risk
指标公式阈值用途
max_drawdown
max(peak - trough) / peak
< 0.30最大回撤
calmar_ratio
annual_return / max_drawdown
> 0.5风险调整后收益
profit_factor
gross_profit / gross_loss
> 1.0盈亏比
cvar_10pct
mean(worst 10% returns)
> -0.05尾部风险

Tier 3: Statistical Validation

层级3:统计验证

MetricFormulaThresholdPurpose
psr
P(true_sharpe > 0)
> 0.85Statistical significance
dsr
sharpe - E[max_sharpe_null]
> 0.50Multiple testing adjusted
binomial_pvalue
binom.test(n_positive, n_total)
< 0.05Sign test
hac_ttest_pvalue
HAC-adjusted t-test< 0.05Autocorrelation robust
指标公式阈值用途
psr
P(true_sharpe > 0)
> 0.85统计显著性
dsr
sharpe - E[max_sharpe_null]
> 0.50多重检验调整
binomial_pvalue
binom.test(n_positive, n_total)
< 0.05符号检验
hac_ttest_pvalue
HAC调整t检验< 0.05自相关鲁棒检验

Metric Computation Code

指标计算代码

python
import numpy as np
from scipy.stats import norm, binomtest  # norm for PSR, binomtest for sign test

def compute_oos_metrics(
    predictions: np.ndarray,
    actuals: np.ndarray,
    timestamps: np.ndarray,
    duration_us: np.ndarray | None = None,  # Required for range bars
    market_type: str = "crypto_24_7",  # For annualization factor
) -> dict[str, float]:
    """Compute full OOS metrics suite for test data.

    Args:
        predictions: Model predictions (signed magnitude)
        actuals: Actual returns
        timestamps: Bar timestamps for daily aggregation
        duration_us: Bar durations in microseconds (REQUIRED for range bars)

    Returns:
        Dictionary with all tier metrics

    IMPORTANT: For range bars, pass duration_us to compute sharpe_tw.
    Simple bar_sharpe violates i.i.d. assumption - see range-bar-metrics.md.
    """
    pnl = predictions * actuals

    # Tier 1: Primary
    # For range bars: Use time-weighted Sharpe (canonical)
    if duration_us is not None:
        from exp066e_tau_precision import compute_time_weighted_sharpe
        sharpe_tw, weighted_std, total_days = compute_time_weighted_sharpe(
            bar_pnl=pnl,
            duration_us=duration_us,
            annualize=True,
        )
    else:
        # Fallback for time bars (all same duration)
        daily_pnl = group_by_day(pnl, timestamps)
        weekly_factor = get_daily_to_weekly_factor(market_type=market_type)
        sharpe_tw = (
            np.mean(daily_pnl) / np.std(daily_pnl) * weekly_factor
            if np.std(daily_pnl) > 1e-10 else 0.0
        )

    hit_rate = np.mean(np.sign(predictions) == np.sign(actuals))
    cumulative_pnl = np.sum(pnl)

    # Tier 2: Risk
    equity_curve = np.cumsum(pnl)
    running_max = np.maximum.accumulate(equity_curve)
    drawdowns = (running_max - equity_curve) / np.maximum(running_max, 1e-10)
    max_drawdown = np.max(drawdowns)

    gross_profit = np.sum(pnl[pnl > 0])
    gross_loss = abs(np.sum(pnl[pnl < 0]))
    profit_factor = gross_profit / gross_loss if gross_loss > 0 else float("inf")

    # CVaR (10%)
    sorted_pnl = np.sort(pnl)
    cvar_cutoff = max(1, int(len(sorted_pnl) * 0.10))
    cvar_10pct = np.mean(sorted_pnl[:cvar_cutoff])

    # Tier 3: Statistical (use sharpe_tw for PSR)
    sharpe_se = 1.0 / np.sqrt(len(pnl)) if len(pnl) > 0 else 1.0
    psr = norm.cdf(sharpe_tw / sharpe_se) if sharpe_se > 0 else 0.5

    n_positive = np.sum(pnl > 0)
    n_total = len(pnl)
    # Use binomtest (binom_test deprecated since scipy 1.10)
    binomial_pvalue = binomtest(n_positive, n_total, 0.5, alternative="greater").pvalue

    return {
        # Tier 1 (use sharpe_tw for range bars)
        "sharpe_tw": sharpe_tw,
        "hit_rate": hit_rate,
        "cumulative_pnl": cumulative_pnl,
        "n_bars": len(pnl),
        # Tier 2
        "max_drawdown": max_drawdown,
        "profit_factor": profit_factor,
        "cvar_10pct": cvar_10pct,
        # Tier 3
        "psr": psr,
        "binomial_pvalue": binomial_pvalue,
    }
python
import numpy as np
from scipy.stats import norm, binomtest  # norm用于PSR,binomtest用于符号检验

def compute_oos_metrics(
    predictions: np.ndarray,
    actuals: np.ndarray,
    timestamps: np.ndarray,
    duration_us: np.ndarray | None = None,  # 范围bar必填
    market_type: str = "crypto_24_7",  # 用于年化因子
) -> dict[str, float]:
    """为测试数据计算完整OOS指标集。

    参数:
        predictions: 模型预测值(带符号幅度)
        actuals: 实际收益
        timestamps: 用于日度聚合的bar时间戳
        duration_us: bar持续时间(微秒,范围bar必填)

    返回:
        包含所有层级指标的字典

    重要提示:对于范围bar,传递duration_us以计算sharpe_tw。
    简单bar_sharpe违反独立同分布假设 - 参见range-bar-metrics.md。
    """
    pnl = predictions * actuals

    # 层级1:核心
    # 范围bar:使用时间加权Sharpe(标准方法)
    if duration_us is not None:
        from exp066e_tau_precision import compute_time_weighted_sharpe
        sharpe_tw, weighted_std, total_days = compute_time_weighted_sharpe(
            bar_pnl=pnl,
            duration_us=duration_us,
            annualize=True,
        )
    else:
        # 时间bar回退(所有bar持续时间相同)
        daily_pnl = group_by_day(pnl, timestamps)
        weekly_factor = get_daily_to_weekly_factor(market_type=market_type)
        sharpe_tw = (
            np.mean(daily_pnl) / np.std(daily_pnl) * weekly_factor
            if np.std(daily_pnl) > 1e-10 else 0.0
        )

    hit_rate = np.mean(np.sign(predictions) == np.sign(actuals))
    cumulative_pnl = np.sum(pnl)

    # 层级2:风险
    equity_curve = np.cumsum(pnl)
    running_max = np.maximum.accumulate(equity_curve)
    drawdowns = (running_max - equity_curve) / np.maximum(running_max, 1e-10)
    max_drawdown = np.max(drawdowns)

    gross_profit = np.sum(pnl[pnl > 0])
    gross_loss = abs(np.sum(pnl[pnl < 0]))
    profit_factor = gross_profit / gross_loss if gross_loss > 0 else float("inf")

    # CVaR(10%)
    sorted_pnl = np.sort(pnl)
    cvar_cutoff = max(1, int(len(sorted_pnl) * 0.10))
    cvar_10pct = np.mean(sorted_pnl[:cvar_cutoff])

    # 层级3:统计(使用sharpe_tw计算PSR)
    sharpe_se = 1.0 / np.sqrt(len(pnl)) if len(pnl) > 0 else 1.0
    psr = norm.cdf(sharpe_tw / sharpe_se) if sharpe_se > 0 else 0.5

    n_positive = np.sum(pnl > 0)
    n_total = len(pnl)
    # 使用binomtest(binom_test自scipy 1.10起已弃用)
    binomial_pvalue = binomtest(n_positive, n_total, 0.5, alternative="greater").pvalue

    return {
        # 层级1(范围bar使用sharpe_tw)
        "sharpe_tw": sharpe_tw,
        "hit_rate": hit_rate,
        "cumulative_pnl": cumulative_pnl,
        "n_bars": len(pnl),
        # 层级2
        "max_drawdown": max_drawdown,
        "profit_factor": profit_factor,
        "cvar_10pct": cvar_10pct,
        # 层级3
        "psr": psr,
        "binomial_pvalue": binomial_pvalue,
    }

Aggregation Across Folds

跨折聚合

python
def aggregate_test_metrics(fold_results: list[dict]) -> dict[str, float]:
    """Aggregate test metrics across all folds.

    NOTE: For range bars, use sharpe_tw (time-weighted).
    See range-bar-metrics.md for why simple bar_sharpe is invalid for range bars.
    """
    metrics = [r["test_metrics"] for r in fold_results]

    # Positive Sharpe Folds (use sharpe_tw for range bars)
    sharpes = [m["sharpe_tw"] for m in metrics]
    positive_sharpe_folds = np.mean([s > 0 for s in sharpes])

    # Median for robustness
    median_sharpe_tw = np.median(sharpes)
    median_hit_rate = np.median([m["hit_rate"] for m in metrics])

    # DSR for multiple testing (use time-weighted Sharpe)
    n_trials = len(metrics)
    dsr = compute_dsr(median_sharpe_tw, n_trials)

    return {
        "n_folds": len(metrics),
        "positive_sharpe_folds": positive_sharpe_folds,
        "median_sharpe_tw": median_sharpe_tw,
        "mean_sharpe_tw": np.mean(sharpes),
        "std_sharpe_tw": np.std(sharpes),
        "median_hit_rate": median_hit_rate,
        "dsr": dsr,
        "total_pnl": sum(m["cumulative_pnl"] for m in metrics),
    }
See references/oos-metrics.md for threshold justifications.

python
def aggregate_test_metrics(fold_results: list[dict]) -> dict[str, float]:
    """跨所有折聚合测试指标。

    注意:对于范围bar,使用sharpe_tw(时间加权)。
    参见range-bar-metrics.md了解为什么简单bar_sharpe对范围bar无效。
    """
    metrics = [r["test_metrics"] for r in fold_results]

    # 正Sharpe折数(范围bar使用sharpe_tw)
    sharpes = [m["sharpe_tw"] for m in metrics]
    positive_sharpe_folds = np.mean([s > 0 for s in sharpes])

    # 中位数鲁棒性
    median_sharpe_tw = np.median(sharpes)
    median_hit_rate = np.median([m["hit_rate"] for m in metrics])

    # 多重检验DSR(使用时间加权Sharpe)
    n_trials = len(metrics)
    dsr = compute_dsr(median_sharpe_tw, n_trials)

    return {
        "n_folds": len(metrics),
        "positive_sharpe_folds": positive_sharpe_folds,
        "median_sharpe_tw": median_sharpe_tw,
        "mean_sharpe_tw": np.mean(sharpes),
        "std_sharpe_tw": np.std(sharpes),
        "median_hit_rate": median_hit_rate,
        "dsr": dsr,
        "total_pnl": sum(m["cumulative_pnl"] for m in metrics),
    }
阈值理由请参见references/oos-metrics.md

Look-Ahead Bias Prevention

前瞻偏差预防

The Problem

问题

Using the same data for epoch selection AND final evaluation creates look-ahead bias:
❌ WRONG: Use fold's own optimal epoch for fold's OOS evaluation
   - Epoch selection "sees" validation returns
   - Then apply same epoch to OOS from same period
   - Result: Overly optimistic performance
使用相同数据进行Epoch选择和最终评估会产生前瞻偏差:
❌ 错误:将折自身的最优Epoch用于该折的OOS评估
   - Epoch选择“看到”了验证收益
   - 然后将相同Epoch应用于同一时期的OOS
   - 结果:过于乐观的性能

The Solution: Nested WFO + Bayesian Lag

解决方案:嵌套WFO + 贝叶斯延迟

✅ CORRECT: Bayesian-smoothed epoch from PRIOR folds for current TEST
   - Epoch selection on train/validation (inner loop)
   - Update Bayesian posterior with validation-optimal
   - Apply Bayesian-selected epoch to TEST (outer loop)
   - TEST data completely untouched during selection
✅ 正确:使用来自PRIOR折的贝叶斯平滑Epoch用于当前TEST
   - 在内部循环的训练/验证上进行Epoch选择
   - 使用验证最优更新贝叶斯后验
   - 在外部循环的TEST上应用贝叶斯选中的Epoch
   - TEST数据在选择过程中完全未被接触

v3 Temporal Ordering (CRITICAL - 2026 Fix)

v3时序顺序(关键 - 2026修复)

The v3 implementation fixes a subtle but critical look-ahead bias bug in the original AWFES workflow. The key insight: TEST must use
prior_bayesian_epoch
, NOT
val_optimal_epoch
.
v3实现修复了原始AWFES工作流中一个微妙但严重的前瞻偏差bug。核心见解:TEST必须使用
prior_bayesian_epoch
,而非
val_optimal_epoch

The Bug (v2 and earlier)

错误(v2及更早版本)

python
undefined
python
undefined

v2 BUG: Bayesian update BEFORE test evaluation

v2错误:贝叶斯更新在测试评估之前

for fold in folds: epoch_metrics = sweep_epochs(fold.train, fold.validation) val_optimal_epoch = select_optimal(epoch_metrics)
# WRONG: Update Bayesian with current fold's val_optimal
bayesian.update(val_optimal_epoch, wfe)
selected_epoch = bayesian.get_current_epoch()  # CONTAMINATED!

# This selected_epoch is influenced by val_optimal from SAME fold
test_metrics = evaluate(selected_epoch, fold.test)  # LOOK-AHEAD BIAS
undefined
for fold in folds: epoch_metrics = sweep_epochs(fold.train, fold.validation) val_optimal_epoch = select_optimal(epoch_metrics)
# 错误:使用当前折的val_optimal更新贝叶斯
bayesian.update(val_optimal_epoch, wfe)
selected_epoch = bayesian.get_current_epoch()  # 被污染!

# 该selected_epoch受同一折的val_optimal影响
test_metrics = evaluate(selected_epoch, fold.test)  # 前瞻偏差
undefined

The Fix (v3)

修复(v3)

python
undefined
python
undefined

v3 CORRECT: Get prior epoch BEFORE any work on current fold

v3正确:在处理当前折之前先获取先验Epoch

for fold in folds: # Step 1: FIRST - Get epoch from ONLY prior folds prior_bayesian_epoch = bayesian.get_current_epoch() # BEFORE any fold work
# Step 2: Train and sweep to find this fold's optimal
epoch_metrics = sweep_epochs(fold.train, fold.validation)
val_optimal_epoch = select_optimal(epoch_metrics)

# Step 3: TEST uses prior_bayesian_epoch (NOT val_optimal!)
test_metrics = evaluate(prior_bayesian_epoch, fold.test)  # UNBIASED

# Step 4: AFTER test - update Bayesian for FUTURE folds only
bayesian.update(val_optimal_epoch, wfe)  # For fold+1, fold+2, ...
undefined
for fold in folds: # 步骤1:首先 - 仅从先验折获取Epoch prior_bayesian_epoch = bayesian.get_current_epoch() # 在处理折之前
# 步骤2:训练并扫描以找到该折的最优值
epoch_metrics = sweep_epochs(fold.train, fold.validation)
val_optimal_epoch = select_optimal(epoch_metrics)

# 步骤3:TEST使用prior_bayesian_epoch(而非val_optimal!)
test_metrics = evaluate(prior_bayesian_epoch, fold.test)  # 无偏差

# 步骤4:测试完成后 - 仅为未来折更新贝叶斯
bayesian.update(val_optimal_epoch, wfe)  # 用于折+1、折+2...
undefined

Why This Matters

为什么这很重要

Aspectv2 (Buggy)v3 (Fixed)
When Bayesian updatedBefore test evalAfter test eval
Test epoch sourceCurrent fold influencesOnly prior folds
Information flowFuture → PresentPast → Present only
Expected biasOptimistic by ~10-20%Unbiased
方面v2(错误)v3(修复)
贝叶斯更新时间测试评估前测试评估后
测试Epoch来源当前折有影响仅先验折
信息流未来→当前仅过去→现在
预期偏差乐观~10-20%无偏差

Validation Checkpoint

验证检查点

python
undefined
python
undefined

MANDATORY: Log these values for audit trail

强制要求:记录这些值用于审计追踪

fold_log.info( f"Fold {fold_idx}: " f"prior_bayesian_epoch={prior_bayesian_epoch}, " f"val_optimal_epoch={val_optimal_epoch}, " f"test_uses={prior_bayesian_epoch}" # MUST equal prior_bayesian_epoch )

See [references/look-ahead-bias.md](./references/look-ahead-bias.md) for detailed examples.
fold_log.info( f"折 {fold_idx}: " f"prior_bayesian_epoch={prior_bayesian_epoch}, " f"val_optimal_epoch={val_optimal_epoch}, " f"test_uses={prior_bayesian_epoch}" # 必须等于prior_bayesian_epoch )

详细示例请参见[references/look-ahead-bias.md](./references/look-ahead-bias.md)。

Embargo Requirements

间隔要求

BoundaryEmbargoRationale
Train → Validation6% of foldPrevent feature leakage
Validation → Test6% of foldPrevent selection leakage
Fold → Fold1 hour (calendar)Range bar duration
python
def compute_embargo_indices(
    n_total: int,
    train_pct: float = 0.60,
    val_pct: float = 0.20,
    test_pct: float = 0.20,
    embargo_pct: float = 0.06,
) -> dict[str, tuple[int, int]]:
    """Compute indices for nested split with embargoes.

    Returns dict with (start, end) tuples for each segment.
    """
    embargo_size = int(n_total * embargo_pct)

    train_end = int(n_total * train_pct)
    val_start = train_end + embargo_size
    val_end = val_start + int(n_total * val_pct)
    test_start = val_end + embargo_size
    test_end = n_total

    return {
        "train": (0, train_end),
        "embargo_1": (train_end, val_start),
        "validation": (val_start, val_end),
        "embargo_2": (val_end, test_start),
        "test": (test_start, test_end),
    }
边界间隔理由
训练→验证折的6%防止特征泄露
验证→测试折的6%防止选择泄露
折→折1小时(日历时间)范围bar持续时间
python
def compute_embargo_indices(
    n_total: int,
    train_pct: float = 0.60,
    val_pct: float = 0.20,
    test_pct: float = 0.20,
    embargo_pct: float = 0.06,
) -> dict[str, tuple[int, int]]:
    """计算带间隔的嵌套拆分索引。

    返回包含每个段(start, end)元组的字典。
    """
    embargo_size = int(n_total * embargo_pct)

    train_end = int(n_total * train_pct)
    val_start = train_end + embargo_size
    val_end = val_start + int(n_total * val_pct)
    test_start = val_end + embargo_size
    test_end = n_total

    return {
        "train": (0, train_end),
        "embargo_1": (train_end, val_start),
        "validation": (val_start, val_end),
        "embargo_2": (val_end, test_start),
        "test": (test_start, test_end),
    }

Validation Checklist

验证检查清单

Before running AWFES with OOS application:
  • Three-way split: Train/Validation/Test clearly separated
  • Embargoes: 6% gap at each boundary
  • Bayesian lag: Current fold uses posterior from prior folds
  • No peeking: Test data untouched until final evaluation
  • Temporal order: No shuffling, strict time sequence
  • Feature computation: Features computed BEFORE split, no recalculation
在运行AWFES进行OOS应用之前:
  • 三分拆分:训练/验证/测试清晰分离
  • 间隔:每个边界有6%的间隔
  • 贝叶斯延迟:当前折使用来自先验折的后验
  • 无窥探:测试数据在最终评估前完全未被接触
  • 时序顺序:无打乱,严格时间序列
  • 特征计算:特征在拆分前计算,不重新计算

Anti-Patterns

反模式

Anti-PatternDetectionFix
Using current fold's epoch on current fold's OOS
selected_epoch == fold_optimal_epoch
Use Bayesian posterior
Validation overlaps testDate ranges overlapAdd embargo
Features computed on full datasetScaler fit includes testPer-split scaling
Fold shufflingFolds not time-orderedEnforce temporal order
See references/look-ahead-bias.md for detailed examples.

反模式检测方式修复方案
将当前折的Epoch用于当前折的OOS
selected_epoch == fold_optimal_epoch
使用贝叶斯后验
验证集与测试集重叠日期范围重叠添加间隔
在全数据集上计算特征缩放器拟合包含测试集按拆分单独缩放
折打乱折不是时序排列强制时序顺序
详细示例请参见references/look-ahead-bias.md

References

参考链接

TopicReference File
Academic Literatureacademic-foundations.md
Mathematical Formulationmathematical-formulation.md
Decision Treeepoch-selection-decision-tree.md
Anti-Patternsanti-patterns.md
OOS Applicationoos-application.md
Epoch Smoothingepoch-smoothing.md
OOS Metricsoos-metrics.md
Look-Ahead Biaslook-ahead-bias.md
Feature Setsfeature-sets.md
xLSTM Implementationxlstm-implementation.md
Range Bar Metricsrange-bar-metrics.md
主题参考文件
学术文献academic-foundations.md
数学公式mathematical-formulation.md
决策树epoch-selection-decision-tree.md
反模式anti-patterns.md
OOS应用oos-application.md
Epoch平滑epoch-smoothing.md
OOS指标oos-metrics.md
前瞻偏差look-ahead-bias.md
特征集feature-sets.md
xLSTM实现xlstm-implementation.md
范围bar指标range-bar-metrics.md

Full Citations

完整引用

  • Bailey, D. H., & López de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting and non-normality. The Journal of Portfolio Management, 40(5), 94-107.
  • Bischl, B., et al. (2023). Multi-Objective Hyperparameter Optimization in Machine Learning. ACM Transactions on Evolutionary Learning and Optimization.
  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter 7.
  • Nomura, M., & Ono, I. (2021). Warm Starting CMA-ES for Hyperparameter Optimization. AAAI Conference on Artificial Intelligence.
  • Pardo, R. E. (2008). The Evaluation and Optimization of Trading Strategies, 2nd Edition. John Wiley & Sons.

  • Bailey, D. H., & López de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting and non-normality. The Journal of Portfolio Management, 40(5), 94-107.
  • Bischl, B., et al. (2023). Multi-Objective Hyperparameter Optimization in Machine Learning. ACM Transactions on Evolutionary Learning and Optimization.
  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter 7.
  • Nomura, M., & Ono, I. (2021). Warm Starting CMA-ES for Hyperparameter Optimization. AAAI Conference on Artificial Intelligence.
  • Pardo, R. E. (2008). The Evaluation and Optimization of Trading Strategies, 2nd Edition. John Wiley & Sons.

Troubleshooting

故障排除

IssueCauseSolution
WFE is NoneIS_Sharpe below noise floorCheck if IS_Sharpe > 2/sqrt(n_samples)
All epochs rejectedSevere overfittingReduce model complexity, add regularization
Bayesian posterior unstableHigh WFE varianceIncrease observation_variance or use median WFE
Epoch always at boundarySearch range too narrowExpand min_epoch or max_epoch bounds
Look-ahead bias detectedUsing val_optimal for testUse prior_bayesian_epoch for test evaluation
DSR too aggressiveToo many epoch candidatesLimit to 3-5 epoch configs (meta-overfitting risk)
Cauchy mean issuesArithmetic mean of WFEUse median or pooled WFE for aggregation
Fold metrics inconsistentVariable fold sizesUse pooled WFE (precision-weighted)
问题原因解决方案
WFE为NoneIS_Sharpe低于噪声下限检查IS_Sharpe > 2/sqrt(n_samples)
所有Epoch被拒绝严重过拟合降低模型复杂度,添加正则化
贝叶斯后验不稳定WFE方差高增加observation_variance或使用中位数WFE
Epoch始终在边界搜索范围过窄扩大min_epoch或max_epoch边界
检测到前瞻偏差使用val_optimal进行测试使用prior_bayesian_epoch进行测试评估
DSR过于激进Epoch候选过多限制为3-5个Epoch配置(元过拟合风险)
柯西均值问题WFE的算术均值使用中位数或加权WFE进行聚合
折指标不一致折大小可变使用加权WFE(精度加权)