adaptive-wfo-epoch

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Adaptive Walk-Forward Epoch Selection (AWFES)

自适应滚动窗口Epoch选择（AWFES）

Machine-readable reference for adaptive epoch selection within Walk-Forward Optimization (WFO). Optimizes training epochs per-fold using Walk-Forward Efficiency (WFE) as the objective.

这是滚动窗口优化（WFO）中自适应Epoch选择的机器可读参考文档，以滚动窗口效率（WFE）为目标，对每个折的训练Epoch进行优化。

When to Use This Skill

何时使用该工具

Use this skill when:

Selecting optimal training epochs for ML models in WFO
Avoiding overfitting via Walk-Forward Efficiency metrics
Implementing per-fold adaptive epoch selection
Computing efficient frontiers for epoch-performance trade-offs
Carrying epoch priors across WFO folds

在以下场景中使用该工具：

为WFO中的ML模型选择最优训练Epoch
通过滚动窗口效率指标避免过拟合
实现按折自适应的Epoch选择
计算Epoch-性能权衡的有效前沿
在WFO折之间传递Epoch先验值

Quick Start

快速开始

python

from adaptive_wfo_epoch import AWFESConfig, compute_efficient_frontier

python

from adaptive_wfo_epoch import AWFESConfig, compute_efficient_frontier

Generate epoch candidates from search bounds and granularity

从搜索范围和粒度生成Epoch候选值

config = AWFESConfig.from_search_space( min_epoch=100, max_epoch=2000, granularity=5, # Number of frontier points )

config = AWFESConfig.from_search_space( min_epoch=100, max_epoch=2000, granularity=5, # 前沿点数量 )

config.epoch_configs → [100, 211, 447, 945, 2000] (log-spaced)

config.epoch_configs → [100, 211, 447, 945, 2000]（对数间距）

Per-fold epoch sweep

按折进行Epoch扫描

for fold in wfo_folds: epoch_metrics = [] for epoch in config.epoch_configs: is_sharpe, oos_sharpe = train_and_evaluate(fold, epochs=epoch) wfe = config.compute_wfe(is_sharpe, oos_sharpe, n_samples=len(fold.train)) epoch_metrics.append({"epoch": epoch, "wfe": wfe, "is_sharpe": is_sharpe})

# Select from efficient frontier
selected_epoch = compute_efficient_frontier(epoch_metrics)

# Carry forward to next fold as prior
prior_epoch = selected_epoch

undefined

# 从有效前沿中选择
selected_epoch = compute_efficient_frontier(epoch_metrics)

# 传递到下一个折作为先验值
prior_epoch = selected_epoch

undefined

Methodology Overview

方法概述

What This Is

什么是AWFES

Per-fold adaptive epoch selection where:

Train models across a range of epochs (e.g., 400, 800, 1000, 2000)
Compute WFE = OOS_Sharpe / IS_Sharpe for each epoch count
Find the "efficient frontier" - epochs maximizing WFE vs training cost
Select optimal epoch from frontier for OOS evaluation
Carry forward as prior for next fold

按折自适应Epoch选择，步骤如下：

在一系列Epoch上训练模型（例如400、800、1000、2000）
为每个Epoch数量计算WFE = OOS_Sharpe / IS_Sharpe
找到“有效前沿”——在WFE与训练成本之间达到最优的Epoch
从前沿中选择最优Epoch用于OOS评估
将该Epoch传递到下一个折作为先验值

What This Is NOT

什么不是AWFES

NOT early stopping: Early stopping monitors validation loss continuously; this evaluates discrete candidates post-hoc
NOT Bayesian optimization: No surrogate model; direct evaluation of all candidates
NOT nested cross-validation: Uses temporal WFO, not shuffled splits

不是早停机制：早停会持续监控验证损失，而本工具是在事后评估离散候选值
不是贝叶斯优化：没有 surrogate 模型，直接评估所有候选值
不是嵌套交叉验证：使用时序WFO，而非打乱的拆分

Academic Foundations

学术基础

Concept	Citation	Key Insight
Walk-Forward Efficiency	Pardo (1992, 2008)	WFE = OOS_Return / IS_Return as robustness metric
Deflated Sharpe Ratio	Bailey & López de Prado (2014)	Adjusts for multiple testing
Pareto-Optimal HP Selection	Bischl et al. (2023)	Multi-objective hyperparameter optimization
Warm-Starting	Nomura & Ono (2021)	Transfer knowledge between optimization runs

See references/academic-foundations.md for full literature review.

概念	引用文献	核心见解
滚动窗口效率（Walk-Forward Efficiency）	Pardo (1992, 2008)	WFE = OOS_Return / IS_Return 作为鲁棒性指标
压缩夏普比率（Deflated Sharpe Ratio）	Bailey & López de Prado (2014)	针对多重检验进行调整
帕累托最优超参数选择（Pareto-Optimal HP Selection）	Bischl et al. (2023)	多目标超参数优化
热启动（Warm-Starting）	Nomura & Ono (2021)	在优化运行之间传递知识

完整文献综述请参见 references/academic-foundations.md。

Core Formula: Walk-Forward Efficiency

核心公式：滚动窗口效率

python

def compute_wfe(
    is_sharpe: float,
    oos_sharpe: float,
    n_samples: int | None = None,
) -> float | None:
    """Walk-Forward Efficiency - measures performance transfer.

    WFE = OOS_Sharpe / IS_Sharpe

    Interpretation (guidelines, not hard thresholds):
    - WFE ≥ 0.70: Excellent transfer (low overfitting)
    - WFE 0.50-0.70: Good transfer
    - WFE 0.30-0.50: Moderate transfer (investigate)
    - WFE < 0.30: Severe overfitting (likely reject)

    The IS_Sharpe minimum is derived from signal-to-noise ratio,
    not a fixed magic number. See compute_is_sharpe_threshold().

    Reference: Pardo (2008) "The Evaluation and Optimization of Trading Strategies"
    """
    # Data-driven threshold: IS_Sharpe must exceed 2σ noise floor
    min_is_sharpe = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1

    if abs(is_sharpe) < min_is_sharpe:
        return None
    return oos_sharpe / is_sharpe

python

def compute_wfe(
    is_sharpe: float,
    oos_sharpe: float,
    n_samples: int | None = None,
) -> float | None:
    """滚动窗口效率 - 衡量性能迁移能力。

    WFE = OOS_Sharpe / IS_Sharpe

    解读（指南，非硬性阈值）：
    - WFE ≥ 0.70: 优秀的迁移能力（低过拟合）
    - WFE 0.50-0.70: 良好的迁移能力
    - WFE 0.30-0.50: 中等迁移能力（需调研）
    - WFE < 0.30: 严重过拟合（可能需要拒绝）

    IS_Sharpe的最小值由信噪比推导而来，
    并非固定的魔法数字。请参见compute_is_sharpe_threshold()。

    参考：Pardo (2008) 《交易策略的评估与优化》
    """
    # 数据驱动的阈值：IS_Sharpe必须超过2σ噪声下限
    min_is_sharpe = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1

    if abs(is_sharpe) < min_is_sharpe:
        return None
    return oos_sharpe / is_sharpe

Principled Configuration Framework

原则性配置框架

All parameters in AWFES are derived from first principles or data characteristics, not arbitrary magic numbers.

AWFES中的所有参数均从第一性原理或数据特征推导而来，而非任意的魔法数字。

AWFESConfig: Unified Configuration

AWFESConfig：统一配置

python

from dataclasses import dataclass, field
from typing import Literal
import numpy as np

@dataclass
class AWFESConfig:
    """AWFES configuration with principled parameter derivation.

    No magic numbers - all values derived from search space or data.
    """
    # Search space bounds (user-specified)
    min_epoch: int
    max_epoch: int
    granularity: int  # Number of frontier points

    # Derived automatically
    epoch_configs: list[int] = field(init=False)
    prior_variance: float = field(init=False)
    observation_variance: float = field(init=False)

    # Market context for annualization
    # crypto_session_filtered: Use when data is filtered to London-NY weekday hours
    market_type: Literal["crypto_24_7", "crypto_session_filtered", "equity", "forex"] = "crypto_24_7"
    time_unit: Literal["bar", "daily", "weekly"] = "weekly"

    def __post_init__(self):
        # Generate epoch configs with log spacing (optimal for frontier discovery)
        self.epoch_configs = self._generate_epoch_configs()

        # Derive Bayesian variances from search space
        self.prior_variance, self.observation_variance = self._derive_variances()

    def _generate_epoch_configs(self) -> list[int]:
        """Generate epoch candidates with log spacing.

        Log spacing is optimal for efficient frontier because:
        1. Early epochs: small changes matter more (underfit → fit transition)
        2. Late epochs: diminishing returns (already near convergence)
        3. Uniform coverage of the WFE vs cost trade-off space

        Formula: epoch_i = min × (max/min)^(i/(n-1))
        """
        if self.granularity < 2:
            return [self.min_epoch]

        log_min = np.log(self.min_epoch)
        log_max = np.log(self.max_epoch)
        log_epochs = np.linspace(log_min, log_max, self.granularity)

        return sorted(set(int(round(np.exp(e))) for e in log_epochs))

    def _derive_variances(self) -> tuple[float, float]:
        """Derive Bayesian variances from search space.

        Principle: Prior should span the search space with ~95% coverage.

        For Normal distribution: 95% CI = mean ± 1.96σ
        If we want 95% of prior mass in [min_epoch, max_epoch]:
            range = max - min = 2 × 1.96 × σ = 3.92σ
            σ = range / 3.92
            σ² = (range / 3.92)²

        Observation variance: Set to achieve reasonable learning rate.
        Rule: observation_variance ≈ prior_variance / 4
        This means each observation updates the posterior meaningfully
        but doesn't dominate the prior immediately.
        """
        epoch_range = self.max_epoch - self.min_epoch
        prior_std = epoch_range / 3.92  # 95% CI spans search space
        prior_variance = prior_std ** 2

        # Observation variance: 1/4 of prior for balanced learning
        # This gives ~0.2 weight to each new observation initially
        observation_variance = prior_variance / 4

        return prior_variance, observation_variance

    @classmethod
    def from_search_space(
        cls,
        min_epoch: int,
        max_epoch: int,
        granularity: int = 5,
        market_type: str = "crypto_24_7",
    ) -> "AWFESConfig":
        """Create config from search space bounds."""
        return cls(
            min_epoch=min_epoch,
            max_epoch=max_epoch,
            granularity=granularity,
            market_type=market_type,
        )

    def compute_wfe(
        self,
        is_sharpe: float,
        oos_sharpe: float,
        n_samples: int | None = None,
    ) -> float | None:
        """Compute WFE with data-driven IS_Sharpe threshold."""
        min_is = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1
        if abs(is_sharpe) < min_is:
            return None
        return oos_sharpe / is_sharpe

    def get_annualization_factor(self) -> float:
        """Get annualization factor to scale Sharpe from time_unit to ANNUAL.

        IMPORTANT: This returns sqrt(periods_per_year) for scaling to ANNUAL Sharpe.
        For daily-to-weekly scaling, use get_daily_to_weekly_factor() instead.

        Principled derivation:
        - Sharpe scales with √(periods per year)
        - Crypto 24/7: 365 days/year, 52.14 weeks/year
        - Crypto session-filtered: 252 days/year (like equity)
        - Equity: 252 trading days/year, ~52 weeks/year
        - Forex: ~252 days/year (varies by pair)
        """
        PERIODS_PER_YEAR = {
            ("crypto_24_7", "daily"): 365,
            ("crypto_24_7", "weekly"): 52.14,
            ("crypto_24_7", "bar"): None,  # Cannot annualize bars directly
            ("crypto_session_filtered", "daily"): 252,  # London-NY weekdays only
            ("crypto_session_filtered", "weekly"): 52,
            ("equity", "daily"): 252,
            ("equity", "weekly"): 52,
            ("forex", "daily"): 252,
        }

        key = (self.market_type, self.time_unit)
        periods = PERIODS_PER_YEAR.get(key)

        if periods is None:
            raise ValueError(
                f"Cannot annualize {self.time_unit} for {self.market_type}. "
                "Use daily or weekly aggregation first."
            )

        return np.sqrt(periods)

    def get_daily_to_weekly_factor(self) -> float:
        """Get factor to scale DAILY Sharpe to WEEKLY Sharpe.

        This is different from get_annualization_factor()!
        - Daily → Weekly: sqrt(days_per_week)
        - Daily → Annual: sqrt(days_per_year)  (use get_annualization_factor)

        Market-specific:
        - Crypto 24/7: sqrt(7) = 2.65 (7 trading days/week)
        - Crypto session-filtered: sqrt(5) = 2.24 (weekdays only)
        - Equity: sqrt(5) = 2.24 (5 trading days/week)
        """
        DAYS_PER_WEEK = {
            "crypto_24_7": 7,
            "crypto_session_filtered": 5,  # London-NY weekdays only
            "equity": 5,
            "forex": 5,
        }

        days = DAYS_PER_WEEK.get(self.market_type)
        if days is None:
            raise ValueError(f"Unknown market type: {self.market_type}")

        return np.sqrt(days)

python

from dataclasses import dataclass, field
from typing import Literal
import numpy as np

@dataclass
class AWFESConfig:
    """AWFES配置，基于原则性参数推导。

    无魔法数字 - 所有值均从搜索空间或数据中推导得出。
    """
    # 搜索空间边界（用户指定）
    min_epoch: int
    max_epoch: int
    granularity: int  # 前沿点数量

    # 自动推导
    epoch_configs: list[int] = field(init=False)
    prior_variance: float = field(init=False)
    observation_variance: float = field(init=False)

    # 用于年化的市场环境
    # crypto_session_filtered: 当数据过滤为伦敦-纽约工作日时段时使用
    market_type: Literal["crypto_24_7", "crypto_session_filtered", "equity", "forex"] = "crypto_24_7"
    time_unit: Literal["bar", "daily", "weekly"] = "weekly"

    def __post_init__(self):
        # 生成对数间距的Epoch配置（对前沿发现最优）
        self.epoch_configs = self._generate_epoch_configs()

        # 从搜索空间推导贝叶斯方差
        self.prior_variance, self.observation_variance = self._derive_variances()

    def _generate_epoch_configs(self) -> list[int]:
        """生成对数间距的Epoch候选值。

        对数间距对有效前沿而言是最优的，原因如下：
        1. 早期Epoch：微小变化影响更大（从欠拟合到拟合的过渡）
        2. 晚期Epoch：收益递减（已接近收敛）
        3. 均匀覆盖WFE与成本的权衡空间

        公式：epoch_i = min × (max/min)^(i/(n-1))
        """
        if self.granularity < 2:
            return [self.min_epoch]

        log_min = np.log(self.min_epoch)
        log_max = np.log(self.max_epoch)
        log_epochs = np.linspace(log_min, log_max, self.granularity)

        return sorted(set(int(round(np.exp(e))) for e in log_epochs))

    def _derive_variances(self) -> tuple[float, float]:
        """从搜索空间推导贝叶斯方差。

        原则：先验分布应覆盖搜索空间的~95%范围。

        对于正态分布：95%置信区间 = 均值 ± 1.96σ
        如果希望95%的先验质量落在[min_epoch, max_epoch]内：
            range = max - min = 2 × 1.96 × σ = 3.92σ
            σ = range / 3.92
            σ² = (range / 3.92)²

        观测方差：设置为实现合理的学习率。
        规则：observation_variance ≈ prior_variance / 4
        这意味着每个观测值都会有意义地更新后验分布，但不会立即主导先验分布。
        """
        epoch_range = self.max_epoch - self.min_epoch
        prior_std = epoch_range / 3.92  # 95%置信区间覆盖搜索空间
        prior_variance = prior_std ** 2

        # 观测方差：先验的1/4，实现平衡学习
        # 初始时每个新观测值的权重约为0.2
        observation_variance = prior_variance / 4

        return prior_variance, observation_variance

    @classmethod
    def from_search_space(
        cls,
        min_epoch: int,
        max_epoch: int,
        granularity: int = 5,
        market_type: str = "crypto_24_7",
    ) -> "AWFESConfig":
        """从搜索空间边界创建配置。"""
        return cls(
            min_epoch=min_epoch,
            max_epoch=max_epoch,
            granularity=granularity,
            market_type=market_type,
        )

    def compute_wfe(
        self,
        is_sharpe: float,
        oos_sharpe: float,
        n_samples: int | None = None,
    ) -> float | None:
        """使用数据驱动的IS_Sharpe阈值计算WFE。"""
        min_is = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1
        if abs(is_sharpe) < min_is:
            return None
        return oos_sharpe / is_sharpe

    def get_annualization_factor(self) -> float:
        """获取年化因子，将Sharpe从time_unit转换为年度。

        重要提示：返回sqrt(每年周期数)用于转换为年度Sharpe。
        如需从日度转换为周度，请使用get_daily_to_weekly_factor()。

        原则性推导：
        - Sharpe与√(每年周期数)成正比
        - 加密货币24/7：每年365天，52.14周
        - 过滤时段的加密货币：每年252天（与股票相同）
        - 股票：每年252个交易日，约52周
        - 外汇：每年约252天（因货币对而异）
        """
        PERIODS_PER_YEAR = {
            ("crypto_24_7", "daily"): 365,
            ("crypto_24_7", "weekly"): 52.14,
            ("crypto_24_7", "bar"): None,  # 无法直接对bar进行年化
            ("crypto_session_filtered", "daily"): 252,  # 仅伦敦-纽约工作日
            ("crypto_session_filtered", "weekly"): 52,
            ("equity", "daily"): 252,
            ("equity", "weekly"): 52,
            ("forex", "daily"): 252,
        }

        key = (self.market_type, self.time_unit)
        periods = PERIODS_PER_YEAR.get(key)

        if periods is None:
            raise ValueError(
                f"无法对{self.market_type}的{self.time_unit}进行年化。 "
                "请先进行日度或周度聚合。"
            )

        return np.sqrt(periods)

    def get_daily_to_weekly_factor(self) -> float:
        """获取将日度Sharpe转换为周度Sharpe的因子。

        这与get_annualization_factor()不同！
        - 日度 → 周度：sqrt(每周天数)
        - 日度 → 年度：sqrt(每年天数) （使用get_annualization_factor）

        市场特定值：
        - 加密货币24/7：sqrt(7) = 2.65（每周7个交易日）
        - 过滤时段的加密货币：sqrt(5) = 2.24（仅工作日）
        - 股票：sqrt(5) = 2.24（每周5个交易日）
        """
        DAYS_PER_WEEK = {
            "crypto_24_7": 7,
            "crypto_session_filtered": 5,  # 仅伦敦-纽约工作日
            "equity": 5,
            "forex": 5,
        }

        days = DAYS_PER_WEEK.get(self.market_type)
        if days is None:
            raise ValueError(f"未知市场类型：{self.market_type}")

        return np.sqrt(days)

IS_Sharpe Threshold: Signal-to-Noise Derivation

IS_Sharpe阈值：信噪比推导

python

def compute_is_sharpe_threshold(n_samples: int | None = None) -> float:
    """Compute minimum IS_Sharpe threshold from signal-to-noise ratio.

    Principle: IS_Sharpe must be statistically distinguishable from zero.

    Under null hypothesis (no skill), Sharpe ~ N(0, 1/√n).
    To reject null at α=0.05 (one-sided), need Sharpe > 1.645/√n.

    For practical use, we use 2σ threshold (≈97.7% confidence):
        threshold = 2.0 / √n

    This adapts to sample size:
    - n=100: threshold ≈ 0.20
    - n=400: threshold ≈ 0.10
    - n=1600: threshold ≈ 0.05

    Fallback for unknown n: 0.1 (assumes n≈400, typical fold size)

    Rationale for 0.1 fallback:
    - 2/√400 = 0.1, so 0.1 assumes ~400 samples per fold
    - This is conservative: 400 samples is typical for weekly folds
    - If actual n is smaller, threshold is looser (accepts more noise)
    - If actual n is larger, threshold is tighter (fine, we're conservative)
    - The 0.1 value also corresponds to "not statistically distinguishable
      from zero at reasonable sample sizes" - a natural floor for Sharpe SE
    """
    if n_samples is None or n_samples < 10:
        # Conservative fallback: 0.1 assumes ~400 samples (typical fold size)
        # Derivation: 2/√400 = 0.1; see rationale above
        return 0.1

    return 2.0 / np.sqrt(n_samples)

python

def compute_is_sharpe_threshold(n_samples: int | None = None) -> float:
    """从信噪比推导最小IS_Sharpe阈值。

    原则：IS_Sharpe必须在统计上显著区别于零。

    在原假设（无技能）下，Sharpe ~ N(0, 1/√n)。
    要在α=0.05（单侧）下拒绝原假设，需要Sharpe > 1.645/√n。

    实际使用中，我们使用2σ阈值（≈97.7%置信度）：
        threshold = 2.0 / √n

    该阈值会随样本量调整：
    - n=100: 阈值≈0.20
    - n=400: 阈值≈0.10
    - n=1600: 阈值≈0.05

    未知n时的 fallback：0.1（假设n≈400，典型折大小）

    0.1 fallback的理由：
    - 2/√400 = 0.1，因此0.1假设每个折约400个样本
    - 这是保守值：400个样本是周度折的典型大小
    - 如果实际n更小，阈值更宽松（接受更多噪声）
    - 如果实际n更大，阈值更严格（没问题，我们是保守的）
    - 0.1值也对应“在合理样本量下无法与零统计区分”——Sharpe标准误的自然下限
    """
    if n_samples is None or n_samples < 10:
        # 保守fallback：0.1假设约400个样本（典型折大小）
        # 推导：2/√400 = 0.1；参见上述理由
        return 0.1

    return 2.0 / np.sqrt(n_samples)

Guardrails (Principled Guidelines)

防护准则（原则性指南）

G1: WFE Thresholds

G1: WFE阈值

The traditional thresholds (0.30, 0.50, 0.70) are guidelines based on practitioner consensus, not derived from first principles. They represent:

Threshold	Meaning	Statistical Basis
0.30	Hard reject	Retaining <30% of IS performance is almost certainly noise
0.50	Warning	At 50%, half the signal is lost - investigate
0.70	Target	Industry standard for "good" transfer

python

undefined

传统阈值（0.30、0.50、0.70）是基于从业者共识的指南，而非从第一性原理推导而来。它们代表：

阈值	含义	统计基础
0.30	硬性拒绝	保留的IS性能<30%几乎可以肯定是噪声
0.50	警告	达到50%时，一半信号丢失——需调研
0.70	目标值	行业标准的“良好”迁移能力

python

undefined

These are GUIDELINES, not hard rules

这些是指南，而非硬性规则

Adjust based on your domain and risk tolerance

根据你的领域和风险容忍度进行调整

WFE_THRESHOLDS = { "hard_reject": 0.30, # Below this: almost certainly overfitting "warning": 0.50, # Below this: significant signal loss "target": 0.70, # Above this: good generalization }

def classify_wfe(wfe: float | None) -> str: """Classify WFE with principled thresholds.""" if wfe is None: return "INVALID" # IS_Sharpe below noise floor if wfe < WFE_THRESHOLDS["hard_reject"]: return "REJECT" if wfe < WFE_THRESHOLDS["warning"]: return "INVESTIGATE" if wfe < WFE_THRESHOLDS["target"]: return "ACCEPTABLE" return "EXCELLENT"

undefined

WFE_THRESHOLDS = { "hard_reject": 0.30, # 低于此值：几乎肯定过拟合 "warning": 0.50, # 低于此值：显著信号丢失 "target": 0.70, # 高于此值：良好的泛化能力 }

def classify_wfe(wfe: float | None) -> str: """使用原则性阈值对WFE进行分类。""" if wfe is None: return "INVALID" # IS_Sharpe低于噪声下限 if wfe < WFE_THRESHOLDS["hard_reject"]: return "REJECT" if wfe < WFE_THRESHOLDS["warning"]: return "INVESTIGATE" if wfe < WFE_THRESHOLDS["target"]: return "ACCEPTABLE" return "EXCELLENT"

undefined

G2: IS_Sharpe Minimum (Data-Driven)

G2: IS_Sharpe最小值（数据驱动）

OLD (magic number):

python

undefined

旧方法（魔法数字）：

python

undefined

WRONG: Fixed threshold regardless of sample size

错误：固定阈值，与样本量无关

if is_sharpe < 1.0: wfe = None


**NEW (principled):**

```python

if is_sharpe < 1.0: wfe = None


**新方法（原则性）：**

```python

CORRECT: Threshold adapts to sample size

正确：阈值随样本量调整

min_is_sharpe = compute_is_sharpe_threshold(n_samples) if is_sharpe < min_is_sharpe: wfe = None # Below noise floor for this sample size


The threshold derives from the standard error of Sharpe ratio: SE(SR) ≈ 1/√n.

**Note on SE(Sharpe) approximation**: The formula `1/√n` is a first-order approximation valid when SR is small (close to 0). The full Lo (2002) formula is:

SE(SR) = √((1 + 0.5×SR²) / n)


For high-Sharpe strategies (SR > 1.0), the simplified formula underestimates SE by ~25-50%. Use the full formula when evaluating strategies with SR > 1.0.

min_is_sharpe = compute_is_sharpe_threshold(n_samples) if is_sharpe < min_is_sharpe: wfe = None # 低于该样本量下的噪声下限


该阈值来源于Sharpe比率的标准误：SE(SR) ≈ 1/√n。

**关于SE(Sharpe)近似的说明**：公式`1/√n`是一阶近似，在SR较小（接近0）时有效。完整的Lo (2002)公式为：

SE(SR) = √((1 + 0.5×SR²) / n)


对于高Sharpe策略（SR > 1.0），简化公式会低估SE约25-50%。当评估SR > 1.0的策略时，请使用完整公式。

G3: Stability Penalty for Epoch Changes (Adaptive)

G3: Epoch变化的稳定性惩罚（自适应）

The stability penalty prevents hyperparameter churn. Instead of fixed thresholds, use relative improvement based on WFE variance:

python

def compute_stability_threshold(wfe_history: list[float]) -> float:
    """Compute stability threshold from observed WFE variance.

    Principle: Require improvement exceeding noise level.

    If WFE has std=0.15 across folds, random fluctuation could be ±0.15.
    To distinguish signal from noise, require improvement > 1σ of WFE.

    Minimum: 5% (prevent switching on negligible improvements)
    Maximum: 20% (don't be overly conservative)
    """
    if len(wfe_history) < 3:
        return 0.10  # Default until enough history

    wfe_std = np.std(wfe_history)
    threshold = max(0.05, min(0.20, wfe_std))
    return threshold


class AdaptiveStabilityPenalty:
    """Stability penalty that adapts to observed WFE variance."""

    def __init__(self):
        self.wfe_history: list[float] = []
        self.epoch_changes: list[int] = []

    def should_change_epoch(
        self,
        current_wfe: float,
        candidate_wfe: float,
        current_epoch: int,
        candidate_epoch: int,
    ) -> bool:
        """Decide whether to change epochs based on adaptive threshold."""
        self.wfe_history.append(current_wfe)

        if current_epoch == candidate_epoch:
            return False  # Same epoch, no change needed

        threshold = compute_stability_threshold(self.wfe_history)
        improvement = (candidate_wfe - current_wfe) / max(abs(current_wfe), 0.01)

        if improvement > threshold:
            self.epoch_changes.append(len(self.wfe_history))
            return True

        return False  # Improvement not significant

稳定性惩罚用于防止超参数频繁变化。替代固定阈值，使用基于WFE方差的相对改进：

python

def compute_stability_threshold(wfe_history: list[float]) -> float:
    """从观测到的WFE方差计算稳定性阈值。

    原则：要求改进超过噪声水平。

    如果WFE在各折中的标准差为0.15，随机波动可能为±0.15。
    为区分信号与噪声，要求改进> WFE的1σ。

    最小值：5%（防止因微小改进而切换）
    最大值：20%（不过于保守）
    """
    if len(wfe_history) < 3:
        return 0.10  # 有足够历史前的默认值

    wfe_std = np.std(wfe_history)
    threshold = max(0.05, min(0.20, wfe_std))
    return threshold


class AdaptiveStabilityPenalty:
    """自适应稳定性惩罚，基于观测到的WFE方差调整。"""

    def __init__(self):
        self.wfe_history: list[float] = []
        self.epoch_changes: list[int] = []

    def should_change_epoch(
        self,
        current_wfe: float,
        candidate_wfe: float,
        current_epoch: int,
        candidate_epoch: int,
    ) -> bool:
        """基于自适应阈值决定是否更换Epoch。"""
        self.wfe_history.append(current_wfe)

        if current_epoch == candidate_epoch:
            return False  # 相同Epoch，无需更换

        threshold = compute_stability_threshold(self.wfe_history)
        improvement = (candidate_wfe - current_wfe) / max(abs(current_wfe), 0.01)

        if improvement > threshold:
            self.epoch_changes.append(len(self.wfe_history))
            return True

        return False  # 改进不显著

G4: DSR Adjustment for Epoch Search (Principled)

G4: 针对Epoch搜索的DSR调整（原则性）

python

def adjusted_dsr_for_epoch_search(
    sharpe: float,
    n_folds: int,
    n_epochs: int,
    sharpe_se: float | None = None,
    n_samples_per_fold: int | None = None,
) -> float:
    """Deflated Sharpe Ratio accounting for epoch selection multiplicity.

    When selecting from K epochs, the expected maximum Sharpe under null
    is inflated. This adjustment corrects for that selection bias.

    Principled SE estimation:
    - If n_samples provided: SE(Sharpe) ≈ 1/√n
    - Otherwise: estimate from typical fold size

    Reference: Bailey & López de Prado (2014), Gumbel distribution
    """
    from math import sqrt, log, pi

    n_trials = n_folds * n_epochs  # Total selection events

    if n_trials < 2:
        return sharpe  # No multiple testing correction needed

    # Expected maximum under null (Gumbel distribution)
    # E[max(Z_1, ..., Z_n)] ≈ √(2·ln(n)) - (γ + ln(π/2)) / √(2·ln(n))
    # where γ ≈ 0.5772 is Euler-Mascheroni constant
    euler_gamma = 0.5772156649
    sqrt_2_log_n = sqrt(2 * log(n_trials))
    e_max_z = sqrt_2_log_n - (euler_gamma + log(pi / 2)) / sqrt_2_log_n

    # Estimate Sharpe SE if not provided
    if sharpe_se is None:
        if n_samples_per_fold is not None:
            sharpe_se = 1.0 / sqrt(n_samples_per_fold)
        else:
            # Conservative default: assume ~300 samples per fold
            sharpe_se = 1.0 / sqrt(300)

    # Expected maximum Sharpe under null
    e_max_sharpe = e_max_z * sharpe_se

    # Deflated Sharpe
    return max(0, sharpe - e_max_sharpe)

Example: For 5 epochs × 50 folds = 250 trials with 300 samples/fold:

```
sharpe_se ≈ 0.058
```
```
e_max_z ≈ 2.88
```
```
e_max_sharpe ≈ 0.17
```
A Sharpe of 1.0 deflates to 0.83 after adjustment.

python

def adjusted_dsr_for_epoch_search(
    sharpe: float,
    n_folds: int,
    n_epochs: int,
    sharpe_se: float | None = None,
    n_samples_per_fold: int | None = None,
) -> float:
    """针对Epoch选择的多重检验调整压缩夏普比率。

    当从K个Epoch中选择时，原假设下的最大Sharpe会被高估。此调整用于修正该选择偏差。

    原则性SE估计：
    - 如果提供n_samples：SE(Sharpe) ≈ 1/√n
    - 否则：从典型折大小估计

    参考：Bailey & López de Prado (2014)，Gumbel分布
    """
    from math import sqrt, log, pi

    n_trials = n_folds * n_epochs  # 总选择事件数

    if n_trials < 2:
        return sharpe  # 无需多重检验校正

    # 原假设下的期望最大值（Gumbel分布）
    # E[max(Z_1, ..., Z_n)] ≈ √(2·ln(n)) - (γ + ln(π/2)) / √(2·ln(n))
    # 其中γ ≈ 0.5772是欧拉-马歇罗尼常数
    euler_gamma = 0.5772156649
    sqrt_2_log_n = sqrt(2 * log(n_trials))
    e_max_z = sqrt_2_log_n - (euler_gamma + log(pi / 2)) / sqrt_2_log_n

    # 如果未提供，估计Sharpe SE
    if sharpe_se is None:
        if n_samples_per_fold is not None:
            sharpe_se = 1.0 / sqrt(n_samples_per_fold)
        else:
            # 保守默认：假设每个折约300个样本
            sharpe_se = 1.0 / sqrt(300)

    # 原假设下的期望最大Sharpe
    e_max_sharpe = e_max_z * sharpe_se

    # 压缩夏普比率
    return max(0, sharpe - e_max_sharpe)

示例：对于5个Epoch × 50个折 = 250次试验，每个折300个样本：

```
sharpe_se ≈ 0.058
```
```
e_max_z ≈ 2.88
```
```
e_max_sharpe ≈ 0.17
```
Sharpe为1.0时，调整后为0.83。

WFE Aggregation Methods

WFE聚合方法

WARNING: Cauchy Distribution Under Null

Under the null hypothesis (no predictive skill), WFE follows a Cauchy distribution, which has:

No defined mean (undefined expectation)
No defined variance (infinite)
Heavy tails (extreme values common)

This makes arithmetic mean unreliable. A single extreme WFE can dominate the average. Always prefer median or pooled methods for robust WFE aggregation. See references/mathematical-formulation.md for the proof:

WFE | H0 ~ Cauchy(0, sqrt(T_IS/T_OOS))

警告：原假设下的柯西分布

在原假设（无预测能力）下，WFE服从柯西分布，该分布具有：

无定义均值（期望不存在）
无定义方差（无穷大）
厚尾（极端值常见）

这使得算术均值不可靠。单个极端WFE会主导平均值。始终优先使用中位数或加权聚合方法进行鲁棒的WFE聚合。证明请参见references/mathematical-formulation.md：

WFE | H0 ~ Cauchy(0, sqrt(T_IS/T_OOS))

。

Method 1: Pooled WFE (Recommended for precision-weighted)

方法1：加权WFE（推荐用于精度加权）

python

def pooled_wfe(fold_results: list[dict]) -> float:
    """Weights each fold by its sample size (precision).

    Formula: Σ(T_OOS × SR_OOS) / Σ(T_IS × SR_IS)

    Advantage: More stable than arithmetic mean, handles varying fold sizes.
    Use when: Fold sizes vary significantly.
    """
    numerator = sum(r["n_oos"] * r["oos_sharpe"] for r in fold_results)
    denominator = sum(r["n_is"] * r["is_sharpe"] for r in fold_results)

    if denominator < 1e-10:
        return float("nan")
    return numerator / denominator

python

def pooled_wfe(fold_results: list[dict]) -> float:
    """按样本量（精度）对每个折进行加权。

    公式：Σ(T_OOS × SR_OOS) / Σ(T_IS × SR_IS)

    优势：比算术均值更稳定，处理不同折大小。
    使用场景：折大小差异显著时。
    """
    numerator = sum(r["n_oos"] * r["oos_sharpe"] for r in fold_results)
    denominator = sum(r["n_is"] * r["is_sharpe"] for r in fold_results)

    if denominator < 1e-10:
        return float("nan")
    return numerator / denominator

Method 2: Median WFE (Recommended for robustness)

方法2：中位数WFE（推荐用于鲁棒性）

python

def median_wfe(fold_results: list[dict]) -> float:
    """Robust to outliers, standard in robust statistics.

    Advantage: Single extreme fold doesn't dominate.
    Use when: Suspected outlier folds (regime changes, data issues).
    """
    wfes = [r["wfe"] for r in fold_results if r["wfe"] is not None]
    return float(np.median(wfes)) if wfes else float("nan")

python

def median_wfe(fold_results: list[dict]) -> float:
    """对异常值鲁棒，是鲁棒统计中的标准方法。

    优势：单个极端折不会主导结果。
    使用场景：怀疑存在异常折（ regime变化、数据问题）时。
    """
    wfes = [r["wfe"] for r in fold_results if r["wfe"] is not None]
    return float(np.median(wfes)) if wfes else float("nan")

Method 3: Weighted Arithmetic Mean

方法3：加权算术均值

python

def weighted_mean_wfe(fold_results: list[dict]) -> float:
    """Weights by inverse variance (efficiency weighting).

    Formula: Σ(w_i × WFE_i) / Σ(w_i)
    where w_i = 1 / Var(WFE_i) ≈ n_oos × n_is / (n_oos + n_is)

    Advantage: Optimal when combining estimates of different precision.
    Use when: All folds have similar characteristics.
    """
    weighted_sum = 0.0
    weight_total = 0.0

    for r in fold_results:
        if r["wfe"] is None:
            continue
        weight = r["n_oos"] * r["n_is"] / (r["n_oos"] + r["n_is"] + 1e-10)
        weighted_sum += weight * r["wfe"]
        weight_total += weight

    return weighted_sum / weight_total if weight_total > 0 else float("nan")

python

def weighted_mean_wfe(fold_results: list[dict]) -> float:
    """按逆方差加权（效率加权）。

    公式：Σ(w_i × WFE_i) / Σ(w_i)
    其中w_i = 1 / Var(WFE_i) ≈ n_oos × n_is / (n_oos + n_is)

    优势：当合并不同精度的估计时最优。
    使用场景：所有折特征相似时。
    """
    weighted_sum = 0.0
    weight_total = 0.0

    for r in fold_results:
        if r["wfe"] is None:
            continue
        weight = r["n_oos"] * r["n_is"] / (r["n_oos"] + r["n_is"] + 1e-10)
        weighted_sum += weight * r["wfe"]
        weight_total += weight

    return weighted_sum / weight_total if weight_total > 0 else float("nan")

Aggregation Selection Guide

聚合方法选择指南

Scenario	Recommended Method	Rationale
Variable fold sizes	Pooled WFE	Weights by precision
Suspected outliers	Median WFE	Robust to extremes
Homogeneous folds	Weighted mean	Optimal efficiency
Reporting	All three	Cross-check consistency

场景	推荐方法	理由
折大小可变	加权WFE	按精度加权
怀疑存在异常折	中位数WFE	对极端值鲁棒
折特征相似	加权均值	最优效率
报告	三种方法都用	交叉检查一致性

Efficient Frontier Algorithm

有效前沿算法

python

def compute_efficient_frontier(
    epoch_metrics: list[dict],
    wfe_weight: float = 1.0,
    time_weight: float = 0.1,
) -> tuple[list[int], int]:
    """
    Find Pareto-optimal epochs and select best.

    An epoch is on the frontier if no other epoch dominates it
    (better WFE AND lower training time).

    Args:
        epoch_metrics: List of {epoch, wfe, training_time_sec}
        wfe_weight: Weight for WFE in selection (higher = prefer generalization)
        time_weight: Weight for training time (higher = prefer speed)

    Returns:
        (frontier_epochs, selected_epoch)
    """
    import numpy as np

    # Filter valid metrics
    valid = [(m["epoch"], m["wfe"], m.get("training_time_sec", m["epoch"]))
             for m in epoch_metrics
             if m["wfe"] is not None and np.isfinite(m["wfe"])]

    if not valid:
        # Fallback: return epoch with best OOS Sharpe
        best_oos = max(epoch_metrics, key=lambda m: m.get("oos_sharpe", 0))
        return ([best_oos["epoch"]], best_oos["epoch"])

    # Pareto dominance check
    frontier = []
    for i, (epoch_i, wfe_i, time_i) in enumerate(valid):
        dominated = False
        for j, (epoch_j, wfe_j, time_j) in enumerate(valid):
            if i == j:
                continue
            # j dominates i if: better/equal WFE AND lower/equal time (strict in at least one)
            if (wfe_j >= wfe_i and time_j <= time_i and
                (wfe_j > wfe_i or time_j < time_i)):
                dominated = True
                break
        if not dominated:
            frontier.append((epoch_i, wfe_i, time_i))

    frontier_epochs = [e for e, _, _ in frontier]

    if len(frontier) == 1:
        return (frontier_epochs, frontier[0][0])

    # Weighted score selection
    wfes = np.array([w for _, w, _ in frontier])
    times = np.array([t for _, _, t in frontier])

    wfe_norm = (wfes - wfes.min()) / (wfes.max() - wfes.min() + 1e-10)
    time_norm = (times.max() - times) / (times.max() - times.min() + 1e-10)

    scores = wfe_weight * wfe_norm + time_weight * time_norm
    best_idx = np.argmax(scores)

    return (frontier_epochs, frontier[best_idx][0])

python

def compute_efficient_frontier(
    epoch_metrics: list[dict],
    wfe_weight: float = 1.0,
    time_weight: float = 0.1,
) -> tuple[list[int], int]:
    """
    找到帕累托最优Epoch并选择最优值。

    一个Epoch在前沿上的条件是：没有其他Epoch能在WFE和训练时间上同时优于它
    （更好的WFE AND更短的训练时间，至少有一项严格更优）。

    参数：
        epoch_metrics: 列表，元素为{epoch, wfe, training_time_sec}
        wfe_weight: 选择中WFE的权重（值越高越偏好泛化能力）
        time_weight: 选择中训练时间的权重（值越高越偏好速度）

    返回：
        (frontier_epochs, selected_epoch)
    """
    import numpy as np

    # 过滤有效指标
    valid = [(m["epoch"], m["wfe"], m.get("training_time_sec", m["epoch"]))
             for m in epoch_metrics
             if m["wfe"] is not None and np.isfinite(m["wfe"])]

    if not valid:
        # 回退：返回OOS Sharpe最优的Epoch
        best_oos = max(epoch_metrics, key=lambda m: m.get("oos_sharpe", 0))
        return ([best_oos["epoch"]], best_oos["epoch"])

    # 帕累托占优检查
    frontier = []
    for i, (epoch_i, wfe_i, time_i) in enumerate(valid):
        dominated = False
        for j, (epoch_j, wfe_j, time_j) in enumerate(valid):
            if i == j:
                continue
            # j占优i的条件：WFE更好/相等 且 训练时间更短/相等（至少一项严格更优）
            if (wfe_j >= wfe_i and time_j <= time_i and
                (wfe_j > wfe_i or time_j < time_i)):
                dominated = True
                break
        if not dominated:
            frontier.append((epoch_i, wfe_i, time_i))

    frontier_epochs = [e for e, _, _ in frontier]

    if len(frontier) == 1:
        return (frontier_epochs, frontier[0][0])

    # 加权分数选择
    wfes = np.array([w for _, w, _ in frontier])
    times = np.array([t for _, _, t in frontier])

    wfe_norm = (wfes - wfes.min()) / (wfes.max() - wfes.min() + 1e-10)
    time_norm = (times.max() - times) / (times.max() - times.min() + 1e-10)

    scores = wfe_weight * wfe_norm + time_weight * time_norm
    best_idx = np.argmax(scores)

    return (frontier_epochs, frontier[best_idx][0])

Carry-Forward Mechanism

传递机制

python

class AdaptiveEpochSelector:
    """Maintains epoch selection state across WFO folds with adaptive stability."""

    def __init__(self, epoch_configs: list[int]):
        self.epoch_configs = epoch_configs
        self.selection_history: list[dict] = []
        self.last_selected: int | None = None
        self.stability = AdaptiveStabilityPenalty()  # Use adaptive, not fixed

    def select_epoch(self, epoch_metrics: list[dict]) -> int:
        """Select epoch with adaptive stability penalty for changes."""
        frontier_epochs, candidate = compute_efficient_frontier(epoch_metrics)

        # Apply adaptive stability penalty if changing epochs
        if self.last_selected is not None and candidate != self.last_selected:
            candidate_wfe = next(
                m["wfe"] for m in epoch_metrics if m["epoch"] == candidate
            )
            last_wfe = next(
                (m["wfe"] for m in epoch_metrics if m["epoch"] == self.last_selected),
                0.0
            )

            # Use adaptive threshold derived from WFE variance
            if not self.stability.should_change_epoch(
                last_wfe, candidate_wfe, self.last_selected, candidate
            ):
                candidate = self.last_selected

        # Record and return
        self.selection_history.append({
            "epoch": candidate,
            "frontier": frontier_epochs,
            "changed": candidate != self.last_selected,
        })
        self.last_selected = candidate
        return candidate

python

class AdaptiveEpochSelector:
    """在WFO折之间维护Epoch选择状态，带有自适应稳定性。"""

    def __init__(self, epoch_configs: list[int]):
        self.epoch_configs = epoch_configs
        self.selection_history: list[dict] = []
        self.last_selected: int | None = None
        self.stability = AdaptiveStabilityPenalty()  # 使用自适应，而非固定

    def select_epoch(self, epoch_metrics: list[dict]) -> int:
        """使用自适应稳定性惩罚选择Epoch。"""
        frontier_epochs, candidate = compute_efficient_frontier(epoch_metrics)

        # 更换Epoch时应用自适应稳定性惩罚
        if self.last_selected is not None and candidate != self.last_selected:
            candidate_wfe = next(
                m["wfe"] for m in epoch_metrics if m["epoch"] == candidate
            )
            last_wfe = next(
                (m["wfe"] for m in epoch_metrics if m["epoch"] == self.last_selected),
                0.0
            )

            # 使用从WFE方差推导的自适应阈值
            if not self.stability.should_change_epoch(
                last_wfe, candidate_wfe, self.last_selected, candidate
            ):
                candidate = self.last_selected

        # 记录并返回
        self.selection_history.append({
            "epoch": candidate,
            "frontier": frontier_epochs,
            "changed": candidate != self.last_selected,
        })
        self.last_selected = candidate
        return candidate

Anti-Patterns

反模式

Anti-Pattern	Symptom	Fix	Severity
Expanding window (range bars)	Train size grows per fold	Use fixed sliding window	CRITICAL
Peak picking	Best epoch always at sweep boundary	Expand range, check for plateau	HIGH
Insufficient folds	effective_n < 30	Increase folds or data span	HIGH
Ignoring temporal autocorr	Folds correlated	Use purged CV, gap between folds	HIGH
Overfitting to IS	IS >> OOS Sharpe	Reduce epochs, add regularization	HIGH
sqrt(252) for crypto	Inflated Sharpe	Use sqrt(365) or sqrt(7) weekly	MEDIUM
Single epoch selection	No uncertainty quantification	Report confidence interval	MEDIUM
Meta-overfitting	Epoch selection itself overfits	Limit to 3-4 candidates max	HIGH

CRITICAL: Never use expanding window for range bar ML training. Expanding windows create fold non-equivalence, regime dilution, and systematically bias risk metrics. See references/anti-patterns.md for the full analysis (Section 7).

反模式	症状	修复方案	严重程度
扩展窗口（范围bar）	训练集大小随折增长	使用固定滑动窗口	严重
峰值选取	最优Epoch始终在扫描边界	扩大范围，检查是否存在平台	高
折数量不足	effective_n < 30	增加折数量或数据跨度	高
忽略时序自相关	折之间存在相关性	使用净化CV，在折之间添加间隔	高
过拟合到IS	IS >> OOS Sharpe	减少Epoch，添加正则化	高
对加密货币使用sqrt(252)	Sharpe被高估	使用sqrt(365)或周度的sqrt(7)	中
单个Epoch选择	无不确定性量化	报告置信区间	中
元过拟合	Epoch选择本身过拟合	最多限制为3-4个候选Epoch	高

严重警告：永远不要对范围bar ML训练使用扩展窗口。扩展窗口会导致折非等价、regime稀释，并系统性地偏置风险指标。完整分析请参见references/anti-patterns.md（第7节）。

Decision Tree

决策树

See references/epoch-selection-decision-tree.md for the full practitioner decision tree.

Start
  │
  ├─ IS_Sharpe > compute_is_sharpe_threshold(n)? ──NO──> Mark WFE invalid, use fallback
  │         │                                            (threshold = 2/√n, adapts to sample size)
  │        YES
  │         │
  ├─ Compute WFE for each epoch
  │         │
  ├─ Any WFE > 0.30? ──NO──> REJECT all epochs (severe overfit)
  │         │                (guideline, not hard threshold)
  │        YES
  │         │
  ├─ Compute efficient frontier
  │         │
  ├─ Apply AdaptiveStabilityPenalty
  │         │ (threshold derived from WFE variance)
  └─> Return selected epoch

完整从业者决策树请参见references/epoch-selection-decision-tree.md。

开始
  │
  ├─ IS_Sharpe > compute_is_sharpe_threshold(n)? ──NO──> 标记WFE无效，使用回退方案
  │         │                                            (阈值 = 2/√n，随样本量调整)
  │        YES
  │         │
  ├─ 为每个Epoch计算WFE
  │         │
  ├─ 是否有WFE > 0.30? ──NO──> 拒绝所有Epoch（严重过拟合）
  │         │                (指南，非硬性阈值)
  │        YES
  │         │
  ├─ 计算有效前沿
  │         │
  ├─ 应用AdaptiveStabilityPenalty
  │         │ (阈值从WFE方差推导)
  └─> 返回选中的Epoch

Integration with rangebar-eval-metrics

与rangebar-eval-metrics集成

This skill extends rangebar-eval-metrics:

Metric Source	Used For	Reference
`sharpe_tw`	WFE numerator (OOS) and denominator (IS)	range-bar-metrics.md
`n_bars`	Sample size for aggregation weights	metrics-schema.md
`psr` , `dsr`	Final acceptance criteria	sharpe-formulas.md
`prediction_autocorr`	Validate model isn't collapsed	ml-prediction-quality.md
`is_collapsed`	Model health check	ml-prediction-quality.md
Extended risk metrics	Deep risk analysis (optional)	risk-metrics.md

本工具扩展了rangebar-eval-metrics：

指标来源	用途	参考链接
`sharpe_tw`	WFE的分子（OOS）和分母（IS）	range-bar-metrics.md
`n_bars`	聚合权重的样本大小	metrics-schema.md
`psr` , `dsr`	最终验收标准	sharpe-formulas.md
`prediction_autocorr`	验证模型未崩溃	ml-prediction-quality.md
`is_collapsed`	模型健康检查	ml-prediction-quality.md
扩展风险指标	深度风险分析（可选）	risk-metrics.md

Recommended Workflow

OOS Application Phase

OOS应用阶段

Overview

概述

After epoch selection via efficient frontier, apply the selected epochs to held-out test data for final OOS performance metrics. This phase produces "live trading" results that simulate deployment.

通过有效前沿选择Epoch后，将选中的Epoch应用于保留的测试数据以获取最终OOS性能指标。此阶段生成模拟部署的“实盘交易”结果。

Nested WFO Structure

嵌套WFO结构

AWFES uses Nested WFO with three data splits per fold:

                    AWFES: Nested WFO Data Split (per fold)

#############     +----------+     +---------+     +----------+     #==========#

AWFES使用嵌套WFO，每个折包含三个数据拆分：

                    AWFES: 嵌套WFO数据拆分（每个折）

#############     +----------+     +---------+     +----------+     #==========#

Train 60% # --> | Gap 6% A | --> | Val 20% | --> | Gap 6% B | --> H Test 20% H

训练集60% # --> | 间隔6% A | --> | 验证集20% | --> | 间隔6% B | --> H 测试集20% H

############# +----------+ +---------+ +----------+ #==========#


<details>
<summary>graph-easy source</summary>

graph { label: "AWFES: Nested WFO Data Split (per fold)"; flow: east; }

[ Train 60% ] { border: bold; } [ Gap 6% A ] [ Val 20% ] [ Gap 6% B ] [ Test 20% ] { border: double; }

[ Train 60% ] -> [ Gap 6% A ] [ Gap 6% A ] -> [ Val 20% ] [ Val 20% ] -> [ Gap 6% B ] [ Gap 6% B ] -> [ Test 20% ]


</details>

############# +----------+ +---------+ +----------+ #==========#


<details>
<summary>graph-easy源码</summary>

graph { label: "AWFES: 嵌套WFO数据拆分（每个折）"; flow: east; }

[ 训练集60% ] { border: bold; } [ 间隔6% A ] [ 验证集20% ] [ 间隔6% B ] [ 测试集20% ] { border: double; }

[ 训练集60% ] -> [ 间隔6% A ] [ 间隔6% A ] -> [ 验证集20% ] [ 验证集20% ] -> [ 间隔6% B ] [ 间隔6% B ] -> [ 测试集20% ]


</details>

Per-Fold Workflow

单折工作流

                  AWFES: Per-Fold Workflow

                   -----------------------
                  |      Fold i Data      |
                   -----------------------
                    |
                    v
                  +-----------------------+
                  | Split: Train/Val/Test |
                  +-----------------------+
                    |
                    v
                  +-----------------------+
                  | Epoch Sweep on Train  |
                  +-----------------------+
                    |
                    v
                  +-----------------------+
                  |  Compute WFE on Val   |
                  +-----------------------+
                    |
                    | val optimal
                    v
                  #=======================#
                  H    Bayesian Update    H
                  #=======================#
                    |
                    | smoothed epoch
                    v
                  +-----------------------+
                  |   Train Final Model   |
                  +-----------------------+
                    |
                    v
                  #=======================#
                  H   Evaluate on Test    H
                  #=======================#
                    |
                    v
                   -----------------------
                  |    Fold i Metrics     |
                   -----------------------

<details> <summary>graph-easy source</summary>

graph { label: "AWFES: Per-Fold Workflow"; flow: south; }

[ Fold i Data ] { shape: rounded; }
[ Split: Train/Val/Test ]
[ Epoch Sweep on Train ]
[ Compute WFE on Val ]
[ Bayesian Update ] { border: double; }
[ Train Final Model ]
[ Evaluate on Test ] { border: double; }
[ Fold i Metrics ] { shape: rounded; }

[ Fold i Data ] -> [ Split: Train/Val/Test ]
[ Split: Train/Val/Test ] -> [ Epoch Sweep on Train ]
[ Epoch Sweep on Train ] -> [ Compute WFE on Val ]
[ Compute WFE on Val ] -- val optimal --> [ Bayesian Update ]
[ Bayesian Update ] -- smoothed epoch --> [ Train Final Model ]
[ Train Final Model ] -> [ Evaluate on Test ]
[ Evaluate on Test ] -> [ Fold i Metrics ]

</details>

                  AWFES: 单折工作流

                   -----------------------
                  |      折i数据      |
                   -----------------------
                    |
                    v
                  +-----------------------+
                  | 拆分：训练/验证/测试 |
                  +-----------------------+
                    |
                    v
                  +-----------------------+
                  | 在训练集上进行Epoch扫描 |
                  +-----------------------+
                    |
                    v
                  +-----------------------+
                  |  在验证集上计算WFE   |
                  +-----------------------+
                    |
                    | 验证集最优
                    v
                  #=======================#
                  H    贝叶斯更新    H
                  #=======================#
                    |
                    | 平滑后的Epoch
                    v
                  +-----------------------+
                  |   训练最终模型   |
                  +-----------------------+
                    |
                    v
                  #=======================#
                  H   在测试集上评估    H
                  #=======================#
                    |
                    v
                   -----------------------
                  |    折i指标     |
                   -----------------------

<details> <summary>graph-easy源码</summary>

graph { label: "AWFES: 单折工作流"; flow: south; }

[ 折i数据 ] { shape: rounded; }
[ 拆分：训练/验证/测试 ]
[ 在训练集上进行Epoch扫描 ]
[ 在验证集上计算WFE ]
[ 贝叶斯更新 ] { border: double; }
[ 训练最终模型 ]
[ 在测试集上评估 ] { border: double; }
[ 折i指标 ] { shape: rounded; }

[ 折i数据 ] -> [ 拆分：训练/验证/测试 ]
[ 拆分：训练/验证/测试 ] -> [ 在训练集上进行Epoch扫描 ]
[ 在训练集上进行Epoch扫描 ] -> [ 在验证集上计算WFE ]
[ 在验证集上计算WFE ] -- 验证集最优 --> [ 贝叶斯更新 ]
[ 贝叶斯更新 ] -- 平滑后的Epoch --> [ 训练最终模型 ]
[ 训练最终模型 ] -> [ 在测试集上评估 ]
[ 在测试集上评估 ] -> [ 折i指标 ]

</details>

Bayesian Carry-Forward Across Folds

跨折贝叶斯传递

                                 AWFES: Bayesian Carry-Forward Across Folds

 -------   init   +--------+  posterior   +--------+  posterior   +--------+     +--------+      -----------
| Prior | ------> | Fold 1 | -----------> | Fold 2 | -----------> | Fold 3 | ..> | Fold N | --> | Aggregate |
 -------          +--------+              +--------+              +--------+     +--------+      -----------

<details> <summary>graph-easy source</summary>

graph { label: "AWFES: Bayesian Carry-Forward Across Folds"; flow: east; }

[ Prior ] { shape: rounded; }
[ Fold 1 ]
[ Fold 2 ]
[ Fold 3 ]
[ Fold N ]
[ Aggregate ] { shape: rounded; }

[ Prior ] -- init --> [ Fold 1 ]
[ Fold 1 ] -- posterior --> [ Fold 2 ]
[ Fold 2 ] -- posterior --> [ Fold 3 ]
[ Fold 3 ] ..> [ Fold N ]
[ Fold N ] -> [ Aggregate ]

</details>

                                 AWFES: 跨折贝叶斯传递

 -------   初始化   +--------+  后验   +--------+  后验   +--------+     +--------+      -----------
| 先验 | ------> | 折1 | -----------> | 折2 | -----------> | 折3 | ..> | 折N | --> | 聚合 |
 -------          +--------+              +--------+              +--------+     +--------+      -----------

<details> <summary>graph-easy源码</summary>

graph { label: "AWFES: 跨折贝叶斯传递"; flow: east; }

[ 先验 ] { shape: rounded; }
[ 折1 ]
[ 折2 ]
[ 折3 ]
[ 折N ]
[ 聚合 ] { shape: rounded; }

[ 先验 ] -- 初始化 --> [ 折1 ]
[ 折1 ] -- 后验 --> [ 折2 ]
[ 折2 ] -- 后验 --> [ 折3 ]
[ 折3 ] ..> [ 折N ]
[ 折N ] -> [ 聚合 ]

</details>

Bayesian Epoch Selection for OOS

用于OOS的贝叶斯Epoch选择

Instead of using the current fold's optimal epoch (look-ahead bias), use Bayesian-smoothed epoch from prior folds:

python

class BayesianEpochSelector:
    """Bayesian updating of epoch selection across folds.

    Also known as: BayesianEpochSmoother (alias in epoch-smoothing.md)

    Variance parameters are DERIVED from search space, not hard-coded.
    See AWFESConfig._derive_variances() for the principled derivation.
    """

    def __init__(
        self,
        epoch_configs: list[int],
        prior_mean: float | None = None,
        prior_variance: float | None = None,
        observation_variance: float | None = None,
    ):
        self.epoch_configs = sorted(epoch_configs)

        # PRINCIPLED DERIVATION: Variances from search space
        # If not provided, derive from epoch range
        epoch_range = max(epoch_configs) - min(epoch_configs)

        # Prior spans search space with 95% coverage
        # 95% CI = mean ± 1.96σ → range = 3.92σ → σ² = (range/3.92)²
        default_prior_var = (epoch_range / 3.92) ** 2

        # Observation variance: 1/4 of prior for balanced learning
        default_obs_var = default_prior_var / 4

        self.posterior_mean = prior_mean or np.mean(epoch_configs)
        self.posterior_variance = prior_variance or default_prior_var
        self.observation_variance = observation_variance or default_obs_var
        self.history: list[dict] = []

    def update(self, observed_optimal_epoch: int, wfe: float) -> int:
        """Update posterior with new fold's optimal epoch.

        Uses precision-weighted Bayesian update:
        posterior_mean = (prior_precision * prior_mean + obs_precision * obs) /
                        (prior_precision + obs_precision)

        Args:
            observed_optimal_epoch: Optimal epoch from current fold's validation
            wfe: Walk-Forward Efficiency (used to weight observation)

        Returns:
            Smoothed epoch selection for TEST evaluation
        """
        # Weight observation by WFE (higher WFE = more reliable signal)
        # Clamp WFE to [0.1, 2.0] to prevent extreme weights:
        #   - Lower bound 0.1: Prevents division issues and ensures minimum weight
        #   - Upper bound 2.0: WFE > 2 is suspicious (OOS > 2× IS suggests:
        #       a) Regime shift favoring OOS (lucky timing, not skill)
        #       b) IS severely overfit (artificially low denominator)
        #       c) Data anomaly or look-ahead bias
        #     Capping at 2.0 treats such observations with skepticism
        wfe_clamped = max(0.1, min(wfe, 2.0))
        effective_variance = self.observation_variance / wfe_clamped

        prior_precision = 1.0 / self.posterior_variance
        obs_precision = 1.0 / effective_variance

        # Bayesian update
        new_precision = prior_precision + obs_precision
        new_mean = (
            prior_precision * self.posterior_mean +
            obs_precision * observed_optimal_epoch
        ) / new_precision

        # Record before updating
        self.history.append({
            "observed_epoch": observed_optimal_epoch,
            "wfe": wfe,
            "prior_mean": self.posterior_mean,
            "posterior_mean": new_mean,
            "selected_epoch": self._snap_to_config(new_mean),
        })

        self.posterior_mean = new_mean
        self.posterior_variance = 1.0 / new_precision

        return self._snap_to_config(new_mean)

    def _snap_to_config(self, continuous_epoch: float) -> int:
        """Snap continuous estimate to nearest valid epoch config."""
        return min(self.epoch_configs, key=lambda e: abs(e - continuous_epoch))

    def get_current_epoch(self) -> int:
        """Get current smoothed epoch without updating."""
        return self._snap_to_config(self.posterior_mean)

不要使用当前折的最优Epoch（前瞻偏差），而是使用来自先验折的贝叶斯平滑Epoch：

python

class BayesianEpochSelector:
    """跨折贝叶斯更新Epoch选择。

    也称为：BayesianEpochSmoother（在epoch-smoothing.md中的别名）

    方差参数从搜索空间推导，而非硬编码。
    推导细节请参见AWFESConfig._derive_variances()。
    """

    def __init__(
        self,
        epoch_configs: list[int],
        prior_mean: float | None = None,
        prior_variance: float | None = None,
        observation_variance: float | None = None,
    ):
        self.epoch_configs = sorted(epoch_configs)

        # 原则性推导：从搜索空间得到方差
        # 如果未提供，从Epoch范围推导
        epoch_range = max(epoch_configs) - min(epoch_configs)

        # 先验分布覆盖搜索空间的95%范围
        # 95%置信区间 = 均值 ± 1.96σ → 范围 = 3.92σ → σ² = (range/3.92)²
        default_prior_var = (epoch_range / 3.92) ** 2

        # 观测方差：先验的1/4，实现平衡学习
        default_obs_var = default_prior_var / 4

        self.posterior_mean = prior_mean or np.mean(epoch_configs)
        self.posterior_variance = prior_variance or default_prior_var
        self.observation_variance = observation_variance or default_obs_var
        self.history: list[dict] = []

    def update(self, observed_optimal_epoch: int, wfe: float) -> int:
        """使用当前折的验证最优Epoch更新后验分布。

        使用精度加权贝叶斯更新：
        posterior_mean = (prior_precision * prior_mean + obs_precision * obs) /
                        (prior_precision + obs_precision)

        参数：
            observed_optimal_epoch: 当前折验证集的最优Epoch
            wfe: 滚动窗口效率（用于加权观测值）

        返回：
            用于TEST评估的平滑Epoch选择
        """
        # 按WFE加权观测值（WFE越高，信号越可靠）
        # 将WFE限制在[0.1, 2.0]以防止极端权重：
        #   - 下限0.1：防止除零问题并确保最小权重
        #   - 上限2.0：WFE > 2可疑（OOS > 2× IS表明：
        #       a) 有利于OOS的regime变化（幸运时机，而非技能）
        #       b) IS严重过拟合（人为降低分母）
        #       c) 数据异常或前瞻偏差
        #     限制为2.0表示对这类观测值持怀疑态度
        wfe_clamped = max(0.1, min(wfe, 2.0))
        effective_variance = self.observation_variance / wfe_clamped

        prior_precision = 1.0 / self.posterior_variance
        obs_precision = 1.0 / effective_variance

        # 贝叶斯更新
        new_precision = prior_precision + obs_precision
        new_mean = (
            prior_precision * self.posterior_mean +
            obs_precision * observed_optimal_epoch
        ) / new_precision

        # 更新前记录
        self.history.append({
            "observed_epoch": observed_optimal_epoch,
            "wfe": wfe,
            "prior_mean": self.posterior_mean,
            "posterior_mean": new_mean,
            "selected_epoch": self._snap_to_config(new_mean),
        })

        self.posterior_mean = new_mean
        self.posterior_variance = 1.0 / new_precision

        return self._snap_to_config(new_mean)

    def _snap_to_config(self, continuous_epoch: float) -> int:
        """将连续估计值对齐到最近的有效Epoch配置。"""
        return min(self.epoch_configs, key=lambda e: abs(e - continuous_epoch))

    def get_current_epoch(self) -> int:
        """获取当前平滑后的Epoch，无需更新。"""
        return self._snap_to_config(self.posterior_mean)

Application Workflow

应用工作流

python

def apply_awfes_to_test(
    folds: list[Fold],
    model_factory: Callable,
    bayesian_selector: BayesianEpochSelector,
) -> list[dict]:
    """Apply AWFES with Bayesian smoothing to test data.

    Workflow per fold:
    1. Split into train/validation/test (60/20/20)
    2. Sweep epochs on train, compute WFE on validation
    3. Update Bayesian posterior with validation-optimal epoch
    4. Train final model at Bayesian-selected epoch on train+validation
    5. Evaluate on TEST (untouched data)
    """
    results = []

    for fold_idx, fold in enumerate(folds):
        # Step 1: Split data
        train, validation, test = fold.split_nested(
            train_pct=0.60,
            validation_pct=0.20,
            test_pct=0.20,
            embargo_pct=0.06,  # 6% gap at each boundary
        )

        # Step 2: Epoch sweep on train → validate on validation
        epoch_metrics = []
        for epoch in bayesian_selector.epoch_configs:
            model = model_factory()
            model.fit(train.X, train.y, epochs=epoch)

            is_sharpe = compute_sharpe(model.predict(train.X), train.y)
            val_sharpe = compute_sharpe(model.predict(validation.X), validation.y)

            # Use data-driven threshold instead of hardcoded 0.1
            is_threshold = compute_is_sharpe_threshold(len(train.X))
            wfe = val_sharpe / is_sharpe if is_sharpe > is_threshold else None

            epoch_metrics.append({
                "epoch": epoch,
                "is_sharpe": is_sharpe,
                "val_sharpe": val_sharpe,
                "wfe": wfe,
            })

        # Step 3: Find validation-optimal and update Bayesian
        val_optimal = max(
            [m for m in epoch_metrics if m["wfe"] is not None],
            key=lambda m: m["wfe"],
            default={"epoch": bayesian_selector.epoch_configs[0], "wfe": 0.3}
        )
        selected_epoch = bayesian_selector.update(
            val_optimal["epoch"],
            val_optimal["wfe"],
        )

        # Step 4: Train final model on train+validation at selected epoch
        combined_X = np.vstack([train.X, validation.X])
        combined_y = np.hstack([train.y, validation.y])
        final_model = model_factory()
        final_model.fit(combined_X, combined_y, epochs=selected_epoch)

        # Step 5: Evaluate on TEST (untouched)
        test_predictions = final_model.predict(test.X)
        test_metrics = compute_oos_metrics(test_predictions, test.y, test.timestamps)

        results.append({
            "fold_idx": fold_idx,
            "validation_optimal_epoch": val_optimal["epoch"],
            "bayesian_selected_epoch": selected_epoch,
            "test_metrics": test_metrics,
            "epoch_metrics": epoch_metrics,
        })

    return results

See references/oos-application.md for complete implementation.

python

def apply_awfes_to_test(
    folds: list[Fold],
    model_factory: Callable,
    bayesian_selector: BayesianEpochSelector,
) -> list[dict]:
    """将AWFES与贝叶斯平滑应用于测试数据。

    单折工作流：
    1. 拆分为训练/验证/测试（60/20/20）
    2. 在训练集上扫描Epoch，在验证集上计算WFE
    3. 使用验证最优Epoch更新贝叶斯后验
    4. 在训练+验证集上以贝叶斯选中的Epoch训练最终模型
    5. 在TEST（未接触的数据）上评估
    """
    results = []

    for fold_idx, fold in enumerate(folds):
        # 步骤1：拆分数据
        train, validation, test = fold.split_nested(
            train_pct=0.60,
            validation_pct=0.20,
            test_pct=0.20,
            embargo_pct=0.06,  # 每个边界6%的间隔
        )

        # 步骤2：在训练集上扫描Epoch → 在验证集上验证
        epoch_metrics = []
        for epoch in bayesian_selector.epoch_configs:
            model = model_factory()
            model.fit(train.X, train.y, epochs=epoch)

            is_sharpe = compute_sharpe(model.predict(train.X), train.y)
            val_sharpe = compute_sharpe(model.predict(validation.X), validation.y)

            # 使用数据驱动的阈值而非硬编码的0.1
            is_threshold = compute_is_sharpe_threshold(len(train.X))
            wfe = val_sharpe / is_sharpe if is_sharpe > is_threshold else None

            epoch_metrics.append({
                "epoch": epoch,
                "is_sharpe": is_sharpe,
                "val_sharpe": val_sharpe,
                "wfe": wfe,
            })

        # 步骤3：找到验证最优并更新贝叶斯
        val_optimal = max(
            [m for m in epoch_metrics if m["wfe"] is not None],
            key=lambda m: m["wfe"],
            default={"epoch": bayesian_selector.epoch_configs[0], "wfe": 0.3}
        )
        selected_epoch = bayesian_selector.update(
            val_optimal["epoch"],
            val_optimal["wfe"],
        )

        # 步骤4：在训练+验证集上以选中的Epoch训练最终模型
        combined_X = np.vstack([train.X, validation.X])
        combined_y = np.hstack([train.y, validation.y])
        final_model = model_factory()
        final_model.fit(combined_X, combined_y, epochs=selected_epoch)

        # 步骤5：在TEST（未接触的数据）上评估
        test_predictions = final_model.predict(test.X)
        test_metrics = compute_oos_metrics(test_predictions, test.y, test.timestamps)

        results.append({
            "fold_idx": fold_idx,
            "validation_optimal_epoch": val_optimal["epoch"],
            "bayesian_selected_epoch": selected_epoch,
            "test_metrics": test_metrics,
            "epoch_metrics": epoch_metrics,
        })

    return results

完整实现请参见references/oos-application.md。

Epoch Smoothing Methods

Epoch平滑方法

Why Smooth Epoch Selections?

为什么要平滑Epoch选择？

Raw per-fold epoch selections are noisy due to:

Limited validation data per fold
Regime changes between folds
Stochastic training dynamics

Smoothing reduces variance while preserving signal.

原始的单折Epoch选择存在噪声，原因包括：

每个折的验证数据有限
折之间的regime变化
训练的随机性

平滑可以减少方差同时保留信号。

Method Comparison

方法比较

Method	Formula	Pros	Cons
Bayesian (Recommended)	Precision-weighted update	Principled, handles uncertainty	More complex
EMA	`α × new + (1-α) × old`	Simple, responsive	No uncertainty quantification
SMA	Mean of last N	Most stable	Slow to adapt
Median	Median of last N	Robust to outliers	Loses magnitude info

方法	公式	优点	缺点
贝叶斯（推荐）	精度加权更新	原则性，处理不确定性	更复杂
EMA	`α × new + (1-α) × old`	简单，响应快	无不确定性量化
SMA	最近N个的均值	最稳定	适应慢
中位数	最近N个的中位数	对异常值鲁棒	丢失幅度信息

Bayesian Updating (Primary Method)

贝叶斯更新（主要方法）

python

def bayesian_epoch_update(
    prior_mean: float,
    prior_variance: float,
    observed_epoch: int,
    observation_variance: float,
    wfe_weight: float = 1.0,
) -> tuple[float, float]:
    """Single Bayesian update step.

    Mathematical formulation:
    - Prior: N(μ₀, σ₀²)
    - Observation: N(x, σ_obs²/wfe)  # WFE-weighted
    - Posterior: N(μ₁, σ₁²)

    Where:
    μ₁ = (μ₀/σ₀² + x·wfe/σ_obs²) / (1/σ₀² + wfe/σ_obs²)
    σ₁² = 1 / (1/σ₀² + wfe/σ_obs²)
    """
    # Effective observation variance (lower WFE = less reliable)
    eff_obs_var = observation_variance / max(wfe_weight, 0.1)

    prior_precision = 1.0 / prior_variance
    obs_precision = 1.0 / eff_obs_var

    posterior_precision = prior_precision + obs_precision
    posterior_mean = (
        prior_precision * prior_mean + obs_precision * observed_epoch
    ) / posterior_precision
    posterior_variance = 1.0 / posterior_precision

    return posterior_mean, posterior_variance

python

def bayesian_epoch_update(
    prior_mean: float,
    prior_variance: float,
    observed_epoch: int,
    observation_variance: float,
    wfe_weight: float = 1.0,
) -> tuple[float, float]:
    """单次贝叶斯更新步骤。

    数学公式：
    - 先验：N(μ₀, σ₀²)
    - 观测：N(x, σ_obs²/wfe)  # WFE加权
    - 后验：N(μ₁, σ₁²)

    其中：
    μ₁ = (μ₀/σ₀² + x·wfe/σ_obs²) / (1/σ₀² + wfe/σ_obs²)
    σ₁² = 1 / (1/σ₀² + wfe/σ_obs²)
    """
    # 有效观测方差（WFE越低，可靠性越差）
    eff_obs_var = observation_variance / max(wfe_weight, 0.1)

    prior_precision = 1.0 / prior_variance
    obs_precision = 1.0 / eff_obs_var

    posterior_precision = prior_precision + obs_precision
    posterior_mean = (
        prior_precision * prior_mean + obs_precision * observed_epoch
    ) / posterior_precision
    posterior_variance = 1.0 / posterior_precision

    return posterior_mean, posterior_variance

Exponential Moving Average (Alternative)

指数移动平均（替代方法）

python

def ema_epoch_update(
    current_ema: float,
    observed_epoch: int,
    alpha: float = 0.3,
) -> float:
    """EMA update: more weight on recent observations.

    α = 0.3 means ~90% of signal from last 7 folds.
    α = 0.5 means ~90% of signal from last 4 folds.
    """
    return alpha * observed_epoch + (1 - alpha) * current_ema

python

def ema_epoch_update(
    current_ema: float,
    observed_epoch: int,
    alpha: float = 0.3,
) -> float:
    """EMA更新：给最近观测值更高权重。

    α = 0.3表示~90%的信号来自最近7个折。
    α = 0.5表示~90%的信号来自最近4个折。
    """
    return alpha * observed_epoch + (1 - alpha) * current_ema

Initialization Strategies

初始化策略

Strategy	When to Use	Implementation
Midpoint prior	No domain knowledge	`mean(epoch_configs)`
Literature prior	Published optimal exists	Known optimal ± uncertainty
Burn-in	Sufficient data	Use first N folds for initialization

python

undefined

策略	使用场景	实现方式
中点先验	无领域知识	`mean(epoch_configs)`
文献先验	存在已发表的最优值	已知最优值 ± 不确定性
预热期	数据充足	使用前N个折进行初始化

python

undefined

RECOMMENDED: Use AWFESConfig for principled derivation

推荐：使用AWFESConfig进行原则性推导

config = AWFESConfig.from_search_space( min_epoch=80, max_epoch=400, granularity=5, )

prior_variance = ((400-80)/3.92)² ≈ 6,658 (derived automatically)

prior_variance = ((400-80)/3.92)² ≈ 6,658（自动推导）

observation_variance = prior_variance/4 ≈ 1,665 (derived automatically)

observation_variance = prior_variance/4 ≈ 1,665（自动推导）

Alternative strategies (if manual configuration needed):

替代策略（如需手动配置）：

Strategy 1: Search-space derived (same as AWFESConfig)

策略1：从搜索空间推导（与AWFESConfig相同）

epoch_range = max(EPOCH_CONFIGS) - min(EPOCH_CONFIGS) prior_mean = np.mean(EPOCH_CONFIGS) prior_variance = (epoch_range / 3.92) ** 2 # 95% CI spans search space

epoch_range = max(EPOCH_CONFIGS) - min(EPOCH_CONFIGS) prior_mean = np.mean(EPOCH_CONFIGS) prior_variance = (epoch_range / 3.92) ** 2 # 95%置信区间覆盖搜索空间

Strategy 2: Burn-in (use first 5 folds)

策略2：预热期（使用前5个折）

burn_in_optima = [run_fold_sweep(fold) for fold in folds[:5]] prior_mean = np.mean(burn_in_optima) base_variance = (epoch_range / 3.92) ** 2 / 4 # Reduced after burn-in prior_variance = max(np.var(burn_in_optima), base_variance)


See [references/epoch-smoothing.md](./references/epoch-smoothing.md) for extended analysis.

---

burn_in_optima = [run_fold_sweep(fold) for fold in folds[:5]] prior_mean = np.mean(burn_in_optima) base_variance = (epoch_range / 3.92) ** 2 / 4 # 预热期后减小 prior_variance = max(np.var(burn_in_optima), base_variance)


扩展分析请参见[references/epoch-smoothing.md](./references/epoch-smoothing.md)。

---

OOS Metrics Specification

OOS指标规范

Metric Tiers for Test Evaluation

测试评估的指标层级

Following rangebar-eval-metrics, compute these metrics on TEST data.

CRITICAL for Range Bars: Use time-weighted Sharpe (

sharpe_tw

) instead of simple bar Sharpe. See range-bar-metrics.md for the canonical implementation. The metrics below assume time-weighted computation for range bar data.

遵循rangebar-eval-metrics，在TEST数据上计算这些指标。

范围bar的关键注意事项：使用时间加权Sharpe (

sharpe_tw

)而非简单bar Sharpe。标准实现请参见range-bar-metrics.md。以下指标假设对范围bar数据使用时间加权计算。

Tier 1: Primary Metrics (Mandatory)

层级1：核心指标（必填）

Metric	Formula	Threshold	Purpose
`sharpe_tw`	Time-weighted (see range-bar-metrics.md)	> 0	Core performance
`hit_rate`	`n_correct_sign / n_total`	> 0.50	Directional accuracy
`cumulative_pnl`	`Σ(pred × actual)`	> 0	Total return
`positive_sharpe_folds`	`n_folds(sharpe_tw > 0) / n_folds`	> 0.55	Consistency
`wfe_test`	`test_sharpe_tw / validation_sharpe_tw`	> 0.30	Final transfer

指标	公式	阈值	用途
`sharpe_tw`	时间加权（参见range-bar-metrics.md）	> 0	核心性能
`hit_rate`	`n_correct_sign / n_total`	> 0.50	方向准确率
`cumulative_pnl`	`Σ(pred × actual)`	> 0	总收益
`positive_sharpe_folds`	`n_folds(sharpe_tw > 0) / n_folds`	> 0.55	一致性
`wfe_test`	`test_sharpe_tw / validation_sharpe_tw`	> 0.30	最终迁移能力

Tier 2: Risk Metrics

层级2：风险指标

Metric	Formula	Threshold	Purpose
`max_drawdown`	`max(peak - trough) / peak`	< 0.30	Worst loss
`calmar_ratio`	`annual_return / max_drawdown`	> 0.5	Risk-adjusted
`profit_factor`	`gross_profit / gross_loss`	> 1.0	Win/loss ratio
`cvar_10pct`	`mean(worst 10% returns)`	> -0.05	Tail risk

指标	公式	阈值	用途
`max_drawdown`	`max(peak - trough) / peak`	< 0.30	最大回撤
`calmar_ratio`	`annual_return / max_drawdown`	> 0.5	风险调整后收益
`profit_factor`	`gross_profit / gross_loss`	> 1.0	盈亏比
`cvar_10pct`	`mean(worst 10% returns)`	> -0.05	尾部风险

Tier 3: Statistical Validation

层级3：统计验证

Metric	Formula	Threshold	Purpose
`psr`	`P(true_sharpe > 0)`	> 0.85	Statistical significance
`dsr`	`sharpe - E[max_sharpe_null]`	> 0.50	Multiple testing adjusted
`binomial_pvalue`	`binom.test(n_positive, n_total)`	< 0.05	Sign test
`hac_ttest_pvalue`	HAC-adjusted t-test	< 0.05	Autocorrelation robust

指标	公式	阈值	用途
`psr`	`P(true_sharpe > 0)`	> 0.85	统计显著性
`dsr`	`sharpe - E[max_sharpe_null]`	> 0.50	多重检验调整
`binomial_pvalue`	`binom.test(n_positive, n_total)`	< 0.05	符号检验
`hac_ttest_pvalue`	HAC调整t检验	< 0.05	自相关鲁棒检验

Metric Computation Code

指标计算代码

python

import numpy as np
from scipy.stats import norm, binomtest  # norm for PSR, binomtest for sign test

def compute_oos_metrics(
    predictions: np.ndarray,
    actuals: np.ndarray,
    timestamps: np.ndarray,
    duration_us: np.ndarray | None = None,  # Required for range bars
    market_type: str = "crypto_24_7",  # For annualization factor
) -> dict[str, float]:
    """Compute full OOS metrics suite for test data.

    Args:
        predictions: Model predictions (signed magnitude)
        actuals: Actual returns
        timestamps: Bar timestamps for daily aggregation
        duration_us: Bar durations in microseconds (REQUIRED for range bars)

    Returns:
        Dictionary with all tier metrics

    IMPORTANT: For range bars, pass duration_us to compute sharpe_tw.
    Simple bar_sharpe violates i.i.d. assumption - see range-bar-metrics.md.
    """
    pnl = predictions * actuals

    # Tier 1: Primary
    # For range bars: Use time-weighted Sharpe (canonical)
    if duration_us is not None:
        from exp066e_tau_precision import compute_time_weighted_sharpe
        sharpe_tw, weighted_std, total_days = compute_time_weighted_sharpe(
            bar_pnl=pnl,
            duration_us=duration_us,
            annualize=True,
        )
    else:
        # Fallback for time bars (all same duration)
        daily_pnl = group_by_day(pnl, timestamps)
        weekly_factor = get_daily_to_weekly_factor(market_type=market_type)
        sharpe_tw = (
            np.mean(daily_pnl) / np.std(daily_pnl) * weekly_factor
            if np.std(daily_pnl) > 1e-10 else 0.0
        )

    hit_rate = np.mean(np.sign(predictions) == np.sign(actuals))
    cumulative_pnl = np.sum(pnl)

    # Tier 2: Risk
    equity_curve = np.cumsum(pnl)
    running_max = np.maximum.accumulate(equity_curve)
    drawdowns = (running_max - equity_curve) / np.maximum(running_max, 1e-10)
    max_drawdown = np.max(drawdowns)

    gross_profit = np.sum(pnl[pnl > 0])
    gross_loss = abs(np.sum(pnl[pnl < 0]))
    profit_factor = gross_profit / gross_loss if gross_loss > 0 else float("inf")

    # CVaR (10%)
    sorted_pnl = np.sort(pnl)
    cvar_cutoff = max(1, int(len(sorted_pnl) * 0.10))
    cvar_10pct = np.mean(sorted_pnl[:cvar_cutoff])

    # Tier 3: Statistical (use sharpe_tw for PSR)
    sharpe_se = 1.0 / np.sqrt(len(pnl)) if len(pnl) > 0 else 1.0
    psr = norm.cdf(sharpe_tw / sharpe_se) if sharpe_se > 0 else 0.5

    n_positive = np.sum(pnl > 0)
    n_total = len(pnl)
    # Use binomtest (binom_test deprecated since scipy 1.10)
    binomial_pvalue = binomtest(n_positive, n_total, 0.5, alternative="greater").pvalue

    return {
        # Tier 1 (use sharpe_tw for range bars)
        "sharpe_tw": sharpe_tw,
        "hit_rate": hit_rate,
        "cumulative_pnl": cumulative_pnl,
        "n_bars": len(pnl),
        # Tier 2
        "max_drawdown": max_drawdown,
        "profit_factor": profit_factor,
        "cvar_10pct": cvar_10pct,
        # Tier 3
        "psr": psr,
        "binomial_pvalue": binomial_pvalue,
    }

python

import numpy as np
from scipy.stats import norm, binomtest  # norm用于PSR，binomtest用于符号检验

def compute_oos_metrics(
    predictions: np.ndarray,
    actuals: np.ndarray,
    timestamps: np.ndarray,
    duration_us: np.ndarray | None = None,  # 范围bar必填
    market_type: str = "crypto_24_7",  # 用于年化因子
) -> dict[str, float]:
    """为测试数据计算完整OOS指标集。

    参数：
        predictions: 模型预测值（带符号幅度）
        actuals: 实际收益
        timestamps: 用于日度聚合的bar时间戳
        duration_us: bar持续时间（微秒，范围bar必填）

    返回：
        包含所有层级指标的字典

    重要提示：对于范围bar，传递duration_us以计算sharpe_tw。
    简单bar_sharpe违反独立同分布假设 - 参见range-bar-metrics.md。
    """
    pnl = predictions * actuals

    # 层级1：核心
    # 范围bar：使用时间加权Sharpe（标准方法）
    if duration_us is not None:
        from exp066e_tau_precision import compute_time_weighted_sharpe
        sharpe_tw, weighted_std, total_days = compute_time_weighted_sharpe(
            bar_pnl=pnl,
            duration_us=duration_us,
            annualize=True,
        )
    else:
        # 时间bar回退（所有bar持续时间相同）
        daily_pnl = group_by_day(pnl, timestamps)
        weekly_factor = get_daily_to_weekly_factor(market_type=market_type)
        sharpe_tw = (
            np.mean(daily_pnl) / np.std(daily_pnl) * weekly_factor
            if np.std(daily_pnl) > 1e-10 else 0.0
        )

    hit_rate = np.mean(np.sign(predictions) == np.sign(actuals))
    cumulative_pnl = np.sum(pnl)

    # 层级2：风险
    equity_curve = np.cumsum(pnl)
    running_max = np.maximum.accumulate(equity_curve)
    drawdowns = (running_max - equity_curve) / np.maximum(running_max, 1e-10)
    max_drawdown = np.max(drawdowns)

    gross_profit = np.sum(pnl[pnl > 0])
    gross_loss = abs(np.sum(pnl[pnl < 0]))
    profit_factor = gross_profit / gross_loss if gross_loss > 0 else float("inf")

    # CVaR（10%）
    sorted_pnl = np.sort(pnl)
    cvar_cutoff = max(1, int(len(sorted_pnl) * 0.10))
    cvar_10pct = np.mean(sorted_pnl[:cvar_cutoff])

    # 层级3：统计（使用sharpe_tw计算PSR）
    sharpe_se = 1.0 / np.sqrt(len(pnl)) if len(pnl) > 0 else 1.0
    psr = norm.cdf(sharpe_tw / sharpe_se) if sharpe_se > 0 else 0.5

    n_positive = np.sum(pnl > 0)
    n_total = len(pnl)
    # 使用binomtest（binom_test自scipy 1.10起已弃用）
    binomial_pvalue = binomtest(n_positive, n_total, 0.5, alternative="greater").pvalue

    return {
        # 层级1（范围bar使用sharpe_tw）
        "sharpe_tw": sharpe_tw,
        "hit_rate": hit_rate,
        "cumulative_pnl": cumulative_pnl,
        "n_bars": len(pnl),
        # 层级2
        "max_drawdown": max_drawdown,
        "profit_factor": profit_factor,
        "cvar_10pct": cvar_10pct,
        # 层级3
        "psr": psr,
        "binomial_pvalue": binomial_pvalue,
    }

Aggregation Across Folds

跨折聚合

python

def aggregate_test_metrics(fold_results: list[dict]) -> dict[str, float]:
    """Aggregate test metrics across all folds.

    NOTE: For range bars, use sharpe_tw (time-weighted).
    See range-bar-metrics.md for why simple bar_sharpe is invalid for range bars.
    """
    metrics = [r["test_metrics"] for r in fold_results]

    # Positive Sharpe Folds (use sharpe_tw for range bars)
    sharpes = [m["sharpe_tw"] for m in metrics]
    positive_sharpe_folds = np.mean([s > 0 for s in sharpes])

    # Median for robustness
    median_sharpe_tw = np.median(sharpes)
    median_hit_rate = np.median([m["hit_rate"] for m in metrics])

    # DSR for multiple testing (use time-weighted Sharpe)
    n_trials = len(metrics)
    dsr = compute_dsr(median_sharpe_tw, n_trials)

    return {
        "n_folds": len(metrics),
        "positive_sharpe_folds": positive_sharpe_folds,
        "median_sharpe_tw": median_sharpe_tw,
        "mean_sharpe_tw": np.mean(sharpes),
        "std_sharpe_tw": np.std(sharpes),
        "median_hit_rate": median_hit_rate,
        "dsr": dsr,
        "total_pnl": sum(m["cumulative_pnl"] for m in metrics),
    }

See references/oos-metrics.md for threshold justifications.

python

def aggregate_test_metrics(fold_results: list[dict]) -> dict[str, float]:
    """跨所有折聚合测试指标。

    注意：对于范围bar，使用sharpe_tw（时间加权）。
    参见range-bar-metrics.md了解为什么简单bar_sharpe对范围bar无效。
    """
    metrics = [r["test_metrics"] for r in fold_results]

    # 正Sharpe折数（范围bar使用sharpe_tw）
    sharpes = [m["sharpe_tw"] for m in metrics]
    positive_sharpe_folds = np.mean([s > 0 for s in sharpes])

    # 中位数鲁棒性
    median_sharpe_tw = np.median(sharpes)
    median_hit_rate = np.median([m["hit_rate"] for m in metrics])

    # 多重检验DSR（使用时间加权Sharpe）
    n_trials = len(metrics)
    dsr = compute_dsr(median_sharpe_tw, n_trials)

    return {
        "n_folds": len(metrics),
        "positive_sharpe_folds": positive_sharpe_folds,
        "median_sharpe_tw": median_sharpe_tw,
        "mean_sharpe_tw": np.mean(sharpes),
        "std_sharpe_tw": np.std(sharpes),
        "median_hit_rate": median_hit_rate,
        "dsr": dsr,
        "total_pnl": sum(m["cumulative_pnl"] for m in metrics),
    }

阈值理由请参见references/oos-metrics.md。

Look-Ahead Bias Prevention

前瞻偏差预防

The Problem

问题

Using the same data for epoch selection AND final evaluation creates look-ahead bias:

❌ WRONG: Use fold's own optimal epoch for fold's OOS evaluation
   - Epoch selection "sees" validation returns
   - Then apply same epoch to OOS from same period
   - Result: Overly optimistic performance

使用相同数据进行Epoch选择和最终评估会产生前瞻偏差：

❌ 错误：将折自身的最优Epoch用于该折的OOS评估
   - Epoch选择“看到”了验证收益
   - 然后将相同Epoch应用于同一时期的OOS
   - 结果：过于乐观的性能

The Solution: Nested WFO + Bayesian Lag

解决方案：嵌套WFO + 贝叶斯延迟

✅ CORRECT: Bayesian-smoothed epoch from PRIOR folds for current TEST
   - Epoch selection on train/validation (inner loop)
   - Update Bayesian posterior with validation-optimal
   - Apply Bayesian-selected epoch to TEST (outer loop)
   - TEST data completely untouched during selection

✅ 正确：使用来自PRIOR折的贝叶斯平滑Epoch用于当前TEST
   - 在内部循环的训练/验证上进行Epoch选择
   - 使用验证最优更新贝叶斯后验
   - 在外部循环的TEST上应用贝叶斯选中的Epoch
   - TEST数据在选择过程中完全未被接触

v3 Temporal Ordering (CRITICAL - 2026 Fix)

v3时序顺序（关键 - 2026修复）

The v3 implementation fixes a subtle but critical look-ahead bias bug in the original AWFES workflow. The key insight: TEST must use
prior_bayesian_epoch
, NOT
val_optimal_epoch
.

v3实现修复了原始AWFES工作流中一个微妙但严重的前瞻偏差bug。核心见解：TEST必须使用
prior_bayesian_epoch
，而非
val_optimal_epoch
。

The Bug (v2 and earlier)

错误（v2及更早版本）

python

undefined

python

undefined

v2 BUG: Bayesian update BEFORE test evaluation

v2错误：贝叶斯更新在测试评估之前

for fold in folds: epoch_metrics = sweep_epochs(fold.train, fold.validation) val_optimal_epoch = select_optimal(epoch_metrics)

# WRONG: Update Bayesian with current fold's val_optimal
bayesian.update(val_optimal_epoch, wfe)
selected_epoch = bayesian.get_current_epoch()  # CONTAMINATED!

# This selected_epoch is influenced by val_optimal from SAME fold
test_metrics = evaluate(selected_epoch, fold.test)  # LOOK-AHEAD BIAS

undefined

for fold in folds: epoch_metrics = sweep_epochs(fold.train, fold.validation) val_optimal_epoch = select_optimal(epoch_metrics)

# 错误：使用当前折的val_optimal更新贝叶斯
bayesian.update(val_optimal_epoch, wfe)
selected_epoch = bayesian.get_current_epoch()  # 被污染！

# 该selected_epoch受同一折的val_optimal影响
test_metrics = evaluate(selected_epoch, fold.test)  # 前瞻偏差

undefined

The Fix (v3)

修复（v3）

python

undefined

python

undefined

v3 CORRECT: Get prior epoch BEFORE any work on current fold

v3正确：在处理当前折之前先获取先验Epoch

for fold in folds: # Step 1: FIRST - Get epoch from ONLY prior folds prior_bayesian_epoch = bayesian.get_current_epoch() # BEFORE any fold work

# Step 2: Train and sweep to find this fold's optimal
epoch_metrics = sweep_epochs(fold.train, fold.validation)
val_optimal_epoch = select_optimal(epoch_metrics)

# Step 3: TEST uses prior_bayesian_epoch (NOT val_optimal!)
test_metrics = evaluate(prior_bayesian_epoch, fold.test)  # UNBIASED

# Step 4: AFTER test - update Bayesian for FUTURE folds only
bayesian.update(val_optimal_epoch, wfe)  # For fold+1, fold+2, ...

undefined

for fold in folds: # 步骤1：首先 - 仅从先验折获取Epoch prior_bayesian_epoch = bayesian.get_current_epoch() # 在处理折之前

# 步骤2：训练并扫描以找到该折的最优值
epoch_metrics = sweep_epochs(fold.train, fold.validation)
val_optimal_epoch = select_optimal(epoch_metrics)

# 步骤3：TEST使用prior_bayesian_epoch（而非val_optimal！）
test_metrics = evaluate(prior_bayesian_epoch, fold.test)  # 无偏差

# 步骤4：测试完成后 - 仅为未来折更新贝叶斯
bayesian.update(val_optimal_epoch, wfe)  # 用于折+1、折+2...

undefined

Why This Matters

为什么这很重要

Aspect	v2 (Buggy)	v3 (Fixed)
When Bayesian updated	Before test eval	After test eval
Test epoch source	Current fold influences	Only prior folds
Information flow	Future → Present	Past → Present only
Expected bias	Optimistic by ~10-20%	Unbiased

方面	v2（错误）	v3（修复）
贝叶斯更新时间	测试评估前	测试评估后
测试Epoch来源	当前折有影响	仅先验折
信息流	未来→当前	仅过去→现在
预期偏差	乐观~10-20%	无偏差

Validation Checkpoint

验证检查点

python

undefined

python

undefined

MANDATORY: Log these values for audit trail

强制要求：记录这些值用于审计追踪

fold_log.info( f"Fold {fold_idx}: " f"prior_bayesian_epoch={prior_bayesian_epoch}, " f"val_optimal_epoch={val_optimal_epoch}, " f"test_uses={prior_bayesian_epoch}" # MUST equal prior_bayesian_epoch )


See [references/look-ahead-bias.md](./references/look-ahead-bias.md) for detailed examples.

fold_log.info( f"折 {fold_idx}: " f"prior_bayesian_epoch={prior_bayesian_epoch}, " f"val_optimal_epoch={val_optimal_epoch}, " f"test_uses={prior_bayesian_epoch}" # 必须等于prior_bayesian_epoch )


详细示例请参见[references/look-ahead-bias.md](./references/look-ahead-bias.md)。

Embargo Requirements

间隔要求

Boundary	Embargo	Rationale
Train → Validation	6% of fold	Prevent feature leakage
Validation → Test	6% of fold	Prevent selection leakage
Fold → Fold	1 hour (calendar)	Range bar duration

python

def compute_embargo_indices(
    n_total: int,
    train_pct: float = 0.60,
    val_pct: float = 0.20,
    test_pct: float = 0.20,
    embargo_pct: float = 0.06,
) -> dict[str, tuple[int, int]]:
    """Compute indices for nested split with embargoes.

    Returns dict with (start, end) tuples for each segment.
    """
    embargo_size = int(n_total * embargo_pct)

    train_end = int(n_total * train_pct)
    val_start = train_end + embargo_size
    val_end = val_start + int(n_total * val_pct)
    test_start = val_end + embargo_size
    test_end = n_total

    return {
        "train": (0, train_end),
        "embargo_1": (train_end, val_start),
        "validation": (val_start, val_end),
        "embargo_2": (val_end, test_start),
        "test": (test_start, test_end),
    }

边界	间隔	理由
训练→验证	折的6%	防止特征泄露
验证→测试	折的6%	防止选择泄露
折→折	1小时（日历时间）	范围bar持续时间

python

def compute_embargo_indices(
    n_total: int,
    train_pct: float = 0.60,
    val_pct: float = 0.20,
    test_pct: float = 0.20,
    embargo_pct: float = 0.06,
) -> dict[str, tuple[int, int]]:
    """计算带间隔的嵌套拆分索引。

    返回包含每个段(start, end)元组的字典。
    """
    embargo_size = int(n_total * embargo_pct)

    train_end = int(n_total * train_pct)
    val_start = train_end + embargo_size
    val_end = val_start + int(n_total * val_pct)
    test_start = val_end + embargo_size
    test_end = n_total

    return {
        "train": (0, train_end),
        "embargo_1": (train_end, val_start),
        "validation": (val_start, val_end),
        "embargo_2": (val_end, test_start),
        "test": (test_start, test_end),
    }

Validation Checklist

验证检查清单

Anti-Patterns

反模式

Anti-Pattern	Detection	Fix
Using current fold's epoch on current fold's OOS	`selected_epoch == fold_optimal_epoch`	Use Bayesian posterior
Validation overlaps test	Date ranges overlap	Add embargo
Features computed on full dataset	Scaler fit includes test	Per-split scaling
Fold shuffling	Folds not time-ordered	Enforce temporal order

See references/look-ahead-bias.md for detailed examples.

反模式	检测方式	修复方案
将当前折的Epoch用于当前折的OOS	`selected_epoch == fold_optimal_epoch`	使用贝叶斯后验
验证集与测试集重叠	日期范围重叠	添加间隔
在全数据集上计算特征	缩放器拟合包含测试集	按拆分单独缩放
折打乱	折不是时序排列	强制时序顺序

详细示例请参见references/look-ahead-bias.md。

References

参考链接

Topic	Reference File
Academic Literature	academic-foundations.md
Mathematical Formulation	mathematical-formulation.md
Decision Tree	epoch-selection-decision-tree.md
Anti-Patterns	anti-patterns.md
OOS Application	oos-application.md
Epoch Smoothing	epoch-smoothing.md
OOS Metrics	oos-metrics.md
Look-Ahead Bias	look-ahead-bias.md
Feature Sets	feature-sets.md
xLSTM Implementation	xlstm-implementation.md
Range Bar Metrics	range-bar-metrics.md

主题	参考文件
学术文献	academic-foundations.md
数学公式	mathematical-formulation.md
决策树	epoch-selection-decision-tree.md
反模式	anti-patterns.md
OOS应用	oos-application.md
Epoch平滑	epoch-smoothing.md
OOS指标	oos-metrics.md
前瞻偏差	look-ahead-bias.md
特征集	feature-sets.md
xLSTM实现	xlstm-implementation.md
范围bar指标	range-bar-metrics.md

Full Citations

完整引用

Bailey, D. H., & López de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting and non-normality. The Journal of Portfolio Management, 40(5), 94-107.
Bischl, B., et al. (2023). Multi-Objective Hyperparameter Optimization in Machine Learning. ACM Transactions on Evolutionary Learning and Optimization.
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter 7.
Nomura, M., & Ono, I. (2021). Warm Starting CMA-ES for Hyperparameter Optimization. AAAI Conference on Artificial Intelligence.
Pardo, R. E. (2008). The Evaluation and Optimization of Trading Strategies, 2nd Edition. John Wiley & Sons.

Bailey, D. H., & López de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting and non-normality. The Journal of Portfolio Management, 40(5), 94-107.
Bischl, B., et al. (2023). Multi-Objective Hyperparameter Optimization in Machine Learning. ACM Transactions on Evolutionary Learning and Optimization.
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter 7.
Nomura, M., & Ono, I. (2021). Warm Starting CMA-ES for Hyperparameter Optimization. AAAI Conference on Artificial Intelligence.
Pardo, R. E. (2008). The Evaluation and Optimization of Trading Strategies, 2nd Edition. John Wiley & Sons.

Troubleshooting

故障排除

Issue	Cause	Solution
WFE is None	IS_Sharpe below noise floor	Check if IS_Sharpe > 2/sqrt(n_samples)
All epochs rejected	Severe overfitting	Reduce model complexity, add regularization
Bayesian posterior unstable	High WFE variance	Increase observation_variance or use median WFE
Epoch always at boundary	Search range too narrow	Expand min_epoch or max_epoch bounds
Look-ahead bias detected	Using val_optimal for test	Use prior_bayesian_epoch for test evaluation
DSR too aggressive	Too many epoch candidates	Limit to 3-5 epoch configs (meta-overfitting risk)
Cauchy mean issues	Arithmetic mean of WFE	Use median or pooled WFE for aggregation
Fold metrics inconsistent	Variable fold sizes	Use pooled WFE (precision-weighted)

问题	原因	解决方案
WFE为None	IS_Sharpe低于噪声下限	检查IS_Sharpe > 2/sqrt(n_samples)
所有Epoch被拒绝	严重过拟合	降低模型复杂度，添加正则化
贝叶斯后验不稳定	WFE方差高	增加observation_variance或使用中位数WFE
Epoch始终在边界	搜索范围过窄	扩大min_epoch或max_epoch边界
检测到前瞻偏差	使用val_optimal进行测试	使用prior_bayesian_epoch进行测试评估
DSR过于激进	Epoch候选过多	限制为3-5个Epoch配置（元过拟合风险）
柯西均值问题	WFE的算术均值	使用中位数或加权WFE进行聚合
折指标不一致	折大小可变	使用加权WFE（精度加权）

adaptive-wfo-epoch

Original

Translation

Adaptive Walk-Forward Epoch Selection (AWFES)

自适应滚动窗口Epoch选择（AWFES）

When to Use This Skill

何时使用该工具

Quick Start

快速开始

Generate epoch candidates from search bounds and granularity

从搜索范围和粒度生成Epoch候选值

config.epoch_configs → [100, 211, 447, 945, 2000] (log-spaced)

config.epoch_configs → [100, 211, 447, 945, 2000]（对数间距）

Per-fold epoch sweep

按折进行Epoch扫描

Methodology Overview

方法概述

What This Is

什么是AWFES

What This Is NOT

什么不是AWFES

Academic Foundations

学术基础

Core Formula: Walk-Forward Efficiency

核心公式：滚动窗口效率

Principled Configuration Framework

原则性配置框架

AWFESConfig: Unified Configuration

AWFESConfig：统一配置

IS_Sharpe Threshold: Signal-to-Noise Derivation

IS_Sharpe阈值：信噪比推导

Guardrails (Principled Guidelines)

防护准则（原则性指南）

G1: WFE Thresholds

G1: WFE阈值

These are GUIDELINES, not hard rules

这些是指南，而非硬性规则

Adjust based on your domain and risk tolerance

根据你的领域和风险容忍度进行调整

G2: IS_Sharpe Minimum (Data-Driven)

G2: IS_Sharpe最小值（数据驱动）

WRONG: Fixed threshold regardless of sample size

错误：固定阈值，与样本量无关

CORRECT: Threshold adapts to sample size

正确：阈值随样本量调整

G3: Stability Penalty for Epoch Changes (Adaptive)

G3: Epoch变化的稳定性惩罚（自适应）

G4: DSR Adjustment for Epoch Search (Principled)

G4: 针对Epoch搜索的DSR调整（原则性）

WFE Aggregation Methods

WFE聚合方法

Method 1: Pooled WFE (Recommended for precision-weighted)

方法1：加权WFE（推荐用于精度加权）

Method 2: Median WFE (Recommended for robustness)

方法2：中位数WFE（推荐用于鲁棒性）

Method 3: Weighted Arithmetic Mean

方法3：加权算术均值

Aggregation Selection Guide

聚合方法选择指南

Efficient Frontier Algorithm

有效前沿算法

Carry-Forward Mechanism

传递机制

Anti-Patterns

反模式

Decision Tree

决策树

Integration with rangebar-eval-metrics

与rangebar-eval-metrics集成

Recommended Workflow

推荐工作流

OOS Application Phase

OOS应用阶段

Overview

概述

Nested WFO Structure

嵌套WFO结构

Train 60% # --> | Gap 6% A | --> | Val 20% | --> | Gap 6% B | --> H Test 20% H

训练集60% # --> | 间隔6% A | --> | 验证集20% | --> | 间隔6% B | --> H 测试集20% H

Per-Fold Workflow