statistics-fundamentals

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Statistics Fundamentals

统计学基础

Purpose

用途

This skill enables Claude to apply core statistical methods to financial data, including descriptive statistics, covariance estimation, linear regression, hypothesis testing, and resampling techniques. These methods form the quantitative backbone for portfolio construction, risk measurement, and factor modeling.

本技能支持Claude将核心统计方法应用于金融数据，包括描述性统计、协方差估计、线性回归、假设检验和重采样技术。这些方法是投资组合构建、风险度量和因子建模的量化基础。

Layer

层级

0 — Mathematical Foundations

0 — 数学基础

Direction

适用方向

both

When to Use

适用场景

Analyzing return distributions
Estimating correlations or covariance matrices
Running regression analysis on financial data
Testing hypotheses about returns
Building factor models

分析收益分布
估计相关性或协方差矩阵
对金融数据运行回归分析
检验关于收益的假设
构建因子模型

Core Concepts

核心概念

Descriptive Statistics

描述性统计

Mean (Expected Value):

$$\mu = E[X] = \frac{1}{n} \sum_{i=1}^{n} x_i$$

The arithmetic average of observed values. For financial returns, this represents the central tendency of the return distribution.

Variance:

Population variance:

$$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$$

Sample variance (Bessel's correction):

$$s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2$$

Standard Deviation:

$$\sigma = \sqrt{\sigma^2}$$

In finance, standard deviation of returns is commonly called volatility. Annualized volatility from monthly data:

sigma_annual = sigma_monthly * sqrt(12)

Skewness:

$$\gamma = \frac{E[(X - \mu)^3]}{\sigma^3}$$

Measures asymmetry of the distribution. Negative skewness (left tail) is common in equity returns and indicates a higher probability of large losses than large gains.

Excess Kurtosis:

$$\kappa = \frac{E[(X - \mu)^4]}{\sigma^4} - 3$$

Measures tail thickness relative to the normal distribution (which has excess kurtosis of 0). Financial returns typically exhibit positive excess kurtosis (leptokurtosis), meaning fat tails and more frequent extreme events than a normal distribution would predict.

均值（期望值）：

$$\mu = E[X] = \frac{1}{n} \sum_{i=1}^{n} x_i$$

观测值的算术平均值。对于金融收益而言，它代表收益分布的集中趋势。

方差：

总体方差：

$$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$$

样本方差（贝塞尔校正）：

$$s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2$$

标准差：

$$\sigma = \sqrt{\sigma^2}$$

在金融领域，收益的标准差通常被称为 volatility（波动率）。从月度数据计算年化波动率的公式为：

sigma_annual = sigma_monthly * sqrt(12)

。

偏度：

$$\gamma = \frac{E[(X - \mu)^3]}{\sigma^3}$$

衡量分布的不对称性。负偏度（左尾）在股票收益中很常见，意味着发生大额亏损的概率高于大额收益的概率。

超额峰度：

$$\kappa = \frac{E[(X - \mu)^4]}{\sigma^4} - 3$$

衡量相对于正态分布（超额峰度为0）的尾部厚度。金融收益通常表现为正超额峰度（尖峰厚尾），意味着比正态分布预测的极端事件发生频率更高、尾部更厚。

Covariance and Correlation

协方差与相关性

Covariance:

$$\text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]$$

Sample covariance:

$$\hat{\text{Cov}}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})$$

Covariance measures the linear co-movement between two variables. Positive covariance means they tend to move together; negative means they move inversely.

Correlation (Pearson):

$$\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \times \sigma_Y}$$

Correlation normalizes covariance to the range

[-1, +1]

, making it unit-free and comparable across variable pairs.

协方差：

$$\text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]$$

样本协方差：

$$\hat{\text{Cov}}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})$$

协方差衡量两个变量之间的线性联动程度。协方差为正意味着两者倾向于同向变动，为负则意味着反向变动。

相关性（皮尔逊相关系数）：

$$\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \times \sigma_Y}$$

Covariance Matrix Estimation

协方差矩阵估计

For a set of

assets with

return observations, the sample covariance matrix is:

$$\hat{\Sigma} = \frac{1}{n-1} (X - \bar{X})^T (X - \bar{X})$$

where

is the

n x p

matrix of returns.

The curse of dimensionality: When

(number of assets) is large relative to

(number of observations), the sample covariance matrix becomes poorly conditioned or singular, leading to unstable portfolio optimizations.

对于

个资产、

个收益观测值的数据集，样本协方差矩阵为：

$$\hat{\Sigma} = \frac{1}{n-1} (X - \bar{X})^T (X - \bar{X})$$

其中

是维度为

n x p

的收益矩阵。

维度灾难： 当

（资产数量）相对于

（观测值数量）过大时，样本协方差矩阵会出现病态或奇异的情况，导致投资组合优化结果不稳定。

Ledoit-Wolf Shrinkage Estimator

Ledoit-Wolf收缩估计量

Shrinkage blends the sample covariance matrix with a structured target (e.g., the identity matrix scaled by average variance) to produce a more stable estimate:

$$\hat{\Sigma}_{shrunk} = \delta \cdot F + (1 - \delta) \cdot \hat{\Sigma}$$

where:

```
F
```
= the shrinkage target (structured estimator)
```
delta
```
= the optimal shrinkage intensity (estimated analytically)
```
Sigma_hat
```
= the sample covariance matrix

Ledoit-Wolf determines the optimal

delta

that minimizes expected squared Frobenius distance to the true covariance matrix. This produces better-conditioned matrices and more stable portfolio weights.

收缩法将样本协方差矩阵与结构化目标矩阵（例如按平均方差缩放的单位矩阵）结合，生成更稳定的估计值：

$$\hat{\Sigma}_{shrunk} = \delta \cdot F + (1 - \delta) \cdot \hat{\Sigma}$$

其中：

```
F
```
= 收缩目标（结构化估计量）
```
delta
```
= 最优收缩强度（通过解析方法估计）
```
Sigma_hat
```
= 样本协方差矩阵

Ledoit-Wolf方法可以确定最优

delta

，最小化与真实协方差矩阵的期望平方Frobenius距离，生成条件更好的矩阵和更稳定的投资组合权重。

OLS Regression

OLS回归

Ordinary Least Squares estimates the linear relationship

y = X * beta + epsilon

by minimizing the sum of squared residuals.

Coefficient Estimate:

$$\hat{\beta} = (X^T X)^{-1} X^T y$$

Key Regression Diagnostics:

R-squared (Coefficient of Determination):

$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$$

Represents the proportion of variance in the dependent variable explained by the model.

Adjusted R-squared:

$$\bar{R}^2 = 1 - (1 - R^2) \frac{n - 1}{n - k - 1}$$

where

= number of regressors. Penalizes additional regressors that do not improve fit.

Standard Errors:

$$SE(\hat{\beta}) = \sqrt{\hat{\sigma}^2 \cdot \text{diag}((X^T X)^{-1})}$$

where

sigma_hat^2 = SS_res / (n - k - 1)

t-statistic:

$$t = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}$$

Tests whether each coefficient is significantly different from zero.

In finance, the single-factor regression

R_i - R_f = alpha + beta * (R_m - R_f) + epsilon

is the CAPM regression, where

alpha

is the risk-adjusted excess return and

beta

is market sensitivity.

普通最小二乘法通过最小化残差平方和来估计线性关系

y = X * beta + epsilon

。

系数估计：

$$\hat{\beta} = (X^T X)^{-1} X^T y$$

关键回归诊断指标：

R-squared（决定系数）：

$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$$

代表模型解释的因变量方差占比。

调整后R-squared：

$$\bar{R}^2 = 1 - (1 - R^2) \frac{n - 1}{n - k - 1}$$

其中

= 自变量数量。对无法提升拟合效果的额外自变量进行惩罚。

标准误：

$$SE(\hat{\beta}) = \sqrt{\hat{\sigma}^2 \cdot \text{diag}((X^T X)^{-1})}$$

其中

sigma_hat^2 = SS_res / (n - k - 1)

。

t统计量：

$$t = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}$$

用于检验每个系数是否显著不为零。

在金融领域，单因子回归

R_i - R_f = alpha + beta * (R_m - R_f) + epsilon

就是CAPM回归，其中

alpha

是风险调整后的超额收益，

beta

是市场敏感度。

Common Distributions in Finance

金融领域常见分布

Normal Distribution: Symmetric, fully characterized by mean and variance. Used as a baseline model for returns, though real returns deviate from normality.

Log-Normal Distribution: If

ln(X)

is normal, then

is log-normal. Asset prices (not returns) are often modeled as log-normal, ensuring prices cannot go negative.

Student-t Distribution: Has heavier tails than the normal. Parameterized by degrees of freedom

nu

; lower

nu

means fatter tails. Commonly used to model financial returns more realistically. As

nu -> infinity

, converges to the normal.

Chi-Squared Distribution: The distribution of a sum of squared standard normal variables. Used in variance tests and as the sampling distribution of

(n-1)*s^2 / sigma^2

正态分布： 对称分布，完全由均值和方差表征。作为收益的基准模型使用，但实际收益往往偏离正态分布。

对数正态分布： 如果

ln(X)

符合正态分布，则

符合对数正态分布。资产价格（而非收益）通常被建模为对数正态分布，可确保价格不会为负。

学生t分布： 尾部比正态分布更厚，由自由度

nu

参数化；

nu

越小则尾部越厚。通常用于更真实地建模金融收益。当

nu -> infinity

时，收敛为正态分布。

卡方分布： 标准正态变量平方和的分布。用于方差检验，也是

(n-1)*s^2 / sigma^2

的抽样分布。

Bootstrap Methods

Bootstrap（自助法）

Non-parametric resampling technique for estimating the sampling distribution of a statistic.

Algorithm:

From the original dataset of size
```
n
```
, draw
```
B
```
bootstrap samples, each of size
```
n
```
, with replacement.
Compute the statistic of interest on each bootstrap sample.
Use the distribution of the
```
B
```
bootstrap statistics to estimate confidence intervals, standard errors, or bias.

Confidence Interval (Percentile Method): The

(1 - alpha)

confidence interval is given by the

alpha/2

and

1 - alpha/2

percentiles of the bootstrap distribution.

Bootstrap is especially useful in finance when:

Analytical formulas for standard errors are unavailable (e.g., Sharpe ratio)
The underlying distribution is unknown or non-normal
Small sample sizes make asymptotic results unreliable

用于估计统计量抽样分布的非参数重采样技术。

算法：

从大小为
```
n
```
的原始数据集中，有放回地抽取
```
B
```
个自助样本，每个样本大小为
```
n
```
。
在每个自助样本上计算目标统计量。
使用
```
B
```
个自助统计量的分布来估计置信区间、标准误或偏差。

置信区间（百分位法）：

(1 - alpha)

置信区间由自助分布的

alpha/2

和

1 - alpha/2

百分位给出。

Bootstrap在金融领域尤其适用于以下场景：

没有标准误的解析公式（例如Sharpe比率）
基础分布未知或非正态
样本量小导致渐近结果不可靠

Hypothesis Testing

假设检验

t-test (mean): Tests whether a sample mean differs significantly from a hypothesized value.

$$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$$

with

n - 1

degrees of freedom.

F-test (joint significance): Tests whether a group of regression coefficients are jointly zero. Used in multi-factor models.

$$F = \frac{(SS_{restricted} - SS_{unrestricted}) / q}{SS_{unrestricted} / (n - k - 1)}$$

where

= number of restrictions.

Jarque-Bera Test (normality): Tests whether sample skewness and kurtosis are consistent with a normal distribution.

$$JB = \frac{n}{6} \left(\gamma^2 + \frac{\kappa^2}{4}\right)$$

where

gamma

= sample skewness and

kappa

= sample excess kurtosis. Under the null of normality, JB follows a chi-squared distribution with 2 degrees of freedom. Financial return series almost always reject normality due to fat tails and skewness.

t检验（均值）： 检验样本均值是否与假设值存在显著差异。

$$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$$

自由度为

n - 1

。

F检验（联合显著性）： 检验一组回归系数是否联合为零，用于多因子模型。

$$F = \frac{(SS_{restricted} - SS_{unrestricted}) / q}{SS_{unrestricted} / (n - k - 1)}$$

其中

= 约束条件数量。

Jarque-Bera检验（正态性）： 检验样本偏度和峰度是否符合正态分布。

$$JB = \frac{n}{6} \left(\gamma^2 + \frac{\kappa^2}{4}\right)$$

其中

gamma

= 样本偏度，

kappa

= 样本超额峰度。在正态性原假设下，JB服从自由度为2的卡方分布。由于肥尾和偏度的存在，金融收益序列几乎总是会拒绝正态性假设。

Key Formulas

关键公式

Formula	Expression	Use Case
Sample Mean	`x_bar = (1/n) * sum(x_i)`	Central tendency
Sample Variance	`s^2 = (1/(n-1)) * sum((x_i - x_bar)^2)`	Dispersion
Annualized Volatility	`sigma_annual = sigma_period * sqrt(periods_per_year)`	Risk standardization
Skewness	`gamma = E[(X-mu)^3] / sigma^3`	Asymmetry
Excess Kurtosis	`kappa = E[(X-mu)^4] / sigma^4 - 3`	Tail thickness
Covariance	`Cov(X,Y) = E[(X-mu_X)(Y-mu_Y)]`	Co-movement
Correlation	`rho = Cov(X,Y) / (sigma_X * sigma_Y)`	Standardized co-movement
Shrinkage Estimator	`Sigma_shrunk = deltaF + (1-delta)Sigma_hat`	Stable covariance matrix
OLS Coefficients	`beta_hat = (X'X)^(-1) X'y`	Linear regression
R-squared	`1 - SS_res / SS_tot`	Model explanatory power
t-statistic	`t = beta_hat_j / SE(beta_hat_j)`	Coefficient significance
Jarque-Bera	`JB = (n/6) * (gamma^2 + kappa^2/4)`	Normality test

公式	表达式	适用场景
样本均值	`x_bar = (1/n) * sum(x_i)`	集中趋势计算
样本方差	`s^2 = (1/(n-1)) * sum((x_i - x_bar)^2)`	离散程度计算
年化波动率	`sigma_annual = sigma_period * sqrt(periods_per_year)`	风险标准化
偏度	`gamma = E[(X-mu)^3] / sigma^3`	分布不对称性计算
超额峰度	`kappa = E[(X-mu)^4] / sigma^4 - 3`	尾部厚度计算
协方差	`Cov(X,Y) = E[(X-mu_X)(Y-mu_Y)]`	联动性计算
相关性	`rho = Cov(X,Y) / (sigma_X * sigma_Y)`	标准化联动性计算
收缩估计量	`Sigma_shrunk = deltaF + (1-delta)Sigma_hat`	稳定协方差矩阵生成
OLS系数	`beta_hat = (X'X)^(-1) X'y`	线性回归
R-squared	`1 - SS_res / SS_tot`	模型解释力评估
t统计量	`t = beta_hat_j / SE(beta_hat_j)`	系数显著性检验
Jarque-Bera	`JB = (n/6) * (gamma^2 + kappa^2/4)`	正态性检验

Worked Examples

实操示例

Example 1: Compute Descriptive Statistics and Test for Normality

示例1：计算描述性统计量并检验正态性

Given: Monthly returns (in %) for a fund over 12 months:

[2.1, -0.5, 1.8, -3.2, 4.5, 0.3, -1.1, 2.7, -0.8, 3.4, 1.2, -0.6]

Calculate: Mean, volatility, skewness, excess kurtosis, and Jarque-Bera test statistic.

Solution:

Mean:

x_bar = (2.1 + (-0.5) + 1.8 + (-3.2) + 4.5 + 0.3 + (-1.1) + 2.7 + (-0.8) + 3.4 + 1.2 + (-0.6)) / 12
x_bar = 9.8 / 12
x_bar = 0.8167% per month

Annualized return (approximate):

0.8167% * 12 = 9.8%

Sample Standard Deviation:

Deviations from mean: [1.283, -1.317, 0.983, -4.017, 3.683, -0.517, -1.917, 1.883, -1.617, 2.583, 0.383, -1.417]
Squared deviations:   [1.646, 1.734, 0.967, 16.133, 13.566, 0.267, 3.674, 3.547, 2.614, 6.674, 0.147, 2.007]
Sum of squared deviations = 52.977
s^2 = 52.977 / 11 = 4.816
s = sqrt(4.816) = 2.195% per month

Annualized volatility:

2.195% * sqrt(12) = 7.60%

Skewness:

Sum of cubed standardized deviations:
gamma = (1/n) * sum[((x_i - x_bar)/s)^3]  (using adjusted formula for sample)
gamma approx 0.075 (slightly positive, near symmetric)

Excess Kurtosis:

kappa = (1/n) * sum[((x_i - x_bar)/s)^4] - 3
kappa approx -0.42 (platykurtic, lighter tails than normal)

Jarque-Bera Test:

JB = (12/6) * (0.075^2 + (-0.42)^2 / 4)
JB = 2 * (0.00563 + 0.04410)
JB = 2 * 0.04973
JB = 0.099

The JB critical value at 5% significance (chi-squared, df=2) is 5.99. Since

0.099 < 5.99

, we fail to reject the null hypothesis of normality. With only 12 observations, however, the test has low power, and we should not conclude the data is truly normal.

给定： 某基金12个月的月度收益率（单位：%）：

[2.1, -0.5, 1.8, -3.2, 4.5, 0.3, -1.1, 2.7, -0.8, 3.4, 1.2, -0.6]

计算： 均值、波动率、偏度、超额峰度和Jarque-Bera检验统计量。

解答：

均值：

x_bar = (2.1 + (-0.5) + 1.8 + (-3.2) + 4.5 + 0.3 + (-1.1) + 2.7 + (-0.8) + 3.4 + 1.2 + (-0.6)) / 12
x_bar = 9.8 / 12
x_bar = 0.8167% 每月

年化收益（近似值）：

0.8167% * 12 = 9.8%

样本标准差：

与均值的偏差: [1.283, -1.317, 0.983, -4.017, 3.683, -0.517, -1.917, 1.883, -1.617, 2.583, 0.383, -1.417]
偏差平方:   [1.646, 1.734, 0.967, 16.133, 13.566, 0.267, 3.674, 3.547, 2.614, 6.674, 0.147, 2.007]
偏差平方和 = 52.977
s^2 = 52.977 / 11 = 4.816
s = sqrt(4.816) = 2.195% 每月

年化波动率：

2.195% * sqrt(12) = 7.60%

偏度：

标准化偏差立方和：
gamma = (1/n) * sum[((x_i - x_bar)/s)^3] （使用调整后的样本公式）
gamma ≈ 0.075（轻微正偏，接近对称）

超额峰度：

kappa = (1/n) * sum[((x_i - x_bar)/s)^4] - 3
kappa ≈ -0.42（低峰态，尾部比正态分布更薄）

Jarque-Bera检验：

JB = (12/6) * (0.075^2 + (-0.42)^2 / 4)
JB = 2 * (0.00563 + 0.04410)
JB = 2 * 0.04973
JB = 0.099

5%显著性水平下的JB临界值（卡方分布，自由度=2）为5.99。由于

0.099 < 5.99

，我们无法拒绝正态性原假设。但由于仅12个观测值，检验的功效较低，我们不能得出数据确实符合正态分布的结论。

Example 2: Regress Fund Returns on Market Factor (CAPM)

示例2：对基金收益做市场因子回归（CAPM）

Given: 24 monthly observations:

Fund excess returns (
```
R_i - R_f
```
): mean = 0.8%, std = 4.2%
Market excess returns (
```
R_m - R_f
```
): mean = 0.6%, std = 3.8%
Sample correlation between fund and market: 0.85

Calculate: CAPM alpha and beta, R-squared, and assess statistical significance.

Solution:

Beta:

beta = Cov(R_i, R_m) / Var(R_m)
     = rho * sigma_i * sigma_m / sigma_m^2
     = rho * sigma_i / sigma_m
     = 0.85 * 4.2 / 3.8
     = 0.939

Alpha:

alpha = mean(R_i - R_f) - beta * mean(R_m - R_f)
      = 0.8% - 0.939 * 0.6%
      = 0.8% - 0.564%
      = 0.236% per month (approximately 2.84% annualized)

R-squared:

R^2 = rho^2 = 0.85^2 = 0.7225

72.25% of the fund's return variance is explained by the market factor.

Standard Error and t-statistic for alpha:

Residual std = sigma_i * sqrt(1 - R^2) = 4.2% * sqrt(1 - 0.7225) = 4.2% * 0.5268 = 2.213%
SE(alpha) = residual_std / sqrt(n) = 2.213% / sqrt(24) = 0.452%
t(alpha) = 0.236 / 0.452 = 0.522

With 22 degrees of freedom (n - 2), the critical t-value at 5% significance (two-tailed) is approximately 2.074. Since

|0.522| < 2.074

, the alpha is not statistically significant. Despite the positive point estimate, we cannot conclude the fund generates genuine risk-adjusted outperformance with this sample size.

Standard Error and t-statistic for beta:

SE(beta) = residual_std / (sigma_m * sqrt(n-1)) = 2.213% / (3.8% * sqrt(23)) = 2.213% / 18.226% = 0.121
t(beta) = 0.939 / 0.121 = 7.76

Since

|7.76| >> 2.074

, the beta is highly statistically significant, confirming the fund has meaningful market exposure.

给定： 24个月度观测值：

基金超额收益（
```
R_i - R_f
```
）：均值=0.8%，标准差=4.2%
市场超额收益（
```
R_m - R_f
```
）：均值=0.6%，标准差=3.8%
基金与市场的样本相关性：0.85

计算： CAPM alpha和beta、R-squared，并评估统计显著性。

解答：

Beta：

beta = Cov(R_i, R_m) / Var(R_m)
     = rho * sigma_i * sigma_m / sigma_m^2
     = rho * sigma_i / sigma_m
     = 0.85 * 4.2 / 3.8
     = 0.939

Alpha：

alpha = mean(R_i - R_f) - beta * mean(R_m - R_f)
      = 0.8% - 0.939 * 0.6%
      = 0.8% - 0.564%
      = 0.236% 每月（年化约2.84%）

R-squared：

R^2 = rho^2 = 0.85^2 = 0.7225

基金收益方差的72.25%可由市场因子解释。

Alpha的标准误和t统计量：

残差标准差 = sigma_i * sqrt(1 - R^2) = 4.2% * sqrt(1 - 0.7225) = 4.2% * 0.5268 = 2.213%
SE(alpha) = residual_std / sqrt(n) = 2.213% / sqrt(24) = 0.452%
t(alpha) = 0.236 / 0.452 = 0.522

自由度为22（n-2）时，5%显著性水平（双尾）的临界t值约为2.074。由于

|0.522| < 2.074

，alpha不具备统计显著性。尽管点估计为正，但基于该样本量我们无法得出基金实现了真正的风险调整后超额收益的结论。

Beta的标准误和t统计量：

SE(beta) = residual_std / (sigma_m * sqrt(n-1)) = 2.213% / (3.8% * sqrt(23)) = 2.213% / 18.226% = 0.121
t(beta) = 0.939 / 0.121 = 7.76

由于

|7.76| >> 2.074

，beta统计显著性极高，证实该基金有显著的市场敞口。

Common Pitfalls

常见误区

Using population variance instead of sample variance: always use
```
n - 1
```
(Bessel's correction) in the denominator when estimating variance from a sample. Using
```
n
```
underestimates the true variance.
Assuming normality when financial returns have fat tails: equity returns typically exhibit negative skewness and positive excess kurtosis. Models relying on normality (e.g., standard VaR) underestimate tail risk. Use the Student-t distribution or non-parametric methods for more robust estimates.
Ignoring non-stationarity in time series: financial return distributions change over time (regime shifts, volatility clustering). Rolling-window estimation or GARCH models may be more appropriate than full-sample statistics.
Overfitting with too many regressors: adding more factors to a regression always increases R-squared but may not improve out-of-sample explanatory power. Use adjusted R-squared, information criteria (AIC/BIC), or cross-validation to guard against overfitting.
Unstable covariance matrices with small samples: when the number of assets
```
p
```
approaches or exceeds the number of observations
```
n
```
, the sample covariance matrix becomes singular or poorly conditioned. Apply Ledoit-Wolf shrinkage or factor-based covariance models to obtain stable, invertible matrices for portfolio optimization.

使用总体方差而非样本方差：从样本估计方差时，分母始终使用
```
n - 1
```
（贝塞尔校正）。使用
```
n
```
会低估真实方差。
金融收益存在肥尾时仍假设正态分布：股票收益通常表现为负偏度和正超额峰度。依赖正态分布的模型（例如标准VaR）会低估尾部风险。使用学生t分布或非参数方法可得到更稳健的估计。
忽略时间序列的非平稳性：金融收益分布会随时间变化（制度转换、波动率聚类）。滚动窗口估计或GARCH模型可能比全样本统计量更适用。
过多自变量导致过拟合：向回归中添加更多因子总会提升R-squared，但可能不会提升样本外解释力。使用调整后R-squared、信息准则（AIC/BIC）或交叉验证来防止过拟合。
小样本下协方差矩阵不稳定：当资产数量
```
p
```
接近或超过观测值数量
```
n
```
时，样本协方差矩阵会变成奇异或病态矩阵。应用Ledoit-Wolf收缩或基于因子的协方差模型，可得到用于投资组合优化的稳定可逆矩阵。

Cross-References

交叉参考

return-calculations (core plugin, Layer 0): Arithmetic and geometric mean returns, log returns for statistical modeling
time-value-of-money (core plugin, Layer 0): Discount rate estimation via CAPM regression; NPV and IRR calculations use statistical inputs

return-calculations（核心插件，层级0）：算术和几何平均收益、用于统计建模的对数收益
time-value-of-money（核心插件，层级0）：通过CAPM回归估计折现率；NPV和IRR计算使用统计输入

Reference Implementation

参考实现

See

scripts/statistics_fundamentals.py

for computational helpers.

计算辅助工具请参见

scripts/statistics_fundamentals.py

。