symbolic-equation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSymbolic Equation Discovery
符号方程发现
Discover interpretable scientific equations from data using LLM-guided evolutionary search.
使用LLM引导的进化搜索从数据中发现可解释的科学方程。
Input
输入
- — Dataset description, variable names, and physical context
$0
- — 数据集描述、变量名称以及物理上下文
$0
References
参考资料
- LLM-SR patterns (prompts, evolution, sampling):
~/.claude/skills/symbolic-equation/references/llmsr-patterns.md
- LLM-SR模式(提示词、进化逻辑、采样策略):
~/.claude/skills/symbolic-equation/references/llmsr-patterns.md
Workflow (from LLM-SR)
工作流(来自LLM-SR)
Step 1: Define Problem Specification
步骤1:定义问题规范
Create a specification with:
- Input variables: Physical quantities with types (e.g., ,
x: np.ndarray)v: np.ndarray - Output variable: Target quantity to predict
- Evaluation function: Fitness metric (typically negative MSE with parameter optimization)
- Physical context: Domain knowledge to guide equation discovery
python
undefined创建包含以下内容的规范:
- 输入变量:带类型的物理量(例如 、
x: np.ndarray)v: np.ndarray - 输出变量:需要预测的目标量
- 评估函数:适应度指标(通常为带参数优化的负均方误差)
- 物理上下文:用于指导方程发现的领域知识
python
undefinedExample specification
Example specification
@equation.evolve
def equation(x: np.ndarray, v: np.ndarray, params: np.ndarray) -> np.ndarray:
"""Describe the acceleration of a damped nonlinear oscillator."""
return params[0] * x
undefined@equation.evolve
def equation(x: np.ndarray, v: np.ndarray, params: np.ndarray) -> np.ndarray:
"""Describe the acceleration of a damped nonlinear oscillator."""
return params[0] * x
undefinedStep 2: Initialize Multi-Island Buffer
步骤2:初始化多岛缓冲区
- Create N islands (default: 10) for population diversity
- Each island maintains independent clusters of equations
- Clusters group equations by performance signature
- 创建N个岛屿(默认:10个)以保证种群多样性
- 每个岛屿维护独立的方程聚类
- 聚类会按照性能特征对方程进行分组
Step 3: Evolutionary Search Loop
步骤3:进化搜索循环
Repeat until convergence or max samples:
- Select island: Random island selection
- Build prompt: Sample top equations from clusters (softmax-weighted by score)
- LLM proposes: Generate new equation as improved version
- Evaluate: Execute on test data, compute fitness score
- Register: Add to island's cluster if valid
重复执行直到收敛或达到最大采样数:
- 选择岛屿:随机选取岛屿
- 构建提示词:从聚类中采样顶级方程(按分数的softmax加权)
- LLM生成提议:生成作为改进版本的新方程
- 评估:在测试数据上执行,计算适应度分数
- 注册:如果有效则添加到对应岛屿的聚类中
Step 4: Prompt Construction
步骤4:提示词构建
Present previous equations as versioned sequence:
python
def equation_v0(x, v, params):
"""Initial version."""
return params[0] * x
def equation_v1(x, v, params):
"""Improved version of equation_v0."""
return params[0] * x + params[1] * v
def equation_v2(x, v, params):
"""Improved version of equation_v1."""
# LLM completes this将历史方程展示为带版本的序列:
python
def equation_v0(x, v, params):
"""Initial version."""
return params[0] * x
def equation_v1(x, v, params):
"""Improved version of equation_v0."""
return params[0] * x + params[1] * v
def equation_v2(x, v, params):
"""Improved version of equation_v1."""
# LLM completes thisStep 5: Island Reset (Diversity Maintenance)
步骤5:岛屿重置(多样性维护)
Periodically (default: every 4 hours):
- Sort islands by best score
- Reset bottom 50% of islands
- Seed each reset island with best equation from a surviving island
- Restart cluster sampling temperature
定期执行(默认:每4小时一次):
- 按最佳分数对岛屿排序
- 重置排名后50%的岛屿
- 用存活岛屿的最佳方程为每个重置的岛屿做种子初始化
- 重置聚类采样温度
Step 6: Extract Best Equations
步骤6:提取最佳方程
After search completes:
- Collect best equation from each island
- Rank by fitness score
- Simplify if possible (algebraic simplification)
- Report with physical interpretation
搜索完成后:
- 收集每个岛屿的最佳方程
- 按适应度分数排序
- 尽可能进行代数化简
- 附带物理解释输出结果
Cluster Sampling
聚类采样
Temperature-scheduled softmax over cluster scores:
temperature = T_init * (1 - (num_programs % period) / period)
probabilities = softmax(cluster_scores / temperature)- Higher temperature → more exploration
- Lower temperature → more exploitation of best clusters
- Within clusters: shorter programs are preferred (Occam's razor)
对聚类分数使用带温度调度的softmax:
temperature = T_init * (1 - (num_programs % period) / period)
probabilities = softmax(cluster_scores / temperature)- 温度越高 → 探索性越强
- 温度越低 → 对最优聚类的利用程度越高
- 聚类内部:更偏好更简短的程序(奥卡姆剃刀原则)
Rules
规则
- Equations must use only standard mathematical operations
- Parameter optimization via scipy BFGS or Adam
- Fitness = negative MSE (higher is better)
- Timeout protection for equation evaluation
- No recursive equations allowed
- Physical interpretability is preferred over pure fit
- 方程仅可使用标准数学运算
- 通过scipy BFGS或Adam进行参数优化
- 适应度 = 负均方误差(值越高越好)
- 方程评估设有超时保护
- 不允许使用递归方程
- 相比纯拟合结果,优先选择具备物理可解释性的结果
Related Skills
相关技能
- Upstream: data-analysis, math-reasoning
- Downstream: paper-writing-section
- See also: algorithm-design
- 上游:data-analysis、math-reasoning
- 下游:paper-writing-section
- 另请参见:algorithm-design