shinka-setup

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Shinka Task Setup Skill

Shinka任务搭建技能

Create a setup scaffold consisting of an evaluation script and initial solution for an optimization problem given a user's task description. Both ingredients will be used within ShinkaEvolve, a framework combining LLMs with evolutionary algorithms to drive code optimization.
根据用户的任务描述,为优化问题创建包含评估脚本和初始解决方案的搭建脚手架。这两部分内容将用于ShinkaEvolve——一个结合LLMs与进化算法来推动代码优化的框架。

When to Use

使用场景

Invoke this skill when the user:
  • Wants to optimize code with LLM-driven code evolution (Shinka/ShinkaEvolve)
  • No
    evaluate.py
    and
    initial.<ext>
    exist in the working directory
当用户有以下需求时调用此技能:
  • 希望通过LLM驱动的代码进化(Shinka/ShinkaEvolve)优化代码
  • 工作目录中不存在
    evaluate.py
    initial.<ext>
    文件

User Inputs

用户输入

  • Task description + success criteria
  • Target language for
    initial.<ext>
    (if omitted, default to Python)
  • What parts of the script to optimize
  • Evaluation metric(s) and score direction
  • Number of eval runs / seeds
  • Required assets or data files
  • Dependencies or constraints (runtime, memory)
  • 任务描述及成功标准
  • initial.<ext>
    的目标语言(若未指定,默认使用Python)
  • 脚本中需要优化的部分
  • 评估指标及分数方向
  • 评估运行次数/随机种子数
  • 所需资源或数据文件
  • 依赖项或约束条件(运行时、内存)

Workflow

工作流程

  1. Check if all user inputs are provided and ask the user follow-up questions if not inferrable.
  2. Inspect working directory. Detect chosen language + extension. Avoid overwriting existing
    evaluate.py
    or
    initial.<ext>
    without consent.
  3. Write
    initial.<ext>
    with a clear evolve region (
    EVOLVE-BLOCK
    markers or language-equivalent comments) and stable I/O contract.
  4. Write
    evaluate.py
    :
    • Python
      initial.py
      : call
      run_shinka_eval
      with
      experiment_fn_name
      ,
      get_experiment_kwargs
      ,
      aggregate_metrics_fn
      ,
      num_runs
      , and optional
      validate_fn
      .
    • Non-Python
      initial.<ext>
      : run candidate program directly (usually via
      subprocess
      ) and write
      metrics.json
      +
      correct.json
      .
  5. Ensure candidate output schema matches evaluator expectations (tuple/dict for Python module eval, or file/CLI contract for non-Python).
  6. Validate draft
    evaluate.py
    before handoff:
    • Run a smoke test:
      • python evaluate.py --program_path initial.<ext> --results_dir /tmp/shinka_eval_smoke
    • Confirm evaluator runs without exceptions.
    • Confirm a metrics
      dict
      is produced (either from
      aggregate_fn
      or
      metrics.json
      ) with at least:
      • combined_score
        (numeric),
      • public
        (
        dict
        ),
      • private
        (
        dict
        ),
      • extra_data
        (
        dict
        ),
      • text_feedback
        (string, can be empty).
    • Confirm
      correct.json
      exists with
      correct
      (bool) and
      error
      (string) fields.
  7. Ask the user if they want to run the evolution themself or whether to use the
    shinka-run
    skill:
    • If the user wants to run evolution manually, add
      run_evo.py
      plus a
      shinka.yaml
      config with matching language +
      init_program_path
      .
    • Ask the user if they want to use the
      shinka-run
      skill to perform optimization with the agent.
  1. 检查是否获取了所有用户输入,若有无法推断的信息则向用户提问。
  2. 检查工作目录,检测所选语言及扩展名。未经用户同意,不得覆盖已有的
    evaluate.py
    initial.<ext>
    文件。
  3. 编写
    initial.<ext>
    ,包含明确的进化区域(
    EVOLVE-BLOCK
    标记或等效的语言注释)以及稳定的I/O约定。
  4. 编写
    evaluate.py
    • Python版
      initial.py
      :调用
      run_shinka_eval
      ,传入
      experiment_fn_name
      get_experiment_kwargs
      aggregate_metrics_fn
      num_runs
      ,可选传入
      validate_fn
    • 非Python版
      initial.<ext>
      :直接运行候选程序(通常通过
      subprocess
      )并生成
      metrics.json
      correct.json
      文件。
  5. 确保候选程序的输出格式符合评估器的要求(Python模块评估使用元组/字典,非Python则使用文件/CLI约定)。
  6. 交付前验证
    evaluate.py
    草稿:
    • 运行冒烟测试:
      • python evaluate.py --program_path initial.<ext> --results_dir /tmp/shinka_eval_smoke
    • 确认评估器运行无异常。
    • 确认生成了包含至少以下字段的指标
      dict
      (来自
      aggregate_fn
      metrics.json
      ):
      • combined_score
        (数值型),
      • public
        dict
        类型),
      • private
        dict
        类型),
      • extra_data
        dict
        类型),
      • text_feedback
        (字符串,可为空)。
    • 确认存在
      correct.json
      文件,包含
      correct
      (布尔型)和
      error
      (字符串)字段。
  7. 询问用户是希望自行运行进化流程还是使用
    shinka-run
    技能:
    • 如果用户希望手动运行进化流程,添加
      run_evo.py
      以及匹配语言和
      init_program_path
      shinka.yaml
      配置文件。
    • 询问用户是否希望使用
      shinka-run
      技能通过智能体执行优化。

What is ShinkaEvolve?

什么是ShinkaEvolve?

A framework developed by SakanaAI that combines LLMs with evolutionary algorithms to propose program mutations, that are then evaluated and archived. The goal is to optimize for performance and discover novel scientific insights.
由SakanaAI开发的框架,结合LLMs与进化算法来生成程序变异方案,随后对这些方案进行评估和归档。目标是优化性能并发现新的科学见解。

Evolution Flow

进化流程

  1. Select parent(s) from archive/population
  2. LLM proposes patch (diff, full rewrite, or crossover)
  3. Evaluate candidate →
    combined_score
  4. If valid, insert into island archive (higher score = better)
  5. Periodically migrate top solutions between islands
  6. Repeat for N generations
  1. 从归档/种群中选择父代程序
  2. LLM生成补丁(差异文件、完整重写或交叉变异)
  3. 评估候选程序 → 生成
    combined_score
  4. 若有效,将其加入孤岛归档(分数越高越好)
  5. 定期在不同孤岛间迁移最优解决方案
  6. 重复上述步骤N代

Core Files To Generate

需生成的核心文件

FilePurpose
initial.<ext>
Starting solution in the chosen language with an evolve region that LLMs mutate
evaluate.py
Scores candidates and emits metrics/correctness outputs that guide selection
run_evo.py
(Optional) Launches the evolution loop
shinka.yaml
(Optional) Config: generations, islands, LLM models, patch types, etc.
文件用途
initial.<ext>
所选语言的初始解决方案,包含供LLMs变异的进化区域
evaluate.py
为候选程序评分并输出指标/正确性结果,指导选择过程
run_evo.py
(可选)启动进化循环
shinka.yaml
(可选)配置文件:进化代数、孤岛数量、LLM模型、补丁类型等

Quick Install (if Shinka is not set up yet)

快速安装(若尚未设置Shinka)

Install once before creating/running tasks:
bash
undefined
创建/运行任务前先安装一次:
bash
undefined

Check if shinka is available in workspace environment

检查工作区环境中是否有shinka

python -c "import shinka"
python -c "import shinka"

If not; install from PyPI

若没有;从PyPI安装

pip install shinka-evolve
pip install shinka-evolve

Or with uv

或使用uv安装

uv pip install shinka-evolve
undefined
uv pip install shinka-evolve
undefined

Language Support (
initial.<ext>
)

语言支持(
initial.<ext>

Shinka supports multiple candidate-program languages. Choose one, then keep extension/config/evaluator aligned.
evo_config.language
initial.<ext>
python
initial.py
julia
initial.jl
cpp
initial.cpp
cuda
initial.cu
rust
initial.rs
swift
initial.swift
json
/
json5
initial.json
Rules:
  • evaluate.py
    stays the evaluator entrypoint.
  • Python candidates: prefer
    run_shinka_eval
    +
    experiment_fn_name
    .
  • Non-Python candidates: evaluate via
    subprocess
    and write
    metrics.json
    +
    correct.json
    .
  • Always set both
    evo_config.language
    and matching
    evo_config.init_program_path
    .
Shinka支持多种候选程序语言。选择一种语言后,需保持扩展名/配置/评估器一致。
evo_config.language
initial.<ext>
python
initial.py
julia
initial.jl
cpp
initial.cpp
cuda
initial.cu
rust
initial.rs
swift
initial.swift
json
/
json5
initial.json
规则:
  • evaluate.py
    始终作为评估器入口。
  • Python候选程序:优先使用
    run_shinka_eval
    +
    experiment_fn_name
  • 非Python候选程序:通过
    subprocess
    评估并生成
    metrics.json
    +
    correct.json
  • 务必同时设置
    evo_config.language
    和匹配的
    evo_config.init_program_path

Template:
initial.<ext>
(Python example)

模板:
initial.<ext>
(Python示例)

py
import random
py
import random

EVOLVE-BLOCK-START

EVOLVE-BLOCK-START

def advanced_algo(): # Implement the evolving algorithm here. return 0.0, ""
def advanced_algo(): # Implement the evolving algorithm here. return 0.0, ""

EVOLVE-BLOCK-END

EVOLVE-BLOCK-END

def solve_problem(params): return advanced_algo()
def run_experiment(random_seed: int | None = None, **kwargs): """Main entrypoint called by evaluator.""" if random_seed is not None: random.seed(random_seed)
score, text = solve_problem(kwargs)
return float(score), text

For non-Python `initial.<ext>`, keep the same idea: small evolve region + deterministic program interface consumed by `evaluate.py`.
def solve_problem(params): return advanced_algo()
def run_experiment(random_seed: int | None = None, **kwargs): """Main entrypoint called by evaluator.""" if random_seed is not None: random.seed(random_seed)
score, text = solve_problem(kwargs)
return float(score), text

对于非Python版`initial.<ext>`,遵循相同思路:小范围进化区域 + 供`evaluate.py`调用的确定性程序接口。

Template:
evaluate.py
(Python
run_shinka_eval
path)

模板:
evaluate.py
(Python
run_shinka_eval
路径)

py
import argparse
import numpy as np

from shinka.core import run_shinka_eval  # required for results storage


def get_kwargs(run_idx: int) -> dict:
    return {"random_seed": int(np.random.randint(0, 1_000_000_000))}


def aggregate_fn(results: list) -> dict:
    scores = [r[0] for r in results]
    texts = [r[1] for r in results if len(r) > 1]
    combined_score = float(np.mean(scores))
    text = texts[0] if texts else ""
    return {
        "combined_score": combined_score,
        "public": {},
        "private": {},
        "extra_data": {},
        "text_feedback": text,
    }


def validate_fn(result):
    # Return (True, None) or (False, "reason")
    return True, None


def main(program_path: str, results_dir: str):
    metrics, correct, err = run_shinka_eval(
        program_path=program_path,
        results_dir=results_dir,
        experiment_fn_name="run_experiment",
        num_runs=3,
        get_experiment_kwargs=get_kwargs,
        aggregate_metrics_fn=aggregate_fn,
        validate_fn=validate_fn,  # Optional
    )
    if not correct:
        raise RuntimeError(err or "Evaluation failed")


if __name__ == "__main__":
    # argparse program path & dir
    parser = argparse.ArgumentParser()
    parser.add_argument("--program_path", required=True)
    parser.add_argument("--results_dir", required=True)
    args = parser.parse_args()
    main(program_path=args.program_path, results_dir=args.results_dir)
py
import argparse
import numpy as np

from shinka.core import run_shinka_eval  # required for results storage


def get_kwargs(run_idx: int) -> dict:
    return {"random_seed": int(np.random.randint(0, 1_000_000_000))}


def aggregate_fn(results: list) -> dict:
    scores = [r[0] for r in results]
    texts = [r[1] for r in results if len(r) > 1]
    combined_score = float(np.mean(scores))
    text = texts[0] if texts else ""
    return {
        "combined_score": combined_score,
        "public": {},
        "private": {},
        "extra_data": {},
        "text_feedback": text,
    }


def validate_fn(result):
    # Return (True, None) or (False, "reason")
    return True, None


def main(program_path: str, results_dir: str):
    metrics, correct, err = run_shinka_eval(
        program_path=program_path,
        results_dir=results_dir,
        experiment_fn_name="run_experiment",
        num_runs=3,
        get_experiment_kwargs=get_kwargs,
        aggregate_metrics_fn=aggregate_fn,
        validate_fn=validate_fn,  # Optional
    )
    if not correct:
        raise RuntimeError(err or "Evaluation failed")


if __name__ == "__main__":
    # argparse program path & dir
    parser = argparse.ArgumentParser()
    parser.add_argument("--program_path", required=True)
    parser.add_argument("--results_dir", required=True)
    args = parser.parse_args()
    main(program_path=args.program_path, results_dir=args.results_dir)

Template:
evaluate.py
(non-Python
initial.<ext>
path)

模板:
evaluate.py
(非Python
initial.<ext>
路径)

py
import argparse
import json
import os
from pathlib import Path


def main(program_path: str, results_dir: str):
    os.makedirs(results_dir, exist_ok=True)

    # 1) Execute candidate program_path (subprocess / runtime-specific call)
    # 2) Compute task metrics + correctness
    metrics = {
        "combined_score": 0.0,
        "public": {},
        "private": {},
        "extra_data": {},
        "text_feedback": "",
    }
    correct = False
    error = ""

    (Path(results_dir) / "metrics.json").write_text(
        json.dumps(metrics, indent=2), encoding="utf-8"
    )
    (Path(results_dir) / "correct.json").write_text(
        json.dumps({"correct": correct, "error": error}, indent=2), encoding="utf-8"
    )


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--program_path", required=True)
    parser.add_argument("--results_dir", required=True)
    args = parser.parse_args()
    main(program_path=args.program_path, results_dir=args.results_dir)
py
import argparse
import json
import os
from pathlib import Path


def main(program_path: str, results_dir: str):
    os.makedirs(results_dir, exist_ok=True)

    # 1) Execute candidate program_path (subprocess / runtime-specific call)
    # 2) Compute task metrics + correctness
    metrics = {
        "combined_score": 0.0,
        "public": {},
        "private": {},
        "extra_data": {},
        "text_feedback": "",
    }
    correct = False
    error = ""

    (Path(results_dir) / "metrics.json").write_text(
        json.dumps(metrics, indent=2), encoding="utf-8"
    )
    (Path(results_dir) / "correct.json").write_text(
        json.dumps({"correct": correct, "error": error}, indent=2), encoding="utf-8"
    )


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--program_path", required=True)
    parser.add_argument("--results_dir", required=True)
    args = parser.parse_args()
    main(program_path=args.program_path, results_dir=args.results_dir)

(Optional) Template:
run_evo.py
(async)

(可选)模板:
run_evo.py
(异步)

See
skills/shinka-setup/scripts/run_evo.py
for an example to edit.
请查看
skills/shinka-setup/scripts/run_evo.py
获取可编辑的示例。

(Optional) Template:
shinka.yaml

(可选)模板:
shinka.yaml

See
skills/shinka-setup/scripts/shinka.yaml
for an example to edit.
请查看
skills/shinka-setup/scripts/shinka.yaml
获取可编辑的示例。

Notes

注意事项

  • Keep evolve markers tight; only code inside the region should evolve.
  • Keep evaluator schema stable (
    combined_score
    ,
    public
    ,
    private
    ,
    extra_data
    ,
    text_feedback
    ).
  • Python module path: ensure
    experiment_fn_name
    matches function name in
    initial.py
    .
  • Non-Python path: ensure evaluator/runtime contract matches
    initial.<ext>
    CLI/I/O.
  • Higher
    combined_score
    values indicate better performance.
  • 进化标记范围要紧凑;只有标记区域内的代码会被进化。
  • 保持评估器格式稳定(
    combined_score
    public
    private
    extra_data
    text_feedback
    )。
  • Python模块路径:确保
    experiment_fn_name
    initial.py
    中的函数名匹配。
  • 非Python路径:确保评估器/运行时约定与
    initial.<ext>
    的CLI/I/O一致。
  • combined_score
    值越高表示性能越好。