rlm

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Build an RLM

构建RLM

An RLM is a callable, pre-configured agent. It autonomously explores context, writes and executes code in a sandboxed REPL, calls tools, inspects results, and iterates until the task is done. Unlike a chat agent, an RLM is a function — you define its inputs, outputs, and tools, then call it from your code. It returns structured data, not chat messages.
This skill has two phases:
  1. Plan — interactively define the RLM with the user, research feasibility, produce a plan
  2. Build — implement the plan as code files
First action: Enter plan mode using the EnterPlanMode tool.

RLM是一种可调用的预配置Agent。它能自主探索上下文,在沙箱化REPL中编写并执行代码,调用工具,检查结果,并迭代直至任务完成。与聊天Agent不同,RLM是一个函数——你定义其输入、输出和工具,然后从代码中调用它。它返回结构化数据,而非聊天消息。
该技能包含两个阶段:
  1. 规划 —— 与用户交互式定义RLM,研究可行性,生成规划方案
  2. 构建 —— 将规划方案实现为代码文件
第一步操作:使用EnterPlanMode工具进入规划模式。

Phase 1: Plan

阶段1:规划

Work through these steps interactively. Do not skip steps or rush to the plan. Each step should involve asking the user questions and confirming alignment before moving on.
按步骤交互式推进,请勿跳过步骤或急于生成方案。每一步都应先向用户提问,确认达成共识后再继续。

Step 1: Goal Definition

步骤1:目标定义

Understand what the user wants to build.
Ask:
  • What is the desired outcome? What does success look like?
  • What is the input material? (documents, code, data, APIs, etc.)
  • What does the output look like? (structured report, modified files, spreadsheet, etc.)
Then validate RLM fit. An RLM is the right tool when:
  • The input is large and needs selective exploration (documents, datasets, codebases)
  • The task is multi-step with tool use (extract -> transform -> validate)
  • Actions modify state (redaction, form filling, generation)
  • Parallel sub-LM calls are needed across many items
  • File-to-file transformations (PDFs -> spreadsheets, documents -> reports)
If the task is better served by a single LLM call or a simple script, tell the user and suggest an alternative. Otherwise, proceed.
理解用户想要构建的内容。
提问:
  • 期望的成果是什么?成功的标准是什么?
  • 输入材料有哪些?(文档、代码、数据、API等)
  • 输出形式是什么?(结构化报告、修改后的文件、电子表格等)
然后验证RLM适配性。当满足以下条件时,RLM是合适的工具:
  • 输入内容庞大,需要选择性探索(文档、数据集、代码库)
  • 任务需要多步骤工具调用(提取 -> 转换 -> 验证)
  • 操作会修改状态(编辑、表单填充、生成)
  • 需要针对多个条目并行调用子LM
  • 需要文件到文件的转换(PDF -> 电子表格、文档 -> 报告)
如果任务更适合单次LLM调用或简单脚本,请告知用户并建议替代方案。否则,继续推进。

Step 2: Input Design

步骤2:输入设计

Work with the user to define every input to the RLM.
For each input, determine:
  • Name and type:
    File
    ,
    list[File]
    ,
    str
    , or a Pydantic model
  • Description: what it contains and how the RLM uses it
  • Source: user-provided file, API response, config, generated data
Key principles:
  • Large content (PDFs, images, datasets) must be
    File
    references — the RLM accesses content on-demand through skills, keeping its context small
  • Metadata (file paths, page counts, config flags) can be strings or Pydantic models
  • Use
    list[File]
    for variable-count file inputs
Confirm the input design with the user before proceeding.
与用户协作定义RLM的所有输入。
针对每个输入,确定:
  • 名称类型
    File
    list[File]
    str
    或Pydantic模型
  • 描述:输入包含的内容以及RLM如何使用它
  • 来源:用户提供的文件、API响应、配置、生成的数据
核心原则:
  • 大型内容(PDF、图片、数据集)必须使用
    File
    引用——RLM通过技能按需访问内容,保持其上下文规模较小
  • 元数据(文件路径、页数、配置标志)可以是字符串或Pydantic模型
  • 使用
    list[File]
    处理数量可变的文件输入
在推进前与用户确认输入设计。

Step 3: Output Design

步骤3:输出设计

Work with the user to define the structured output.
For each output field, determine:
  • Name, type, and description
  • Whether it's a Pydantic model (structured data),
    File
    (generated file), or primitive
Push for specificity — vague outputs lead to poor RLM performance. Sketch the Pydantic models with
Field(description=...)
annotations. Include nested models where appropriate.
Ask the user:
  • What fields matter most? What would they check first?
  • Are there any computed/derived fields (scores, summaries, counts)?
  • Do they need output files (Excel, PDF, images)?
Confirm the output design with the user before proceeding.
与用户协作定义结构化输出。
针对每个输出字段,确定:
  • 名称类型描述
  • 它是Pydantic模型(结构化数据)、
    File
    (生成的文件)还是基本类型
追求明确性——模糊的输出会导致RLM性能不佳。使用
Field(description=...)
注解绘制Pydantic模型,必要时包含嵌套模型。
向用户提问:
  • 哪些字段最重要?他们会首先检查哪些内容?
  • 是否存在计算/派生字段(分数、摘要、计数)?
  • 是否需要输出文件(Excel、PDF、图片)?
在推进前与用户确认输出设计。

Step 4: Research

步骤4:调研

This step is autonomous. Tell the user you are researching, then do it.
Use web search and the Explore subagent to:
  1. Find Python packages for the domain (e.g.,
    networkx
    for graphs,
    tree-sitter
    for code parsing,
    beautifulsoup4
    for HTML).
  2. Check Pyodide compatibility. The sandbox runs Pyodide (Python in WASM). Only pure-Python wheels or packages with Emscripten builds work. Search pypi.org for each package and check:
  3. Identify network needs. Does the task require calling external APIs? If so, note the domains for
    allowed_domains
    .
  4. Identify host-side tool needs. If any functionality cannot run in WASM (native binaries, C extensions, heavy computation), it must be a host-side tool — a Python function running on the host that the RLM calls like any other tool.
  5. Check for existing skills. The built-in skills are:
    • pdf
      — pymupdf for PDF rendering, text extraction, manipulation
    • spreadsheet
      — openpyxl, pandas, formulas for Excel work
    • docx
      — python-docx for reading, writing, and modifying Word documents
Report findings to the user with a clear feasibility assessment. Flag any blockers.
此步骤为自主执行。告知用户你正在调研,然后开展工作。
使用网络搜索和Explore子代理完成以下工作:
  1. 查找领域相关Python包(例如,用于图的
    networkx
    、用于代码解析的
    tree-sitter
    、用于HTML的
    beautifulsoup4
    )。
  2. 检查Pyodide兼容性。沙箱运行Pyodide(基于WASM的Python)。仅纯Python轮子或带有Emscripten构建版本的包可用。在pypi.org搜索每个包并检查:
  3. 识别网络需求。任务是否需要调用外部API?如果是,记录
    allowed_domains
    对应的域名。
  4. 识别主机端工具需求。如果某些功能无法在WASM中运行(原生二进制文件、C扩展、重型计算),则必须作为主机端工具——即在主机上运行的Python函数,RLM可像调用其他工具一样调用它。
  5. 检查现有技能。内置技能包括:
    • pdf
      —— 使用pymupdf进行PDF渲染、文本提取和操作
    • spreadsheet
      —— 使用openpyxl、pandas、formulas处理Excel
    • docx
      —— 使用python-docx读取、写入和修改Word文档
向用户汇报调研结果,包括清晰的可行性评估,并标记任何障碍。

Step 5: Skill Design

步骤5:技能设计

Based on research, design the skill configuration.
基于调研结果,设计技能配置。

Built-in skills

内置技能

List which built-in skills to use and why.
列出要使用的内置技能及其原因。

Custom skills (if needed)

自定义技能(如有需要)

For each custom skill, define:
  • name: short identifier
  • instructions: prose guidance injected into the RLM's system prompt — teaches the RLM patterns and best practices. Be detailed; this is the primary way to control RLM behavior.
  • packages: PyPI packages installed in the sandbox via micropip (must be Pyodide-compatible)
  • modules: Python files mounted into the sandbox as importable modules
  • tools: host-side callable functions exposed to the RLM
针对每个自定义技能,定义:
  • name:简短标识符
  • instructions:注入RLM系统提示的指导性文本——教授RLM模式和最佳实践。内容需详细;这是控制RLM行为的主要方式。
  • packages:通过micropip在沙箱中安装的PyPI包(必须兼容Pyodide)
  • modules:挂载到沙箱中作为可导入模块的Python文件
  • tools:暴露给RLM的主机端可调用函数

Host-side tool design

主机端工具设计

For each host-side tool:
  • Function name and signature with type hints
  • Docstring (the RLM sees this to understand how to call it)
  • What it does and why it must be host-side
Confirm the skill design with the user before proceeding.
针对每个主机端工具:
  • 带类型提示的函数名称和签名
  • 文档字符串(RLM通过它理解如何调用工具)
  • 工具的功能以及必须在主机端运行的原因
在推进前与用户确认技能设计。

Step 6: Strategy and Architecture

步骤6:策略与架构

Signature strategy

签名策略

Write the step-by-step strategy that goes in the signature's docstring. This is the RLM's playbook:
  1. What to do first (survey/understand the input)
  2. How to gather information (render pages, use predict() for extraction, call tools)
  3. How to process and synthesize
  4. What to produce and where to save output files
编写将放入签名文档字符串的分步策略,这是RLM的操作手册:
  1. 首先要做什么(调查/理解输入)
  2. 如何收集信息(渲染页面、使用predict()提取内容、调用工具)
  3. 如何处理和合成数据
  4. 要生成什么内容以及将输出文件保存到何处

Single vs chained RLMs

单RLM vs 链式RLM

Evaluate whether this needs one RLM or multiple chained RLMs.
Use a single RLM when:
  • The task is one coherent workflow
  • All steps need the same context/state
  • The iteration count stays reasonable (under 40)
Use chained RLMs when:
  • There are distinct phases with different skill needs
  • One phase produces artifacts consumed by another
  • The combined task would exceed reasonable iteration counts
  • Different phases benefit from different sub-LM models
If chaining, define each stage:
  • Stage name, signature (inputs/outputs), skills, strategy
  • The DAG: which stage feeds into which, with typed connections
评估任务需要单个RLM还是多个链式RLM。
使用单RLM的场景
  • 任务是一个连贯的工作流
  • 所有步骤需要相同的上下文/状态
  • 迭代次数保持合理(少于40次)
使用链式RLM的场景
  • 存在不同阶段,各阶段需要不同技能
  • 一个阶段生成的产物供另一个阶段使用
  • 组合任务的迭代次数超过合理范围
  • 不同阶段受益于不同的子LM模型
如果使用链式结构,定义每个阶段:
  • 阶段名称、签名(输入/输出)、技能、策略
  • DAG(有向无环图):哪些阶段向哪些阶段提供输入,包含类型化连接

Configuration

配置

  • max_iterations
    estimate per RLM
  • allowed_domains
    if network access is needed
  • sub_lm
    recommendations (capability level needed)
  • 每个RLM的
    max_iterations
    估计值
  • 如需网络访问,设置
    allowed_domains
  • sub_lm
    建议(所需的能力级别)

Feasibility Checklist

可行性检查清单

Before producing the final plan, verify:
  • All proposed packages are Pyodide-compatible (or have host-side fallbacks)
  • Network access needs are identified with specific domains
  • Host-side tools are defined for anything that can't run in WASM
  • Iteration count is reasonable (under 50 per RLM)
  • Input sizes are manageable (or chunking strategy is defined)
  • Output schemas are specific enough for reliable extraction
  • The task is achievable — no unsupported capabilities assumed
生成最终方案前,验证以下内容:
  • 所有提议的包都兼容Pyodide(或有主机端备选方案)
  • 已识别网络访问需求及具体域名
  • 为无法在WASM中运行的功能定义了主机端工具
  • 迭代次数合理(每个RLM少于50次)
  • 输入规模可控(或已定义分块策略)
  • 输出模式足够明确,以确保可靠提取
  • 任务可实现——未假设不支持的功能

Plan Output

方案输出

Write the plan to the Claude Code plan file with these sections:
  1. Overview — one paragraph: what, why, and expected workflow
  2. File manifest — every file to create with a one-line description
  3. Input schemas — complete Pydantic model code for
    schema.py
  4. Output schemas — complete Pydantic model code for
    schema.py
  5. Signature — complete
    signature.py
    code with strategy docstring
  6. Skills configuration — built-in imports + custom
    Skill(...)
    definitions + tool signatures
  7. Service architecture — single RLM wiring or chained DAG:
    Stage1(documents) --[ExtractedData]--> Stage2(extracted) --[Report]--> Stage3(report)
  8. Feasibility notes — constraints, risks, alternatives
  9. Estimated complexity — iteration count, sub-LM calls, cost range, runtime
After writing the plan, use ExitPlanMode to get user approval. Once approved, proceed to Phase 2.

将方案写入Claude Code规划文件,包含以下部分:
  1. 概述 —— 一段文字:内容、原因和预期工作流
  2. 文件清单 —— 要创建的所有文件及一行描述
  3. 输入模式 ——
    schema.py
    的完整Pydantic模型代码
  4. 输出模式 ——
    schema.py
    的完整Pydantic模型代码
  5. 签名 —— 带策略文档字符串的完整
    signature.py
    代码
  6. 技能配置 —— 内置导入 + 自定义
    Skill(...)
    定义 + 工具签名
  7. 服务架构 —— 单RLM连接或链式DAG:
    Stage1(documents) --[ExtractedData]--> Stage2(extracted) --[Report]--> Stage3(report)
  8. 可行性说明 —— 约束、风险、替代方案
  9. 预估复杂度 —— 迭代次数、子LM调用次数、成本范围、运行时间
编写完方案后,使用ExitPlanMode获取用户批准。批准后,进入阶段2。

Phase 2: Build

阶段2:构建

Implement the approved plan. Create all files following the patterns below.
实施已批准的方案,按照以下模式创建所有文件。

File structure

文件结构

my_rlm/
├── __init__.py       # Public exports (service class, schema, signature)
├── schema.py         # Pydantic models for inputs AND outputs
├── signature.py      # DSPy Signature (inputs/outputs + strategy docstring)
├── service.py        # DSPy Module wiring signature + PredictRLM + skills
└── skills.py         # (optional) Custom skill definitions beyond built-in skills
Always create:
schema.py
,
signature.py
,
service.py
,
__init__.py
Create when needed:
skills.py
(only if the RLM needs domain-specific instructions beyond built-in skills)
my_rlm/
├── __init__.py       # 公共导出(服务类、模式、签名)
├── schema.py         # 输入和输出的Pydantic模型
├── signature.py      # DSPy Signature(输入/输出 + 策略文档字符串)
├── service.py        # DSPy Module:连接签名 + PredictRLM + 技能
└── skills.py         # (可选)除内置技能外的自定义技能定义
必须创建
schema.py
signature.py
service.py
__init__.py
按需创建
skills.py
(仅当RLM需要超出内置技能的领域特定指令时)

schema.py — Pydantic models

schema.py —— Pydantic模型

Define models for structured inputs and outputs. Use
Field(description=...)
so the RLM knows what each field means.
python
from pydantic import BaseModel, Field


class KeyDate(BaseModel):
    """A key date extracted from a document."""

    name: str = Field(description="e.g. 'Submission Deadline', 'Effective Date'")
    date: str = Field(description="ISO format date (YYYY-MM-DD)")
    time: str | None = Field(
        None, description="24-hour format (HH:MM), e.g. '14:00', '09:30'"
    )
    timezone: str | None = Field(
        None, description="Timezone code, e.g. 'EST', 'EDT', 'PST', 'UTC'"
    )


class DocumentAnalysis(BaseModel):
    """Structured analysis of a document set."""

    report: str = Field(
        description="Full analysis as a well-formatted markdown report"
    )
    key_dates: list[KeyDate] = Field(
        default_factory=list, description="Important dates found in the documents"
    )
为结构化输入和输出定义模型。使用
Field(description=...)
让RLM了解每个字段的含义。
python
from pydantic import BaseModel, Field


class KeyDate(BaseModel):
    """从文档中提取的关键日期。"""

    name: str = Field(description="例如:'提交截止日期'、'生效日期'")
    date: str = Field(description="ISO格式日期(YYYY-MM-DD)")
    time: str | None = Field(
        None, description="24小时制格式(HH:MM),例如:'14:00'、'09:30'"
    )
    timezone: str | None = Field(
        None, description="时区代码,例如:'EST'、'EDT'、'PST'、'UTC'"
    )


class DocumentAnalysis(BaseModel):
    """文档集的结构化分析结果。"""

    report: str = Field(
        description="格式规范的markdown格式完整分析报告"
    )
    key_dates: list[KeyDate] = Field(
        default_factory=list, description="文档中发现的重要日期"
    )

signature.py — Inputs, outputs, and strategy

signature.py —— 输入、输出和策略

The docstring becomes the RLM's system instructions — tell the RLM how to approach the task step by step:
python
import dspy

from predict_rlm import File

from .schema import DocumentAnalysis


class AnalyzeDocuments(dspy.Signature):
    """Analyze documents and produce a structured report.

    1. **Read the report criteria** (appended below) to understand what
       information to extract and in what format.

    2. **Survey the documents** to understand what you're working with:
       file names, page counts, document types.

    3. **Gather information** systematically by rendering pages as images
       and using predict() to extract content.

    4. **Produce the report** following the format specified in the criteria.
       Use tables for structured data, prose for analysis and context.
    """

    documents: list[File] = dspy.InputField(
        desc="PDF documents to analyze"
    )
    analysis: DocumentAnalysis = dspy.OutputField(
        desc="Structured analysis with markdown report, key dates, and key entities"
    )
文档字符串将成为RLM的系统指令——逐步告知RLM如何处理任务:
python
import dspy

from predict_rlm import File

from .schema import DocumentAnalysis


class AnalyzeDocuments(dspy.Signature):
    """分析文档并生成结构化报告。

    1. **阅读报告标准**(附在下方),了解需要提取的信息及其格式。

    2. **调查文档**,了解工作对象:文件名、页数、文档类型。

    3. **系统收集信息**,将页面渲染为图片并使用predict()提取内容。

    4. **生成报告**,遵循标准中指定的格式。使用表格呈现结构化数据,使用散文体进行分析和说明上下文。
    """

    documents: list[File] = dspy.InputField(
        desc="要分析的PDF文档"
    )
    analysis: DocumentAnalysis = dspy.OutputField(
        desc="包含markdown报告、关键日期和关键实体的结构化分析结果"
    )

service.py — Wiring it together

service.py —— 整合所有组件

Wrap signature + skills + PredictRLM into a reusable DSPy Module:
python
import dspy

from predict_rlm import File, PredictRLM
from predict_rlm.skills import pdf as pdf_skill

from .schema import DocumentAnalysis
from .signature import AnalyzeDocuments


class DocumentAnalyzer(dspy.Module):
    def __init__(
        self,
        sub_lm: dspy.LM | str | None = None,
        max_iterations: int = 30,
        verbose: bool = False,
        debug: bool = False,
    ):
        self.sub_lm = sub_lm
        self.max_iterations = max_iterations
        self.verbose = verbose
        self.debug = debug

    async def aforward(
        self, documents: list[File], criteria: str
    ) -> DocumentAnalysis:
        signature = AnalyzeDocuments.with_instructions(
            AnalyzeDocuments.instructions + "\n\n# Task\n\n" + criteria.strip()
        )
        predictor = PredictRLM(
            signature,
            sub_lm=self.sub_lm,
            skills=[pdf_skill],
            max_iterations=self.max_iterations,
            verbose=self.verbose,
            debug=self.debug,
        )
        result = await predictor.acall(documents=documents)
        return result.analysis
When using multiple skills or host-side tools:
python
from predict_rlm.skills import pdf as pdf_skill
from predict_rlm.skills import spreadsheet as spreadsheet_skill

async def aforward(self, documents: list[File]) -> MyOutput:
    predictor = PredictRLM(
        MySignature,
        sub_lm=self.sub_lm,
        skills=[pdf_skill, spreadsheet_skill],
        tools={"fetch_exchange_rate": fetch_exchange_rate},
        ...
    )
将签名 + 技能 + PredictRLM包装为可复用的DSPy Module:
python
import dspy

from predict_rlm import File, PredictRLM
from predict_rlm.skills import pdf as pdf_skill

from .schema import DocumentAnalysis
from .signature import AnalyzeDocuments


class DocumentAnalyzer(dspy.Module):
    def __init__(
        self,
        sub_lm: dspy.LM | str | None = None,
        max_iterations: int = 30,
        verbose: bool = False,
        debug: bool = False,
    ):
        self.sub_lm = sub_lm
        self.max_iterations = max_iterations
        self.verbose = verbose
        self.debug = debug

    async def aforward(
        self, documents: list[File], criteria: str
    ) -> DocumentAnalysis:
        signature = AnalyzeDocuments.with_instructions(
            AnalyzeDocuments.instructions + "\n\n# 任务\n\n" + criteria.strip()
        )
        predictor = PredictRLM(
            signature,
            sub_lm=self.sub_lm,
            skills=[pdf_skill],
            max_iterations=self.max_iterations,
            verbose=self.verbose,
            debug=self.debug,
        )
        result = await predictor.acall(documents=documents)
        return result.analysis
使用多个技能或主机端工具时:
python
from predict_rlm.skills import pdf as pdf_skill
from predict_rlm.skills import spreadsheet as spreadsheet_skill

async def aforward(self, documents: list[File]) -> MyOutput:
    predictor = PredictRLM(
        MySignature,
        sub_lm=self.sub_lm,
        skills=[pdf_skill, spreadsheet_skill],
        tools={"fetch_exchange_rate": fetch_exchange_rate},
        ...
    )

Chaining pattern (multiple RLMs)

链式模式(多个RLM)

python
async def aforward(self, documents: list[File]):
    # Stage 1: Extract
    extractor = PredictRLM(ExtractSignature, sub_lm=self.sub_lm, skills=[pdf_skill])
    extracted = await extractor.acall(documents=documents)

    # Stage 2: Analyze (uses output from stage 1)
    analyzer = PredictRLM(AnalyzeSignature, sub_lm=self.sub_lm, skills=[analysis_skill])
    result = await analyzer.acall(data=extracted.data)

    return result
python
async def aforward(self, documents: list[File]):
    # 阶段1:提取
    extractor = PredictRLM(ExtractSignature, sub_lm=self.sub_lm, skills=[pdf_skill])
    extracted = await extractor.acall(documents=documents)

    # 阶段2:分析(使用阶段1的输出)
    analyzer = PredictRLM(AnalyzeSignature, sub_lm=self.sub_lm, skills=[analysis_skill])
    result = await analyzer.acall(data=extracted.data)

    return result

skills.py — Custom skills

skills.py —— 自定义技能

Create only when the RLM needs domain-specific instructions beyond built-in skills.
python
from predict_rlm import Skill
from predict_rlm.skills import pdf as pdf_skill

redaction_skill = Skill(
    name="redaction",
    instructions="""How to redact content from PDFs using pymupdf.
仅当RLM需要超出内置技能的领域特定指令时创建。
python
from predict_rlm import Skill
from predict_rlm.skills import pdf as pdf_skill

redaction_skill = Skill(
    name="redaction",
    instructions="""如何使用pymupdf对PDF内容进行编辑。

Text redaction

文本编辑

Search for text, create redaction annotations, then apply: page = doc[page_num] hits = page.search_for("sensitive text") for rect in hits: page.add_redact_annot(rect, fill=(0, 0, 0)) page.apply_redactions() ...""", )
all = ["pdf_skill", "redaction_skill"]

---
搜索文本,创建编辑注释,然后应用: page = doc[page_num] hits = page.search_for("敏感文本") for rect in hits: page.add_redact_annot(rect, fill=(0, 0, 0)) page.apply_redactions() ...""", )
all = ["pdf_skill", "redaction_skill"]

---

Architecture Reference

架构参考

Use this reference to ensure plans and implementations are accurate. Do not hallucinate parameters or patterns.
使用此参考确保方案和实现准确无误,请勿虚构参数或模式。

How an RLM works

RLM工作原理

The architecture is two-level:
  1. The outer LLM (the RLM itself) writes and executes Python code in a sandboxed Pyodide/WASM REPL. It plans, orchestrates, and iterates.
  2. The sub-LM (via
    predict()
    ) handles perception and extraction — analyzing images, understanding text, and returning typed results. Each
    predict()
    call gets its own context window.
The outer LLM's context stays small (code + tool results), while context-heavy work is offloaded to
predict()
calls.
架构分为两层:
  1. 外层LLM(RLM本身)在沙箱化Pyodide/WASM REPL中编写并执行Python代码。它负责规划、编排和迭代。
  2. 子LM(通过
    predict()
    )处理感知和提取——分析图片、理解文本并返回类型化结果。每个
    predict()
    调用拥有独立的上下文窗口。
外层LLM的上下文保持较小规模(代码 + 工具结果),而上下文密集型工作则卸载到
predict()
调用中。

File I/O

文件I/O

Use
File
for file-typed fields:
  • Input field: mounts the file from host into the sandbox at
    /sandbox/input/{field_name}/
  • Output field: syncs from
    /sandbox/output/{field_name}/
    back to the host
python
from predict_rlm import File
对文件类型字段使用
File
  • 输入字段:将文件从主机挂载到沙箱的
    /sandbox/input/{field_name}/
    路径
  • 输出字段:从
    /sandbox/output/{field_name}/
    同步回主机
python
from predict_rlm import File

Input: File(path="/absolute/path/to/file.pdf")

输入:File(path="/绝对路径/to/file.pdf")

Output: declared as File output field, RLM writes to /sandbox/output/<field>/

输出:声明为File输出字段,RLM写入到/sandbox/output/<field>/

undefined
undefined

PredictRLM constructor

PredictRLM构造函数

python
PredictRLM(
    signature: type[Signature] | str,     # DSPy signature class
    lm: dspy.LM | str | None = None,      # Main LM (code generation)
    sub_lm: dspy.LM | str | None = None,  # Sub-LM for predict() calls
    max_iterations: int = 30,
    max_llm_calls: int = 50,
    verbose: bool = False,
    tools: dict[str, Callable] | list[Callable] | None = None,
    allowed_domains: list[str] | None = None,
    skills: list[Skill] | None = None,
    debug: bool = False,
    output_dir: str | Path | None = None,
)
Both
lm
and
sub_lm
accept a model string (e.g.
"openai/gpt-5.4"
) or a
dspy.LM
instance. If
lm
is omitted, the current context LM from
dspy.context(lm=...)
is used.
python
PredictRLM(
    signature: type[Signature] | str,     # DSPy签名类
    lm: dspy.LM | str | None = None,      # 主LM(代码生成)
    sub_lm: dspy.LM | str | None = None,  # 用于predict()调用的子LM
    max_iterations: int = 30,
    max_llm_calls: int = 50,
    verbose: bool = False,
    tools: dict[str, Callable] | list[Callable] | None = None,
    allowed_domains: list[str] | None = None,
    skills: list[Skill] | None = None,
    debug: bool = False,
    output_dir: str | Path | None = None,
)
lm
sub_lm
都接受模型字符串(例如
"openai/gpt-5.4"
)或
dspy.LM
实例。如果省略
lm
,则使用
dspy.context(lm=...)
中的当前上下文LM。

Skill dataclass

Skill数据类

python
from predict_rlm import Skill

Skill(
    name="my-skill",                          # Short identifier
    instructions="How to approach...",         # Prose injected into the RLM prompt
    packages=["pandas", "openpyxl"],           # PyPI packages installed in the sandbox
    modules={"helper": "/path/to/helper.py"},  # Python files mounted as importable modules
    tools={"fetch": fetch_fn},                 # Host-side callable functions exposed to the RLM
)
Skills can bundle host-side tools via their
tools=
field. When skills are composed, their tools are merged alongside instructions and packages (tool name conflicts raise errors).
python
from predict_rlm import Skill

Skill(
    name="my-skill",                          # 简短标识符
    instructions="How to approach...",         # 注入RLM提示的指导性文本
    packages=["pandas", "openpyxl"],           # 在沙箱中安装的PyPI包
    modules={"helper": "/path/to/helper.py"},  # 挂载为可导入模块的Python文件
    tools={"fetch": fetch_fn},                 # 暴露给RLM的主机端可调用函数
)
技能可通过
tools=
字段捆绑主机端工具。组合技能时,其工具会与指令和包合并(工具名称冲突会引发错误)。

Built-in skills

内置技能

python
from predict_rlm.skills import pdf as pdf_skill          # pymupdf
from predict_rlm.skills import spreadsheet as spreadsheet_skill  # openpyxl, pandas, formulas
from predict_rlm.skills import docx as docx_skill        # python-docx
SkillPackagesModulesWhat it teaches the RLM
pdf
pymupdf
Read, render, modify, and redact PDFs
spreadsheet
openpyxl
,
pandas
,
formulas
formula_eval
Build and modify Excel workbooks with formulas and formatting
docx
python-docx
md2docx
Read, write, and modify Word documents with tables, formatting, and styles
python
from predict_rlm.skills import pdf as pdf_skill          # pymupdf
from predict_rlm.skills import spreadsheet as spreadsheet_skill  # openpyxl, pandas, formulas
from predict_rlm.skills import docx as docx_skill        # python-docx
技能模块教授RLM的内容
pdf
pymupdf
读取、渲染、修改和编辑PDF
spreadsheet
openpyxl
,
pandas
,
formulas
formula_eval
使用公式和格式构建及修改Excel工作簿
docx
python-docx
md2docx
读取、写入和修改带有表格、格式和样式的Word文档

Tools

工具

Tools are host-side functions the RLM can call from the sandbox. Use them for operations that cannot run inside the sandbox — host access, authenticated APIs, database queries, system resources.
python
async def fetch_exchange_rate(currency: str, date: str) -> str:
    """Fetch the exchange rate for a currency on a given date.

    Args:
        currency: ISO currency code (e.g. "EUR", "GBP")
        date: Date in YYYY-MM-DD format

    Returns:
        JSON string with the exchange rate data
    """
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"https://api.example.com/rates/{currency}/{date}")
        return resp.text
Tools can be passed directly to PredictRLM via
tools={"name": fn}
or bundled inside a Skill via
tools=
.
工具是RLM可从沙箱中调用的主机端函数。用于无法在沙箱内运行的操作——主机访问、认证API、数据库查询、系统资源。
python
async def fetch_exchange_rate(currency: str, date: str) -> str:
    """获取指定日期的货币汇率。

    参数:
        currency: ISO货币代码(例如"EUR"、"GBP")
        date: YYYY-MM-DD格式的日期

    返回:
        包含汇率数据的JSON字符串
    """
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"https://api.example.com/rates/{currency}/{date}")
        return resp.text
工具可通过
tools={"name": fn}
直接传递给PredictRLM,或通过
tools=
捆绑在Skill中。

When to use a Skill vs tools

使用Skill vs tools的场景对比

Use a Skill when...Use
tools=
when...
The RLM needs a package installed in the sandboxThe function must run on the host (API calls, DB queries, filesystem)
You need to teach the RLM how to use somethingThe tool's docstring is self-explanatory
The knowledge is reusable across RLMsIt's a single specific function for one RLM
使用Skill的场景...使用
tools=
的场景...
RLM需要在沙箱中安装函数必须在主机上运行(API调用、数据库查询、文件系统操作)
需要教授RLM如何使用某功能工具的文档字符串本身已清晰说明
知识可在多个RLM间复用仅为单个RLM提供的特定函数

predict() tool (inside sandbox)

predict()工具(沙箱内)

The RLM can call
predict()
for sub-LM perception/extraction:
python
result = await predict(
    "image: dspy.Image -> items: list[Item]",
    instructions="Extract all line items from this invoice page",
    image=page_image,
)
Each predict() call gets its own context window. Supports
dspy.Image
for multimodal.
RLM可调用
predict()
进行子LM感知/提取:
python
result = await predict(
    "image: dspy.Image -> items: list[Item]",
    instructions="从发票页面提取所有行项目",
    image=page_image,
)
每个predict()调用拥有独立的上下文窗口,支持
dspy.Image
用于多模态场景。

Key imports

核心导入

python
from predict_rlm import PredictRLM, Skill, File
from predict_rlm.skills import pdf, spreadsheet, docx
python
from predict_rlm import PredictRLM, Skill, File
from predict_rlm.skills import pdf, spreadsheet, docx

WASM sandbox constraints

WASM沙箱约束

  • Only pure-Python wheels or Pyodide built-in packages work
  • No subprocess, no native binaries, no C extensions (unless Emscripten-built)
  • Network access requires
    allowed_domains
    whitelist
  • File I/O is within the sandbox filesystem
  • Host-side tools bridge the gap for anything WASM can't do
  • 仅支持纯Python轮子或Pyodide内置包
  • 无 subprocess、无原生二进制文件、无C扩展(除非是Emscripten构建版本)
  • 网络访问需要
    allowed_domains
    白名单
  • 文件I/O仅限于沙箱文件系统
  • 主机端工具可弥补WASM无法实现的功能