tilegym-monkey-patch-kernels-to-transformers

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Integrate and create cuTile kernels into 🤗 Transformers

集成TileGym内核至🤗 Transformers并创建cuTile内核

The main purpose of TileGym project is to provide performant kernels for LLM training and inference. We will integrate proper kernels available in TileGym project to LLM models provided by Hugging Face
transformers
library to validate end-to-end functional correctness and performance improvements. Instead of modifying
transformers
source code, we will take a non-intrusive monkey-patch approach: We will replace certain modules/classes/methods in
transformers
library that implement the Transformer model we would like to integrate, such that at model instantiation, that model's core components will be replaced by TileGym implementations. At runtime the model will actually invoke TileGym kernels under the hood. In addition, we will follow an auto-research-style agent harness loop to create and integrate new cuTile kernels to the target model to improve kernel coverage and end-to-end throughput.
TileGym项目的主要目标是为LLM训练与推理提供高性能内核。我们将把TileGym项目中适用的内核集成到Hugging Face
transformers
库提供的LLM模型中,以验证端到端功能的正确性及性能提升。我们不会修改
transformers
的源代码,而是采用非侵入式的猴子补丁(monkey-patch)方案:替换
transformers
库中实现Transformer模型的特定模块/类/方法,使得在模型实例化时,该模型的核心组件会被TileGym的实现替代。在运行时,模型实际上会在底层调用TileGym内核。此外,我们将遵循自动研究风格的Agent harness循环,创建并集成新的cuTile内核到目标模型中,以提升内核覆盖率和端到端吞吐量。

Instructions

操作说明

This is for human readers: Simply prompt your favorite AI Agent with skill name and target model ID. E.g.,:
Claude
Hi, please /monkey-patch-kernels-to-transformers Qwen/Qwen3.5-0.8B.
The Agent might ask you several questions. Make clarifications and give a go confirmation.
此部分面向人类读者:只需向你常用的AI Agent发送包含技能名称和目标模型ID的指令即可。例如:
Claude
Hi, please /monkey-patch-kernels-to-transformers Qwen/Qwen3.5-0.8B.
Agent可能会向你提出几个问题,请做出说明并确认执行。

Workflow

工作流程

  1. Prepare experiment environment. Follow environment-setup.md
  2. Integrate existing TileGym kernels to the target model. Follow kernel-integration.md
  3. Autonomously create new cuTile kernels for uncovered PyTorch code. Follow auto-kernelize.md
    • Feel free to add new cuTile kernels with constraints in mind
    • Do not stop until meet auto-kernelize loop stop conditions
  4. Summarize and report
  1. 准备实验环境。请遵循environment-setup.md
  2. 将现有TileGym内核集成到目标模型中。请遵循kernel-integration.md
  3. 为未覆盖的PyTorch代码自动创建新的cuTile内核。请遵循auto-kernelize.md
    • 在考虑约束条件的前提下,可自由添加新的cuTile内核
    • 直至满足自动内核化循环的停止条件方可结束
  4. 总结并提交报告

Disciplines

执行规范

This is for AI Agents executing this workflow.
此部分面向执行该工作流程的AI Agent。

Kernel inventory

内核清单

Reusable transformer-local kernels must be represented with FlashInfer-Bench-style Definition and Solution metadata. Follow kernel-inventory-schema.md when researching compute requirements, inventorying existing kernels, proposing candidates, or creating new generated kernels.
可复用的Transformer本地内核必须采用FlashInfer-Bench风格的定义与解决方案元数据来表示。在研究计算需求、盘点现有内核、提出候选方案或创建新生成内核时,请遵循kernel-inventory-schema.md