monkey-patch-kernels-to-transformers
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseIntegrate and create cuTile kernels into 🤗 Transformers
将cuTile内核集成并创建至🤗 Transformers
The main purpose of TileGym project is to provide performant kernels for LLM training and inference. We will integrate proper kernels available in TileGym project to LLM models provided by Hugging Face library to validate end-to-end functional correctness and performance improvements. Instead of modifying source code, we will take a non-intrusive monkey-patch approach: We will replace certain modules/classes/methods in library that implement the Transformer model we would like to integrate, such that at model instantiation, that model's core components will be replaced by TileGym implementations. At runtime the model will actually invoke TileGym kernels under the hood. In addition, we will follow an auto-research-style agent harness loop to create and integrate new cuTile kernels to the target model to improve kernel coverage and end-to-end throughput.
transformerstransformerstransformersTileGym项目的主要目标是为LLM训练与推理提供高性能内核。我们将把TileGym项目中适用的内核集成到Hugging Face 库提供的LLM模型中,以验证端到端的功能正确性和性能提升。我们不会修改的源代码,而是采用非侵入式的猴子补丁(monkey-patch)方法:替换库中实现目标Transformer模型的特定模块/类/方法,使得在模型实例化时,该模型的核心组件会被TileGym的实现替代。在运行时,模型实际上会在底层调用TileGym内核。此外,我们将遵循自动研究风格的代理 harness 循环,创建并集成新的cuTile内核到目标模型中,以提升内核覆盖率和端到端吞吐量。
transformerstransformerstransformersWorkflow
工作流程
- Prepare experiment environment. Follow environment-setup.md
- Integrate existing TileGym kernels to the target model. Follow kernel-integration.md
- Autonomously create new cuTile kernels for uncovered PyTorch code. Follow auto-kernelize.md
- Feel free to add new cuTile kernels with constraints in mind
- Do not stop until meet auto-kernelize loop stop conditions
- Summarize and report
- 准备实验环境。请遵循environment-setup.md
- 将现有TileGym内核集成到目标模型中。请遵循kernel-integration.md
- 为未覆盖的PyTorch代码自动创建新的cuTile内核。请遵循auto-kernelize.md
- 请在约束范围内自由添加新的cuTile内核
- 直到满足自动内核化循环停止条件为止
- 总结并报告