The main purpose of TileGym project is to provide performant kernels for LLM training and inference. We will integrate proper kernels available in TileGym project to LLM models provided by Hugging Face
library to validate end-to-end functional correctness and performance improvements. Instead of modifying
source code, we will take a non-intrusive monkey-patch approach: We will replace certain modules/classes/methods in
library that implement the Transformer model we would like to integrate, such that at model instantiation, that model's core components will be replaced by TileGym implementations. At runtime the model will actually invoke TileGym kernels under the hood. In addition, we will follow an auto-research-style agent harness loop to create and integrate new cuTile kernels to the target model to improve kernel coverage and end-to-end throughput.
This is for human readers: Simply prompt your favorite AI Agent with skill name and target model ID. E.g.,:
The Agent might ask you several questions. Make clarifications and give a go confirmation.
This is for AI Agents executing this workflow.
Reusable transformer-local kernels must be represented with FlashInfer-Bench-style Definition and Solution metadata. Follow kernel-inventory-schema.md when researching compute requirements, inventorying existing kernels, proposing candidates, or creating new generated kernels.