nemo-automodel-model-onboarding
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAdding Model Support to NeMo AutoModel
为NeMo AutoModel添加模型支持
Purpose
目的
This skill guides implementation of new model architectures in NeMo AutoModel. Follow the five phases in order.
本指南指导如何在NeMo AutoModel中实现新模型架构,请按顺序完成以下五个阶段。
Instructions
说明
When answering an onboarding question, keep the response in this order:
- Classify the architecture from .
config.json - Name the exact implementation files under .
components/models/<name>/ - Identify registry and optional custom-config updates.
- State the validation tests that must be added before full checkpoint use.
For conceptual onboarding questions, answer from this skill without opening the
pattern files unless the user asks you to edit code. Mention pattern filenames
as references, then give the direct checklist.
Use direct action verbs: classify the model, name the files, map the weights,
register the class, and add tests. Do not discuss distributed strategy,
launcher configuration, or general recipe authoring unless the user explicitly
connects it to onboarding a new architecture.
回答接入相关问题时,请遵循以下顺序:
- 从中对架构进行分类。
config.json - 指明下的具体实现文件。
components/models/<name>/ - 确认注册表及可选的自定义配置更新。
- 说明在完全使用检查点前必须添加的验证测试。
对于概念性接入问题,直接依据本指南回答,除非用户要求编辑代码,否则无需打开模式文件。可提及模式文件名作为参考,然后给出直接的检查清单。
使用直接的动作动词:分类模型、命名文件、映射权重、注册类、添加测试。除非用户明确将其与新架构接入关联,否则不要讨论分布式策略、启动器配置或通用配方编写。
Examples
示例
Use these compact answer patterns for common questions:
- Dense causal LM: classify as dense only when contains a
architecturesclass and expert fields such asForCausalLM,num_local_experts, orn_routed_expertsare absent. Createnum_experts_per_tok,components/models/<name>/model.py,state_dict_adapter.py, and optional__init__.py, registerconfig.pyinMODEL_ARCH_MAPPING, add example YAML, and add tiny-config unit tests plus layer-equivalence tests for rewritten layers._transformers/registry.py - MoE state dict: identify expert fields in , reference
config.json, map router tensors separately, preserve routed-expert index order, map routed experts, shared experts, and gate/up/down projections, add adapter key-map tests and tiny-config numerical equivalence tests, and do not rely only onmoe-patterns.mdor silent tensor reshapes.from_pretrained() - VLM onboarding: classify as VLM only when ,
vision_config, and atext_configarchitecture are present. ReferenceForConditionalGenerationand existing VLM implementations such asvlm-patterns.md,mistral4, orkimivl; check text backbone, vision tower, projector, processor assumptions, text and visionkimi_k25_vlmappings, registry registration, and tiny image-text tests before full checkpoints. Do not treat VLM onboarding as a pure causal-LM path or skip processor/image tests.state_dict_adapter.py
For MoE state-dict questions, always include the safety checklist:
- Map router tensors separately from expert tensors.
- Preserve routed-expert index order; never sort, drop, merge, or silently reshape expert weights to make loading pass.
- Map gate, up, and down projections explicitly, including combined projection layouts and shared experts when present.
- Add adapter key-map tests and tiny-config numerical equivalence tests before relying on full checkpoint loading.
For VLM questions, explicitly check , , the
conditional-generation architecture, text backbone, vision tower, projector,
processor assumptions, registry entry, and tiny image-text tests.
vision_configtext_config针对常见问题,可使用以下简洁的回答模板:
- 密集型因果语言模型:仅当包含
architectures类且不存在ForCausalLM、num_local_experts或n_routed_experts等专家字段时,才归类为密集型。创建num_experts_per_tok、components/models/<name>/model.py、state_dict_adapter.py及可选的__init__.py,在config.py中注册_transformers/registry.py,添加示例YAML,并为重写的层添加微型配置单元测试及层等价性测试。MODEL_ARCH_MAPPING - MoE状态字典:识别中的专家字段,参考
config.json,单独映射路由器张量,保留路由专家的索引顺序,映射路由专家、共享专家以及门控/向上/向下投影,添加适配器键映射测试和微型配置数值等价性测试,不要仅依赖moe-patterns.md或静默张量重塑。from_pretrained() - VLM接入:仅当包含
architectures架构且存在ForConditionalGeneration、vision_config时,才归类为VLM。参考text_config及现有VLM实现(如vlm-patterns.md、mistral4或kimivl);在使用完整检查点前,检查文本骨干、视觉塔、投影器、处理器假设、文本和视觉kimi_k25_vl映射、注册表注册以及微型图文测试。不要将VLM接入视为纯因果语言模型流程,也不要跳过处理器/图像测试。state_dict_adapter.py
对于MoE状态字典相关问题,务必包含以下安全检查清单:
- 将路由器张量与专家张量分开映射。
- 保留路由专家的索引顺序;切勿为使加载通过而进行排序、丢弃、合并或静默重塑专家权重。
- 显式映射门控、向上和向下投影,包括组合投影布局及存在的共享专家。
- 在依赖完整检查点加载前,添加适配器键映射测试和微型配置数值等价性测试。
对于VLM相关问题,需显式检查、、条件生成架构、文本骨干、视觉塔、投影器、处理器假设、注册表条目以及微型图文测试。
vision_configtext_configRouting Boundary
适用边界
Use this skill only when the user is adding or modifying model architecture support: model files, custom layers, state-dict adapters, Hugging Face config mapping, registry entries, or model capability flags.
Do not use this skill for standalone training recipe YAML questions about optimizers, datasets, schedulers, validation datasets, or trainer wiring unless they are explicitly part of onboarding a new model architecture. Those recipe questions belong to the nemo-automodel-recipe-development skill.
In-scope examples:
- "Add support for a new Hugging Face causal LM architecture."
- "Map MoE router and expert weights from a Hugging Face checkpoint."
- "Register a new model class in NeMo AutoModel."
Out-of-scope examples:
- "Write a finetuning recipe YAML with optimizer and dataset sections."
- "Choose FSDP2, DDP, tensor parallel, or context parallel settings."
- "Configure Slurm, SkyPilot, containers, mounts, or launch dispatch."
仅当用户添加或修改模型架构支持时使用本指南:模型文件、自定义层、状态字典适配器、Hugging Face配置映射、注册表条目或模型功能标志。
除非明确属于新模型架构接入的一部分,否则不要将本指南用于关于优化器、数据集、调度器、验证数据集或训练器连接的独立训练配方YAML问题。这些配方问题属于nemo-automodel-recipe-development技能范畴。
适用场景示例:
- "为新的Hugging Face因果语言模型架构添加支持。"
- "从Hugging Face检查点映射MoE路由器和专家权重。"
- "在NeMo AutoModel中注册新模型类。"
不适用场景示例:
- "编写包含优化器和数据集部分的微调配方YAML。"
- "选择FSDP2、DDP、张量并行或上下文并行设置。"
- "配置Slurm、SkyPilot、容器、挂载或启动调度。"
Phase 1: Discovery
阶段1:探索
Before writing code, gather information about the target model.
编写代码前,收集目标模型的相关信息。
1.1 Fetch HuggingFace config.json
1.1 获取HuggingFace config.json
Download the model's from the HuggingFace Hub (or use ). Key fields to extract:
config.jsonAutoConfig.from_pretrained- -- determines the class name and registration key (e.g.,
architectures,"LlamaForCausalLM","Qwen3MoeForCausalLM")"Mistral3ForConditionalGeneration" - -- used for custom config registration in
model_typeif HF does not have a built-in config class_CUSTOM_CONFIG_REGISTRATIONS - ,
hidden_size,intermediate_size,num_hidden_layers,num_attention_heads-- sizingnum_key_value_heads - -- needed for tiny test configs
vocab_size - -- whether lm_head shares weights with embed_tokens
tie_word_embeddings - -- activation function (e.g.,
hidden_actfor SwiGLU)"silu"
从HuggingFace Hub下载模型的(或使用)。需提取的关键字段:
config.jsonAutoConfig.from_pretrained- -- 确定类名和注册键(例如
architectures、"LlamaForCausalLM"、"Qwen3MoeForCausalLM")"Mistral3ForConditionalGeneration" - -- 如果HF没有内置配置类,用于在
model_type中注册自定义配置_CUSTOM_CONFIG_REGISTRATIONS - 、
hidden_size、intermediate_size、num_hidden_layers、num_attention_heads-- 尺寸参数num_key_value_heads - -- 微型测试配置所需
vocab_size - -- lm_head是否与embed_tokens共享权重
tie_word_embeddings - -- 激活函数(例如SwiGLU对应
hidden_act)"silu"
1.2 Determine model type
1.2 确定模型类型
| Type | Indicators | Pattern file |
|---|---|---|
| Dense LLM | | llm-patterns.md |
| MoE LLM | | moe-patterns.md |
| VLM | | vlm-patterns.md |
| 类型 | 标识 | 模式文件 |
|---|---|---|
| 密集型LLM | 架构中包含 | llm-patterns.md |
| MoE LLM | 配置中包含 | moe-patterns.md |
| VLM | 架构中包含 | vlm-patterns.md |
1.3 Check for existing similar architectures
1.3 检查是否存在类似架构
Look in for architectures with similar attention or MLP patterns:
components/models/components/models/
llama/ # Standard GQA + SwiGLU (CombinedQKV + CombinedGateUpMLP)
qwen2/ # Same as Llama but with attention bias + QKV bias
baichuan/ # ALiBi attention variant
deepseek_v3/ # MLA attention + MoE (DeepSeek-style grouped experts)
mistral4/ # MLA + MoE + VLM (Pixtral vision)
kimivl/ # DeepSeek-V3 backbone + MoonVit vision
kimi_k25_vl/ # Updated KimiVL with different projector
qwen3_moe/ # Qwen3 with MoE layers
nemotron_v3/ # Hybrid mamba-attention在中查找具有相似注意力或MLP模式的架构:
components/models/components/models/
llama/ # 标准GQA + SwiGLU(CombinedQKV + CombinedGateUpMLP)
qwen2/ # 与Llama类似,但带有注意力偏置 + QKV偏置
baichuan/ # ALiBi注意力变体
deepseek_v3/ # MLA注意力 + MoE(DeepSeek风格分组专家)
mistral4/ # MLA + MoE + VLM(Pixtral视觉)
kimivl/ # DeepSeek-V3骨干 + MoonVit视觉
kimi_k25_vl/ # 改进版KimiVL,带有不同的投影器
qwen3_moe/ # 带有MoE层的Qwen3
nemotron_v3/ # 混合mamba-注意力1.4 Identify custom components
1.4 识别自定义组件
Check whether the model needs:
- Custom attention: GQA (standard), MLA (DeepSeek/Mistral4), sliding window, bidirectional
- Custom RoPE: Standard (Llama), YaRN scaling, NTK-aware, complex-number (DeepSeek)
- Custom normalization: RMSNorm (standard), LayerNorm, different eps values
- Custom MLP: SwiGLU (standard), GeGLU, ReLU-squared, MoE routing
- Custom config class: Needed only if HF cannot parse the model's
AutoConfig(checkconfig.jsonfield)auto_map
检查模型是否需要:
- 自定义注意力:GQA(标准)、MLA(DeepSeek/Mistral4)、滑动窗口、双向
- 自定义RoPE:标准(Llama)、YaRN缩放、NTK-aware、复数(DeepSeek)
- 自定义归一化:RMSNorm(标准)、LayerNorm、不同的eps值
- 自定义MLP:SwiGLU(标准)、GeGLU、ReLU平方、MoE路由
- 自定义配置类:仅当HF 无法解析模型的
AutoConfig时需要(检查config.json字段)auto_map
1.5 Note dimensions for test config
1.5 记录测试配置的维度
For unit tests, create a tiny config. Target: ~1M parameters or less.
python
undefined对于单元测试,创建一个微型配置。目标:参数约1M或更少。
python
undefinedExample tiny config for a Llama-like model:
Llama类模型的微型配置示例:
tiny_config = LlamaConfig(
hidden_size=64,
intermediate_size=128,
num_hidden_layers=2,
num_attention_heads=4,
num_key_value_heads=2,
vocab_size=256,
max_position_embeddings=128,
)
---tiny_config = LlamaConfig(
hidden_size=64,
intermediate_size=128,
num_hidden_layers=2,
num_attention_heads=4,
num_key_value_heads=2,
vocab_size=256,
max_position_embeddings=128,
)
---Phase 2: Implementation
阶段2:实现
2.1 Create directory structure
2.1 创建目录结构
components/models/<name>/
__init__.py
model.py
state_dict_adapter.py
config.py # Only if HF config is insufficient
layers.py # Only for MoE / MLA / other non-standard layers
rope_utils.py # Only for custom RoPEcomponents/models/<name>/
__init__.py
model.py
state_dict_adapter.py
config.py # 仅当HF配置不足时需要
layers.py # 仅适用于MoE / MLA / 其他非标准层
rope_utils.py # 仅当需要自定义RoPE时需要2.2 Implementation order
2.2 实现顺序
Implement files in dependency order:
- config.py (if needed) -- Custom subclass
PretrainedConfig - rope_utils.py (if needed) -- RoPE implementation
- layers.py (if needed) -- Attention, MLP, decoder block classes
- model.py -- The main (or
ForCausalLM) classForConditionalGeneration - state_dict_adapter.py -- HF weight conversion
- init.py -- Re-export the main model class
See the pattern files for detailed implementation guidance:
- Dense LLM: llm-patterns.md
- MoE: moe-patterns.md
- VLM: vlm-patterns.md
按依赖顺序实现文件:
- config.py(如需要)-- 自定义子类
PretrainedConfig - rope_utils.py(如需要)-- RoPE实现
- layers.py(如需要)-- 注意力、MLP、解码器块类
- model.py -- 主(或
ForCausalLM)类ForConditionalGeneration - state_dict_adapter.py -- HF权重转换
- init.py -- 重新导出主模型类
请参考模式文件获取详细实现指导:
- 密集型LLM:llm-patterns.md
- MoE:moe-patterns.md
- VLM:vlm-patterns.md
2.3 MoE state-dict adapter checklist
2.3 MoE状态字典适配器检查清单
For MoE models, do not stop at generic loading. The adapter must explicitly map:
- Router weights, including gate bias or correction-bias tensors when the Hugging Face model has them.
- Expert weights, preserving expert index order across local and routed experts.
- Gate/up/down projections, including combined or split projection layouts.
- Shared experts separately from routed experts when the architecture has both.
Add tests that assert expected key mappings and run numerical equivalence with tiny configs before trying full checkpoints.
Do not use these shortcuts:
- Do not validate the adapter only by calling .
from_pretrained() - Do not accept missing or extra expert keys without an explicit mapping reason.
- Do not change dtype, transpose dimensions, or reshape tensors unless the HF and NeMo layouts require it and a test proves the conversion is reversible.
- Do not skip router or shared-expert tests because dense-layer tests pass.
对于MoE模型,不要停留在通用加载层面。适配器必须显式映射:
- 路由器权重,包括Hugging Face模型中存在的门控偏置或校正偏置张量。
- 专家权重,保留本地和路由专家的索引顺序。
- 门控/向上/向下投影,包括组合或拆分投影布局。
- 当架构同时存在共享专家和路由专家时,单独映射共享专家。
在尝试完整检查点之前,添加测试以断言预期的键映射,并通过微型配置运行数值等价性验证。
请勿使用以下捷径:
- 不要仅通过调用来验证适配器。
from_pretrained() - 不要在没有显式映射原因的情况下接受缺失或多余的专家键。
- 除非HF和NeMo布局要求且测试证明转换可逆,否则不要更改数据类型、转置维度或重塑张量。
- 不要因为密集层测试通过就跳过路由器或共享专家测试。
2.4 VLM onboarding checklist
2.4 VLM接入检查清单
For VLMs, confirm the Hugging Face config has and
and that points to a conditional-generation class. Start from
the closest VLM pattern file, usually vlm-patterns.md, and
compare existing implementations such as , , or
.
vision_configtext_configarchitecturesmistral4kimivlkimi_k25_vlThe implementation should explicitly cover:
- Text backbone, vision tower, projector, and processor or image preprocessing assumptions.
- Weight mapping for both text and vision modules in .
state_dict_adapter.py - Registration of the class in
ForConditionalGeneration._transformers/registry.py - Tiny tests that exercise image-text inputs and verify the adapter round-trip.
对于VLM,确认Hugging Face配置包含和,且指向条件生成类。从最接近的VLM模式文件(通常是vlm-patterns.md)开始,并对比现有实现(如、或)。
vision_configtext_configarchitecturesmistral4kimivlkimi_k25_vl实现需明确覆盖:
- 文本骨干、视觉塔、投影器以及处理器或图像预处理假设。
- 中文本和视觉模块的权重映射。
state_dict_adapter.py - 在中注册
_transformers/registry.py类。ForConditionalGeneration - 测试图文输入并验证适配器往返的微型测试。
2.5 Register in registry
2.5 在注册表中注册
Add the model to in :
MODEL_ARCH_MAPPING_transformers/registry.pypython
undefined将模型添加到中的:
_transformers/registry.pyMODEL_ARCH_MAPPINGpython
undefinedIn _transformers/registry.py
在_transformers/registry.py中
MODEL_ARCH_MAPPING = OrderedDict([
# ... existing entries ...
(
"NewModelForCausalLM",
("nemo_automodel.components.models.new_model.model", "NewModelForCausalLM"),
),
])
If the model has a custom config class with `auto_map` in its `config.json`, also register in `_CUSTOM_CONFIG_REGISTRATIONS`:
```python
_CUSTOM_CONFIG_REGISTRATIONS: Dict[str, Tuple[str, str]] = {
# ... existing entries ...
"new_model": ("nemo_automodel.components.models.new_model.configuration", "NewModelConfig"),
}MODEL_ARCH_MAPPING = OrderedDict([
# ... 现有条目 ...
(
"NewModelForCausalLM",
("nemo_automodel.components.models.new_model.model", "NewModelForCausalLM"),
),
])
如果模型的`config.json`中`auto_map`字段指向自定义配置类,还需在`_CUSTOM_CONFIG_REGISTRATIONS`中注册:
```python
_CUSTOM_CONFIG_REGISTRATIONS: Dict[str, Tuple[str, str]] = {
# ... 现有条目 ...
"new_model": ("nemo_automodel.components.models.new_model.configuration", "NewModelConfig"),
}Phase 3: Onboarding Example Config
阶段3:接入示例配置
This phase is only for adding a minimal example config that proves the newly
onboarded architecture can load and run. Use nemo-automodel-recipe-development for general
recipe authoring or existing recipe modifications.
此阶段仅用于添加最小示例配置,以证明新接入的架构可以加载并运行。通用配方编写或现有配方修改请使用nemo-automodel-recipe-development技能。
3.1 Create example YAML config
3.1 创建示例YAML配置
Create an example config under (or ):
examples/llm_finetune/<name>/examples/vlm_finetune/<name>/yaml
model:
_target_: nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained
pretrained_model_name_or_path: <org>/<model-name>
trainer:
max_steps: 100
gradient_clip_val: 1.0
accumulate_grad_batches: 1在(或)下创建示例配置:
examples/llm_finetune/<name>/examples/vlm_finetune/<name>/yaml
model:
_target_: nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained
pretrained_model_name_or_path: <org>/<model-name>
trainer:
max_steps: 100
gradient_clip_val: 1.0
accumulate_grad_batches: 1... data, optimizer config ...
... 数据、优化器配置 ...
undefinedundefined3.2 Verify model loads
3.2 验证模型加载
Test that the model loads from a HuggingFace checkpoint:
python
from nemo_automodel import NeMoAutoModelForCausalLM
model = NeMoAutoModelForCausalLM.from_pretrained("<org>/<model-name>")测试模型是否可以从HuggingFace检查点加载:
python
from nemo_automodel import NeMoAutoModelForCausalLM
model = NeMoAutoModelForCausalLM.from_pretrained("<org>/<model-name>")3.3 Test with tiny config first
3.3 先使用微型配置测试
Before using full-size models, verify with a tiny config (1-2 layers, small hidden dim) to catch shape mismatches early.
在使用全尺寸模型之前,先通过微型配置(1-2层,小隐藏维度)验证,以尽早发现形状不匹配问题。
Phase 4: Tests
阶段4:测试
Create and cover the checks below before
loading full checkpoints:
tests/unit_tests/models/<name>/- Forward-shape smoke test with a tiny config.
- State-dict adapter round-trip: preserves mapped names, shapes, dtypes, and values.
from_hf -> to_hf - Layer equivalence tests for every rewritten attention, MLP, normalization,
RoPE, or MoE layer. Use the model dtype from config, identical seeded weights,
identical inputs, and dtype-appropriate tolerances.
torch.allclose - Short functional test that verifies loss decreases over a few training steps.
创建,并在加载完整检查点前完成以下检查:
tests/unit_tests/models/<name>/- 使用微型配置进行前向形状冒烟测试。
- 状态字典适配器往返测试:保留映射名称、形状、数据类型和值。
from_hf -> to_hf - 每个重写的注意力、MLP、归一化、RoPE或MoE层的层等价性测试。使用配置中的模型数据类型、相同的种子权重、相同的输入以及适合数据类型的容差。
torch.allclose - 验证训练损失在几个步骤中下降的简短功能测试。
Phase 5: Documentation
阶段5:文档
5.1 Update model coverage page
5.1 更新模型覆盖页面
Edit the appropriate file in :
docs/model-coverage/- LLM/MoE:
docs/model-coverage/llm/index.md - VLM:
docs/model-coverage/vlm/index.md
Add a row with the model name, supported features (TP, PP, FSDP, LoRA, QLoRA), and any limitations.
编辑中的对应文件:
docs/model-coverage/- LLM/MoE:
docs/model-coverage/llm/index.md - VLM:
docs/model-coverage/vlm/index.md
添加一行,包含模型名称、支持的功能(TP、PP、FSDP、LoRA、QLoRA)以及任何限制。
Phase 6: Parity Testing
阶段6:等价性测试
After implementation and unit tests are complete, run the full parity-testing
workflow to verify that the new model produces numerically equivalent results to
the reference HuggingFace implementation.
Run three levels of comparison:
- State-dict round-trip: load a reference HuggingFace checkpoint, convert it into the NeMo AutoModel layout, export it back, and verify that all mapped tensors match the reference names, shapes, dtypes, and values within the expected tolerance.
- Component-level parity: compare rewritten attention, MLP, normalization, RoPE, and MoE components against the HuggingFace implementation with fixed seeds and identical dtype.
- End-to-end forward pass: run the full NeMo AutoModel and HuggingFace model on the same tokenized input and compare logits, hidden states, and loss.
Do not skip this phase. A model that passes unit tests can still diverge from HF
due to subtle weight-conversion bugs, backend differences, or RoPE mismatches
that only surface in a full parity comparison.
完成实现和单元测试后,运行完整的等价性测试工作流,以验证新模型产生的结果与参考HuggingFace实现在数值上等价。
运行三个级别的对比:
- 状态字典往返:加载参考HuggingFace检查点,将其转换为NeMo AutoModel布局,再导出回去,验证所有映射张量的名称、形状、数据类型和值在预期容差内与参考一致。
- 组件级等价性:在固定种子和相同数据类型下,对比重写的注意力、MLP、归一化、RoPE和MoE组件与HuggingFace实现的结果。
- 端到端前向传播:在相同的分词输入上运行完整的NeMo AutoModel和HuggingFace模型,对比logits、隐藏状态和损失。
不要跳过此阶段。通过单元测试的模型仍可能因细微的权重转换错误、后端差异或RoPE不匹配而与HF结果偏离,这些问题只有在完整等价性对比中才会显现。
Key Files Reference
关键文件参考
| File | Purpose |
|---|---|
| |
| Exports |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| 文件 | 用途 |
|---|---|
| |
| 导出 |
| 带有 |
| 带有交错门控/向上布局的 |
| |
| 用于保存/加载的 |
| |
| |
| 用于分布式专家处理的 |
| |
| |
Checklist
检查清单
- Fetched and analyzed from HuggingFace
config.json - Determined model type (dense LLM / MoE / VLM)
- Identified custom components (attention, RoPE, normalization, MLP)
- Created directory
components/models/<name>/ - Implemented config.py (if custom config needed)
- Implemented layers.py (if custom layers needed)
- Implemented rope_utils.py (if custom RoPE needed)
- Implemented model.py with
HFCheckpointingMixin - Implemented state_dict_adapter.py
- Implemented init.py with re-export
- Registered in in
MODEL_ARCH_MAPPING_transformers/registry.py - Registered custom config in (if applicable)
_CUSTOM_CONFIG_REGISTRATIONS - Created example YAML config
- Verified model loads via
NeMoAutoModelForCausalLM.from_pretrained() - Created unit tests (forward shape, state_dict round-trip)
- Created layer equivalence tests for every rewritten layer (matching model dtype)
- Created functional tests (training loss decreases)
- Updated docs/model-coverage page
- Ran state-dict round-trip, component parity, and E2E forward-pass parity checks
- Set at module bottom
ModelClass = <Name>ForCausalLM
- 从HuggingFace获取并分析了
config.json - 确定了模型类型(密集型LLM / MoE / VLM)
- 识别了自定义组件(注意力、RoPE、归一化、MLP)
- 创建了目录
components/models/<name>/ - 实现了config.py(如需要自定义配置)
- 实现了layers.py(如需要自定义层)
- 实现了rope_utils.py(如需要自定义RoPE)
- 实现了带有的model.py
HFCheckpointingMixin - 实现了state_dict_adapter.py
- 实现了带有重导出的__init__.py
- 在的
_transformers/registry.py中完成注册MODEL_ARCH_MAPPING - 在中注册了自定义配置(如适用)
_CUSTOM_CONFIG_REGISTRATIONS - 创建了示例YAML配置
- 通过验证了模型加载
NeMoAutoModelForCausalLM.from_pretrained() - 创建了单元测试(前向形状、状态字典往返)
- 为每个重写的层创建了层等价性测试(匹配模型数据类型)
- 创建了功能测试(训练损失下降)
- 更新了docs/model-coverage页面
- 运行了状态字典往返、组件等价性和端到端前向传播等价性检查
- 在模块底部设置了
ModelClass = <Name>ForCausalLM