fix-ocaml-gc

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Fixing OCaml Garbage Collector Bugs

OCaml垃圾收集器Bug修复

This skill provides guidance for diagnosing and fixing bugs in the OCaml garbage collector runtime, with emphasis on memory management issues in sweeping, allocation, and free-list handling code.
本技能为OCaml垃圾收集器运行时的Bug诊断与修复提供指导,重点关注清理、分配与空闲列表处理代码中的内存管理问题。

When to Use This Skill

何时使用该技能

  • Debugging segfaults or memory corruption in OCaml's C runtime
  • Fixing bugs in the GC's sweeping or allocation logic
  • Working with size-classed memory pools and free block management
  • Investigating pointer arithmetic issues in memory managers
  • Analyzing run-length encoded free lists
  • 调试OCaml C运行时中的段错误或内存损坏问题
  • 修复GC清理或分配逻辑中的Bug
  • 处理大小分类内存池与空闲块管理相关工作
  • 排查内存管理器中的指针运算问题
  • 分析采用运行长度编码的空闲列表

Environment Setup

环境搭建

Establish Context First

先建立上下文

Before making any changes:
  1. Check repository type: Verify if the project is a git repository before running git commands
  2. Understand the build system: Read build documentation (e.g.,
    HACKING.adoc
    ,
    INSTALL
    ,
    Makefile
    ) to understand compilation steps
  3. Verify file paths: Always use absolute paths when reading files to avoid path resolution errors
  4. Identify OCaml version: Different OCaml versions have different GC implementations (especially OCaml 5.x with multicore support)
在进行任何修改之前:
  1. 检查仓库类型:在运行git命令前,确认项目是否为git仓库
  2. 了解构建系统:阅读构建文档(如
    HACKING.adoc
    INSTALL
    Makefile
    )以理解编译步骤
  3. 验证文件路径:读取文件时始终使用绝对路径,避免路径解析错误
  4. 确定OCaml版本:不同OCaml版本的GC实现不同(尤其是支持多核的OCaml 5.x)

Build Configuration

构建配置

When building the OCaml compiler:
  1. Use appropriate timeouts: OCaml bootstrap compilation is lengthy; use timeouts of 10+ minutes for full builds
  2. Consider background builds: For long compilations, run in background and monitor progress
  3. Incremental builds: After initial bootstrap, use
    make
    without
    world
    target for faster iteration
bash
undefined
编译OCaml编译器时:
  1. 使用合适的超时设置:OCaml引导编译耗时较长,完整构建需设置10分钟以上的超时
  2. 考虑后台构建:对于长时间编译,可在后台运行并监控进度
  3. 增量构建:初始引导完成后,使用不带
    world
    目标的
    make
    命令以加快迭代速度
bash
undefined

Initial configuration

Initial configuration

./configure
./configure

Full build (may take 10+ minutes)

Full build (may take 10+ minutes)

make world
make world

Incremental rebuild after changes

Incremental rebuild after changes

make
undefined
make
undefined

Debugging Approach

调试方法

Locating GC Code

定位GC代码

Key files in OCaml's runtime for GC-related issues:
  • runtime/shared_heap.c
    - Shared heap management (OCaml 5.x)
  • runtime/major_gc.c
    - Major GC implementation
  • runtime/minor_gc.c
    - Minor GC implementation
  • runtime/memory.c
    - Memory allocation primitives
  • runtime/gc_ctrl.c
    - GC control and statistics
OCaml运行时中与GC相关的关键文件:
  • runtime/shared_heap.c
    - 共享堆管理(OCaml 5.x)
  • runtime/major_gc.c
    - 主GC实现
  • runtime/minor_gc.c
    - 次GC实现
  • runtime/memory.c
    - 内存分配原语
  • runtime/gc_ctrl.c
    - GC控制与统计

Understanding Memory Layout

理解内存布局

When analyzing GC bugs, understand these concepts:
  1. Block headers: OCaml blocks have headers containing size (
    wosize
    ) and tag information
  2. Size classes: Memory pools organize blocks by size class for efficient allocation
  3. Free lists: Free blocks may be linked or use run-length encoding
  4. Header repurposing: Free block headers may repurpose fields (e.g.,
    wosize
    for run-length counts)
分析GC Bug时,需理解以下概念:
  1. 块头:OCaml块包含大小(
    wosize
    )和标签信息的头部
  2. 大小分类:内存池按大小分类组织块,以实现高效分配
  3. 空闲列表:空闲块可能采用链表或运行长度编码
  4. 头部复用:空闲块头部可能复用字段(如
    wosize
    用于运行长度计数)

Common Bug Patterns

常见Bug模式

Pointer Arithmetic Mismatches

指针运算不匹配

A frequent bug pattern occurs when code uses header-derived sizes inappropriately:
c
// INCORRECT: Using header size for pool blocks
p += Whsize_hd(hd);  // May read repurposed field

// CORRECT: Using known block size for size-classed pools
p += wh;  // Use the fixed size class width
Root cause: In size-classed pools, all blocks have a fixed size
wh
determined by the size class. However, free block headers may repurpose the
wosize
field for other purposes (e.g., run-length encoding of contiguous free blocks). Using
Whsize_hd(hd)
reads this repurposed value instead of the actual block size.
一种常见的Bug模式是代码不当使用头部派生的大小:
c
// INCORRECT: Using header size for pool blocks
p += Whsize_hd(hd);  // May read repurposed field

// CORRECT: Using known block size for size-classed pools
p += wh;  // Use the fixed size class width
根本原因:在大小分类内存池中,所有块的大小
wh
由其所属的大小类别决定。但空闲块头部可能将
wosize
字段复用于其他用途(如连续空闲块的运行长度编码)。使用
Whsize_hd(hd)
会读取这个被复用的值,而非实际的块大小。

Symptoms of Pointer Arithmetic Bugs

指针运算Bug的症状

  • Segfaults during sweeping or compaction
  • Memory corruption that appears intermittent
  • Crashes only occurring with certain heap sizes or allocation patterns
  • 清理或压缩过程中出现段错误
  • 间歇性出现的内存损坏
  • 仅在特定堆大小或分配模式下发生崩溃

Systematic Code Analysis

系统化代码分析

When investigating a bug:
  1. Trace the iteration: Follow pointer advancement through loops
  2. Identify size sources: Determine where block sizes come from (header vs. pool metadata)
  3. Check free block handling: Special attention to how free blocks differ from allocated blocks
  4. Verify invariants: Ensure pointer stays within valid memory regions
排查Bug时:
  1. 追踪迭代过程:跟随循环中的指针推进路径
  2. 识别大小来源:确定块大小的来源(头部 vs 内存池元数据)
  3. 检查空闲块处理:特别注意空闲块与已分配块的差异处理
  4. 验证不变量:确保指针始终处于有效内存区域内

Verification Strategies

验证策略

Testing the Fix

测试修复

  1. Compilation test: Ensure the runtime compiles without errors
  2. Basic testsuite: Run the basic test suite to catch regressions
bash
undefined
  1. 编译测试:确保运行时编译无错误
  2. 基础测试套件:运行基础测试套件以捕获回归问题
bash
undefined

Run basic tests (use quotes for DIR variable)

Run basic tests (use quotes for DIR variable)

make -C testsuite DIR='tests/basic' all

3. **Full testsuite**: For comprehensive verification, run the complete test suite
make -C testsuite DIR='tests/basic' all

3. **完整测试套件**:为全面验证,运行完整测试套件

Shell Command Pitfalls

Shell命令陷阱

When running tests via Makefiles:
  • Quote variable assignments: Use
    DIR='tests/basic'
    instead of
    DIR=tests/basic
    to avoid shell interpretation issues
  • Watch for escaping: Makefile variables may need different quoting than direct shell commands
通过Makefile运行测试时:
  • 引用变量赋值:使用
    DIR='tests/basic'
    而非
    DIR=tests/basic
    ,避免shell解析问题
  • 注意转义:Makefile变量的引用方式可能与直接shell命令不同

Search for Similar Bugs

搜索相似Bug

After fixing a bug, search for similar patterns:
bash
undefined
修复Bug后,搜索代码中相似的模式:
bash
undefined

Search for similar pointer arithmetic patterns

Search for similar pointer arithmetic patterns

grep -n "Whsize_hd" runtime/.c grep -n "+= wh" runtime/.c
undefined
grep -n "Whsize_hd" runtime/.c grep -n "+= wh" runtime/.c
undefined

Common Pitfalls

常见陷阱

  1. Timeout too short: OCaml compilation needs extended timeouts (10+ minutes for full build)
  2. Relative paths: Always use absolute paths when reading files
  3. Git assumptions: Check if directory is a git repository before using git commands
  4. Incomplete verification: After fixing one instance, search for similar patterns elsewhere
  5. Shell quoting: Makefile variable assignments require careful quoting
  6. Header semantics: Remember that header fields may have different meanings for free vs. allocated blocks
  1. 超时设置过短:OCaml编译需要延长超时时间(完整构建需10分钟以上)
  2. 相对路径:读取文件时始终使用绝对路径
  3. Git假设:在使用git命令前,检查目录是否为git仓库
  4. 验证不完整:修复一个实例后,需搜索代码中其他相似模式
  5. Shell引用:Makefile变量赋值需谨慎处理引用
  6. 头部语义:记住空闲块与已分配块的头部字段语义可能不同

Minimal Fix Principle

最小修复原则

When fixing GC bugs:
  1. Understand the root cause: Ensure full understanding before changing code
  2. Make minimal changes: Change only what's necessary to fix the bug
  3. Preserve existing behavior: Avoid refactoring or "improving" surrounding code
  4. Document the fix: Ensure the change is self-explanatory or add a comment if needed
修复GC Bug时:
  1. 理解根本原因:在修改代码前确保完全理解问题根源
  2. 最小化修改:仅修改修复Bug所需的部分
  3. 保留现有行为:避免重构或“改进”无关代码
  4. 记录修复:确保修改自解释,必要时添加注释

Verification Checklist

验证清单

Before considering a fix complete:
  • Code compiles without errors or warnings
  • Basic testsuite passes
  • Searched for similar patterns in the codebase
  • Verified the fix addresses the root cause, not just symptoms
  • Considered edge cases (empty pools, boundary conditions, different size classes)
在确认修复完成前:
  • 代码编译无错误或警告
  • 基础测试套件通过
  • 已在代码库中搜索相似模式
  • 验证修复针对的是根本原因而非仅症状
  • 考虑了边缘情况(空内存池、边界条件、不同大小分类)