fix-ocaml-gc

Original🇺🇸 English
Translated

Guide for debugging and fixing bugs in the OCaml garbage collector, particularly memory management issues in the runtime's sweeping and allocation code. This skill applies when working on OCaml runtime C code, investigating segfaults in GC operations, or fixing pointer arithmetic bugs in memory managers with size-classed pools and run-length encoding.

1installs
Added on

NPX Install

npx skill4agent add letta-ai/skills fix-ocaml-gc

Fixing OCaml Garbage Collector Bugs

This skill provides guidance for diagnosing and fixing bugs in the OCaml garbage collector runtime, with emphasis on memory management issues in sweeping, allocation, and free-list handling code.

When to Use This Skill

  • Debugging segfaults or memory corruption in OCaml's C runtime
  • Fixing bugs in the GC's sweeping or allocation logic
  • Working with size-classed memory pools and free block management
  • Investigating pointer arithmetic issues in memory managers
  • Analyzing run-length encoded free lists

Environment Setup

Establish Context First

Before making any changes:
  1. Check repository type: Verify if the project is a git repository before running git commands
  2. Understand the build system: Read build documentation (e.g.,
    HACKING.adoc
    ,
    INSTALL
    ,
    Makefile
    ) to understand compilation steps
  3. Verify file paths: Always use absolute paths when reading files to avoid path resolution errors
  4. Identify OCaml version: Different OCaml versions have different GC implementations (especially OCaml 5.x with multicore support)

Build Configuration

When building the OCaml compiler:
  1. Use appropriate timeouts: OCaml bootstrap compilation is lengthy; use timeouts of 10+ minutes for full builds
  2. Consider background builds: For long compilations, run in background and monitor progress
  3. Incremental builds: After initial bootstrap, use
    make
    without
    world
    target for faster iteration
bash
# Initial configuration
./configure

# Full build (may take 10+ minutes)
make world

# Incremental rebuild after changes
make

Debugging Approach

Locating GC Code

Key files in OCaml's runtime for GC-related issues:
  • runtime/shared_heap.c
    - Shared heap management (OCaml 5.x)
  • runtime/major_gc.c
    - Major GC implementation
  • runtime/minor_gc.c
    - Minor GC implementation
  • runtime/memory.c
    - Memory allocation primitives
  • runtime/gc_ctrl.c
    - GC control and statistics

Understanding Memory Layout

When analyzing GC bugs, understand these concepts:
  1. Block headers: OCaml blocks have headers containing size (
    wosize
    ) and tag information
  2. Size classes: Memory pools organize blocks by size class for efficient allocation
  3. Free lists: Free blocks may be linked or use run-length encoding
  4. Header repurposing: Free block headers may repurpose fields (e.g.,
    wosize
    for run-length counts)

Common Bug Patterns

Pointer Arithmetic Mismatches

A frequent bug pattern occurs when code uses header-derived sizes inappropriately:
c
// INCORRECT: Using header size for pool blocks
p += Whsize_hd(hd);  // May read repurposed field

// CORRECT: Using known block size for size-classed pools
p += wh;  // Use the fixed size class width
Root cause: In size-classed pools, all blocks have a fixed size
wh
determined by the size class. However, free block headers may repurpose the
wosize
field for other purposes (e.g., run-length encoding of contiguous free blocks). Using
Whsize_hd(hd)
reads this repurposed value instead of the actual block size.

Symptoms of Pointer Arithmetic Bugs

  • Segfaults during sweeping or compaction
  • Memory corruption that appears intermittent
  • Crashes only occurring with certain heap sizes or allocation patterns

Systematic Code Analysis

When investigating a bug:
  1. Trace the iteration: Follow pointer advancement through loops
  2. Identify size sources: Determine where block sizes come from (header vs. pool metadata)
  3. Check free block handling: Special attention to how free blocks differ from allocated blocks
  4. Verify invariants: Ensure pointer stays within valid memory regions

Verification Strategies

Testing the Fix

  1. Compilation test: Ensure the runtime compiles without errors
  2. Basic testsuite: Run the basic test suite to catch regressions
bash
# Run basic tests (use quotes for DIR variable)
make -C testsuite DIR='tests/basic' all
  1. Full testsuite: For comprehensive verification, run the complete test suite

Shell Command Pitfalls

When running tests via Makefiles:
  • Quote variable assignments: Use
    DIR='tests/basic'
    instead of
    DIR=tests/basic
    to avoid shell interpretation issues
  • Watch for escaping: Makefile variables may need different quoting than direct shell commands

Search for Similar Bugs

After fixing a bug, search for similar patterns:
bash
# Search for similar pointer arithmetic patterns
grep -n "Whsize_hd" runtime/*.c
grep -n "+= wh" runtime/*.c

Common Pitfalls

  1. Timeout too short: OCaml compilation needs extended timeouts (10+ minutes for full build)
  2. Relative paths: Always use absolute paths when reading files
  3. Git assumptions: Check if directory is a git repository before using git commands
  4. Incomplete verification: After fixing one instance, search for similar patterns elsewhere
  5. Shell quoting: Makefile variable assignments require careful quoting
  6. Header semantics: Remember that header fields may have different meanings for free vs. allocated blocks

Minimal Fix Principle

When fixing GC bugs:
  1. Understand the root cause: Ensure full understanding before changing code
  2. Make minimal changes: Change only what's necessary to fix the bug
  3. Preserve existing behavior: Avoid refactoring or "improving" surrounding code
  4. Document the fix: Ensure the change is self-explanatory or add a comment if needed

Verification Checklist

Before considering a fix complete:
  • Code compiles without errors or warnings
  • Basic testsuite passes
  • Searched for similar patterns in the codebase
  • Verified the fix addresses the root cause, not just symptoms
  • Considered edge cases (empty pools, boundary conditions, different size classes)