analyzing-golang-malware-with-ghidra

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Analyzing Golang Malware with Ghidra

使用Ghidra分析Golang恶意软件

Overview

概述

Go (Golang) has become a popular language for malware authors due to its cross-compilation capabilities, static linking that produces self-contained binaries, and the complexity it introduces for reverse engineering. Go binaries contain the entire runtime, standard library, and all dependencies statically linked, resulting in large binaries (often 5-15MB) with thousands of functions. Ghidra struggles with Go-specific string formats (non-null-terminated), stripped function names, and goroutine concurrency patterns. Specialized tools like GoResolver (Volexity, 2025) use control-flow graph similarity to automatically deobfuscate and recover function names in stripped or obfuscated Go binaries.
Go(Golang)因其跨编译能力、生成自包含二进制文件的静态链接特性,以及给逆向工程带来的复杂性,已成为恶意软件作者青睐的语言。Go二进制文件静态链接了整个运行时、标准库及所有依赖项,导致文件体积较大(通常为5-15MB),包含数千个函数。Ghidra在处理Go特有的字符串格式(非空终止)、已剥离的函数名以及goroutine并发模式时存在困难。GoResolver(Volexity,2025)等专用工具利用控制流图相似度,自动对已剥离或混淆的Go二进制文件进行反混淆并恢复函数名。

When to Use

使用场景

  • When investigating security incidents that require analyzing golang malware with ghidra
  • When building detection rules or threat hunting queries for this domain
  • When SOC analysts need structured procedures for this analysis type
  • When validating security monitoring coverage for related attack techniques
  • 调查需要用Ghidra分析Golang恶意软件的安全事件时
  • 构建该领域的检测规则或威胁狩猎查询时
  • SOC分析师需要此类分析的结构化流程时
  • 验证相关攻击技术的安全监控覆盖范围时

Prerequisites

前置条件

  • Ghidra 11.0+ with JDK 17+
  • GoResolver plugin (for function name recovery)
  • Go Reverse Engineering Tool Kit (go-re.tk)
  • Python 3.9+ for helper scripts
  • Understanding of Go runtime internals (goroutines, channels, interfaces)
  • Familiarity with Go binary structure (pclntab, moduledata, itab)
  • 安装有JDK 17+的Ghidra 11.0+
  • GoResolver插件(用于恢复函数名)
  • Go逆向工程工具包(go-re.tk)
  • 用于辅助脚本的Python 3.9+
  • 了解Go运行时内部机制(goroutines、channels、interfaces)
  • 熟悉Go二进制文件结构(pclntab、moduledata、itab)

Key Concepts

核心概念

Go Binary Structure

Go二进制文件结构

Go binaries embed rich metadata in the
pclntab
(PC Line Table) structure, which maps program counters to function names, source files, and line numbers. Even stripped binaries retain this metadata. The
moduledata
structure contains pointers to type information, itabs (interface tables), and the pclntab itself. Go strings are stored as a pointer-length pair rather than null-terminated C strings.
Go二进制文件在
pclntab
(PC行表)结构中嵌入了丰富的元数据,该结构将程序计数器映射到函数名、源文件和行号。即使是已剥离符号的二进制文件也会保留此元数据。
moduledata
结构包含指向类型信息、itabs(接口表)和pclntab本身的指针。Go字符串以指针-长度对的形式存储,而非空终止的C字符串。

Function Recovery in Stripped Binaries

已剥离二进制文件中的函数恢复

Despite stripping symbol tables, Go binaries retain function names within the pclntab. However, obfuscation tools like garble rename functions to random strings. GoResolver addresses this by computing control-flow graph signatures of obfuscated functions and matching them against a database of known Go standard library and third-party package functions.
尽管符号表被剥离,Go二进制文件仍会在pclntab中保留函数名。但garble等混淆工具会将函数重命名为随机字符串。GoResolver通过计算混淆函数的控制流图签名,并与已知的Go标准库和第三方包函数数据库进行匹配来解决此问题。

Crate/Dependency Extraction

包/依赖提取

Go's dependency management embeds module paths and version strings in the binary. Extracting these reveals the malware's third-party dependencies (HTTP libraries, encryption packages, C2 frameworks), which provides insight into capabilities without full reverse engineering.
Go的依赖管理会将模块路径和版本字符串嵌入二进制文件中。提取这些信息可以揭示恶意软件的第三方依赖(HTTP库、加密包、C2框架),无需完全逆向工程即可了解其功能。

Workflow

工作流程

Step 1: Initial Binary Analysis

步骤1:初始二进制分析

python
#!/usr/bin/env python3
"""Analyze Go binary metadata for malware analysis."""
import struct
import sys
import re


def find_go_build_info(data):
    """Extract Go build information from binary."""
    # Go buildinfo magic: \xff Go buildinf:
    magic = b'\xff Go buildinf:'
    offset = data.find(magic)
    if offset == -1:
        return None

    print(f"[+] Go build info at offset 0x{offset:x}")

    # Extract Go version string nearby
    go_version = re.search(rb'go\d+\.\d+(?:\.\d+)?', data[offset:offset+256])
    if go_version:
        print(f"  Go Version: {go_version.group().decode()}")

    return offset


def find_pclntab(data):
    """Locate the pclntab (PC Line Table) structure."""
    # pclntab magic bytes vary by Go version
    magics = {
        b'\xfb\xff\xff\xff\x00\x00': "Go 1.2-1.15",
        b'\xfa\xff\xff\xff\x00\x00': "Go 1.16-1.17",
        b'\xf1\xff\xff\xff\x00\x00': "Go 1.18-1.19",
        b'\xf0\xff\xff\xff\x00\x00': "Go 1.20+",
    }

    for magic, version in magics.items():
        offset = data.find(magic)
        if offset != -1:
            print(f"[+] pclntab found at 0x{offset:x} ({version})")
            return offset, version

    return None, None


def extract_function_names(data, pclntab_offset):
    """Extract function names from pclntab."""
    if pclntab_offset is None:
        return []

    functions = []
    # Function name strings follow specific patterns
    func_pattern = re.compile(
        rb'(?:main|runtime|fmt|net|os|crypto|encoding|io|sync|'
        rb'syscall|reflect|strings|bytes|path|time|math|sort|'
        rb'github\.com|golang\.org)[/\.][\w/.]+',
    )

    for match in func_pattern.finditer(data):
        name = match.group().decode('utf-8', errors='replace')
        if len(name) > 4 and len(name) < 200:
            functions.append(name)

    return sorted(set(functions))


def extract_go_strings(data):
    """Extract Go-style strings (pointer+length pairs)."""
    # Go strings are not null-terminated; extract readable sequences
    strings = []
    ascii_pattern = re.compile(rb'[\x20-\x7e]{10,}')

    for match in ascii_pattern.finditer(data):
        s = match.group().decode('ascii')
        # Filter for interesting malware strings
        interesting = [
            'http', 'https', 'tcp', 'udp', 'dns',
            'cmd', 'shell', 'exec', 'upload', 'download',
            'encrypt', 'decrypt', 'key', 'token', 'password',
            'c2', 'beacon', 'agent', 'implant', 'bot',
            'mutex', 'persist', 'registry', 'scheduled',
        ]
        if any(kw in s.lower() for kw in interesting):
            strings.append(s)

    return strings


def extract_dependencies(data):
    """Extract Go module dependencies from binary."""
    deps = []
    # Module paths follow pattern: github.com/user/repo
    dep_pattern = re.compile(
        rb'((?:github\.com|gitlab\.com|golang\.org|gopkg\.in|'
        rb'go\.etcd\.io|google\.golang\.org)/[^\x00\s]{5,80})'
    )

    for match in dep_pattern.finditer(data):
        dep = match.group().decode('utf-8', errors='replace')
        deps.append(dep)

    unique_deps = sorted(set(deps))
    return unique_deps


def analyze_go_binary(filepath):
    """Full analysis of Go malware binary."""
    with open(filepath, 'rb') as f:
        data = f.read()

    print(f"[+] Analyzing Go binary: {filepath}")
    print(f"  File size: {len(data):,} bytes")
    print("=" * 60)

    # Build info
    find_go_build_info(data)

    # pclntab
    pclntab_offset, go_version = find_pclntab(data)

    # Functions
    functions = extract_function_names(data, pclntab_offset)
    print(f"\n[+] Recovered {len(functions)} function names")

    # Categorize functions
    categories = {
        "network": [], "crypto": [], "os_exec": [],
        "file_io": [], "main": [], "third_party": [],
    }
    for f in functions:
        if 'net/' in f or 'http' in f.lower():
            categories["network"].append(f)
        elif 'crypto' in f:
            categories["crypto"].append(f)
        elif 'os/exec' in f or 'syscall' in f:
            categories["os_exec"].append(f)
        elif 'os.' in f or 'io/' in f:
            categories["file_io"].append(f)
        elif f.startswith('main.'):
            categories["main"].append(f)
        elif 'github.com' in f or 'golang.org' in f:
            categories["third_party"].append(f)

    for cat, funcs in categories.items():
        if funcs:
            print(f"\n  [{cat}] ({len(funcs)} functions):")
            for fn in funcs[:10]:
                print(f"    {fn}")

    # Dependencies
    deps = extract_dependencies(data)
    print(f"\n[+] Dependencies ({len(deps)}):")
    for dep in deps[:20]:
        print(f"    {dep}")

    # Suspicious strings
    sus_strings = extract_go_strings(data)
    print(f"\n[+] Suspicious strings ({len(sus_strings)}):")
    for s in sus_strings[:20]:
        print(f"    {s}")


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <go_binary>")
        sys.exit(1)
    analyze_go_binary(sys.argv[1])
python
#!/usr/bin/env python3
"""Analyze Go binary metadata for malware analysis."""
import struct
import sys
import re


def find_go_build_info(data):
    """Extract Go build information from binary."""
    # Go buildinfo magic: \xff Go buildinf:
    magic = b'\xff Go buildinf:'
    offset = data.find(magic)
    if offset == -1:
        return None

    print(f"[+] Go build info at offset 0x{offset:x}")

    # Extract Go version string nearby
    go_version = re.search(rb'go\d+\.\d+(?:\.\d+)?', data[offset:offset+256])
    if go_version:
        print(f"  Go Version: {go_version.group().decode()}")

    return offset


def find_pclntab(data):
    """Locate the pclntab (PC Line Table) structure."""
    # pclntab magic bytes vary by Go version
    magics = {
        b'\xfb\xff\xff\xff\x00\x00': "Go 1.2-1.15",
        b'\xfa\xff\xff\xff\x00\x00': "Go 1.16-1.17",
        b'\xf1\xff\xff\xff\x00\x00': "Go 1.18-1.19",
        b'\xf0\xff\xff\xff\x00\x00': "Go 1.20+",
    }

    for magic, version in magics.items():
        offset = data.find(magic)
        if offset != -1:
            print(f"[+] pclntab found at 0x{offset:x} ({version})")
            return offset, version

    return None, None


def extract_function_names(data, pclntab_offset):
    """Extract function names from pclntab."""
    if pclntab_offset is None:
        return []

    functions = []
    # Function name strings follow specific patterns
    func_pattern = re.compile(
        rb'(?:main|runtime|fmt|net|os|crypto|encoding|io|sync|'
        rb'syscall|reflect|strings|bytes|path|time|math|sort|'
        rb'github\.com|golang\.org)[/\.][\w/.]+',
    )

    for match in func_pattern.finditer(data):
        name = match.group().decode('utf-8', errors='replace')
        if len(name) > 4 and len(name) < 200:
            functions.append(name)

    return sorted(set(functions))


def extract_go_strings(data):
    """Extract Go-style strings (pointer+length pairs)."""
    # Go strings are not null-terminated; extract readable sequences
    strings = []
    ascii_pattern = re.compile(rb'[\x20-\x7e]{10,}')

    for match in ascii_pattern.finditer(data):
        s = match.group().decode('ascii')
        # Filter for interesting malware strings
        interesting = [
            'http', 'https', 'tcp', 'udp', 'dns',
            'cmd', 'shell', 'exec', 'upload', 'download',
            'encrypt', 'decrypt', 'key', 'token', 'password',
            'c2', 'beacon', 'agent', 'implant', 'bot',
            'mutex', 'persist', 'registry', 'scheduled',
        ]
        if any(kw in s.lower() for kw in interesting):
            strings.append(s)

    return strings


def extract_dependencies(data):
    """Extract Go module dependencies from binary."""
    deps = []
    # Module paths follow pattern: github.com/user/repo
    dep_pattern = re.compile(
        rb'((?:github\.com|gitlab\.com|golang\.org|gopkg\.in|'
        rb'go\.etcd\.io|google\.golang\.org)/[^\x00\s]{5,80})'
    )

    for match in dep_pattern.finditer(data):
        dep = match.group().decode('utf-8', errors='replace')
        deps.append(dep)

    unique_deps = sorted(set(deps))
    return unique_deps


def analyze_go_binary(filepath):
    """Full analysis of Go malware binary."""
    with open(filepath, 'rb') as f:
        data = f.read()

    print(f"[+] Analyzing Go binary: {filepath}")
    print(f"  File size: {len(data):,} bytes")
    print("=" * 60)

    # Build info
    find_go_build_info(data)

    # pclntab
    pclntab_offset, go_version = find_pclntab(data)

    # Functions
    functions = extract_function_names(data, pclntab_offset)
    print(f"\n[+] Recovered {len(functions)} function names")

    # Categorize functions
    categories = {
        "network": [], "crypto": [], "os_exec": [],
        "file_io": [], "main": [], "third_party": [],
    }
    for f in functions:
        if 'net/' in f or 'http' in f.lower():
            categories["network"].append(f)
        elif 'crypto' in f:
            categories["crypto"].append(f)
        elif 'os/exec' in f or 'syscall' in f:
            categories["os_exec"].append(f)
        elif 'os.' in f or 'io/' in f:
            categories["file_io"].append(f)
        elif f.startswith('main.'):
            categories["main"].append(f)
        elif 'github.com' in f or 'golang.org' in f:
            categories["third_party"].append(f)

    for cat, funcs in categories.items():
        if funcs:
            print(f"\n  [{cat}] ({len(funcs)} functions):")
            for fn in funcs[:10]:
                print(f"    {fn}")

    # Dependencies
    deps = extract_dependencies(data)
    print(f"\n[+] Dependencies ({len(deps)}):")
    for dep in deps[:20]:
        print(f"    {dep}")

    # Suspicious strings
    sus_strings = extract_go_strings(data)
    print(f"\n[+] Suspicious strings ({len(sus_strings)}):")
    for s in sus_strings[:20]:
        print(f"    {s}")


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <go_binary>")
        sys.exit(1)
    analyze_go_binary(sys.argv[1])

Step 2: Ghidra Analysis Script

步骤2:Ghidra分析脚本

python
undefined
python
undefined

Ghidra script (run within Ghidra's script manager)

Ghidra script (run within Ghidra's script manager)

Save as AnalyzeGoBinary.py in Ghidra scripts directory

Save as AnalyzeGoBinary.py in Ghidra scripts directory

@category MalwareAnalysis

@category MalwareAnalysis

@description Analyze Go binary structure and recover metadata

@description Analyze Go binary structure and recover metadata

def analyze_go_binary_ghidra(): """Ghidra script for Go binary analysis.""" from ghidra.program.model.mem import MemoryAccessException
program = getCurrentProgram()
memory = program.getMemory()
listing = program.getListing()

print("[+] Go Binary Analysis Script")
print(f"  Program: {program.getName()}")

# Find pclntab
pclntab_magics = [
    bytes([0xf0, 0xff, 0xff, 0xff]),  # Go 1.20+
    bytes([0xf1, 0xff, 0xff, 0xff]),  # Go 1.18-1.19
    bytes([0xfa, 0xff, 0xff, 0xff]),  # Go 1.16-1.17
    bytes([0xfb, 0xff, 0xff, 0xff]),  # Go 1.2-1.15
]

for magic in pclntab_magics:
    addr = memory.findBytes(
        program.getMinAddress(), magic, None, True, None
    )
    if addr:
        print(f"[+] pclntab found at {addr}")
        # Create label
        program.getSymbolTable().createLabel(
            addr, "go_pclntab", None,
            ghidra.program.model.symbol.SourceType.ANALYSIS
        )
        break

# Fix Go string definitions
# Go strings are ptr+len, not null terminated
print("[+] Fixing Go string references...")

# Search for function names containing package paths
symbol_table = program.getSymbolTable()
func_count = 0
for symbol in symbol_table.getAllSymbols(True):
    name = symbol.getName()
    if ('.' in name and
        any(pkg in name for pkg in
            ['main.', 'runtime.', 'net.', 'crypto.', 'os.'])):
        func_count += 1

print(f"[+] Found {func_count} Go function symbols")
def analyze_go_binary_ghidra(): """Ghidra script for Go binary analysis.""" from ghidra.program.model.mem import MemoryAccessException
program = getCurrentProgram()
memory = program.getMemory()
listing = program.getListing()

print("[+] Go Binary Analysis Script")
print(f"  Program: {program.getName()}")

# Find pclntab
pclntab_magics = [
    bytes([0xf0, 0xff, 0xff, 0xff]),  # Go 1.20+
    bytes([0xf1, 0xff, 0xff, 0xff]),  # Go 1.18-1.19
    bytes([0xfa, 0xff, 0xff, 0xff]),  # Go 1.16-1.17
    bytes([0xfb, 0xff, 0xff, 0xff]),  # Go 1.2-1.15
]

for magic in pclntab_magics:
    addr = memory.findBytes(
        program.getMinAddress(), magic, None, True, None
    )
    if addr:
        print(f"[+] pclntab found at {addr}")
        # Create label
        program.getSymbolTable().createLabel(
            addr, "go_pclntab", None,
            ghidra.program.model.symbol.SourceType.ANALYSIS
        )
        break

# Fix Go string definitions
# Go strings are ptr+len, not null terminated
print("[+] Fixing Go string references...")

# Search for function names containing package paths
symbol_table = program.getSymbolTable()
func_count = 0
for symbol in symbol_table.getAllSymbols(True):
    name = symbol.getName()
    if ('.' in name and
        any(pkg in name for pkg in
            ['main.', 'runtime.', 'net.', 'crypto.', 'os.'])):
        func_count += 1

print(f"[+] Found {func_count} Go function symbols")

Execute

Execute

analyze_go_binary_ghidra()
undefined
analyze_go_binary_ghidra()
undefined

Validation Criteria

验证标准

  • Go version and build information extracted from binary
  • pclntab located and parsed for function name recovery
  • Third-party dependencies identified revealing malware capabilities
  • Main package functions enumerated for targeted analysis
  • Network, crypto, and OS exec functions categorized
  • Ghidra analysis correctly labels Go runtime structures
  • 从二进制文件中提取出Go版本和构建信息
  • 定位并解析pclntab以恢复函数名
  • 识别第三方依赖以揭示恶意软件功能
  • 枚举主包函数以进行针对性分析
  • 对网络、加密和OS执行函数进行分类
  • Ghidra分析正确标记Go运行时结构

References

参考资料