dotnet-trace-collect
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese.NET Trace Collect
.NET 跟踪数据收集
This skill helps developers diagnose production performance issues by recommending the right diagnostic tools for their environment, guiding data collection, and suggesting analysis approaches. It does not analyze code for anti-patterns or perform the analysis itself.
本技能通过为开发者推荐适配其环境的诊断工具、指导数据收集并提供分析方法,帮助他们排查生产环境中的性能问题。本技能不会分析代码中的反模式,也不直接执行数据分析操作。
When to Use
适用场景
- A developer needs to investigate a production performance issue (high CPU, memory leak, slow requests, excessive GC, networking errors, etc.)
- Choosing the right diagnostic tool for a specific runtime, OS, or deployment topology
- Setting up and running diagnostic tool commands for data collection
- Understanding trade-offs between available tools (e.g. PerfView vs dotnet-trace)
- Collecting diagnostics from containerized or Kubernetes workloads
- 开发者需要排查生产环境中的性能问题(CPU占用过高、内存泄漏、请求缓慢、GC过于频繁、网络错误等)
- 为特定运行时、操作系统或部署架构选择合适的诊断工具
- 配置并运行诊断工具命令以收集数据
- 了解不同可用工具的取舍(例如 PerfView 与 dotnet-trace)
- 从容器化或 Kubernetes 工作负载中收集诊断数据
When Not to Use
不适用场景
- Reviewing source code for performance anti-patterns (use a code review skill instead)
- Benchmarking during development (e.g. BenchmarkDotNet setup)
- Analyzing collected trace or dump files (this skill recommends tools for analysis, but does not perform it)
- 审查源代码中的性能反模式(请使用代码审查类技能)
- 开发阶段的基准测试(例如 BenchmarkDotNet 配置)
- 分析已收集的跟踪或转储文件(本技能仅推荐分析工具,不执行分析操作)
Inputs
输入项
| Input | Required | Description |
|---|---|---|
| Symptom | Yes | What the developer is observing (high CPU, memory growth, slow requests, hangs, excessive GC, HTTP 5xx errors, networking timeouts, connection failures, assembly loading failures, etc.) |
| Runtime | Yes | .NET Framework or modern .NET (and version, especially whether .NET 10+) |
| OS | Yes | Windows or Linux |
| Deployment | Yes | Non-container, container, or Kubernetes |
| Admin privileges | Recommended | Whether the developer has admin/root access on the target machine |
| Repro characteristics | Recommended | Whether the issue is easy to reproduce or requires a long time to manifest |
| 输入项 | 是否必填 | 描述 |
|---|---|---|
| 症状 | 是 | 开发者观察到的问题(CPU过高、内存增长、请求缓慢、程序挂起、GC过于频繁、HTTP 5xx错误、网络超时、连接失败、程序集加载失败等) |
| 运行时 | 是 | .NET Framework 或现代.NET(需提供版本,尤其是是否为.NET 10+) |
| 操作系统 | 是 | Windows 或 Linux |
| 部署方式 | 是 | 非容器、容器或 Kubernetes |
| 管理员权限 | 推荐 | 开发者是否在目标机器上拥有管理员/root权限 |
| 复现特征 | 推荐 | 问题是否容易复现,或是需要很长时间才会显现 |
Workflow
工作流程
Step 1: Understand the environment
步骤1:了解环境
Determine or ask the developer to clarify:
- Symptom: What they are observing (high CPU, memory leak, slow requests, hangs, excessive GC, HTTP 5xx errors, networking timeouts, connection failures, assembly loading failures, etc.)
- Runtime: .NET Framework or modern .NET? If modern .NET, which version? (Especially whether .NET 10 or later.)
- OS: Windows or Linux?
- Deployment: Running directly on the host, in a container, or in Kubernetes?
- Admin privileges: Do they have admin/root access on the target machine or container?
- Repro characteristics: Does the issue reproduce quickly, or does it take a long time to manifest?
- Workload context: Determine or ask the user if you are running in the context of the workload (i.e., on the same machine or connected to the same environment where the issue is occurring). If so, you can run diagnostic commands directly on their behalf. If not, provide the commands as guidance for the user to run themselves.
Use this information to select the right tool in Step 2.
确定或请开发者明确以下信息:
- 症状:观察到的具体问题(CPU过高、内存泄漏、请求缓慢、程序挂起、GC过于频繁、HTTP 5xx错误、网络超时、连接失败、程序集加载失败等)
- 运行时:.NET Framework 还是现代.NET?如果是现代.NET,具体版本是多少?(尤其是是否为.NET 10或更高版本)
- 操作系统:Windows 还是 Linux?
- 部署方式:直接在主机运行、在容器中运行,还是在 Kubernetes 中运行?
- 管理员权限:开发者在目标机器或容器上是否拥有管理员/root权限?
- 复现特征:问题是否能快速复现,还是需要很长时间才会显现?
- 工作负载上下文:确定或询问用户是否处于工作负载上下文(即与问题发生环境在同一机器或已连接到该环境)。如果是,可以直接代表用户运行诊断命令;如果不是,则提供命令供用户自行运行。
利用这些信息在步骤2中选择合适的工具。
Step 2: Recommend diagnostic tools
步骤2:推荐诊断工具
Select tools based on the environment using the priority rules below. Once a tool is selected, load the corresponding reference file for detailed command-line usage.
根据以下优先级规则,结合环境选择工具。选定工具后,加载对应的参考文件获取详细的命令行使用说明。
Tool reference lookup
工具参考文件查询
| Environment | Reference file(s) |
|---|---|
| Windows + modern .NET + admin | |
| Windows + modern .NET, no admin | |
| Windows + .NET Framework | |
| Linux + .NET 10+ + root | |
| Linux + pre-.NET 10 | |
| Linux + native stacks needed | |
| Container/K8s (console access) | |
| Container/K8s (no console) | |
| 环境 | 参考文件 |
|---|---|
| Windows + 现代.NET + 管理员权限 | |
| Windows + 现代.NET,无管理员权限 | |
| Windows + .NET Framework | |
| Linux + .NET 10+ + root权限 | |
| Linux + .NET 10之前版本 | |
| Linux + 需要捕获原生调用栈 | |
| 容器/K8s(可访问控制台) | |
| 容器/K8s(无法访问控制台) | |
Quick decision matrix (first-pass triage)
快速决策矩阵(初步分类)
| Environment | Preferred tool | Fallback / Notes |
|---|---|---|
| Windows + modern .NET + admin | PerfView | If admin is unavailable, use |
| Windows + .NET Framework + admin | PerfView | Without admin, there is no trace fallback; for hangs/memory leaks, provide dump commands directly ( |
| Linux + .NET 10+ + root | | Use |
| Linux + pre-.NET 10 | | Add |
| Linux container/Kubernetes | Console tools if in workload context; | See Linux Container / Kubernetes section for details |
| 环境 | 首选工具 | 备选方案/说明 |
|---|---|---|
| Windows + 现代.NET + 管理员权限 | PerfView | 如果无管理员权限,使用 |
| Windows + .NET Framework + 管理员权限 | PerfView | 无管理员权限时,没有跟踪工具备选方案;针对挂起/内存泄漏问题,直接提供转储命令( |
| Linux + .NET 10+ + root权限 | | 如果不满足root或内核前置条件,使用 |
| Linux + .NET 10之前版本 | | 当需要原生调用栈时,添加 |
| Linux 容器/Kubernetes | 处于工作负载上下文时使用控制台工具;无法访问控制台时使用 | 详见Linux容器/Kubernetes部分的说明 |
Windows (non-container, modern .NET)
Windows(非容器,现代.NET)
- PerfView (preferred) — produces richer ETW-based data; requires admin privileges. For slow requests, add to capture thread-level wait and block detail.
/ThreadTime - — fallback when admin privileges are not available.
dotnet-trace - For long-running repros: use PerfView with a trigger that fires on the symptom you want to capture (e.g.,
/StopOn,/StopOnPerfCounter,/StopOnGCEvent) and a circular buffer (/StopOnException+/CircularMB). Critical: the stop trigger must fire on the interesting event, not the recovery. The circular buffer continuously overwrites old data, so if you trigger on recovery, the buffer may have already overwritten the interesting behavior by the time collection stops. Only add/BufferSizeMBif the start event is known to precede the stop event. For slow requests, do not include a stop trigger by default — let the user design one based on their specific scenario./StartOn
- PerfView(首选)—— 生成更丰富的基于ETW的数据;需要管理员权限。针对请求缓慢问题,添加 参数以捕获线程级别的等待和阻塞细节。
/ThreadTime - —— 无管理员权限时的备选方案。
dotnet-trace - 针对长时间复现的问题:使用带有 触发条件的PerfView,触发条件基于要捕获的症状(例如
/StopOn、/StopOnPerfCounter、/StopOnGCEvent),并配合循环缓冲区(/StopOnException+/CircularMB)。关键注意事项:停止触发条件必须针对目标事件,而非恢复事件。 循环缓冲区会持续覆盖旧数据,因此如果在恢复时触发,收集停止时缓冲区可能已经覆盖了关键行为。仅当明确启动事件先于停止事件时,才添加/BufferSizeMB参数。针对请求缓慢问题,默认不设置停止触发条件——由用户根据具体场景自行设计。/StartOn
Windows containers
Windows容器
-
PerfView — most Windows containers (including Kubernetes on Windows) use process-isolation by default. Collect from the host with. After collection, you have two options:
/EnableEventsInContainers- Analyze locally while the container is still running — PerfView can reach into the live container to resolve symbols, so you can open the trace immediately on the host machine.
- Analyze off-machine — before the container shuts down, copy the into the container and run
.etl.zipinside it to embed symbol information. Then copy the merged trace out. Without this merge step, symbols for binaries inside the container will be unresolvable on other machines.PerfViewCollect merge /ImageIDsOnly
For the less common Hyper-V containers, collect inside the container directly. See references/perfview.md for detailed commands. -
,
dotnet-monitor— inside the container if the tools are installed in the image. For dumps, invoke thedotnet-traceskill.dump-collect
-
PerfView—— 大多数Windows容器(包括Windows上的Kubernetes)默认使用进程隔离模式。从主机收集数据时添加参数。收集完成后,有两种选择:
/EnableEventsInContainers- 容器运行时本地分析—— PerfView可以直接访问运行中的容器解析符号,因此可以立即在主机上打开跟踪文件进行分析。
- 离线分析—— 在容器关闭前,将 文件复制到容器内,并在容器中运行
.etl.zip以嵌入符号信息。然后将合并后的跟踪文件复制出来。如果不执行此合并步骤,其他机器将无法解析容器内二进制文件的符号。PerfViewCollect merge /ImageIDsOnly
对于较罕见的Hyper-V容器,直接在容器内收集数据。详见 references/perfview.md 中的详细命令。 -
、
dotnet-monitor—— 如果工具已安装在镜像中,可在容器内使用。如需转储文件,调用dotnet-trace技能。dump-collect
Windows (.NET Framework)
Windows(.NET Framework)
- PerfView — the primary diagnostic tool for .NET Framework on Windows. Requires admin.
- Same trigger guidance for long repros: use triggers that fire on the symptom (e.g.,
/StopOn,/StopOnPerfCounter,/StopOnGCEvent) with/StopOnException+/CircularMB./BufferSizeMB - Without admin: PerfView requires admin, and there are no alternative trace tools for .NET Framework. Process dumps can still be captured without admin — provide dump commands directly (e.g., or Task Manager) since the
procdump -ma <PID>skill does not support .NET Framework. Dumps can help diagnose hangs and memory leaks. However, for high CPU, slow requests, and excessive GC, there is no way to investigate on .NET Framework without admin access. Advise the user to obtain admin privileges.dump-collect
- PerfView—— Windows上.NET Framework的主要诊断工具。需要管理员权限。
- 针对长时间复现问题的触发条件指导:使用基于症状的 触发条件(例如
/StopOn、/StopOnPerfCounter、/StopOnGCEvent),配合/StopOnException+/CircularMB。/BufferSizeMB - 无管理员权限:PerfView需要管理员权限,且.NET Framework没有其他跟踪工具备选方案。仍可在无管理员权限下捕获进程转储——直接提供转储命令(例如 或任务管理器),因为
procdump -ma <PID>技能不支持.NET Framework。转储文件可帮助排查挂起和内存泄漏问题。但针对CPU过高、请求缓慢和GC过于频繁问题,无管理员权限时无法在.NET Framework环境下排查。建议用户获取管理员权限。dump-collect
Linux (non-container, .NET 10+)
Linux(非容器,.NET 10+)
- (preferred) — uses
dotnet-trace collect-linuxfor richer traces including native call stacks and kernel events. Captures machine-wide by default (no PID required). Requires root and kernel >= 6.4.perf_events - — fallback when root privileges are not available or kernel requirements are not met. Managed stacks only.
dotnet-trace
- (首选)—— 使用
dotnet-trace collect-linux生成更丰富的跟踪数据,包括原生调用栈和内核事件。默认捕获整个机器的数据(无需指定PID)。需要root权限和内核版本 >=6.4。perf_events - —— 无root权限或不满足内核要求时的备选方案。仅捕获托管调用栈。
dotnet-trace
Linux (non-container, pre-.NET 10)
Linux(非容器,.NET 10之前版本)
- (preferred) — managed trace collection; no admin required.
dotnet-trace - — when native call stacks are needed (requires admin/root).
perfcollect
- (首选)—— 托管跟踪数据收集;无需管理员权限。
dotnet-trace - —— 当需要原生调用栈时使用(需要管理员/root权限)。
perfcollect
Linux Container / Kubernetes
Linux容器/Kubernetes
If running in the context of the workload (i.e., you have console access to the container), prefer console-based tools. These are easier to set up than , which requires authentication configuration and sidecar deployment:
dotnet-monitor- (.NET 10+ with root) — produces the richest traces including native call stacks and kernel events.
dotnet-trace collect-linux - — inside the container if the tool is installed in the image. For dumps, invoke the
dotnet-traceskill.dump-collect - — inside the container when native stacks are needed on pre-.NET 10 (requires
perfcollect/SYS_ADMIN).--privileged
If not running in the workload context (no console access), or if is already deployed:
dotnet-monitor- — designed for containers; runs as a sidecar. No tools needed in the app container. Easiest option when console access is not available.
dotnet-monitor
如果处于工作负载上下文(即可以访问容器控制台),优先使用控制台工具。这些工具比 更易设置,后者需要配置认证和边车部署:
dotnet-monitor- (.NET 10+ 且有root权限)—— 生成最丰富的跟踪数据,包括原生调用栈和内核事件。
dotnet-trace collect-linux - —— 如果工具已安装在镜像中,可在容器内使用。如需转储文件,调用
dotnet-trace技能。dump-collect - —— 在.NET 10之前版本的容器中需要原生调用栈时使用(需要
perfcollect/SYS_ADMIN权限)。--privileged
如果不处于工作负载上下文(无法访问控制台),或已部署 :
dotnet-monitor- —— 为容器环境设计;以边车形式运行。无需在应用容器中安装工具。无法访问控制台时的最佳选择。
dotnet-monitor
Memory dumps
内存转储文件
When dumps are needed (memory leaks, hangs), do not provide dump collection commands directly for modern .NET — invoke the skill instead. The skill only supports modern .NET (.NET Core 3.0+). For .NET Framework, provide dump collection guidance directly (e.g., or Task Manager). This skill focuses on trace collection only.
dump-collectdump-collectprocdump -ma <PID>当需要转储文件(内存泄漏、程序挂起)时,不要直接提供现代.NET的转储收集命令—— 请调用 技能。 技能仅支持现代.NET(.NET Core 3.0+)。针对**.NET Framework**,直接提供转储收集指导(例如 或任务管理器)。本技能仅专注于跟踪数据收集。
dump-collectdump-collectprocdump -ma <PID>Memory leaks
内存泄漏
- Capture two dumps as memory is increasing (e.g., one early, one after significant growth). Invoke the skill for dump collection — do not provide dump commands directly. Diff the dumps in PerfView to see which objects have increased — this is the most effective way to identify what is leaking.
dump-collect - Without admin privileges: Two process dumps can give a sense of what's growing on the heap, but may not be enough to identify the root cause. If dumps aren't sufficient, reproduce the issue in an environment where admin privileges are available to collect richer data (traces).
- Modern .NET on Linux (pre-.NET 10): Recommend two dump captures (invoke skill) for heap diff, plus
dump-collectwhile memory is growing (for allocation tracking). No trigger needed — capture during the growth period. Both together give the best picture.dotnet-trace - Modern .NET 10+ on Linux with admin: Recommend two dump captures (invoke skill) for heap diff, plus
dump-collectwhile memory is growing (richer data including native stacks). No trigger needed.dotnet-trace collect-linux - .NET Framework: Recommend two dumps plus a PerfView trace while memory is growing to see what is being allocated. The skill does not support .NET Framework, so provide dump commands directly (e.g.,
dump-collector right-click → Create Dump File in Task Manager). No trigger is needed — just capture the trace during the growth period. Do not wait for anprocdump -ma <PID>.OutOfMemoryException
- 捕获两个转储文件,分别在内存增长初期和显著增长后。调用 技能收集转储文件——不要直接提供命令。在PerfView中对比两个转储文件,查看哪些对象数量增加——这是识别泄漏源的最有效方法。
dump-collect - 无管理员权限:两个进程转储文件可以大致了解堆中增长的对象,但可能不足以确定根本原因。如果转储文件不够,建议在有管理员权限的环境中复现问题,以收集更丰富的数据(跟踪文件)。
- Linux上的现代.NET(.NET 10之前版本):建议捕获两个转储文件(调用 技能)进行堆对比,同时在内存增长期间使用
dump-collect(用于分配跟踪)。无需触发条件——在增长期间捕获数据。两者结合可提供最完整的信息。dotnet-trace - Linux上的现代.NET 10+(有管理员权限):建议捕获两个转储文件(调用 技能)进行堆对比,同时在内存增长期间使用
dump-collect(更丰富的数据,包括原生调用栈)。无需触发条件。dotnet-trace collect-linux - .NET Framework:建议捕获两个转储文件,同时在内存增长期间使用PerfView跟踪以查看分配的对象。技能不支持.NET Framework,因此直接提供转储命令(例如
dump-collect或右键任务管理器中的进程→创建转储文件)。无需触发条件——只需在增长期间捕获跟踪文件。不要等到发生procdump -ma <PID>。OutOfMemoryException
Excessive GC
GC过于频繁
Excessive GC requires a trace to analyze GC events, pause times, and allocation patterns — a dump is not sufficient.
- Windows (PerfView): Use to capture GC events.
PerfView collect /GCCollectOnly - Linux (dotnet-trace): Use .
dotnet-trace collect -p <PID> --profile gc-verbose - Linux .NET 10+ with root: Use for richer data with native stacks.
dotnet-trace collect-linux --profile gc-verbose - Containers: can capture GC traces via its REST API (
dotnet-monitor)./trace?profile=gc-verbose
GC过于频繁需要跟踪文件来分析GC事件、暂停时间和分配模式——转储文件不足以排查此类问题。
- Windows(PerfView):使用 捕获GC事件。
PerfView collect /GCCollectOnly - Linux(dotnet-trace):使用 。
dotnet-trace collect -p <PID> --profile gc-verbose - Linux上的.NET 10+(有管理员权限):使用 获取更丰富的数据,包括原生调用栈。
dotnet-trace collect-linux --profile gc-verbose - 容器:可通过REST API(
dotnet-monitor)捕获GC跟踪文件。/trace?profile=gc-verbose
Slow Requests
请求缓慢
Slow requests require a thread time trace to see where threads are spending time — waiting on locks, I/O, external calls, etc. Use larger buffers since thread time traces generate more data. For ASP.NET Core applications, also enable and providers to get server-side request lifecycle timing (when requests arrive, how long they take to process).
Microsoft.AspNetCore.HostingMicrosoft-AspNetCore-Server-Kestrel- Windows (PerfView): Use . The
PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048argument adds thread-level wait and block detail. For ASP.NET Core, add Kestrel providers:/ThreadTime. Do not include a stop trigger by default — let the user design one based on their specific scenario.PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048 /Providers:*Microsoft.AspNetCore.Hosting,*Microsoft-AspNetCore-Server-Kestrel - Linux (dotnet-trace): captures thread time data by default — no special arguments needed. Use
dotnet-trace. For ASP.NET Core, add Kestrel providers:dotnet-trace collect -p <PID>.dotnet-trace collect -p <PID> --providers Microsoft.AspNetCore.Hosting,Microsoft-AspNetCore-Server-Kestrel - Linux .NET 10+ with root: Use for richer data with native stacks. For ASP.NET Core, add:
dotnet-trace collect-linux --profile thread-time.--providers Microsoft.AspNetCore.Hosting,Microsoft-AspNetCore-Server-Kestrel - Containers: can capture traces via its REST API (
dotnet-monitor)./trace?pid=<PID>&durationSeconds=30
请求缓慢需要线程时间跟踪文件来查看线程的时间消耗位置——等待锁、I/O、外部调用等。使用更大的缓冲区,因为线程时间跟踪会生成更多数据。对于ASP.NET Core应用,还需启用 和 提供程序以获取服务器端请求生命周期的时间数据(请求到达时间、处理时长等)。
Microsoft.AspNetCore.HostingMicrosoft-AspNetCore-Server-Kestrel- Windows(PerfView):使用 。
PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048参数添加线程级别的等待和阻塞细节。对于ASP.NET Core,添加Kestrel提供程序:/ThreadTime。默认不设置停止触发条件——由用户根据具体场景自行设计。PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048 /Providers:*Microsoft.AspNetCore.Hosting,*Microsoft-AspNetCore-Server-Kestrel - Linux(dotnet-trace):默认捕获线程时间数据——无需特殊参数。使用
dotnet-trace。对于ASP.NET Core,添加Kestrel提供程序:dotnet-trace collect -p <PID>。dotnet-trace collect -p <PID> --providers Microsoft.AspNetCore.Hosting,Microsoft-AspNetCore-Server-Kestrel - Linux上的.NET 10+(有管理员权限):使用 获取更丰富的数据,包括原生调用栈。对于ASP.NET Core,添加:
dotnet-trace collect-linux --profile thread-time。--providers Microsoft.AspNetCore.Hosting,Microsoft-AspNetCore-Server-Kestrel - 容器:可通过REST API(
dotnet-monitor)捕获跟踪文件。/trace?pid=<PID>&durationSeconds=30
Hangs
程序挂起
- Start with a trace to understand what threads are doing. Use the appropriate trace tool for the environment (PerfView with on Windows,
/ThreadTimeon Linux,dotnet-traceon .NET 10+ Linux with root). The trace can reveal:dotnet-trace collect-linux --profile thread-time- Livelocks (threads spinning without forward progress) — threads appear busy but the application makes no progress.
- Thread starvation — the ThreadPool is exhausted and queued work items are not being processed. This can look like a deadlock but has a different root cause.
- Whether there is any forward progress at all — if some threads are making progress, the issue may be a bottleneck rather than a true hang.
- If the trace does not explain the hang, the issue may be a true deadlock (threads waiting on each other in a cycle). In this case, invoke the skill to collect a process dump — do not provide dump commands directly.
dump-collect - Analyze the dump with a debugger to inspect thread stacks and identify the lock cycle:
- Windows: Visual Studio or WinDbg with the SOS debugger extension.
- Linux: with the SOS debugger extension.
lldb
- 先收集跟踪文件以了解线程的行为。使用适配环境的跟踪工具(Windows上使用带 的PerfView,Linux上使用
/ThreadTime,.NET 10+ Linux且有管理员权限时使用dotnet-trace)。跟踪文件可揭示:dotnet-trace collect-linux --profile thread-time- 活锁(线程空转但无进展)——线程看似忙碌,但应用无进展。
- 线程饥饿——线程池耗尽,排队的工作项无法处理。这看起来像死锁,但根本原因不同。
- 是否有进展——如果部分线程有进展,问题可能是瓶颈而非真正的死锁。
- 如果跟踪文件无法解释挂起问题,可能是真正的死锁(线程循环等待彼此)。此时调用 技能收集进程转储文件——不要直接提供命令。
dump-collect - 使用调试器分析转储文件以检查线程栈并识别锁循环:
- Windows:Visual Studio 或带有SOS调试扩展的WinDbg。
- Linux:带有SOS调试扩展的 。
lldb
Networking Issues
网络问题
Networking issues (HTTP 5xx errors from downstream services, request timeouts, connection failures, DNS resolution failures, TLS handshake failures, connection pool exhaustion) require both a thread-time trace and networking event providers. The thread-time trace shows where threads are blocked (slow downstream calls, thread starvation), while the networking events show the request lifecycle — which requests failed, what status codes came back, how long DNS resolution and TLS handshakes took, and how long requests waited for a connection from the pool.
For .NET Framework, already collects the relevant networking events (from the ETW provider) — no additional providers are needed.
PerfView /ThreadTimeSystem.NetFor modern .NET, you must explicitly enable the EventSource providers:
System.Net.*| Provider | What it covers |
|---|---|
| HttpClient/SocketsHttpHandler — request lifecycle, HTTP status codes, connection pool |
| DNS lookups (start/stop, duration) |
| TLS/SSL handshakes (SslStream) |
| Low-level socket connect/disconnect |
Key events from : (scheme, host, port, path), (statusCode — if no response was received), (exception message for timeouts, connection refused, etc.), (time waiting for a connection from the pool — indicates connection pool exhaustion), , .
System.Net.HttpRequestStartRequestStop-1RequestFailedRequestLeftQueueConnectionEstablishedConnectionClosedCollect a thread-time trace with networking providers enabled (modern .NET only — .NET Framework needs only ):
PerfView /ThreadTime- Windows (PerfView): Use . For .NET Framework, omit the
PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048 /Providers:*System.Net.Http,*System.Net.NameResolution,*System.Net.Security,*System.Net.Socketsflag —/Providersalready includes the networking events. The thread-time trace shows where threads are blocked while the networking events show what requests are failing and why./ThreadTime - Linux (dotnet-trace): captures thread time data by default, but specifying
dotnet-traceoverrides the defaults so you must also include--providers:--profile.dotnet-trace collect -p <PID> --profile dotnet-common,dotnet-sampled-thread-time --providers System.Net.Http,System.Net.NameResolution,System.Net.Security,System.Net.Sockets - Linux .NET 10+ with root: Use .
dotnet-trace collect-linux --profile dotnet-common,cpu-sampling,thread-time --providers System.Net.Http,System.Net.NameResolution,System.Net.Security,System.Net.Sockets - Containers: can capture traces with custom providers via its REST API.
dotnet-monitor
网络问题(下游服务返回HTTP 5xx错误、请求超时、连接失败、DNS解析失败、TLS握手失败、连接池耗尽)需要线程时间跟踪文件和网络事件提供程序。线程时间跟踪文件显示线程阻塞的位置(缓慢的下游调用、线程饥饿),而网络事件显示请求生命周期——哪些请求失败、状态码、DNS解析和TLS握手时长、请求等待连接池连接的时间等。
针对**.NET Framework**, 已收集相关网络事件(来自 ETW提供程序)——无需额外提供程序。
PerfView /ThreadTimeSystem.Net针对现代.NET,必须显式启用 EventSource提供程序:
System.Net.*| 提供程序 | 覆盖范围 |
|---|---|
| HttpClient/SocketsHttpHandler——请求生命周期、HTTP状态码、连接池 |
| DNS查询(开始/结束、时长) |
| TLS/SSL握手(SslStream) |
| 底层套接字连接/断开 |
System.Net.HttpRequestStartRequestStop-1RequestFailedRequestLeftQueueConnectionEstablishedConnectionClosed收集启用网络提供程序的线程时间跟踪文件(仅现代.NET需要——.NET Framework只需 ):
PerfView /ThreadTime- Windows(PerfView):使用 。针对.NET Framework,省略
PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048 /Providers:*System.Net.Http,*System.Net.NameResolution,*System.Net.Security,*System.Net.Sockets参数——/Providers已包含网络事件。线程时间跟踪文件显示线程阻塞位置,网络事件显示请求失败的原因。/ThreadTime - Linux(dotnet-trace):默认捕获线程时间数据,但指定
dotnet-trace会覆盖默认设置,因此必须同时包含--providers:--profile。dotnet-trace collect -p <PID> --profile dotnet-common,dotnet-sampled-thread-time --providers System.Net.Http,System.Net.NameResolution,System.Net.Security,System.Net.Sockets - Linux上的.NET 10+(有管理员权限):使用 。
dotnet-trace collect-linux --profile dotnet-common,cpu-sampling,thread-time --providers System.Net.Http,System.Net.NameResolution,System.Net.Security,System.Net.Sockets - 容器:可通过REST API捕获带有自定义提供程序的跟踪文件。
dotnet-monitor
Assembly Loading Issues
程序集加载问题
For modern .NET, assembly loading issues (, , , version conflicts, duplicate assembly loads across AssemblyLoadContexts) require collecting assembly loader binder events from the provider with the Loader keyword (). These events trace every step of the runtime's assembly resolution algorithm — which paths were probed, which AssemblyLoadContext handled the load, whether the load succeeded or failed, and why. For .NET Framework, the same provider and keyword work for ETW-based collection; additionally, the Fusion Log Viewer () can diagnose assembly binding failures without requiring a trace.
FileNotFoundExceptionFileLoadExceptionReflectionTypeLoadExceptionMicrosoft-Windows-DotNETRuntime0x4fuslogvw.exeThe provider specification is (provider name, AssemblyLoader keyword, Informational verbosity).
Microsoft-Windows-DotNETRuntime:0x4:4- Windows (PerfView): A default PerfView trace already includes binder events - simply run with no extra providers. For a smaller trace file, use
PerfView collect, which removes the most verbose default events while keeping the events necessary for diagnosing assembly loading issues.PerfView collect /ClrEvents:Default-Profile - Linux / cross-platform (dotnet-trace): Use to launch and trace the process, or
dotnet-trace collect --clrevents assemblyloader -- <path-to-built-exe>to attach to a running process.dotnet-trace collect --clrevents assemblyloader -p <PID> - Linux .NET 10+ with root: Use .
dotnet-trace collect-linux --clrevents assemblyloader - Containers: can capture traces with the loader provider via its REST API.
dotnet-monitor
For short-lived processes that fail on startup (common with assembly loading issues), prefer the launch form () over attaching by PID, since the process may exit before you can attach.
dotnet-trace-- <path-to-built-exe>Explain the trade-offs when recommending a tool. For example:
- PerfView gives richer data but needs admin; runs on Windows including Windows containers.
- works cross-platform without admin but captures less system-level detail.
dotnet-trace - captures native call stacks but needs admin/root.
perfcollect - is the best option for containers/K8s when console access is not available, but requires sidecar deployment and authentication configuration.
dotnet-monitor
针对现代.NET,程序集加载问题(、、、版本冲突、跨AssemblyLoadContext的重复程序集加载)需要收集 提供程序中带有Loader关键字()的程序集加载器绑定事件。这些事件跟踪运行时程序集解析算法的每一步——探测了哪些路径、哪个AssemblyLoadContext处理加载、加载成功或失败的原因等。针对.NET Framework,相同的提供程序和关键字适用于基于ETW的收集;此外,Fusion日志查看器()可在无需跟踪文件的情况下诊断程序集绑定失败。
FileNotFoundExceptionFileLoadExceptionReflectionTypeLoadExceptionMicrosoft-Windows-DotNETRuntime0x4fuslogvw.exe提供程序规范为 (提供程序名称、AssemblyLoader关键字、信息级详细程度)。
Microsoft-Windows-DotNETRuntime:0x4:4- Windows(PerfView):默认的PerfView跟踪已包含绑定事件——只需运行 无需额外提供程序。如需更小的跟踪文件,使用
PerfView collect,该命令会移除最冗长的默认事件,但保留诊断程序集加载问题所需的事件。PerfView collect /ClrEvents:Default-Profile - Linux/跨平台(dotnet-trace):使用 启动并跟踪进程,或使用
dotnet-trace collect --clrevents assemblyloader -- <path-to-built-exe>附加到运行中的进程。dotnet-trace collect --clrevents assemblyloader -p <PID> - Linux上的.NET 10+(有管理员权限):使用 。
dotnet-trace collect-linux --clrevents assemblyloader - 容器:可通过REST API捕获带有加载器提供程序的跟踪文件。
dotnet-monitor
针对启动时失败的短生命周期进程(程序集加载问题的常见场景),优先使用 的启动形式()而非通过PID附加,因为进程可能在附加前就已退出。
dotnet-trace-- <path-to-built-exe>推荐工具时请说明取舍。例如:
- PerfView提供更丰富的数据,但需要管理员权限;可在Windows(包括Windows容器)上运行。
- 跨平台且无需管理员权限,但捕获的系统级细节较少。
dotnet-trace - 可捕获原生调用栈,但需要管理员/root权限。
perfcollect - 是无法访问控制台时容器/K8s环境的最佳选择,但需要边车部署和认证配置。
dotnet-monitor
Step 3: Guide data collection
步骤3:指导数据收集
Provide the specific commands for the recommended tool. Load the appropriate reference file from the tool reference lookup table for detailed command-line examples.
Key guidance to include:
- Installation: How to install the tool if it is not already available (e.g. ). When recommending multiple tools, provide installation and usage instructions for each one — do not mention a tool without showing how to install and use it.
dotnet tool install -g dotnet-trace - PID discovery (required before any command): Verify the target process first (for example:
-p <PID>,dotnet-trace ps, orcurl <monitor-endpoint>/processesinside a container). If the app is expected to be PID 1 in a container, still verify before collecting.ps - Collection command: The exact command to run, including relevant providers, output format, and duration.
- Container considerations:
- Collecting from inside the container: ensure the tool is installed in the image or use to copy it in.
kubectl cp - Collecting from outside the container: use as a sidecar with a shared diagnostic port (Unix domain socket in
dotnet-monitor)./tmp - Kubernetes: as a sidecar container, or
dotnet-monitorfor ephemeral debug containers.kubectl debug
- Collecting from inside the container: ensure the tool is installed in the image or use
- Long-running repros (Windows/PerfView): show how to use trigger arguments and circular buffer settings.
- Output location: Where the collected file will be saved and how to copy it off the target for analysis.
- Artifact handoff checklist: Include runtime version, OS/kernel, container image tag or build SHA, PID/process name, UTC collection start/end timestamps, exact command used, and final artifact path when handing traces to someone else for analysis.
提供推荐工具的具体命令。从工具参考文件查询表中加载对应的参考文件获取详细的命令行示例。
需包含的关键指导:
- 安装:如果工具未安装,说明安装方法(例如 )。当推荐多个工具时,为每个工具提供安装和使用说明——不要仅提及工具而不说明如何安装和使用。
dotnet tool install -g dotnet-trace - PID发现(任何 命令的前置步骤):先验证目标进程(例如:
-p <PID>、dotnet-trace ps或容器内的curl <monitor-endpoint>/processes命令)。即使应用在容器中预期为PID 1,收集前仍需验证。ps - 收集命令:具体的运行命令,包括相关提供程序、输出格式和时长。
- 容器注意事项:
- 容器内收集:确保工具已安装在镜像中,或使用 复制工具到容器内。
kubectl cp - 容器外收集:将 作为边车部署,共享诊断端口(
dotnet-monitor中的Unix域套接字)。/tmp - Kubernetes:将 作为边车容器,或使用
dotnet-monitor创建临时调试容器。kubectl debug
- 容器内收集:确保工具已安装在镜像中,或使用
- 长时间复现的问题(Windows/PerfView):说明如何使用触发参数和循环缓冲区设置。
- 输出位置:收集的文件保存位置,以及如何将文件复制到目标机器外进行分析。
- 工件移交检查清单:将跟踪文件移交他人分析时,需包含运行时版本、操作系统/内核版本、容器镜像标签或构建SHA、PID/进程名称、UTC收集开始/结束时间、使用的具体命令、最终工件路径。
Step 4: Recommend analysis approach
步骤4:推荐分析方法
After data is collected, recommend the appropriate tool for analysis. Do not perform the analysis — just point the developer to the right tool and documentation.
| Collected Data | Analysis Tool | Notes |
|---|---|---|
| PerfView (Windows), Speedscope (web) | PerfView gives the richest view on Windows |
| PerfView | ETW traces from PerfView or perfcollect |
| PerfView (Windows) | Copy the file to a Windows machine and open with PerfView |
数据收集完成后,推荐合适的分析工具。不要执行分析操作——只需为开发者指明正确的工具和文档。
| 收集的数据 | 分析工具 | 说明 |
|---|---|---|
| PerfView(Windows)、Speedscope(网页版) | Windows上PerfView提供最丰富的视图 |
| PerfView | 来自PerfView或perfcollect的ETW跟踪文件 |
perfcollect生成的 | PerfView(Windows) | 将文件复制到Windows机器,使用PerfView打开 |
Validation
验证
- The recommended tool is compatible with the developer's runtime, OS, and deployment topology
- The collection command runs without errors
- The output file is generated in the expected location
- The developer knows which analysis tool to use for the collected data
- 推荐的工具与开发者的运行时、操作系统和部署架构兼容
- 收集命令可正常运行无错误
- 输出文件在预期位置生成
- 开发者了解针对收集的数据应使用哪种分析工具
Common Pitfalls
常见陷阱
| Pitfall | Solution |
|---|---|
Using | |
| PerfView without admin privileges | PerfView requires admin for ETW tracing. Fall back to |
| Containers drop |
| Huge trace files from long repros | On Windows, use PerfView |
| Diagnostic port not accessible in container | Mount |
| Forgetting to install tools in container image | Add |
Exposing | Keep auth enabled, bind to localhost, and use |
| Collecting only CPU/thread-time trace for networking issues | CPU and thread-time traces alone do not show HTTP status codes, DNS timing, or connection pool behavior. Add the networking providers ( |
| Enabling all networking providers when only one is needed | Each networking provider adds overhead. If the issue is clearly HTTP-level (5xx status codes), |
| 陷阱 | 解决方案 |
|---|---|
在.NET Framework上使用 | |
| 无管理员权限使用PerfView | PerfView需要管理员权限进行ETW跟踪。无管理员权限时,退而使用 |
在无 | 容器默认移除 |
| 长时间复现问题产生超大跟踪文件 | 在Windows上,使用基于目标症状的PerfView |
| 容器中诊断端口无法访问 | 在应用容器和 |
| 忘记在容器镜像中安装工具 | 在Dockerfile中添加 |
生产环境中使用 | 保持认证启用,绑定到本地主机,并使用 |
| 仅收集CPU/线程时间跟踪文件排查网络问题 | CPU和线程时间跟踪文件无法显示HTTP状态码、DNS时长或连接池行为。在线程时间跟踪文件基础上添加网络提供程序( |
| 仅需一个网络提供程序时启用所有提供程序 | 每个网络提供程序都会增加开销。如果问题明确为HTTP级别(5xx状态码),仅启用 |