Infrastructure Hosts Skill
Monitor and manage host and process infrastructure including CPU, memory, disk, network, and technology inventory.
What This Skill Does
- Discover and inventory hosts across cloud and on-premise environments
- Monitor host resource utilization (CPU, memory, disk, network)
- Track process resource consumption and lifecycle
- Analyze container and Kubernetes infrastructure
- Discover services via listening ports
- Manage technology stack versions and compliance
- Attribute infrastructure costs by cost center and product
- Validate data quality and metadata completeness
- Plan capacity and detect resource saturation
- Correlate infrastructure health across layers
When to Use This Skill
Use this skill when the user needs to:
- Inventory: "Show me all Linux hosts in AWS us-east-1"
- Monitor: "What hosts have high CPU usage?"
- Troubleshoot: "Which processes are consuming the most memory?"
- Discover: "What databases are running in production?"
- Plan: "Track Kubernetes version distribution for upgrade planning"
- Cost: "Calculate infrastructure costs by cost center"
- Security: "Find all processes listening on port 22"
- Compliance: "Identify hosts running EOL Java versions"
- Quality: "Check data completeness for AWS hosts"
- Optimize: "Find rightsizing candidates based on utilization"
Core Concepts
Entities
- HOST - Physical or virtual machines (cloud or on-premise)
- PROCESS - Running processes and process groups
- CONTAINER - Kubernetes containers
- NETWORK_INTERFACE - Host network interfaces
- DISK - Host disk volumes
Metrics Categories
- Host Metrics - , , ,
- Process Metrics - , , ,
- Inventory - OS type, cloud provider, technology stack, versions
- Cost - ,
- Quality - Metadata completeness, version compliance
Alert Thresholds
- CPU/Memory/Disk: 80% warning, 90% critical
- Network: >70% high, >85% saturated
- Disk Latency: >20ms bottleneck
- Network Errors: Drop rate >1%, error rate >0.1%
- Swap: >30% warning, >50% critical
Key Workflows
1. Host Discovery and Classification
Discover hosts, classify by OS/cloud, inventory resources.
dql
smartscapeNodes "HOST"
| fieldsAdd os.type, cloud.provider, host.logical.cpu.cores, host.physical.memory
| summarize host_count = count(), by: {os.type, cloud.provider}
| sort host_count desc
→ For cloud-specific attributes, see
references/inventory-discovery.md
2. Resource Utilization Monitoring
Monitor CPU, memory, disk, network across hosts.
dql
timeseries {
cpu = avg(dt.host.cpu.usage),
memory = avg(dt.host.memory.usage),
disk = avg(dt.host.disk.used.percent)
}, by: {dt.smartscape.host}
| fieldsAdd host_name = getNodeName(dt.smartscape.host)
| filter arrayAvg(cpu) > 80 or arrayAvg(memory) > 80
| sort arrayAvg(cpu) desc
High utilization threshold: 80% warning, 90% critical
→ For detailed CPU analysis, see
references/host-metrics.md
→ For memory breakdown, see
references/host-metrics.md
3. Process Resource Analysis
Identify top resource consumers at process level.
dql
timeseries {
cpu = avg(dt.process.cpu.usage),
memory = avg(dt.process.memory.usage)
}, by: {dt.smartscape.process}
| fieldsAdd process_name = getNodeName(dt.smartscape.process)
| filter arrayAvg(cpu) > 50
| sort arrayAvg(cpu) desc
| limit 20
→ For process I/O analysis, see
references/process-monitoring.md
→ For process network metrics, see
references/process-monitoring.md
4. Technology Stack Inventory
Discover and track software technologies and versions.
dql
smartscapeNodes "PROCESS"
| fieldsAdd process.software_technologies
| expand tech = process.software_technologies
| fieldsAdd tech_type = tech[type], tech_version = tech[version]
| summarize process_count = count(), by: {tech_type, tech_version}
| sort process_count desc
Common Technologies: Java, Node.js, Python, .NET, databases, web servers, messaging systems
→ For version compliance checks, see
references/inventory-discovery.md
5. Service Discovery via Ports
Map listening ports to services for security and inventory.
dql
smartscapeNodes "PROCESS"
| fieldsAdd process.listen_ports, dt.process_group.detected_name
| filter isNotNull(process.listen_ports) and arraySize(process.listen_ports) > 0
| expand port = process.listen_ports
| summarize process_count = count(), by: {port, dt.process_group.detected_name}
| sort toLong(port) asc
| limit 50
Well-known ports: 80 (HTTP), 443 (HTTPS), 22 (SSH), 3306 (MySQL), 5432 (PostgreSQL)
→ For comprehensive port mapping, see
references/inventory-discovery.md
6. Container and Kubernetes Monitoring
Track container distribution and K8s workload types.
dql
smartscapeNodes "CONTAINER"
| fieldsAdd k8s.cluster.name, k8s.namespace.name, k8s.workload.kind
| summarize container_count = count(), by: {k8s.cluster.name, k8s.workload.kind}
| sort k8s.cluster.name, container_count desc
Note: Container image names/versions NOT available in smartscape.
7. Cost Attribution and Chargeback
Calculate infrastructure costs by cost center.
dql
smartscapeNodes "HOST"
| fieldsAdd dt.cost.costcenter, host.logical.cpu.cores, host.physical.memory
| filter isNotNull(dt.cost.costcenter)
| fieldsAdd memory_gb = toDouble(host.physical.memory) / 1024 / 1024 / 1024
| summarize
host_count = count(),
total_cores = sum(toLong(host.logical.cpu.cores)),
total_memory_gb = sum(memory_gb),
by: {dt.cost.costcenter}
| sort total_cores desc
→ For product-level cost tracking, see
references/inventory-discovery.md
8. Infrastructure Health Correlation
Correlate host and process metrics for cross-layer analysis.
dql
timeseries {
host_cpu = avg(dt.host.cpu.usage),
host_memory = avg(dt.host.memory.usage),
process_cpu = avg(dt.process.cpu.usage)
}, by: {dt.smartscape.host, dt.smartscape.process}
| fieldsAdd
host_name = getNodeName(dt.smartscape.host),
process_name = getNodeName(dt.smartscape.process)
| filter arrayAvg(host_cpu) > 70
| sort arrayAvg(host_cpu) desc
Health scoring: Critical if any resource >90%, warning if >80%
→ For multi-resource saturation detection, see
references/host-metrics.md
Common Query Patterns
Pattern 1: Smartscape Discovery
Use
to discover and classify entities.
dql
smartscapeNodes "HOST"
| fieldsAdd <attributes>
| filter <conditions>
| summarize <aggregations>
Pattern 2: Timeseries Performance
Use
to analyze metrics over time.
dql
timeseries metric = avg(dt.host.<metric>), by: {dt.smartscape.host}
| fieldsAdd <calculations>
| filter <thresholds>
Pattern 3: Cross-Layer Correlation
Correlate host and process metrics.
dql
timeseries {
host_cpu = avg(dt.host.cpu.usage),
process_cpu = avg(dt.process.cpu.usage)
}, by: {dt.smartscape.host, dt.smartscape.process}
Pattern 4: Entity Enrichment with Lookup
Enrich data with entity attributes. After
, reference fields with
prefix.
dql
timeseries cpu = avg(dt.host.cpu.usage), by: {dt.smartscape.host}
| lookup [
smartscapeNodes HOST
| fields id, cpuCores, memoryTotal
], sourceField:dt.smartscape.host, lookupField:id
| fieldsAdd cores = lookup.cpuCores, mem_gb = lookup.memoryTotal / 1024 / 1024 / 1024
Tags and Metadata
Important Notes
- Generic field is NOT populated in smartscape queries
- Use specific tag fields: ,
- Use custom metadata:
Available Tags
- Azure Tags:
tags:azure[dt_owner_team]
, tags:azure[dt_cloudcost_capability]
- Environment:
- Custom Metadata:
host.custom.metadata[OperatorVersion]
, host.custom.metadata[Cluster]
- Cost: ,
→ For complete tag reference, see
references/inventory-discovery.md
Cloud-Specific Attributes
AWS
- , ,
- ,
- (running, stopped, terminated)
Azure
cloud.provider == "azure"
- , ,
- ,
- (VM size)
Kubernetes
Best Practices
Alerting
- Use percentiles (p95, p99) for latency metrics
- Use for resource limits
- Use for utilization trends
- Set multi-level thresholds (warning at 80%, critical at 90%)
Time Windows
- Real-time: 5-15 minute windows
- Trends: 24 hours to 7 days
- Capacity planning: 30-90 days
Query Optimization
- Use filters early in the pipeline
- Limit results with
- Use specific entity types in smartscapeNodes
- Aggregate before enrichment (lookup)
Data Quality
- Validate metadata completeness (target >90%)
- Check for duplicate host names
- Ensure cost tag coverage
- Monitor data freshness (lifetime.end)
Limitations and Notes
Smartscape Limitations
- Container image names/versions NOT available in smartscape
- Generic field NOT populated (use specific tag namespaces)
- Process metadata varies by process type
Platform-Specific
- available on Linux only
- AIX has specific CPU metrics (entitlement, physc)
- Inode metrics available on Linux only
Best Practices
- Use to get human-readable names
- Convert bytes to GB for readability:
- Round aggregated values:
round(value, decimals: 1)
- Use checks before array operations
When to Load References
This skill uses progressive disclosure. Start here for 80% of use cases. Load reference files for detailed specifications when needed.
Load host-metrics.md when:
- Analyzing CPU component breakdown (user, system, iowait, steal)
- Investigating memory pressure and swap usage
- Troubleshooting disk I/O latency
- Diagnosing network packet drops or errors
Load process-monitoring.md when:
- Analyzing process-level I/O patterns
- Investigating TCP connection quality
- Detecting resource exhaustion (file descriptors, threads)
- Tracking GC suspension time
Load container-monitoring.md when:
- Analyzing container lifecycle and churn
- Tracking Kubernetes version distribution
- Managing OneAgent operator versions
- Planning K8s cluster upgrades
Load inventory-discovery.md when:
- Performing security audits via port discovery
- Implementing cost attribution and chargeback
- Validating data quality and metadata completeness
- Managing multi-cloud infrastructure
References
- host-metrics.md - Detailed host CPU, memory, disk, and network monitoring
- process-monitoring.md - Process-level CPU, memory, I/O, and network analysis
- container-monitoring.md - Container inventory, Kubernetes versions, and operator management
- inventory-discovery.md - Host/process discovery, technology inventory, cost attribution, and data quality