performing-api-inventory-and-discovery
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePerforming API Inventory and Discovery
执行API盘点与发现
When to Use
适用场景
- Mapping the complete API attack surface of an organization before a security assessment
- Identifying shadow APIs deployed by development teams without security review
- Discovering deprecated or zombie API versions that remain accessible but unmaintained
- Finding undocumented API endpoints exposed through mobile applications, SPAs, or microservices
- Building an API inventory for compliance requirements (PCI-DSS, SOC2, GDPR)
Do not use without written authorization. API discovery involves scanning network infrastructure and analyzing traffic.
- 在安全评估前映射组织的完整API攻击面
- 识别开发团队未经安全审查部署的影子API
- 发现仍可访问但未维护的已弃用或僵尸API版本
- 查找通过移动应用、SPA或微服务暴露的未文档化API端点
- 为合规要求(PCI-DSS、SOC2、GDPR)构建API库存
禁止使用:未经书面授权不得使用。API发现涉及扫描网络基础设施和分析流量。
Prerequisites
前置条件
- Written authorization specifying the target domains and network ranges
- Passive traffic capture capability (network tap, proxy, or cloud traffic mirroring)
- Active scanning tools: Amass, subfinder, httpx, and nuclei
- JavaScript analysis tools: LinkFinder, JS-Miner, or custom parsers
- Access to cloud console (AWS, Azure, GCP) for API gateway inventory
- Burp Suite Professional for passive API endpoint discovery
- 指定目标域名和网络范围的书面授权
- 被动流量捕获能力(网络分流器、代理或云流量镜像)
- 主动扫描工具:Amass、subfinder、httpx和nuclei
- JavaScript分析工具:LinkFinder、JS-Miner或自定义解析器
- 访问云控制台(AWS、Azure、GCP)以进行API网关盘点
- Burp Suite Professional用于被动API端点发现
Workflow
工作流程
Step 1: Passive API Discovery from Traffic Analysis
步骤1:通过流量分析进行被动API发现
python
import re
import json
from collections import defaultdictpython
import re
import json
from collections import defaultdictParse HAR file from browser developer tools or proxy
Parse HAR file from browser developer tools or proxy
def analyze_har_for_apis(har_file_path):
"""Extract API endpoints from HTTP Archive (HAR) file."""
with open(har_file_path) as f:
har = json.load(f)
api_endpoints = defaultdict(lambda: {
"methods": set(), "content_types": set(),
"auth_types": set(), "count": 0
})
for entry in har["log"]["entries"]:
url = entry["request"]["url"]
method = entry["request"]["method"]
# Identify API patterns
api_patterns = [
r'/api/', r'/v\d+/', r'/graphql', r'/rest/',
r'/ws/', r'/rpc/', r'/grpc', r'/json',
]
if any(re.search(p, url) for p in api_patterns):
# Normalize the URL (remove query params and IDs)
normalized = re.sub(r'\?.*$', '', url)
normalized = re.sub(r'/\d+(/|$)', '/{id}\\1', normalized)
normalized = re.sub(
r'/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
'/{uuid}', normalized)
ep = api_endpoints[normalized]
ep["methods"].add(method)
ep["count"] += 1
# Detect authentication type
for header in entry["request"]["headers"]:
name = header["name"].lower()
if name == "authorization":
if "bearer" in header["value"].lower():
ep["auth_types"].add("Bearer/JWT")
elif "basic" in header["value"].lower():
ep["auth_types"].add("Basic")
elif name == "x-api-key":
ep["auth_types"].add("API Key")
# Detect content type
content_type = next(
(h["value"] for h in entry["request"]["headers"]
if h["name"].lower() == "content-type"), None)
if content_type:
ep["content_types"].add(content_type.split(";")[0])
print(f"Discovered {len(api_endpoints)} unique API endpoints:\n")
for url, info in sorted(api_endpoints.items()):
methods = ", ".join(sorted(info["methods"]))
auth = ", ".join(info["auth_types"]) or "None"
print(f" [{methods}] {url}")
print(f" Auth: {auth} | Requests: {info['count']}")
return api_endpointsundefineddef analyze_har_for_apis(har_file_path):
"""Extract API endpoints from HTTP Archive (HAR) file."""
with open(har_file_path) as f:
har = json.load(f)
api_endpoints = defaultdict(lambda: {
"methods": set(), "content_types": set(),
"auth_types": set(), "count": 0
})
for entry in har["log"]["entries"]:
url = entry["request"]["url"]
method = entry["request"]["method"]
# Identify API patterns
api_patterns = [
r'/api/', r'/v\d+/', r'/graphql', r'/rest/',
r'/ws/', r'/rpc/', r'/grpc', r'/json',
]
if any(re.search(p, url) for p in api_patterns):
# Normalize the URL (remove query params and IDs)
normalized = re.sub(r'\?.*$', '', url)
normalized = re.sub(r'/\d+(/|$)', '/{id}\\1', normalized)
normalized = re.sub(
r'/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
'/{uuid}', normalized)
ep = api_endpoints[normalized]
ep["methods"].add(method)
ep["count"] += 1
# Detect authentication type
for header in entry["request"]["headers"]:
name = header["name"].lower()
if name == "authorization":
if "bearer" in header["value"].lower():
ep["auth_types"].add("Bearer/JWT")
elif "basic" in header["value"].lower():
ep["auth_types"].add("Basic")
elif name == "x-api-key":
ep["auth_types"].add("API Key")
# Detect content type
content_type = next(
(h["value"] for h in entry["request"]["headers"]
if h["name"].lower() == "content-type"), None)
if content_type:
ep["content_types"].add(content_type.split(";")[0])
print(f"Discovered {len(api_endpoints)} unique API endpoints:\n")
for url, info in sorted(api_endpoints.items()):
methods = ", ".join(sorted(info["methods"]))
auth = ", ".join(info["auth_types"]) or "None"
print(f" [{methods}] {url}")
print(f" Auth: {auth} | Requests: {info['count']}")
return api_endpointsundefinedStep 2: Active API Endpoint Discovery
步骤2:主动API端点发现
bash
undefinedbash
undefinedDNS enumeration for API subdomains
DNS enumeration for API subdomains
amass enum -d example.com -o amass_results.txt
subfinder -d example.com -o subfinder_results.txt
amass enum -d example.com -o amass_results.txt
subfinder -d example.com -o subfinder_results.txt
Filter for API-related subdomains
Filter for API-related subdomains
grep -iE '(api|rest|graphql|ws|gateway|backend|internal|staging|dev|v1|v2)'
amass_results.txt subfinder_results.txt | sort -u > api_subdomains.txt
amass_results.txt subfinder_results.txt | sort -u > api_subdomains.txt
grep -iE '(api|rest|graphql|ws|gateway|backend|internal|staging|dev|v1|v2)'
amass_results.txt subfinder_results.txt | sort -u > api_subdomains.txt
amass_results.txt subfinder_results.txt | sort -u > api_subdomains.txt
Check which subdomains are alive
Check which subdomains are alive
cat api_subdomains.txt | httpx -status-code -content-length -title
-tech-detect -o live_apis.txt
-tech-detect -o live_apis.txt
cat api_subdomains.txt | httpx -status-code -content-length -title
-tech-detect -o live_apis.txt
-tech-detect -o live_apis.txt
Probe common API paths on each live subdomain
Probe common API paths on each live subdomain
cat api_subdomains.txt | while read domain; do
for path in /api /api/v1 /api/v2 /graphql /swagger.json /openapi.json
/api-docs /docs /health /status /metrics /actuator; do curl -s -o /dev/null -w "%{http_code} %{url_effective}\n"
"https://${domain}${path}" 2>/dev/null | grep -v "^404" done done
/api-docs /docs /health /status /metrics /actuator; do curl -s -o /dev/null -w "%{http_code} %{url_effective}\n"
"https://${domain}${path}" 2>/dev/null | grep -v "^404" done done
```python
import requests
import concurrent.futures
def discover_api_endpoints(base_domains):
"""Actively probe for API endpoints across discovered domains."""
# Common API paths to test
API_PATHS = [
"/api", "/api/v1", "/api/v2", "/api/v3",
"/graphql", "/gql", "/query",
"/rest", "/json", "/rpc",
"/swagger.json", "/swagger/v1/swagger.json",
"/openapi.json", "/openapi.yaml", "/api-docs",
"/docs", "/redoc", "/explorer",
"/.well-known/openid-configuration",
"/health", "/healthz", "/ready",
"/status", "/info", "/version",
"/metrics", "/prometheus",
"/actuator", "/actuator/health", "/actuator/info",
"/admin", "/admin/api", "/internal",
"/debug", "/debug/vars", "/debug/pprof",
"/ws", "/websocket", "/socket.io",
"/grpc", "/twirp",
]
discovered = []
def check_endpoint(domain, path):
for scheme in ["https", "http"]:
url = f"{scheme}://{domain}{path}"
try:
resp = requests.get(url, timeout=5, allow_redirects=False,
verify=False) # TLS verification disabled for discovery; enable in production
if resp.status_code not in (404, 502, 503):
return {
"url": url,
"status": resp.status_code,
"content_type": resp.headers.get("Content-Type", ""),
"server": resp.headers.get("Server", ""),
"size": len(resp.content),
}
except requests.exceptions.RequestException:
pass
return None
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
futures = {}
for domain in base_domains:
for path in API_PATHS:
future = executor.submit(check_endpoint, domain, path)
futures[future] = (domain, path)
for future in concurrent.futures.as_completed(futures):
result = future.result()
if result:
discovered.append(result)
print(f" [FOUND] {result['url']} -> {result['status']} ({result['content_type']})")
return discoveredcat api_subdomains.txt | while read domain; do
for path in /api /api/v1 /api/v2 /graphql /swagger.json /openapi.json
/api-docs /docs /health /status /metrics /actuator; do curl -s -o /dev/null -w "%{http_code} %{url_effective}\n"
"https://${domain}${path}" 2>/dev/null | grep -v "^404" done done
/api-docs /docs /health /status /metrics /actuator; do curl -s -o /dev/null -w "%{http_code} %{url_effective}\n"
"https://${domain}${path}" 2>/dev/null | grep -v "^404" done done
```python
import requests
import concurrent.futures
def discover_api_endpoints(base_domains):
"""Actively probe for API endpoints across discovered domains."""
# Common API paths to test
API_PATHS = [
"/api", "/api/v1", "/api/v2", "/api/v3",
"/graphql", "/gql", "/query",
"/rest", "/json", "/rpc",
"/swagger.json", "/swagger/v1/swagger.json",
"/openapi.json", "/openapi.yaml", "/api-docs",
"/docs", "/redoc", "/explorer",
"/.well-known/openid-configuration",
"/health", "/healthz", "/ready",
"/status", "/info", "/version",
"/metrics", "/prometheus",
"/actuator", "/actuator/health", "/actuator/info",
"/admin", "/admin/api", "/internal",
"/debug", "/debug/vars", "/debug/pprof",
"/ws", "/websocket", "/socket.io",
"/grpc", "/twirp",
]
discovered = []
def check_endpoint(domain, path):
for scheme in ["https", "http"]:
url = f"{scheme}://{domain}{path}"
try:
resp = requests.get(url, timeout=5, allow_redirects=False,
verify=False) # TLS verification disabled for discovery; enable in production
if resp.status_code not in (404, 502, 503):
return {
"url": url,
"status": resp.status_code,
"content_type": resp.headers.get("Content-Type", ""),
"server": resp.headers.get("Server", ""),
"size": len(resp.content),
}
except requests.exceptions.RequestException:
pass
return None
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
futures = {}
for domain in base_domains:
for path in API_PATHS:
future = executor.submit(check_endpoint, domain, path)
futures[future] = (domain, path)
for future in concurrent.futures.as_completed(futures):
result = future.result()
if result:
discovered.append(result)
print(f" [FOUND] {result['url']} -> {result['status']} ({result['content_type']})")
return discoveredStep 3: JavaScript Source Analysis for API Endpoints
步骤3:从JavaScript源码中提取API端点
python
import re
import requests
def extract_apis_from_javascript(js_urls):
"""Extract API endpoints from JavaScript source files."""
api_pattern = re.compile(
r'''(?:['"`])((?:/api/|/v[0-9]+/|/graphql|/rest/)[^'"`\s<>{}]+)(?:['"`])''',
re.IGNORECASE
)
url_pattern = re.compile(
r'''(?:['"`])(https?://[a-zA-Z0-9._-]+(?:\.[a-zA-Z]{2,})+(?:/[^'"`\s<>{}]*)?)(?:['"`])'''
)
fetch_pattern = re.compile(
r'''(?:fetch|axios|ajax|XMLHttpRequest|\.get|\.post|\.put|\.delete|\.patch)\s*\(\s*(?:['"`])([^'"`]+)'''
)
all_endpoints = set()
for js_url in js_urls:
try:
resp = requests.get(js_url, timeout=10)
content = resp.text
# Extract relative API paths
for match in api_pattern.findall(content):
all_endpoints.add(("relative", match))
# Extract absolute URLs
for match in url_pattern.findall(content):
if any(kw in match.lower() for kw in ["/api", "/v1", "/v2", "graphql"]):
all_endpoints.add(("absolute", match))
# Extract from fetch/axios calls
for match in fetch_pattern.findall(content):
all_endpoints.add(("fetch", match))
except requests.exceptions.RequestException:
pass
print(f"\nAPI endpoints discovered from JavaScript ({len(all_endpoints)}):")
for source, endpoint in sorted(all_endpoints):
print(f" [{source}] {endpoint}")
return all_endpointspython
import re
import requests
def extract_apis_from_javascript(js_urls):
"""Extract API endpoints from JavaScript source files."""
api_pattern = re.compile(
r'''(?:['"`])((?:/api/|/v[0-9]+/|/graphql|/rest/)[^'"`\s<>{}]+)(?:['"`])''',
re.IGNORECASE
)
url_pattern = re.compile(
r'''(?:['"`])(https?://[a-zA-Z0-9._-]+(?:\.[a-zA-Z]{2,})+(?:/[^'"`\s<>{}]*)?)(?:['"`])'''
)
fetch_pattern = re.compile(
r'''(?:fetch|axios|ajax|XMLHttpRequest|\.get|\.post|\.put|\.delete|\.patch)\s*\(\s*(?:['"`])([^'"`]+)'''
)
all_endpoints = set()
for js_url in js_urls:
try:
resp = requests.get(js_url, timeout=10)
content = resp.text
# Extract relative API paths
for match in api_pattern.findall(content):
all_endpoints.add(("relative", match))
# Extract absolute URLs
for match in url_pattern.findall(content):
if any(kw in match.lower() for kw in ["/api", "/v1", "/v2", "graphql"]):
all_endpoints.add(("absolute", match))
# Extract from fetch/axios calls
for match in fetch_pattern.findall(content):
all_endpoints.add(("fetch", match))
except requests.exceptions.RequestException:
pass
print(f"\nAPI endpoints discovered from JavaScript ({len(all_endpoints)}):")
for source, endpoint in sorted(all_endpoints):
print(f" [{source}] {endpoint}")
return all_endpointsFind JavaScript files from the target domain
Find JavaScript files from the target domain
def find_js_files(domain):
"""Discover JavaScript files from a web application."""
resp = requests.get(f"https://{domain}", timeout=10)
js_files = re.findall(r'src="'', resp.text)
full_urls = []
for js in js_files:
if js.startswith("http"):
full_urls.append(js)
elif js.startswith("//"):
full_urls.append(f"https:{js}")
elif js.startswith("/"):
full_urls.append(f"https://{domain}{js}")
return full_urls
undefineddef find_js_files(domain):
"""Discover JavaScript files from a web application."""
resp = requests.get(f"https://{domain}", timeout=10)
js_files = re.findall(r'src="'', resp.text)
full_urls = []
for js in js_files:
if js.startswith("http"):
full_urls.append(js)
elif js.startswith("//"):
full_urls.append(f"https:{js}")
elif js.startswith("/"):
full_urls.append(f"https://{domain}{js}")
return full_urls
undefinedStep 4: Cloud API Gateway Inventory
步骤4:云API网关盘点
python
import boto3
def inventory_aws_apis():
"""Inventory all APIs in AWS API Gateway."""
apigw = boto3.client('apigateway')
apigwv2 = boto3.client('apigatewayv2')
apis = []
# REST APIs (API Gateway v1)
rest_apis = apigw.get_rest_apis()
for api in rest_apis['items']:
resources = apigw.get_resources(restApiId=api['id'])
stages = apigw.get_stages(restApiId=api['id'])
for stage in stages['item']:
for resource in resources['items']:
for method in resource.get('resourceMethods', {}).keys():
apis.append({
"type": "REST",
"name": api['name'],
"stage": stage['stageName'],
"path": resource['path'],
"method": method,
"url": f"https://{api['id']}.execute-api.{boto3.session.Session().region_name}.amazonaws.com/{stage['stageName']}{resource['path']}",
"created": str(api.get('createdDate', '')),
})
# HTTP APIs (API Gateway v2)
http_apis = apigwv2.get_apis()
for api in http_apis['Items']:
routes = apigwv2.get_routes(ApiId=api['ApiId'])
stages = apigwv2.get_stages(ApiId=api['ApiId'])
for route in routes['Items']:
apis.append({
"type": "HTTP",
"name": api['Name'],
"route": route['RouteKey'],
"api_id": api['ApiId'],
"protocol": api['ProtocolType'],
})
print(f"\nAWS API Inventory ({len(apis)} endpoints):")
for api in apis:
print(f" [{api['type']}] {api.get('name')} - {api.get('method', '')} {api.get('path', api.get('route', ''))}")
return apispython
import boto3
def inventory_aws_apis():
"""Inventory all APIs in AWS API Gateway."""
apigw = boto3.client('apigateway')
apigwv2 = boto3.client('apigatewayv2')
apis = []
# REST APIs (API Gateway v1)
rest_apis = apigw.get_rest_apis()
for api in rest_apis['items']:
resources = apigw.get_resources(restApiId=api['id'])
stages = apigw.get_stages(restApiId=api['id'])
for stage in stages['item']:
for resource in resources['items']:
for method in resource.get('resourceMethods', {}).keys():
apis.append({
"type": "REST",
"name": api['name'],
"stage": stage['stageName'],
"path": resource['path'],
"method": method,
"url": f"https://{api['id']}.execute-api.{boto3.session.Session().region_name}.amazonaws.com/{stage['stageName']}{resource['path']}",
"created": str(api.get('createdDate', '')),
})
# HTTP APIs (API Gateway v2)
http_apis = apigwv2.get_apis()
for api in http_apis['Items']:
routes = apigwv2.get_routes(ApiId=api['ApiId'])
stages = apigwv2.get_stages(ApiId=api['ApiId'])
for route in routes['Items']:
apis.append({
"type": "HTTP",
"name": api['Name'],
"route": route['RouteKey'],
"api_id": api['ApiId'],
"protocol": api['ProtocolType'],
})
print(f"\nAWS API Inventory ({len(apis)} endpoints):")
for api in apis:
print(f" [{api['type']}] {api.get('name')} - {api.get('method', '')} {api.get('path', api.get('route', ''))}")
return apisStep 5: API Version and Shadow API Detection
步骤5:API版本与影子API检测
python
def detect_shadow_and_zombie_apis(discovered_endpoints, documented_endpoints):
"""Compare discovered APIs against documented inventory."""
# Normalize endpoints for comparison
def normalize(ep):
ep = re.sub(r'/v\d+/', '/vX/', ep)
ep = re.sub(r'/\d+', '/{id}', ep)
return ep.lower().rstrip('/')
documented_normalized = {normalize(ep) for ep in documented_endpoints}
shadow_apis = [] # Discovered but not documented
zombie_apis = [] # Old versions still accessible
for ep in discovered_endpoints:
normalized = normalize(ep["url"])
if normalized not in documented_normalized:
# Check if it is an old version of a documented API
if re.search(r'/v[0-9]+/', ep["url"]):
zombie_apis.append(ep)
else:
shadow_apis.append(ep)
print(f"\nShadow APIs (undocumented): {len(shadow_apis)}")
for api in shadow_apis:
print(f" [SHADOW] {api['url']} -> {api['status']}")
print(f"\nZombie APIs (deprecated versions): {len(zombie_apis)}")
for api in zombie_apis:
print(f" [ZOMBIE] {api['url']} -> {api['status']}")
# Check if zombie APIs lack security controls
for api in zombie_apis:
resp = requests.get(api["url"], timeout=5)
if resp.status_code not in (401, 403):
print(f" [CRITICAL] Zombie API accessible without auth: {api['url']}")
return shadow_apis, zombie_apispython
def detect_shadow_and_zombie_apis(discovered_endpoints, documented_endpoints):
"""Compare discovered APIs against documented inventory."""
# Normalize endpoints for comparison
def normalize(ep):
ep = re.sub(r'/v\d+/', '/vX/', ep)
ep = re.sub(r'/\d+', '/{id}', ep)
return ep.lower().rstrip('/')
documented_normalized = {normalize(ep) for ep in documented_endpoints}
shadow_apis = [] # Discovered but not documented
zombie_apis = [] # Old versions still accessible
for ep in discovered_endpoints:
normalized = normalize(ep["url"])
if normalized not in documented_normalized:
# Check if it is an old version of a documented API
if re.search(r'/v[0-9]+/', ep["url"]):
zombie_apis.append(ep)
else:
shadow_apis.append(ep)
print(f"\nShadow APIs (undocumented): {len(shadow_apis)}")
for api in shadow_apis:
print(f" [SHADOW] {api['url']} -> {api['status']}")
print(f"\nZombie APIs (deprecated versions): {len(zombie_apis)}")
for api in zombie_apis:
print(f" [ZOMBIE] {api['url']} -> {api['status']}")
# Check if zombie APIs lack security controls
for api in zombie_apis:
resp = requests.get(api["url"], timeout=5)
if resp.status_code not in (401, 403):
print(f" [CRITICAL] Zombie API accessible without auth: {api['url']}")
return shadow_apis, zombie_apisKey Concepts
核心概念
| Term | Definition |
|---|---|
| Shadow API | An API deployed by a development team without going through the official API management or security review process |
| Zombie API | A deprecated or old API version that remains accessible and running but is no longer maintained or monitored |
| API Inventory | A comprehensive catalog of all APIs in an organization including endpoint URLs, owners, versions, authentication methods, and data classifications |
| Improper Inventory Management | OWASP API9:2023 - failure to maintain an accurate API inventory, leading to unmonitored and unprotected API endpoints |
| Attack Surface | The total set of API endpoints, methods, and parameters that an attacker can potentially interact with |
| API Sprawl | The uncontrolled proliferation of APIs in an organization, often resulting from microservice adoption without centralized governance |
| 术语 | 定义 |
|---|---|
| 影子API | 由开发团队部署,未经过官方API管理或安全审查流程的API |
| 僵尸API | 已弃用或旧版本的API,仍可访问且在运行,但不再被维护或监控 |
| API库存 | 组织内所有API的全面目录,包括端点URL、所有者、版本、认证方法和数据分类 |
| 不当库存管理 | OWASP API9:2023 - 未能维护准确的API库存,导致API端点未被监控和保护 |
| 攻击面 | 攻击者可能潜在交互的所有API端点、方法和参数的总和 |
| API蔓延 | 组织内API不受控制的激增,通常是由于采用微服务但缺乏集中治理导致 |
Tools & Systems
工具与系统
- Amass: OWASP tool for attack surface mapping through DNS enumeration, web scraping, and API discovery
- httpx: Fast HTTP probing tool for validating discovered domains and identifying live API endpoints
- nuclei: Template-based scanner for detecting exposed API documentation, debug endpoints, and misconfigured services
- Swagger UI Detector: Tool for finding exposed Swagger/OpenAPI documentation endpoints across the organization
- Akto: API security platform that discovers APIs through traffic analysis and maintains an automated inventory
- Amass: OWASP工具,通过DNS枚举、网页抓取和API发现进行攻击面映射
- httpx: 快速HTTP探测工具,用于验证发现的域名并识别活跃API端点
- nuclei: 基于模板的扫描器,用于检测暴露的API文档、调试端点和配置错误的服务
- Swagger UI Detector: 用于在组织范围内查找暴露的Swagger/OpenAPI文档端点的工具
- Akto: API安全平台,通过流量分析发现API并维护自动化库存
Common Scenarios
常见场景
Scenario: Enterprise API Attack Surface Assessment
场景:企业API攻击面评估
Context: A large enterprise has 200+ development teams using microservices. The security team suspects many undocumented APIs are exposed to the internet. A comprehensive API inventory is needed for a security audit.
Approach:
- DNS enumeration discovers 340 subdomains, 45 contain API-related keywords (api, rest, gateway, backend)
- Active probing of all subdomains with API path wordlist discovers 127 live API endpoints
- JavaScript analysis of the main web application reveals 34 API endpoints, 8 of which point to undocumented internal services
- AWS API Gateway inventory shows 67 REST APIs and 23 HTTP APIs across 12 accounts
- Cross-referencing against the official API catalog: 31 shadow APIs (undocumented), 14 zombie APIs (deprecated versions)
- 3 zombie APIs have no authentication, exposing customer data through endpoints that were supposed to be decommissioned
- 2 shadow APIs expose internal admin functions to the internet without authorization
Pitfalls:
- Only checking documented API endpoints and missing shadow APIs deployed outside the API gateway
- Not scanning JavaScript bundles where frontend applications hardcode API endpoint URLs
- Missing APIs behind non-standard ports or subpaths
- Not checking for multiple API versions where older versions may lack security controls
- Assuming all APIs go through the API gateway when some may be directly exposed
背景:某大型企业有200多个开发团队使用微服务。安全团队怀疑有许多未文档化的API暴露在互联网上,需要构建全面的API库存以进行安全审计。
方法:
- DNS枚举发现340个子域名,其中45个包含API相关关键词(api、rest、gateway、backend)
- 使用API路径词表对所有子域名进行主动探测,发现127个活跃API端点
- 对主Web应用进行JavaScript分析,发现34个API端点,其中8个指向未文档化的内部服务
- AWS API网关盘点显示,12个账户中有67个REST API和23个HTTP API
- 与官方API目录交叉对比:31个影子API(未文档化),14个僵尸API(已弃用版本)
- 3个僵尸API无认证,暴露了本应停用的端点中的客户数据
- 2个影子API将内部管理功能暴露在互联网上,且无授权机制
常见误区:
- 仅检查已文档化的API端点,忽略了在API网关外部署的影子API
- 未扫描前端应用硬编码API端点URL的JavaScript包
- 遗漏了非标准端口或子路径后的API
- 未检查多个API版本,旧版本可能缺乏安全控制
- 假设所有API都通过API网关,而有些可能直接暴露
Output Format
输出格式
undefinedundefinedAPI Inventory and Discovery Report
API Inventory and Discovery Report
Organization: Example Corp
Assessment Date: 2024-12-15
Domains Scanned: 340
Organization: Example Corp
Assessment Date: 2024-12-15
Domains Scanned: 340
Summary
Summary
| Category | Count |
|---|---|
| Total APIs Discovered | 127 |
| Documented APIs | 82 |
| Shadow APIs (undocumented) | 31 |
| Zombie APIs (deprecated) | 14 |
| APIs Without Authentication | 8 |
| APIs Exposing Sensitive Data | 5 |
| Category | Count |
|---|---|
| Total APIs Discovered | 127 |
| Documented APIs | 82 |
| Shadow APIs (undocumented) | 31 |
| Zombie APIs (deprecated) | 14 |
| APIs Without Authentication | 8 |
| APIs Exposing Sensitive Data | 5 |
Critical Findings
Critical Findings
- Zombie API: api-v1.example.com/api/v1/users - Deprecated in 2022, still accessible, no authentication required, returns full user data
- Shadow API: internal-tools.example.com/api/admin - Admin functions exposed to internet without authorization
- Exposed Documentation: 12 Swagger UI instances accessible publicly, revealing full API schema and endpoint details
undefined- Zombie API: api-v1.example.com/api/v1/users - Deprecated in 2022, still accessible, no authentication required, returns full user data
- Shadow API: internal-tools.example.com/api/admin - Admin functions exposed to internet without authorization
- Exposed Documentation: 12 Swagger UI instances accessible publicly, revealing full API schema and endpoint details
undefined