troubleshooting-efs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Troubleshooting EFS

Amazon EFS故障排查

Overview

概述

Domain expertise for diagnosing and resolving Amazon EFS issues. Covers mount failures, NFS connectivity, IAM and POSIX permissions, throughput and performance, and encryption problems.
For authoritative guidance, see EFS Troubleshooting.
本内容为诊断和解决Amazon EFS问题的专业指南,涵盖挂载失败、NFS连接、IAM与POSIX权限、吞吐量与性能以及加密问题。
如需权威指导,请查看EFS故障排查文档

Common Tasks

常见任务

0. Verify Dependencies

0. 验证依赖项

  • You MUST verify
    aws
    CLI is available
  • You MUST check if
    amazon-efs-utils
    or
    nfs-utils
    is installed on the instance
  • You MUST ONLY check for tool existence and version — MUST NOT execute destructive or mutating commands during verification
  • You MUST inform the user if any required tools are missing
  • You MUST respect the user's decision to abort if tools are unavailable
  • You SHOULD explain what each step does and why before executing it
  • You SHOULD display write commands and wait for user confirmation before executing
  • 必须验证
    aws
    CLI是否可用
  • 必须检查实例上是否安装了
    amazon-efs-utils
    nfs-utils
  • 仅需检查工具是否存在及其版本——验证过程中禁止执行破坏性或变更性命令
  • 若缺少任何必需工具,必须告知用户
  • 若工具不可用,必须尊重用户中止操作的决定
  • 在执行步骤前,应解释每个步骤的作用及原因
  • 应显示写入命令并等待用户确认后再执行

1. Classify the Issue

1. 问题分类

SymptomCategory
"wrong fs type" or mount command failsA: Missing NFS Client
Connection timed out (hangs 2+ min)B: Network/Security Group
"access denied by server"C: IAM/Permissions
Slow throughput or high latencyD: Performance
NFS server error on encrypted FSE: Encryption/KMS
DNS name resolution failsF: VPC DNS
症状类别
显示"wrong fs type"或挂载命令执行失败A: 缺少NFS客户端
连接超时(挂起2分钟以上)B: 网络/安全组
显示"access denied by server"C: IAM/权限
吞吐量缓慢或延迟过高D: 性能
加密文件系统出现NFS服务器错误E: 加密/KMS
DNS名称解析失败F: VPC DNS

2. Category A — Missing NFS Client

2. 类别A — 缺少NFS客户端

bash
undefined
bash
undefined

Amazon Linux / RHEL / CentOS

Amazon Linux / RHEL / CentOS

sudo yum -y install amazon-efs-utils # preferred (includes mount helper + TLS)
sudo yum -y install amazon-efs-utils # preferred (includes mount helper + TLS)

OR

OR

sudo yum -y install nfs-utils
sudo yum -y install nfs-utils

Ubuntu / Debian

Ubuntu / Debian

sudo apt-get install nfs-common
undefined
sudo apt-get install nfs-common
undefined

3. Category B — Network/Security Group

3. 类别B — 网络/安全组

Connection timeout is the #1 EFS mount failure — almost always security groups.
  1. Verify mount target exists in the instance's AZ:
bash
aws efs describe-mount-targets --file-system-id fs-ID --region REGION
  1. Verify security groups — check BOTH directions:
    • Mount target SG:
      aws ec2 describe-security-groups --group-ids sg-MT
      — MUST have inbound TCP 2049 from compute SG
    • Compute SG: MUST have outbound TCP 2049 to mount target SG
    • Quick fix:
      aws ec2 authorize-security-group-ingress --group-id sg-MT --protocol tcp --port 2049 --source-group sg-COMPUTE
  2. Test connectivity:
bash
nc -zv fs-ID.efs.REGION.amazonaws.com 2049
Note: These security group troubleshooting steps also apply to S3 Files. The only difference is S3 Files uses
aws s3files list-mount-targets
instead of
aws efs describe-mount-targets
.
连接超时是EFS挂载失败的首要原因——几乎都是安全组问题。
  1. 验证挂载目标是否存在于实例的可用区中:
bash
aws efs describe-mount-targets --file-system-id fs-ID --region REGION
  1. 验证安全组——检查双向规则:
    • 挂载目标安全组:
      aws ec2 describe-security-groups --group-ids sg-MT
      — 必须允许来自计算实例安全组的TCP 2049入站流量
    • 计算实例安全组:必须允许发往挂载目标安全组的TCP 2049出站流量
    • 快速修复:
      aws ec2 authorize-security-group-ingress --group-id sg-MT --protocol tcp --port 2049 --source-group sg-COMPUTE
  2. 测试连接性:
bash
nc -zv fs-ID.efs.REGION.amazonaws.com 2049
注意: 这些安全组故障排查步骤也适用于S3 Files。唯一区别是S3 Files使用
aws s3files list-mount-targets
而非
aws efs describe-mount-targets

4. Category C — IAM/Permissions

4. 类别C — IAM/权限

"access denied by server" with
-o iam
:
  • Check identity-based IAM policy has
    elasticfilesystem:ClientMount
  • Check file system resource policy:
bash
aws efs describe-file-system-policy --file-system-id fs-ID --region REGION
Note: IAM authorization is only enforced when a file system policy exists that requires it. Without a file system policy, any client in the VPC with port 2049 access can mount — even with
-o iam
. To enforce IAM, you MUST create a file system policy that denies anonymous access.
POSIX permission denied (not IAM):
  • Check file/directory ownership:
    ls -la /mnt/efs/
  • Use access points to enforce UID/GID for consistent permissions
使用
-o iam
时显示"access denied by server":
  • 检查基于身份的IAM策略是否包含
    elasticfilesystem:ClientMount
    权限
  • 检查文件系统资源策略:
bash
aws efs describe-file-system-policy --file-system-id fs-ID --region REGION
注意: 仅当存在要求IAM授权的文件系统策略时,才会强制执行IAM授权。若无文件系统策略,VPC中任何拥有2049端口访问权限的客户端均可挂载——即使使用
-o iam
选项。要强制执行IAM授权,必须创建拒绝匿名访问的文件系统策略。
POSIX权限被拒绝(非IAM问题):
  • 检查文件/目录所有权:
    ls -la /mnt/efs/
  • 使用访问点强制统一的UID/GID以确保权限一致性

5. Category D — Performance

5. 类别D — 性能

Check throughput mode:
bash
aws efs describe-file-systems --file-system-id fs-ID --region REGION --query 'FileSystems[0].ThroughputMode'
Burst credit exhaustion (Bursting mode only):
bash
aws cloudwatch get-metric-statistics --namespace AWS/EFS --metric-name BurstCreditBalance --dimensions Name=FileSystemId,Value=fs-ID --period 3600 --statistics Average --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) --end-time $(date -u +%Y-%m-%dT%H:%M:%S)
If credits near zero, switch to Elastic throughput:
bash
aws efs update-file-system --file-system-id fs-ID --throughput-mode elastic --region REGION
General Purpose vs Max I/O:
  • Check
    PercentIOLimit
    metric — if consistently >80%, consider Max I/O
  • Note: performance mode is IMMUTABLE — must create new FS and migrate
检查吞吐量模式:
bash
aws efs describe-file-systems --file-system-id fs-ID --region REGION --query 'FileSystems[0].ThroughputMode'
突发信用耗尽(仅适用于Bursting模式):
bash
aws cloudwatch get-metric-statistics --namespace AWS/EFS --metric-name BurstCreditBalance --dimensions Name=FileSystemId,Value=fs-ID --period 3600 --statistics Average --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) --end-time $(date -u +%Y-%m-%dT%H:%M:%S)
若信用余额接近零,切换至弹性吞吐量模式:
bash
aws efs update-file-system --file-system-id fs-ID --throughput-mode elastic --region REGION
通用用途模式 vs 最大I/O模式:
  • 检查
    PercentIOLimit
    指标——若持续超过80%,考虑切换至最大I/O模式
  • 注意:性能模式不可更改——必须创建新的文件系统并迁移数据

6. Category E — Encryption/KMS

6. 类别E — 加密/KMS

NFS server error on encrypted FS = KMS key issue.
  • Verify key is enabled in KMS console
  • Verify EFS service-linked role has KMS permissions
  • If key deleted: cancel deletion if within grace period
加密文件系统出现NFS服务器错误 = KMS密钥问题。
  • 在KMS控制台中验证密钥是否启用
  • 验证EFS服务关联角色是否拥有KMS权限
  • 若密钥已删除:在宽限期内取消删除操作

7. Category F — VPC DNS

7. 类别F — VPC DNS

DNS resolution failure = VPC DNS settings disabled.
bash
aws ec2 describe-vpc-attribute --vpc-id vpc-ID --attribute enableDnsHostnames
aws ec2 describe-vpc-attribute --vpc-id vpc-ID --attribute enableDnsSupport
Both MUST be
true
. If not:
bash
aws ec2 modify-vpc-attribute --vpc-id vpc-ID --enable-dns-hostnames Value=true
aws ec2 modify-vpc-attribute --vpc-id vpc-ID --enable-dns-support Value=true
DNS解析失败 = VPC DNS设置已禁用。
bash
aws ec2 describe-vpc-attribute --vpc-id vpc-ID --attribute enableDnsHostnames
aws ec2 describe-vpc-attribute --vpc-id vpc-ID --attribute enableDnsSupport
上述两项必须均为
true
。若不是:
bash
aws ec2 modify-vpc-attribute --vpc-id vpc-ID --enable-dns-hostnames Value=true
aws ec2 modify-vpc-attribute --vpc-id vpc-ID --enable-dns-support Value=true

Troubleshooting

故障排查场景

Mount hangs then times out

挂载挂起后超时

Most common cause: security group. Verify TCP 2049 is open between compute and mount target.
最常见原因:安全组。验证计算实例与挂载目标之间的TCP 2049端口是否开放。

Auto-mount fails on reboot

重启时自动挂载失败

/etc/fstab
entry MUST include
_netdev
option to wait for network before mounting.
/etc/fstab
条目必须包含
_netdev
选项,以便等待网络就绪后再执行挂载。

"nfs not responding" after reconnect

重新连接后显示"nfs not responding"

Old kernel bug with TCP port reuse. Update kernel or add
noresvport
mount option.
旧内核存在TCP端口复用漏洞。更新内核或添加
noresvport
挂载选项。

Enable Debug Logs

启用调试日志

Set
logging_level = DEBUG
in
/etc/amazon/efs/efs-utils.conf
. Logs at
/var/log/amazon/efs/mount.log
.
/etc/amazon/efs/efs-utils.conf
中设置
logging_level = DEBUG
。日志位于
/var/log/amazon/efs/mount.log

Collect Logs for AWS Support

收集日志以供AWS支持团队使用

bash
sudo tar -czf /tmp/efs-logs.tar.gz /var/log/amazon/efs/ /etc/amazon/efs/efs-utils.conf
bash
sudo tar -czf /tmp/efs-logs.tar.gz /var/log/amazon/efs/ /etc/amazon/efs/efs-utils.conf

Security Considerations

安全注意事项

  • IAM authorization is only enforced when a file system policy exists — without one, any VPC client with port 2049 access can mount
  • When troubleshooting access denied, verify both identity-based and resource-based policies
  • Use
    -o tls
    for encryption in transit — unencrypted NFS traffic is visible on the network
  • Restrict
    /var/log/amazon/efs/
    access — logs may contain file system IDs and mount target IPs
  • 仅当存在文件系统策略时才会强制执行IAM授权——若无策略,VPC中任何拥有2049端口访问权限的客户端均可挂载
  • 排查访问被拒绝问题时,需同时验证基于身份和基于资源的策略
  • 使用
    -o tls
    选项实现传输中加密——未加密的NFS流量可在网络中被捕获
  • 限制
    /var/log/amazon/efs/
    的访问权限——日志可能包含文件系统ID和挂载目标IP地址

Additional Resources

额外资源