linux-server
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLinux Server Administration (Ubuntu / Debian)
Linux服务器管理(Ubuntu / Debian)
New Server Setup Script
新服务器设置脚本
Save as on the new machine, run once as root, then delete it.
/root/setup-server.shbash
#!/usr/bin/env bash在新机器上保存为,以root身份运行一次后删除。
/root/setup-server.shbash
#!/usr/bin/env bash=============================================================================
=============================================================================
Production Server Initial Setup — Ubuntu 22.04 / 24.04 LTS
生产服务器初始设置 — Ubuntu 22.04 / 24.04 LTS
Run as root immediately after first login.
首次登录后立即以root身份运行。
=============================================================================
=============================================================================
set -euo pipefail
set -euo pipefail
---------------------------------------------------------------------------
---------------------------------------------------------------------------
1. Variables — edit before running
1. 变量 — 运行前修改
---------------------------------------------------------------------------
---------------------------------------------------------------------------
NEW_USER="deploy"
SSH_PUBLIC_KEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... user@workstation"
HOSTNAME="app-prod-01"
TIMEZONE="UTC" # Or e.g. "America/New_York", "Europe/London"
SWAP_SIZE_GB=4 # Set to 0 to skip swap creation
NEW_USER="deploy"
SSH_PUBLIC_KEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... user@workstation"
HOSTNAME="app-prod-01"
TIMEZONE="UTC" # 或例如 "America/New_York", "Europe/London"
SWAP_SIZE_GB=4 # 设置为0可跳过交换分区创建
---------------------------------------------------------------------------
---------------------------------------------------------------------------
2. Update package lists and apply security patches
2. 更新软件包列表并应用安全补丁
---------------------------------------------------------------------------
---------------------------------------------------------------------------
apt-get update -qq
DEBIAN_FRONTEND=noninteractive apt-get upgrade -y -q
apt-get update -qq
DEBIAN_FRONTEND=noninteractive apt-get upgrade -y -q
---------------------------------------------------------------------------
---------------------------------------------------------------------------
3. Set hostname and /etc/hosts
3. 设置主机名与/etc/hosts
---------------------------------------------------------------------------
---------------------------------------------------------------------------
hostnamectl set-hostname "$HOSTNAME"
hostnamectl set-hostname "$HOSTNAME"
Ensure 127.0.1.1 resolves the hostname (required by some software)
确保127.0.1.1解析主机名(部分软件要求)
if ! grep -q "$HOSTNAME" /etc/hosts; then
echo "127.0.1.1 $HOSTNAME" >> /etc/hosts
fi
if ! grep -q "$HOSTNAME" /etc/hosts; then
echo "127.0.1.1 $HOSTNAME" >> /etc/hosts
fi
---------------------------------------------------------------------------
---------------------------------------------------------------------------
4. Set timezone
4. 设置时区
---------------------------------------------------------------------------
---------------------------------------------------------------------------
timedatectl set-timezone "$TIMEZONE"
timedatectl set-timezone "$TIMEZONE"
---------------------------------------------------------------------------
---------------------------------------------------------------------------
5. Install essential packages
5. 安装必备软件包
---------------------------------------------------------------------------
---------------------------------------------------------------------------
DEBIAN_FRONTEND=noninteractive apt-get install -y -q
curl wget git vim htop iotop net-tools dnsutils
unzip build-essential ca-certificates gnupg lsb-release
fail2ban ufw chrony logrotate unattended-upgrades apt-listchanges
curl wget git vim htop iotop net-tools dnsutils
unzip build-essential ca-certificates gnupg lsb-release
fail2ban ufw chrony logrotate unattended-upgrades apt-listchanges
DEBIAN_FRONTEND=noninteractive apt-get install -y -q
curl wget git vim htop iotop net-tools dnsutils
unzip build-essential ca-certificates gnupg lsb-release
fail2ban ufw chrony logrotate unattended-upgrades apt-listchanges
curl wget git vim htop iotop net-tools dnsutils
unzip build-essential ca-certificates gnupg lsb-release
fail2ban ufw chrony logrotate unattended-upgrades apt-listchanges
---------------------------------------------------------------------------
---------------------------------------------------------------------------
6. Configure NTP with chrony
6. 用chrony配置NTP
---------------------------------------------------------------------------
---------------------------------------------------------------------------
systemctl enable --now chrony
systemctl enable --now chrony
Verify synchronisation (should show * on the active source)
验证同步状态(活跃源前应显示*)
chronyc sources -v || true
chronyc sources -v || true
---------------------------------------------------------------------------
---------------------------------------------------------------------------
7. Create sudo user with SSH key access
7. 创建带SSH密钥访问的sudo用户
---------------------------------------------------------------------------
---------------------------------------------------------------------------
if ! id -u "$NEW_USER" &>/dev/null; then
adduser --disabled-password --gecos "" "$NEW_USER"
fi
usermod -aG sudo "$NEW_USER"
if ! id -u "$NEW_USER" &>/dev/null; then
adduser --disabled-password --gecos "" "$NEW_USER"
fi
usermod -aG sudo "$NEW_USER"
Set up authorized_keys
设置authorized_keys
SSH_DIR="/home/${NEW_USER}/.ssh"
mkdir -p "$SSH_DIR"
echo "$SSH_PUBLIC_KEY" > "${SSH_DIR}/authorized_keys"
chmod 700 "$SSH_DIR"
chmod 600 "${SSH_DIR}/authorized_keys"
chown -R "${NEW_USER}:${NEW_USER}" "$SSH_DIR"
SSH_DIR="/home/${NEW_USER}/.ssh"
mkdir -p "$SSH_DIR"
echo "$SSH_PUBLIC_KEY" > "${SSH_DIR}/authorized_keys"
chmod 700 "$SSH_DIR"
chmod 600 "${SSH_DIR}/authorized_keys"
chown -R "${NEW_USER}:${NEW_USER}" "$SSH_DIR"
---------------------------------------------------------------------------
---------------------------------------------------------------------------
8. Harden SSH
8. 加固SSH
---------------------------------------------------------------------------
---------------------------------------------------------------------------
cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%F)
cat > /etc/ssh/sshd_config.d/99-hardening.conf <<'SSHD_CONF'
PermitRootLogin no
PasswordAuthentication no
ChallengeResponseAuthentication no
PubkeyAuthentication yes
X11Forwarding no
AllowTcpForwarding no
MaxAuthTries 3
LoginGraceTime 30
ClientAliveInterval 300
ClientAliveCountMax 2
SSHD_CONF
systemctl reload sshd
cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%F)
cat > /etc/ssh/sshd_config.d/99-hardening.conf <<'SSHD_CONF'
PermitRootLogin no
PasswordAuthentication no
ChallengeResponseAuthentication no
PubkeyAuthentication yes
X11Forwarding no
AllowTcpForwarding no
MaxAuthTries 3
LoginGraceTime 30
ClientAliveInterval 300
ClientAliveCountMax 2
SSHD_CONF
systemctl reload sshd
---------------------------------------------------------------------------
---------------------------------------------------------------------------
9. Configure unattended security upgrades
9. 配置无人值守安全升级
---------------------------------------------------------------------------
---------------------------------------------------------------------------
cat > /etc/apt/apt.conf.d/20auto-upgrades <<'APT_CONF'
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";
APT_CONF'
cat > /etc/apt/apt.conf.d/50unattended-upgrades <<'APT_UU'
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}-security";
"${distro_id}ESMApps:${distro_codename}-apps-security";
"${distro_id}ESM:${distro_codename}-infra-security";
};
Unattended-Upgrade::Mail "root";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Automatic-Reboot "false";
APT_UU
cat > /etc/apt/apt.conf.d/20auto-upgrades <<'APT_CONF'
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";
APT_CONF'
cat > /etc/apt/apt.conf.d/50unattended-upgrades <<'APT_UU'
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}-security";
"${distro_id}ESMApps:${distro_codename}-apps-security";
"${distro_id}ESM:${distro_codename}-infra-security";
};
Unattended-Upgrade::Mail "root";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Automatic-Reboot "false";
APT_UU
---------------------------------------------------------------------------
---------------------------------------------------------------------------
10. Create swap file
10. 创建交换文件
---------------------------------------------------------------------------
---------------------------------------------------------------------------
if [[ "$SWAP_SIZE_GB" -gt 0 ]] && ! swapon --show | grep -q /swapfile; then
fallocate -l "${SWAP_SIZE_GB}G" /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab
# Reduce swappiness for applications (default is 60)
echo 'vm.swappiness=10' > /etc/sysctl.d/60-swap.conf
sysctl -p /etc/sysctl.d/60-swap.conf
fi
if [[ "$SWAP_SIZE_GB" -gt 0 ]] && ! swapon --show | grep -q /swapfile; then
fallocate -l "${SWAP_SIZE_GB}G" /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab
# 为应用降低交换倾向(默认值为60)
echo 'vm.swappiness=10' > /etc/sysctl.d/60-swap.conf
sysctl -p /etc/sysctl.d/60-swap.conf
fi
---------------------------------------------------------------------------
---------------------------------------------------------------------------
11. Apply kernel tuning (idempotent — see sysctl.d section below)
11. 应用内核调优(幂等操作 — 见下方sysctl.d章节)
---------------------------------------------------------------------------
---------------------------------------------------------------------------
cp /dev/stdin /etc/sysctl.d/99-production.conf <<'SYSCTL'
cp /dev/stdin /etc/sysctl.d/99-production.conf <<'SYSCTL'
Network performance
网络性能
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65536
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65536
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
File system
文件系统
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
Virtual memory
虚拟内存
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
SYSCTL
sysctl --system
echo ""
echo "=== Setup complete. Log in as '${NEW_USER}' via SSH key before closing this session. ==="
---vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
SYSCTL
sysctl --system
echo ""
echo "=== 设置完成。关闭此会话前,请通过SSH密钥以'${NEW_USER}'身份登录。 ==="
---sysctl.d Production Tuning Reference
sysctl.d生产环境调优参考
The file above () covers the most impactful parameters. Apply changes live without rebooting:
/etc/sysctl.d/99-production.confbash
sudo sysctl --system # Reload all files in /etc/sysctl.d/
sudo sysctl -p /etc/sysctl.d/99-production.conf # Reload specific file上述文件()涵盖了最具影响力的参数。无需重启即可实时应用更改:
/etc/sysctl.d/99-production.confbash
sudo sysctl --system # 重新加载/etc/sysctl.d/下的所有文件
sudo sysctl -p /etc/sysctl.d/99-production.conf # 重新加载指定文件Verify a parameter
验证参数
sudo sysctl net.core.somaxconn
---sudo sysctl net.core.somaxconn
---/etc/security/limits.conf — High-Concurrency App
/etc/security/limits.conf — 高并发应用配置
undefinedundefined/etc/security/limits.conf
/etc/security/limits.conf
Changes take effect on next login (not on running processes).
修改将在下次登录时生效(对运行中进程无效)。
Application service user (e.g., node app running as 'deploy')
应用服务用户(例如,以'deploy'身份运行的node应用)
deploy soft nofile 65535
deploy hard nofile 65535
deploy soft nproc 8192
deploy hard nproc 8192
deploy soft nofile 65535
deploy hard nofile 65535
deploy soft nproc 8192
deploy hard nproc 8192
Root (required if app runs as root — avoid this)
Root(仅当应用以root运行时需要 — 避免此操作)
root soft nofile 65535
root hard nofile 65535
root soft nofile 65535
root hard nofile 65535
Wildcard fallback for all other users
所有其他用户的通配符回退
-
soft nofile 65535 -
hard nofile 65535
For systemd services, `LimitNOFILE` in the unit file takes precedence over `/etc/security/limits.conf`. Set both.
Verify effective limits of a running process:
```bash-
soft nofile 65535 -
hard nofile 65535
对于systemd服务,单元文件中的`LimitNOFILE`优先级高于`/etc/security/limits.conf`,需同时设置两者。
验证运行中进程的有效限制:
```bashPID of your app:
你的应用PID:
cat /proc/$(pgrep -o node)/limits
---cat /proc/$(pgrep -o node)/limits
---Package Management
包管理
bash
undefinedbash
undefinedUpdate package lists
更新软件包列表
sudo apt-get update
sudo apt-get update
Upgrade all packages (non-interactive)
升级所有软件包(非交互式)
sudo DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
sudo DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
Install specific package
安装指定软件包
sudo apt-get install -y nginx
sudo apt-get install -y nginx
Remove package and its config files
删除软件包及其配置文件
sudo apt-get purge -y apache2 && sudo apt-get autoremove -y
sudo apt-get purge -y apache2 && sudo apt-get autoremove -y
Hold a package at its current version (prevent unattended upgrades)
锁定软件包版本(防止无人值守升级)
sudo apt-mark hold nginx
sudo apt-mark unhold nginx
sudo apt-mark showhold
sudo apt-mark hold nginx
sudo apt-mark unhold nginx
sudo apt-mark showhold
List installed packages
列出已安装软件包
dpkg -l | grep nginx
dpkg -l | grep nginx
Show available versions
显示可用版本
apt-cache policy nginx
apt-cache policy nginx
Find which package provides a file
查找提供指定文件的软件包
dpkg -S /usr/sbin/nginx
apt-file search /usr/sbin/nginx # needs apt-file package
---dpkg -S /usr/sbin/nginx
apt-file search /usr/sbin/nginx # 需要apt-file软件包
---File Permissions
文件权限
bash
undefinedbash
undefinedSymbolic mode: u=user, g=group, o=others, a=all; r=4, w=2, x=1
符号模式:u=用户, g=组, o=其他, a=所有; r=4, w=2, x=1
chmod 755 /var/www/myapp # rwxr-xr-x — directory traversable by all
chmod 644 /var/www/myapp/app.js # rw-r--r-- — file readable by all
chmod 600 /etc/app/secret.env # rw------- — private config file
chmod -R 750 /opt/myapp # Recursive; all files/dirs get 750
chmod 755 /var/www/myapp # rwxr-xr-x — 所有用户可遍历目录
chmod 644 /var/www/myapp/app.js # rw-r--r-- — 所有用户可读文件
chmod 600 /etc/app/secret.env # rw------- — 私有配置文件
chmod -R 750 /opt/myapp # 递归设置;所有文件/目录权限为750
Ownership
所有权
chown www-data:www-data /var/www/myapp
chown -R deploy:deploy /opt/myapp
chown www-data:www-data /var/www/myapp
chown -R deploy:deploy /opt/myapp
umask: determines default permissions for new files
umask:决定新文件的默认权限
Default 022 → new files 644, new dirs 755
默认022 → 新文件644,新目录755
For private app dirs, set umask 027 in the service unit's EnvironmentFile
对于私有应用目录,在服务单元的EnvironmentFile中设置umask 027
ACL — grant extra access without changing ownership
ACL — 无需更改所有权即可授予额外权限
sudo apt-get install -y acl
sudo setfacl -m u:deploy:rwX /var/log/myapp # User deploy gets rwX
sudo setfacl -m g:developers:rX /var/log/myapp # Group developers read only
sudo setfacl -d -m u:deploy:rwX /var/log/myapp # Default ACL for new files in dir
sudo getfacl /var/log/myapp # View current ACL
---sudo apt-get install -y acl
sudo setfacl -m u:deploy:rwX /var/log/myapp # 用户deploy获得rwX权限
sudo setfacl -m g:developers:rX /var/log/myapp # 组developers仅可读
sudo setfacl -d -m u:deploy:rwX /var/log/myapp # 目录中新文件的默认ACL
sudo getfacl /var/log/myapp # 查看当前ACL
---logrotate — Application Log Config
logrotate — 应用日志配置
Create :
/etc/logrotate.d/myapp/var/log/myapp/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 deploy www-data
sharedscripts
postrotate
# Signal app to reopen log file handles
systemctl kill -s USR1 myapp.service || true
endscript
}bash
undefined创建:
/etc/logrotate.d/myapp/var/log/myapp/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 deploy www-data
sharedscripts
postrotate
# 发送信号让应用重新打开日志文件句柄
systemctl kill -s USR1 myapp.service || true
endscript
}bash
undefinedTest logrotate config without actually rotating
测试logrotate配置,不实际执行轮转
sudo logrotate --debug /etc/logrotate.d/myapp
sudo logrotate --debug /etc/logrotate.d/myapp
Force rotation immediately (useful after config changes)
立即强制轮转(配置更改后有用)
sudo logrotate --force /etc/logrotate.d/myapp
---sudo logrotate --force /etc/logrotate.d/myapp
---journald Retention Limits
journald保留限制
undefinedundefined/etc/systemd/journald.conf.d/size-limits.conf
/etc/systemd/journald.conf.d/size-limits.conf
[Journal]
SystemMaxUse=2G # Maximum total disk usage for persistent journals
SystemKeepFree=500M # Minimum free space to leave on the volume
MaxFileSec=1month # Rotate journal files older than 1 month
MaxRetentionSec=3months # Delete journal entries older than 3 months
```bash
sudo systemctl restart systemd-journald[Journal]
SystemMaxUse=2G # 持久化日志的最大磁盘占用
SystemKeepFree=500M # 卷上需保留的最小可用空间
MaxFileSec=1month # 轮转早于1个月的日志文件
MaxRetentionSec=3months # 删除早于3个月的日志条目
```bash
sudo systemctl restart systemd-journaldShow current journal disk usage
显示当前日志磁盘占用
journalctl --disk-usage
journalctl --disk-usage
Manually vacuum old entries
手动清理旧条目
sudo journalctl --vacuum-time=90d
sudo journalctl --vacuum-size=1G
---sudo journalctl --vacuum-time=90d
sudo journalctl --vacuum-size=1G
---User Management
用户管理
bash
undefinedbash
undefinedCreate a new system user (no login shell, for running services)
创建新系统用户(无登录shell,用于运行服务)
sudo adduser --system --group --no-create-home --shell /usr/sbin/nologin appuser
sudo adduser --system --group --no-create-home --shell /usr/sbin/nologin appuser
Create a regular user interactively
交互式创建普通用户
sudo adduser alice
sudo adduser alice
Add to a group
添加到组
sudo usermod -aG sudo alice # Add alice to sudo group
sudo usermod -aG docker alice # Add to docker group (takes effect on next login)
sudo usermod -aG sudo alice # 将alice添加到sudo组
sudo usermod -aG docker alice # 添加到docker组(下次登录生效)
Show group memberships
显示组成员
groups alice
id alice
groups alice
id alice
Lock/unlock account (prevents password login)
锁定/解锁账户(阻止密码登录)
sudo passwd -l alice
sudo passwd -u alice
sudo passwd -l alice
sudo passwd -u alice
Expire password (force change on next login)
过期密码(强制下次登录时修改)
sudo chage -d 0 alice
sudo chage -d 0 alice
Edit sudoers safely (always use visudo)
安全编辑sudoers(始终使用visudo)
sudo visudo
sudo visudo
Grant passwordless sudo for specific command only (add to /etc/sudoers.d/deploy)
授予特定命令的无密码sudo权限(添加到/etc/sudoers.d/deploy)
echo 'deploy ALL=(ALL) NOPASSWD: /bin/systemctl restart myapp'
| sudo tee /etc/sudoers.d/deploy-restart sudo chmod 440 /etc/sudoers.d/deploy-restart
| sudo tee /etc/sudoers.d/deploy-restart sudo chmod 440 /etc/sudoers.d/deploy-restart
---echo 'deploy ALL=(ALL) NOPASSWD: /bin/systemctl restart myapp'
| sudo tee /etc/sudoers.d/deploy-restart sudo chmod 440 /etc/sudoers.d/deploy-restart
| sudo tee /etc/sudoers.d/deploy-restart sudo chmod 440 /etc/sudoers.d/deploy-restart
---cron Scheduling
cron定时任务调度
bash
undefinedbash
undefinedEdit crontab for current user
编辑当前用户的crontab
crontab -e
crontab -e
Edit crontab for another user
编辑其他用户的crontab
sudo crontab -u deploy -e
sudo crontab -u deploy -e
Cron expression format:
Cron表达式格式:
┌── minute (0-59)
┌── 分钟 (0-59)
│ ┌── hour (0-23)
│ ┌── 小时 (0-23)
│ │ ┌── day of month (1-31)
│ │ ┌── 日 (1-31)
│ │ │ ┌── month (1-12)
│ │ │ ┌── 月 (1-12)
│ │ │ │ ┌── day of week (0=Sun, 7=Sun)
│ │ │ │ ┌── 星期 (0=周日, 7=周日)
│ │ │ │ │
│ │ │ │ │
* * * * * command
* * * * * 命令
0 3 * * * /opt/myapp/scripts/backup.sh >> /var/log/myapp/backup.log 2>&1
*/15 * * * * /usr/bin/healthcheck.sh
0 0 1 * * /usr/sbin/certbot renew --quiet
0 3 * * * /opt/myapp/scripts/backup.sh >> /var/log/myapp/backup.log 2>&1
*/15 * * * * /usr/bin/healthcheck.sh
0 0 1 * * /usr/sbin/certbot renew --quiet
System-wide cron directories (no crontab syntax — file name is not significant)
系统级cron目录(无需crontab语法 — 文件名无关紧要)
ls /etc/cron.{hourly,daily,weekly,monthly}/
ls /etc/cron.{hourly,daily,weekly,monthly}/
Run a one-off command at a specific time
在特定时间运行一次性命令
echo "/opt/scripts/maintenance.sh" | at 02:00 tomorrow
atq # List pending at jobs
atrm 3 # Remove job #3
---echo "/opt/scripts/maintenance.sh" | at 02:00 tomorrow
atq # 列出待处理的at任务
atrm 3 # 删除编号为3的任务
---Disk Management
磁盘管理
bash
undefinedbash
undefinedDisk space overview
磁盘空间概览
df -hT # Human-readable, with filesystem type
df -ih # inode usage (relevant when disk shows space but can't write)
df -hT # 人类可读格式,含文件系统类型
df -ih # inode使用情况(磁盘显示有空间但无法写入时相关)
Find large files / directories
查找大文件/目录
du -sh /var/log/* # Size of each log dir
du -ah /var/log | sort -rh | head -20
du -sh /var/log/* # 每个日志目录的大小
du -ah /var/log | sort -rh | head -20
Block device layout (partitions, mount points, sizes)
块设备布局(分区、挂载点、大小)
lsblk -f # With filesystem types and UUIDs
blkid # UUIDs and filesystem types only
lsblk -f # 含文件系统类型与UUID
blkid # 仅显示UUID与文件系统类型
Mount a new volume
挂载新卷
sudo mkfs.ext4 /dev/sdb1
sudo mkdir -p /data
sudo mount /dev/sdb1 /data
sudo mkfs.ext4 /dev/sdb1
sudo mkdir -p /data
sudo mount /dev/sdb1 /data
Persist mount in /etc/fstab (use UUID, not device name)
在/etc/fstab中持久化挂载(使用UUID,而非设备名)
UUID=$(blkid -s UUID -o value /dev/sdb1)
echo "UUID=$UUID /data ext4 defaults,noatime 0 2" | sudo tee -a /etc/fstab
sudo mount -a # Test fstab without rebooting
UUID=$(blkid -s UUID -o value /dev/sdb1)
echo "UUID=$UUID /data ext4 defaults,noatime 0 2" | sudo tee -a /etc/fstab
sudo mount -a # 无需重启测试fstab
Resize ext4 filesystem after volume expansion (e.g., cloud disk resize)
卷扩容后调整ext4文件系统大小(例如云磁盘扩容)
sudo resize2fs /dev/sda1 # Online resize — no unmount needed on ext4 with kernel ≥ 2.6
sudo resize2fs /dev/sda1 # 在线调整 — 内核≥2.6的ext4无需卸载
Check filesystem for errors (unmounted)
检查文件系统错误(需卸载)
sudo fsck -n /dev/sdb1 # Dry run — no changes
sudo fsck -n /dev/sdb1 # 试运行 — 不做更改
Find which process has a deleted file open (common cause of "disk full" despite empty dirs)
查找持有已删除文件的进程(磁盘已满但目录为空的常见原因)
sudo lsof | grep '(deleted)' | awk '{print $7, $1, $2}' | sort -rn | head -20
sudo lsof | grep '(deleted)' | awk '{print $7, $1, $2}' | sort -rn | head -20
Kill or restart that process to release the space
终止或重启该进程以释放空间
---
---Anti-Patterns
反模式
| Anti-pattern | Why it's harmful | Fix |
|---|---|---|
| Running the application as root | Root process compromise = full system compromise | Create a dedicated low-privilege service user; use |
| No swap configured | OOM killer terminates processes without warning on memory spikes | Create and mount |
| No log rotation | Logs fill the disk; service crashes with no space left | Add |
| Timezone not set (default UTC may be fine but implicit) | Log timestamps ambiguous when comparing with client logs | Run |
| Password SSH auth enabled | Brute-force attacks succeed with weak passwords | |
| Direct root login bypasses audit trail | |
Not using | Security patches not applied; server stays vulnerable | Enable and configure |
| ulimit not raised for app user | App hits default 1024 FD limit → "too many open files" errors under load | Set |
Ignoring | Disk reports free space but writes fail because inodes exhausted | Monitor inode usage: |
Global | Every user and process can write to the directory — data destruction risk | Identify which user needs access and use |
| 反模式 | 危害 | 修复方案 |
|---|---|---|
| 以root身份运行应用 | Root进程被攻陷 = 整个系统被攻陷 | 创建专用低权限服务用户;在systemd单元中使用 |
| 未配置交换分区 | 内存峰值时OOM杀手无预警终止进程 | 创建并挂载 |
| 未配置日志轮转 | 日志填满磁盘;服务因空间不足崩溃 | 添加 |
| 未设置时区(默认UTC可能可行但不明确) | 日志时间戳与客户端日志对比时模糊不清 | 在每台服务器上运行 |
| 启用SSH密码认证 | 弱密码易遭暴力破解成功 | 在sshd_config中设置 |
| 直接root登录绕过审计追踪 | 设置 |
未使用 | 未应用安全补丁;服务器持续处于易受攻击状态 | 启用并配置安全源的 |
| 未提升应用用户的ulimit | 应用达到默认1024文件描述符限制 → 高负载下出现"打开文件过多"错误 | 在systemd单元和 |
忽略 | 磁盘显示有空间但因inode耗尽无法写入 | 监控inode使用情况: |
全局 | 所有用户和进程均可写入目录 — 存在数据销毁风险 | 确定需要访问的用户,使用 |
Troubleshooting
故障排查
| Symptom | Likely cause | Diagnostic / Fix |
|---|---|---|
| Disk full | Logs, core dumps, or deleted-but-open files | |
| "Too many open files" | Process hit FD limit ( | |
| "cannot set locale" | Missing locale package or LANG mismatch | |
| Time drift / NTP not working | chrony not running, or firewall blocking UDP 123 | |
| User in sudoers but no NOPASSWD rule | Add |
| Can't SSH after sshd_config change | Syntax error or PermitRootLogin disabled with no other user | Always test: |
| unattended-upgrades not running | Service disabled or config typo | |
| Swap not being used | | |
| cron job not running | Permission error, wrong PATH, or no output | Check |
| resize2fs "Device or resource busy" | Trying to resize a mounted ext2/3 filesystem | ext4 supports online resize; for others, boot from rescue mode |
| 症状 | 可能原因 | 诊断/修复 |
|---|---|---|
| 磁盘满额 | 日志、核心转储或已删除但仍被打开的文件 | |
| "打开文件过多" | 进程达到文件描述符限制( | |
| "无法设置区域设置" | 缺少区域设置包或LANG不匹配 | |
| 时间偏移 / NTP不工作 | chrony未运行,或防火墙阻止UDP 123端口 | |
deploy用户执行 | 用户在sudoers中但无NOPASSWD规则 | 在 |
| 修改sshd_config后无法SSH | 语法错误或禁用PermitRootLogin但无其他可用用户 | 始终测试: |
| unattended-upgrades未运行 | 服务已禁用或配置有误 | |
| 未使用交换分区 | | |
| cron任务未运行 | 权限错误、PATH错误或无输出 | 检查 |
| resize2fs提示"设备或资源忙" | 尝试调整已挂载的ext2/3文件系统 | ext4支持在线调整;其他文件系统需从救援模式启动 |
Essential Commands Cheat-Sheet
必备命令速查表
bash
undefinedbash
undefinedProcess management
进程管理
ps aux --sort=-%mem | head -20 # Top memory consumers
top -b -n1 | head -20 # Snapshot of all processes
kill -9 <pid> # Force-kill
pkill -u deploy # Kill all processes of a user
ps aux --sort=-%mem | head -20 # 内存占用最高的前20个进程
top -b -n1 | head -20 # 所有进程快照
kill -9 <pid> # 强制终止
pkill -u deploy # 终止用户deploy的所有进程
Network
网络
ss -tlnp # TCP listening sockets with PID
ss -s # Socket summary
netstat -tulpn # Alternative (older)
ip addr # Interface addresses
ip route # Routing table
ss -tlnp # 带PID的TCP监听套接字
ss -s # 套接字摘要
netstat -tulpn # 替代命令(较旧)
ip addr # 接口地址
ip route # 路由表
Memory
内存
free -h # RAM and swap usage
vmstat -s # Detailed memory stats
cat /proc/meminfo
free -h # 内存与交换分区使用情况
vmstat -s # 详细内存统计
cat /proc/meminfo
Load average
负载平均值
uptime
w
sar -u 1 5 # CPU utilisation every 1s for 5 samples (sysstat package)
uptime
w
sar -u 1 5 # 每1秒采样一次CPU使用率,共5次(需sysstat软件包)
Who is logged in
当前登录用户
who
last | head -20 # Login history
who
last | head -20 # 登录历史
Find recently modified files
查找最近修改的文件
find /etc -newer /etc/passwd -type f 2>/dev/null | head -20
find /etc -newer /etc/passwd -type f 2>/dev/null | head -20
Check open ports
检查开放端口
ss -tlnp | awk 'NR>1 {print $4, $6}'
ss -tlnp | awk 'NR>1 {print $4, $6}'
Check failed login attempts
检查失败的登录尝试
journalctl -u ssh --since "1 hour ago" | grep Failed
grep 'Failed password' /var/log/auth.log | tail -20
undefinedjournalctl -u ssh --since "1 hour ago" | grep Failed
grep 'Failed password' /var/log/auth.log | tail -20
undefined