linux-server

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Linux Server Administration (Ubuntu / Debian)

Linux服务器管理(Ubuntu / Debian)

New Server Setup Script

新服务器设置脚本

Save as
/root/setup-server.sh
on the new machine, run once as root, then delete it.
bash
#!/usr/bin/env bash
在新机器上保存为
/root/setup-server.sh
,以root身份运行一次后删除。
bash
#!/usr/bin/env bash

=============================================================================

=============================================================================

Production Server Initial Setup — Ubuntu 22.04 / 24.04 LTS

生产服务器初始设置 — Ubuntu 22.04 / 24.04 LTS

Run as root immediately after first login.

首次登录后立即以root身份运行。

=============================================================================

=============================================================================

set -euo pipefail
set -euo pipefail

---------------------------------------------------------------------------

---------------------------------------------------------------------------

1. Variables — edit before running

1. 变量 — 运行前修改

---------------------------------------------------------------------------

---------------------------------------------------------------------------

NEW_USER="deploy" SSH_PUBLIC_KEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... user@workstation" HOSTNAME="app-prod-01" TIMEZONE="UTC" # Or e.g. "America/New_York", "Europe/London" SWAP_SIZE_GB=4 # Set to 0 to skip swap creation
NEW_USER="deploy" SSH_PUBLIC_KEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... user@workstation" HOSTNAME="app-prod-01" TIMEZONE="UTC" # 或例如 "America/New_York", "Europe/London" SWAP_SIZE_GB=4 # 设置为0可跳过交换分区创建

---------------------------------------------------------------------------

---------------------------------------------------------------------------

2. Update package lists and apply security patches

2. 更新软件包列表并应用安全补丁

---------------------------------------------------------------------------

---------------------------------------------------------------------------

apt-get update -qq DEBIAN_FRONTEND=noninteractive apt-get upgrade -y -q
apt-get update -qq DEBIAN_FRONTEND=noninteractive apt-get upgrade -y -q

---------------------------------------------------------------------------

---------------------------------------------------------------------------

3. Set hostname and /etc/hosts

3. 设置主机名与/etc/hosts

---------------------------------------------------------------------------

---------------------------------------------------------------------------

hostnamectl set-hostname "$HOSTNAME"
hostnamectl set-hostname "$HOSTNAME"

Ensure 127.0.1.1 resolves the hostname (required by some software)

确保127.0.1.1解析主机名(部分软件要求)

if ! grep -q "$HOSTNAME" /etc/hosts; then echo "127.0.1.1 $HOSTNAME" >> /etc/hosts fi
if ! grep -q "$HOSTNAME" /etc/hosts; then echo "127.0.1.1 $HOSTNAME" >> /etc/hosts fi

---------------------------------------------------------------------------

---------------------------------------------------------------------------

4. Set timezone

4. 设置时区

---------------------------------------------------------------------------

---------------------------------------------------------------------------

timedatectl set-timezone "$TIMEZONE"
timedatectl set-timezone "$TIMEZONE"

---------------------------------------------------------------------------

---------------------------------------------------------------------------

5. Install essential packages

5. 安装必备软件包

---------------------------------------------------------------------------

---------------------------------------------------------------------------

DEBIAN_FRONTEND=noninteractive apt-get install -y -q
curl wget git vim htop iotop net-tools dnsutils
unzip build-essential ca-certificates gnupg lsb-release
fail2ban ufw chrony logrotate unattended-upgrades apt-listchanges
DEBIAN_FRONTEND=noninteractive apt-get install -y -q
curl wget git vim htop iotop net-tools dnsutils
unzip build-essential ca-certificates gnupg lsb-release
fail2ban ufw chrony logrotate unattended-upgrades apt-listchanges

---------------------------------------------------------------------------

---------------------------------------------------------------------------

6. Configure NTP with chrony

6. 用chrony配置NTP

---------------------------------------------------------------------------

---------------------------------------------------------------------------

systemctl enable --now chrony
systemctl enable --now chrony

Verify synchronisation (should show * on the active source)

验证同步状态(活跃源前应显示*)

chronyc sources -v || true
chronyc sources -v || true

---------------------------------------------------------------------------

---------------------------------------------------------------------------

7. Create sudo user with SSH key access

7. 创建带SSH密钥访问的sudo用户

---------------------------------------------------------------------------

---------------------------------------------------------------------------

if ! id -u "$NEW_USER" &>/dev/null; then adduser --disabled-password --gecos "" "$NEW_USER" fi usermod -aG sudo "$NEW_USER"
if ! id -u "$NEW_USER" &>/dev/null; then adduser --disabled-password --gecos "" "$NEW_USER" fi usermod -aG sudo "$NEW_USER"

Set up authorized_keys

设置authorized_keys

SSH_DIR="/home/${NEW_USER}/.ssh" mkdir -p "$SSH_DIR" echo "$SSH_PUBLIC_KEY" > "${SSH_DIR}/authorized_keys" chmod 700 "$SSH_DIR" chmod 600 "${SSH_DIR}/authorized_keys" chown -R "${NEW_USER}:${NEW_USER}" "$SSH_DIR"
SSH_DIR="/home/${NEW_USER}/.ssh" mkdir -p "$SSH_DIR" echo "$SSH_PUBLIC_KEY" > "${SSH_DIR}/authorized_keys" chmod 700 "$SSH_DIR" chmod 600 "${SSH_DIR}/authorized_keys" chown -R "${NEW_USER}:${NEW_USER}" "$SSH_DIR"

---------------------------------------------------------------------------

---------------------------------------------------------------------------

8. Harden SSH

8. 加固SSH

---------------------------------------------------------------------------

---------------------------------------------------------------------------

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%F) cat > /etc/ssh/sshd_config.d/99-hardening.conf <<'SSHD_CONF' PermitRootLogin no PasswordAuthentication no ChallengeResponseAuthentication no PubkeyAuthentication yes X11Forwarding no AllowTcpForwarding no MaxAuthTries 3 LoginGraceTime 30 ClientAliveInterval 300 ClientAliveCountMax 2 SSHD_CONF systemctl reload sshd
cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%F) cat > /etc/ssh/sshd_config.d/99-hardening.conf <<'SSHD_CONF' PermitRootLogin no PasswordAuthentication no ChallengeResponseAuthentication no PubkeyAuthentication yes X11Forwarding no AllowTcpForwarding no MaxAuthTries 3 LoginGraceTime 30 ClientAliveInterval 300 ClientAliveCountMax 2 SSHD_CONF systemctl reload sshd

---------------------------------------------------------------------------

---------------------------------------------------------------------------

9. Configure unattended security upgrades

9. 配置无人值守安全升级

---------------------------------------------------------------------------

---------------------------------------------------------------------------

cat > /etc/apt/apt.conf.d/20auto-upgrades <<'APT_CONF' APT::Periodic::Update-Package-Lists "1"; APT::Periodic::Unattended-Upgrade "1"; APT::Periodic::AutocleanInterval "7"; APT_CONF'
cat > /etc/apt/apt.conf.d/50unattended-upgrades <<'APT_UU' Unattended-Upgrade::Allowed-Origins { "${distro_id}:${distro_codename}-security"; "${distro_id}ESMApps:${distro_codename}-apps-security"; "${distro_id}ESM:${distro_codename}-infra-security"; }; Unattended-Upgrade::Mail "root"; Unattended-Upgrade::Remove-Unused-Kernel-Packages "true"; Unattended-Upgrade::Automatic-Reboot "false"; APT_UU
cat > /etc/apt/apt.conf.d/20auto-upgrades <<'APT_CONF' APT::Periodic::Update-Package-Lists "1"; APT::Periodic::Unattended-Upgrade "1"; APT::Periodic::AutocleanInterval "7"; APT_CONF'
cat > /etc/apt/apt.conf.d/50unattended-upgrades <<'APT_UU' Unattended-Upgrade::Allowed-Origins { "${distro_id}:${distro_codename}-security"; "${distro_id}ESMApps:${distro_codename}-apps-security"; "${distro_id}ESM:${distro_codename}-infra-security"; }; Unattended-Upgrade::Mail "root"; Unattended-Upgrade::Remove-Unused-Kernel-Packages "true"; Unattended-Upgrade::Automatic-Reboot "false"; APT_UU

---------------------------------------------------------------------------

---------------------------------------------------------------------------

10. Create swap file

10. 创建交换文件

---------------------------------------------------------------------------

---------------------------------------------------------------------------

if [[ "$SWAP_SIZE_GB" -gt 0 ]] && ! swapon --show | grep -q /swapfile; then fallocate -l "${SWAP_SIZE_GB}G" /swapfile chmod 600 /swapfile mkswap /swapfile swapon /swapfile echo '/swapfile none swap sw 0 0' >> /etc/fstab # Reduce swappiness for applications (default is 60) echo 'vm.swappiness=10' > /etc/sysctl.d/60-swap.conf sysctl -p /etc/sysctl.d/60-swap.conf fi
if [[ "$SWAP_SIZE_GB" -gt 0 ]] && ! swapon --show | grep -q /swapfile; then fallocate -l "${SWAP_SIZE_GB}G" /swapfile chmod 600 /swapfile mkswap /swapfile swapon /swapfile echo '/swapfile none swap sw 0 0' >> /etc/fstab # 为应用降低交换倾向(默认值为60) echo 'vm.swappiness=10' > /etc/sysctl.d/60-swap.conf sysctl -p /etc/sysctl.d/60-swap.conf fi

---------------------------------------------------------------------------

---------------------------------------------------------------------------

11. Apply kernel tuning (idempotent — see sysctl.d section below)

11. 应用内核调优(幂等操作 — 见下方sysctl.d章节)

---------------------------------------------------------------------------

---------------------------------------------------------------------------

cp /dev/stdin /etc/sysctl.d/99-production.conf <<'SYSCTL'
cp /dev/stdin /etc/sysctl.d/99-production.conf <<'SYSCTL'

Network performance

网络性能

net.core.somaxconn = 65535 net.core.netdev_max_backlog = 65536 net.ipv4.tcp_max_syn_backlog = 65535 net.ipv4.tcp_fin_timeout = 15 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1
net.core.somaxconn = 65535 net.core.netdev_max_backlog = 65536 net.ipv4.tcp_max_syn_backlog = 65535 net.ipv4.tcp_fin_timeout = 15 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

File system

文件系统

fs.file-max = 2097152 fs.inotify.max_user_watches = 524288
fs.file-max = 2097152 fs.inotify.max_user_watches = 524288

Virtual memory

虚拟内存

vm.swappiness = 10 vm.dirty_ratio = 15 vm.dirty_background_ratio = 5 SYSCTL sysctl --system
echo "" echo "=== Setup complete. Log in as '${NEW_USER}' via SSH key before closing this session. ==="

---
vm.swappiness = 10 vm.dirty_ratio = 15 vm.dirty_background_ratio = 5 SYSCTL sysctl --system
echo "" echo "=== 设置完成。关闭此会话前,请通过SSH密钥以'${NEW_USER}'身份登录。 ==="

---

sysctl.d Production Tuning Reference

sysctl.d生产环境调优参考

The file above (
/etc/sysctl.d/99-production.conf
) covers the most impactful parameters. Apply changes live without rebooting:
bash
sudo sysctl --system       # Reload all files in /etc/sysctl.d/
sudo sysctl -p /etc/sysctl.d/99-production.conf  # Reload specific file
上述文件(
/etc/sysctl.d/99-production.conf
)涵盖了最具影响力的参数。无需重启即可实时应用更改:
bash
sudo sysctl --system       # 重新加载/etc/sysctl.d/下的所有文件
sudo sysctl -p /etc/sysctl.d/99-production.conf  # 重新加载指定文件

Verify a parameter

验证参数

sudo sysctl net.core.somaxconn

---
sudo sysctl net.core.somaxconn

---

/etc/security/limits.conf — High-Concurrency App

/etc/security/limits.conf — 高并发应用配置

undefined
undefined

/etc/security/limits.conf

/etc/security/limits.conf

Changes take effect on next login (not on running processes).

修改将在下次登录时生效(对运行中进程无效)。

Application service user (e.g., node app running as 'deploy')

应用服务用户(例如,以'deploy'身份运行的node应用)

deploy soft nofile 65535 deploy hard nofile 65535 deploy soft nproc 8192 deploy hard nproc 8192
deploy soft nofile 65535 deploy hard nofile 65535 deploy soft nproc 8192 deploy hard nproc 8192

Root (required if app runs as root — avoid this)

Root(仅当应用以root运行时需要 — 避免此操作)

root soft nofile 65535 root hard nofile 65535
root soft nofile 65535 root hard nofile 65535

Wildcard fallback for all other users

所有其他用户的通配符回退

  •    soft   nofile   65535
  •    hard   nofile   65535

For systemd services, `LimitNOFILE` in the unit file takes precedence over `/etc/security/limits.conf`. Set both.

Verify effective limits of a running process:

```bash
  •    soft   nofile   65535
  •    hard   nofile   65535

对于systemd服务,单元文件中的`LimitNOFILE`优先级高于`/etc/security/limits.conf`,需同时设置两者。

验证运行中进程的有效限制:

```bash

PID of your app:

你的应用PID:

cat /proc/$(pgrep -o node)/limits

---
cat /proc/$(pgrep -o node)/limits

---

Package Management

包管理

bash
undefined
bash
undefined

Update package lists

更新软件包列表

sudo apt-get update
sudo apt-get update

Upgrade all packages (non-interactive)

升级所有软件包(非交互式)

sudo DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
sudo DEBIAN_FRONTEND=noninteractive apt-get upgrade -y

Install specific package

安装指定软件包

sudo apt-get install -y nginx
sudo apt-get install -y nginx

Remove package and its config files

删除软件包及其配置文件

sudo apt-get purge -y apache2 && sudo apt-get autoremove -y
sudo apt-get purge -y apache2 && sudo apt-get autoremove -y

Hold a package at its current version (prevent unattended upgrades)

锁定软件包版本(防止无人值守升级)

sudo apt-mark hold nginx sudo apt-mark unhold nginx sudo apt-mark showhold
sudo apt-mark hold nginx sudo apt-mark unhold nginx sudo apt-mark showhold

List installed packages

列出已安装软件包

dpkg -l | grep nginx
dpkg -l | grep nginx

Show available versions

显示可用版本

apt-cache policy nginx
apt-cache policy nginx

Find which package provides a file

查找提供指定文件的软件包

dpkg -S /usr/sbin/nginx apt-file search /usr/sbin/nginx # needs apt-file package

---
dpkg -S /usr/sbin/nginx apt-file search /usr/sbin/nginx # 需要apt-file软件包

---

File Permissions

文件权限

bash
undefined
bash
undefined

Symbolic mode: u=user, g=group, o=others, a=all; r=4, w=2, x=1

符号模式:u=用户, g=组, o=其他, a=所有; r=4, w=2, x=1

chmod 755 /var/www/myapp # rwxr-xr-x — directory traversable by all chmod 644 /var/www/myapp/app.js # rw-r--r-- — file readable by all chmod 600 /etc/app/secret.env # rw------- — private config file chmod -R 750 /opt/myapp # Recursive; all files/dirs get 750
chmod 755 /var/www/myapp # rwxr-xr-x — 所有用户可遍历目录 chmod 644 /var/www/myapp/app.js # rw-r--r-- — 所有用户可读文件 chmod 600 /etc/app/secret.env # rw------- — 私有配置文件 chmod -R 750 /opt/myapp # 递归设置;所有文件/目录权限为750

Ownership

所有权

chown www-data:www-data /var/www/myapp chown -R deploy:deploy /opt/myapp
chown www-data:www-data /var/www/myapp chown -R deploy:deploy /opt/myapp

umask: determines default permissions for new files

umask:决定新文件的默认权限

Default 022 → new files 644, new dirs 755

默认022 → 新文件644,新目录755

For private app dirs, set umask 027 in the service unit's EnvironmentFile

对于私有应用目录,在服务单元的EnvironmentFile中设置umask 027

ACL — grant extra access without changing ownership

ACL — 无需更改所有权即可授予额外权限

sudo apt-get install -y acl sudo setfacl -m u:deploy:rwX /var/log/myapp # User deploy gets rwX sudo setfacl -m g:developers:rX /var/log/myapp # Group developers read only sudo setfacl -d -m u:deploy:rwX /var/log/myapp # Default ACL for new files in dir sudo getfacl /var/log/myapp # View current ACL

---
sudo apt-get install -y acl sudo setfacl -m u:deploy:rwX /var/log/myapp # 用户deploy获得rwX权限 sudo setfacl -m g:developers:rX /var/log/myapp # 组developers仅可读 sudo setfacl -d -m u:deploy:rwX /var/log/myapp # 目录中新文件的默认ACL sudo getfacl /var/log/myapp # 查看当前ACL

---

logrotate — Application Log Config

logrotate — 应用日志配置

Create
/etc/logrotate.d/myapp
:
/var/log/myapp/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 deploy www-data
    sharedscripts
    postrotate
        # Signal app to reopen log file handles
        systemctl kill -s USR1 myapp.service || true
    endscript
}
bash
undefined
创建
/etc/logrotate.d/myapp
/var/log/myapp/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 deploy www-data
    sharedscripts
    postrotate
        # 发送信号让应用重新打开日志文件句柄
        systemctl kill -s USR1 myapp.service || true
    endscript
}
bash
undefined

Test logrotate config without actually rotating

测试logrotate配置,不实际执行轮转

sudo logrotate --debug /etc/logrotate.d/myapp
sudo logrotate --debug /etc/logrotate.d/myapp

Force rotation immediately (useful after config changes)

立即强制轮转(配置更改后有用)

sudo logrotate --force /etc/logrotate.d/myapp

---
sudo logrotate --force /etc/logrotate.d/myapp

---

journald Retention Limits

journald保留限制

undefined
undefined

/etc/systemd/journald.conf.d/size-limits.conf

/etc/systemd/journald.conf.d/size-limits.conf

[Journal] SystemMaxUse=2G # Maximum total disk usage for persistent journals SystemKeepFree=500M # Minimum free space to leave on the volume MaxFileSec=1month # Rotate journal files older than 1 month MaxRetentionSec=3months # Delete journal entries older than 3 months

```bash
sudo systemctl restart systemd-journald
[Journal] SystemMaxUse=2G # 持久化日志的最大磁盘占用 SystemKeepFree=500M # 卷上需保留的最小可用空间 MaxFileSec=1month # 轮转早于1个月的日志文件 MaxRetentionSec=3months # 删除早于3个月的日志条目

```bash
sudo systemctl restart systemd-journald

Show current journal disk usage

显示当前日志磁盘占用

journalctl --disk-usage
journalctl --disk-usage

Manually vacuum old entries

手动清理旧条目

sudo journalctl --vacuum-time=90d sudo journalctl --vacuum-size=1G

---
sudo journalctl --vacuum-time=90d sudo journalctl --vacuum-size=1G

---

User Management

用户管理

bash
undefined
bash
undefined

Create a new system user (no login shell, for running services)

创建新系统用户(无登录shell,用于运行服务)

sudo adduser --system --group --no-create-home --shell /usr/sbin/nologin appuser
sudo adduser --system --group --no-create-home --shell /usr/sbin/nologin appuser

Create a regular user interactively

交互式创建普通用户

sudo adduser alice
sudo adduser alice

Add to a group

添加到组

sudo usermod -aG sudo alice # Add alice to sudo group sudo usermod -aG docker alice # Add to docker group (takes effect on next login)
sudo usermod -aG sudo alice # 将alice添加到sudo组 sudo usermod -aG docker alice # 添加到docker组(下次登录生效)

Show group memberships

显示组成员

groups alice id alice
groups alice id alice

Lock/unlock account (prevents password login)

锁定/解锁账户(阻止密码登录)

sudo passwd -l alice sudo passwd -u alice
sudo passwd -l alice sudo passwd -u alice

Expire password (force change on next login)

过期密码(强制下次登录时修改)

sudo chage -d 0 alice
sudo chage -d 0 alice

Edit sudoers safely (always use visudo)

安全编辑sudoers(始终使用visudo)

sudo visudo
sudo visudo

Grant passwordless sudo for specific command only (add to /etc/sudoers.d/deploy)

授予特定命令的无密码sudo权限(添加到/etc/sudoers.d/deploy)

echo 'deploy ALL=(ALL) NOPASSWD: /bin/systemctl restart myapp'
| sudo tee /etc/sudoers.d/deploy-restart sudo chmod 440 /etc/sudoers.d/deploy-restart

---
echo 'deploy ALL=(ALL) NOPASSWD: /bin/systemctl restart myapp'
| sudo tee /etc/sudoers.d/deploy-restart sudo chmod 440 /etc/sudoers.d/deploy-restart

---

cron Scheduling

cron定时任务调度

bash
undefined
bash
undefined

Edit crontab for current user

编辑当前用户的crontab

crontab -e
crontab -e

Edit crontab for another user

编辑其他用户的crontab

sudo crontab -u deploy -e
sudo crontab -u deploy -e

Cron expression format:

Cron表达式格式:

┌── minute (0-59)

┌── 分钟 (0-59)

│ ┌── hour (0-23)

│ ┌── 小时 (0-23)

│ │ ┌── day of month (1-31)

│ │ ┌── 日 (1-31)

│ │ │ ┌── month (1-12)

│ │ │ ┌── 月 (1-12)

│ │ │ │ ┌── day of week (0=Sun, 7=Sun)

│ │ │ │ ┌── 星期 (0=周日, 7=周日)

│ │ │ │ │

│ │ │ │ │

* * * * * command

* * * * * 命令

0 3 * * * /opt/myapp/scripts/backup.sh >> /var/log/myapp/backup.log 2>&1 */15 * * * * /usr/bin/healthcheck.sh 0 0 1 * * /usr/sbin/certbot renew --quiet
0 3 * * * /opt/myapp/scripts/backup.sh >> /var/log/myapp/backup.log 2>&1 */15 * * * * /usr/bin/healthcheck.sh 0 0 1 * * /usr/sbin/certbot renew --quiet

System-wide cron directories (no crontab syntax — file name is not significant)

系统级cron目录(无需crontab语法 — 文件名无关紧要)

ls /etc/cron.{hourly,daily,weekly,monthly}/
ls /etc/cron.{hourly,daily,weekly,monthly}/

Run a one-off command at a specific time

在特定时间运行一次性命令

echo "/opt/scripts/maintenance.sh" | at 02:00 tomorrow atq # List pending at jobs atrm 3 # Remove job #3

---
echo "/opt/scripts/maintenance.sh" | at 02:00 tomorrow atq # 列出待处理的at任务 atrm 3 # 删除编号为3的任务

---

Disk Management

磁盘管理

bash
undefined
bash
undefined

Disk space overview

磁盘空间概览

df -hT # Human-readable, with filesystem type df -ih # inode usage (relevant when disk shows space but can't write)
df -hT # 人类可读格式,含文件系统类型 df -ih # inode使用情况(磁盘显示有空间但无法写入时相关)

Find large files / directories

查找大文件/目录

du -sh /var/log/* # Size of each log dir du -ah /var/log | sort -rh | head -20
du -sh /var/log/* # 每个日志目录的大小 du -ah /var/log | sort -rh | head -20

Block device layout (partitions, mount points, sizes)

块设备布局(分区、挂载点、大小)

lsblk -f # With filesystem types and UUIDs blkid # UUIDs and filesystem types only
lsblk -f # 含文件系统类型与UUID blkid # 仅显示UUID与文件系统类型

Mount a new volume

挂载新卷

sudo mkfs.ext4 /dev/sdb1 sudo mkdir -p /data sudo mount /dev/sdb1 /data
sudo mkfs.ext4 /dev/sdb1 sudo mkdir -p /data sudo mount /dev/sdb1 /data

Persist mount in /etc/fstab (use UUID, not device name)

在/etc/fstab中持久化挂载(使用UUID,而非设备名)

UUID=$(blkid -s UUID -o value /dev/sdb1) echo "UUID=$UUID /data ext4 defaults,noatime 0 2" | sudo tee -a /etc/fstab sudo mount -a # Test fstab without rebooting
UUID=$(blkid -s UUID -o value /dev/sdb1) echo "UUID=$UUID /data ext4 defaults,noatime 0 2" | sudo tee -a /etc/fstab sudo mount -a # 无需重启测试fstab

Resize ext4 filesystem after volume expansion (e.g., cloud disk resize)

卷扩容后调整ext4文件系统大小(例如云磁盘扩容)

sudo resize2fs /dev/sda1 # Online resize — no unmount needed on ext4 with kernel ≥ 2.6
sudo resize2fs /dev/sda1 # 在线调整 — 内核≥2.6的ext4无需卸载

Check filesystem for errors (unmounted)

检查文件系统错误(需卸载)

sudo fsck -n /dev/sdb1 # Dry run — no changes
sudo fsck -n /dev/sdb1 # 试运行 — 不做更改

Find which process has a deleted file open (common cause of "disk full" despite empty dirs)

查找持有已删除文件的进程(磁盘已满但目录为空的常见原因)

sudo lsof | grep '(deleted)' | awk '{print $7, $1, $2}' | sort -rn | head -20
sudo lsof | grep '(deleted)' | awk '{print $7, $1, $2}' | sort -rn | head -20

Kill or restart that process to release the space

终止或重启该进程以释放空间


---

---

Anti-Patterns

反模式

Anti-patternWhy it's harmfulFix
Running the application as rootRoot process compromise = full system compromiseCreate a dedicated low-privilege service user; use
User=
in systemd unit
No swap configuredOOM killer terminates processes without warning on memory spikesCreate and mount
/swapfile
; set
vm.swappiness=10
No log rotationLogs fill the disk; service crashes with no space leftAdd
/etc/logrotate.d/myapp
with
compress
,
rotate 14
,
daily
Timezone not set (default UTC may be fine but implicit)Log timestamps ambiguous when comparing with client logsRun
timedatectl set-timezone
on every server
Password SSH auth enabledBrute-force attacks succeed with weak passwords
PasswordAuthentication no
in sshd_config; key-only access
PermitRootLogin yes
Direct root login bypasses audit trail
PermitRootLogin no
; use sudo from named user
Not using
unattended-upgrades
Security patches not applied; server stays vulnerableEnable and configure
unattended-upgrades
for security repos
ulimit not raised for app userApp hits default 1024 FD limit → "too many open files" errors under loadSet
LimitNOFILE
in systemd unit AND in
/etc/security/limits.conf
Ignoring
df -ih
(inode usage)
Disk reports free space but writes fail because inodes exhaustedMonitor inode usage:
df -ih
; clean up small file accumulation
Global
chmod -R 777
to "fix permissions"
Every user and process can write to the directory — data destruction riskIdentify which user needs access and use
chown
+
chmod 750
or ACLs

反模式危害修复方案
以root身份运行应用Root进程被攻陷 = 整个系统被攻陷创建专用低权限服务用户;在systemd单元中使用
User=
未配置交换分区内存峰值时OOM杀手无预警终止进程创建并挂载
/swapfile
;设置
vm.swappiness=10
未配置日志轮转日志填满磁盘;服务因空间不足崩溃添加
/etc/logrotate.d/myapp
,配置
compress
,
rotate 14
,
daily
未设置时区(默认UTC可能可行但不明确)日志时间戳与客户端日志对比时模糊不清在每台服务器上运行
timedatectl set-timezone
启用SSH密码认证弱密码易遭暴力破解成功在sshd_config中设置
PasswordAuthentication no
;仅允许密钥登录
PermitRootLogin yes
直接root登录绕过审计追踪设置
PermitRootLogin no
;从命名用户使用sudo
未使用
unattended-upgrades
未应用安全补丁;服务器持续处于易受攻击状态启用并配置安全源的
unattended-upgrades
未提升应用用户的ulimit应用达到默认1024文件描述符限制 → 高负载下出现"打开文件过多"错误在systemd单元和
/etc/security/limits.conf
中设置
LimitNOFILE
忽略
df -ih
(inode使用情况)
磁盘显示有空间但因inode耗尽无法写入监控inode使用情况:
df -ih
;清理大量小文件堆积
全局
chmod -R 777
以"修复权限"
所有用户和进程均可写入目录 — 存在数据销毁风险确定需要访问的用户,使用
chown
+
chmod 750
或ACL

Troubleshooting

故障排查

SymptomLikely causeDiagnostic / Fix
Disk fullLogs, core dumps, or deleted-but-open files
df -h
;
du -ah / --max-depth=3 | sort -rh | head
;
lsof | grep deleted
"Too many open files"Process hit FD limit (
ulimit -n
)
cat /proc/$(pgrep -o myapp)/limits
; raise
LimitNOFILE
in unit file
"cannot set locale"Missing locale package or LANG mismatch
locale -a
;
sudo locale-gen en_US.UTF-8
;
update-locale LANG=en_US.UTF-8
Time drift / NTP not workingchrony not running, or firewall blocking UDP 123
chronyc tracking
;
systemctl status chrony
; check UFW rules for UDP 123
sudo
asks for password for deploy user
User in sudoers but no NOPASSWD ruleAdd
deploy ALL=(ALL) NOPASSWD: ALL
in
/etc/sudoers.d/deploy
(scope it tightly)
Can't SSH after sshd_config changeSyntax error or PermitRootLogin disabled with no other userAlways test:
sshd -t
; keep a second SSH session open before reloading
unattended-upgrades not runningService disabled or config typo
systemctl status unattended-upgrades
;
unattended-upgrade --debug
Swap not being used
vm.swappiness=0
or swap not mounted
free -h
;
swapon --show
;
sysctl vm.swappiness
cron job not runningPermission error, wrong PATH, or no outputCheck
/var/log/syslog
for cron entries; redirect output to log file; use absolute paths
resize2fs "Device or resource busy"Trying to resize a mounted ext2/3 filesystemext4 supports online resize; for others, boot from rescue mode

症状可能原因诊断/修复
磁盘满额日志、核心转储或已删除但仍被打开的文件
df -h
;
du -ah / --max-depth=3 | sort -rh | head
;
lsof | grep deleted
"打开文件过多"进程达到文件描述符限制(
ulimit -n
cat /proc/$(pgrep -o myapp)/limits
;在单元文件中提升
LimitNOFILE
"无法设置区域设置"缺少区域设置包或LANG不匹配
locale -a
;
sudo locale-gen en_US.UTF-8
;
update-locale LANG=en_US.UTF-8
时间偏移 / NTP不工作chrony未运行,或防火墙阻止UDP 123端口
chronyc tracking
;
systemctl status chrony
;检查UFW规则中的UDP 123端口
deploy用户执行
sudo
时要求输入密码
用户在sudoers中但无NOPASSWD规则
/etc/sudoers.d/deploy
中添加
deploy ALL=(ALL) NOPASSWD: ALL
(严格限制范围)
修改sshd_config后无法SSH语法错误或禁用PermitRootLogin但无其他可用用户始终测试:
sshd -t
;重新加载前保持第二个SSH会话打开
unattended-upgrades未运行服务已禁用或配置有误
systemctl status unattended-upgrades
;
unattended-upgrade --debug
未使用交换分区
vm.swappiness=0
或交换分区未挂载
free -h
;
swapon --show
;
sysctl vm.swappiness
cron任务未运行权限错误、PATH错误或无输出检查
/var/log/syslog
中的cron条目;将输出重定向到日志文件;使用绝对路径
resize2fs提示"设备或资源忙"尝试调整已挂载的ext2/3文件系统ext4支持在线调整;其他文件系统需从救援模式启动

Essential Commands Cheat-Sheet

必备命令速查表

bash
undefined
bash
undefined

Process management

进程管理

ps aux --sort=-%mem | head -20 # Top memory consumers top -b -n1 | head -20 # Snapshot of all processes kill -9 <pid> # Force-kill pkill -u deploy # Kill all processes of a user
ps aux --sort=-%mem | head -20 # 内存占用最高的前20个进程 top -b -n1 | head -20 # 所有进程快照 kill -9 <pid> # 强制终止 pkill -u deploy # 终止用户deploy的所有进程

Network

网络

ss -tlnp # TCP listening sockets with PID ss -s # Socket summary netstat -tulpn # Alternative (older) ip addr # Interface addresses ip route # Routing table
ss -tlnp # 带PID的TCP监听套接字 ss -s # 套接字摘要 netstat -tulpn # 替代命令(较旧) ip addr # 接口地址 ip route # 路由表

Memory

内存

free -h # RAM and swap usage vmstat -s # Detailed memory stats cat /proc/meminfo
free -h # 内存与交换分区使用情况 vmstat -s # 详细内存统计 cat /proc/meminfo

Load average

负载平均值

uptime w sar -u 1 5 # CPU utilisation every 1s for 5 samples (sysstat package)
uptime w sar -u 1 5 # 每1秒采样一次CPU使用率,共5次(需sysstat软件包)

Who is logged in

当前登录用户

who last | head -20 # Login history
who last | head -20 # 登录历史

Find recently modified files

查找最近修改的文件

find /etc -newer /etc/passwd -type f 2>/dev/null | head -20
find /etc -newer /etc/passwd -type f 2>/dev/null | head -20

Check open ports

检查开放端口

ss -tlnp | awk 'NR>1 {print $4, $6}'
ss -tlnp | awk 'NR>1 {print $4, $6}'

Check failed login attempts

检查失败的登录尝试

journalctl -u ssh --since "1 hour ago" | grep Failed grep 'Failed password' /var/log/auth.log | tail -20
undefined
journalctl -u ssh --since "1 hour ago" | grep Failed grep 'Failed password' /var/log/auth.log | tail -20
undefined