ETCD 性能测试指南

发表于 2023-05-03 分类于 etcd 阅读次数：

概述

Kubernetes 使用 etcd 作为集群后端的数据存储。其中 etcd 是的分布式键值存储系统，它提供了一种强一致性的 Raft 算法来存储集群需要访问的数据。etcd 集群的性能很大程度上取决于其存储的性能，快速的磁盘是 etcd 性能和稳定性的最关键因素，磁盘速度慢会增加 etcd 请求延迟，最终会导致整个 k8s 集群受到影响。对于生产环境中，我们强烈建议 etcd 使用高性能的 SSD 磁盘（＞500 的顺序 IOPS）为集群提供后端存储支持。

要判断磁盘速度是否足以满足 etcd 的要求，我们可以使用 fio 来对磁盘进行性能测试。

先决条件

安装 fio （≥ 3.5）工具。

性能测试

使用 fio 命令进行测试。

1	fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=100m --bs=2300 --name=mytest

这是我在本地服务器测试的结果，使用的磁盘是 NVME 的 SSD。

mytest: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=2054KiB/s][w=914 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=120887: Mon Jun 23 23:49:44 2025
  write: IOPS=909, BW=2043KiB/s (2092kB/s)(100.0MiB/50122msec); 0 zone resets
    clat (usec): min=2, max=8967, avg=354.58, stdev=336.18
     lat (usec): min=3, max=8967, avg=354.86, stdev=336.19
    clat percentiles (usec):
     |  1.00th=[    5],  5.00th=[    5], 10.00th=[    6], 20.00th=[    6],
     | 30.00th=[    9], 40.00th=[   13], 50.00th=[  506], 60.00th=[  553],
     | 70.00th=[  594], 80.00th=[  635], 90.00th=[  701], 95.00th=[  758],
     | 99.00th=[  971], 99.50th=[ 1139], 99.90th=[ 1991], 99.95th=[ 2343],
     | 99.99th=[ 7373]
   bw (  KiB/s): min= 1118, max= 2609, per=100.00%, avg=2043.62, stdev=279.82, samples=100
   iops        : min=  498, max= 1162, avg=910.08, stdev=124.56, samples=100
  lat (usec)   : 4=0.64%, 10=36.38%, 20=5.44%, 50=1.22%, 100=0.06%
  lat (usec)   : 250=0.07%, 500=5.55%, 750=45.01%, 1000=4.75%
  lat (msec)   : 2=0.78%, 4=0.08%, 10=0.02%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=285, max=74171, avg=740.77, stdev=3375.04
    sync percentiles (usec):
     |  1.00th=[  388],  5.00th=[  424], 10.00th=[  449], 20.00th=[  482],
     | 30.00th=[  506], 40.00th=[  529], 50.00th=[  553], 60.00th=[  578],
     | 70.00th=[  611], 80.00th=[  644], 90.00th=[  717], 95.00th=[  799],
     | 99.00th=[ 1205], 99.50th=[ 1893], 99.90th=[71828], 99.95th=[71828],
     | 99.99th=[71828]
  cpu          : usr=0.98%, sys=5.16%, ctx=160060, majf=1, minf=16
  IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,45590,0,0 short=45590,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=2043KiB/s (2092kB/s), 2043KiB/s-2043KiB/s (2092kB/s-2092kB/s), io=100.0MiB (105MB), run=50122-50122msec

Disk stats (read/write):
  vda: ios=25460/92028, merge=30/922, ticks=15353/35986, in_queue=54818, util=98.95%

测试的结果要关注 fsync 的第 99 个百分位数部分。这里我的测试结果是 1205 usec，也就是 1.205 ms，etcd 官方建议，为了使存储速度足够快，写入 WAL 文件时 fdatasync 调用的第 99 个百分位数必须小于 10 毫秒。如果第 99 个百分位接近甚至大于 10 毫秒，则当前的存储无法满足 etcd 所需的读写性能要求。

其他建议

设置 etcd 磁盘优先级

1	ionice -c2 -n0 -p `pgrep -x etcd`

etcd 参数调优

## rke
services:
  etcd:
    extra_args:
      # 修改空间配额为$((6*1024*1024*1024))，默认2G,最大8G
      quota-backend-bytes: '6442450944'
      auto-compaction-mode: periodic
      auto-compaction-retention: 60m
      snapshot-count: 100000 # 默认 100000 条事务才进行一次存盘，存盘之前数据都保存在内存中。如果节点内存禁止，可以适当缩减。但是缩减之后会增加数据落盘的频率，会增加存盘的负载。default: "100000"
      max-request-bytes: 104857600 # 默认 1.5m，https://etcd.io/docs/v3.3/dev-guide/limit/#request-size-limit
      log-level: info # supports debug, info, warn, error, panic, or fatal.

参考链接：

https://etcd.io/docs/v3.4/op-guide/hardware/#disks

https://github.com/etcd-io/etcd/blob/release-3.4/Documentation/faq.md

https://www.suse.com/support/kb/doc/?id=000020100