NUMA balancing impact on common benchmarks
NUMA balancing can lead to performance degradation on NUMA-based arm64 systems when tasks migrate,
and their memory accesses now suffer additional latency.
System |
Information |
Architecture |
aarch64 |
Processor version |
Kunpeng 920-6426 |
CPUs |
128 |
NUMA nodes |
4 |
Kernel release |
5.6.0+ |
Node name |
ARMv2-3 |
Test results
PerfBenchSchedPipe
perf bench -f simple sched pipe
Test |
Result |
numa_balancing-ON |
10.012 (usecs/op) |
numa_balancing-OFF |
10.509 (usecs/op) |
PerfBenchSchedMessaging
perf bench -f simple sched messaging -l 10000
Test |
Result |
numa_balancing-ON |
6.417 (Sec) |
numa_balancing-OFF |
6.494 (Sec) |
PerfBenchMemMemset
perf bench -f simple mem memset -s 4GB -l 5 -f default
Test |
Result |
numa_balancing-ON |
17.438783330964565 (GB/sec) |
numa_balancing-OFF |
17.63163114627642 (GB/sec) |
PerfBenchFutexWake
perf bench -f simple futex wake -s -t 1024 -w 1
Test |
Result |
numa_balancing-ON |
9.2742 (ms) |
numa_balancing-OFF |
9.2178 (ms) |
SysBenchCpu
sysbench cpu --time=10 --threads=64 --cpu-max-prime=10000 run
Test |
Result |
numa_balancing-ON |
214960.28 (Events/sec) |
numa_balancing-OFF |
214965.55 (Events/sec) |
SysBenchMemory
sysbench memory --memory-access-mode=rnd --threads=64 run
Test |
Result |
numa_balancing-ON |
1645 (MB/s) |
numa_balancing-OFF |
1959 (MB/s) |
SysBenchThreads
sysbench threads --threads=64 run
Test |
Result |
numa_balancing-ON |
4604 (Events/sec) |
numa_balancing-OFF |
5390 (Events/sec) |
SysBenchMutex
sysbench mutex --mutex-num=1 --threads=512 run
Test |
Result |
numa_balancing-ON |
33.2165 (Sec) |
numa_balancing-OFF |
32.1088 (Sec) |