NUMA balancing impact on common benchmarks
NUMA balancing can lead to performance degradation on NUMA-based arm64 systems when tasks migrate,
and their memory accesses now suffer additional latency.
| System |
Information |
| Architecture |
aarch64 |
| Processor version |
Kunpeng 920-6426 |
| CPUs |
128 |
| NUMA nodes |
4 |
| Kernel release |
5.6.0+ |
| Node name |
ARMv2-3 |
Test results
PerfBenchSchedPipe
perf bench -f simple sched pipe
| Test |
Result |
| numa_balancing-ON |
10.012 (usecs/op) |
| numa_balancing-OFF |
10.509 (usecs/op) |
PerfBenchSchedMessaging
perf bench -f simple sched messaging -l 10000
| Test |
Result |
| numa_balancing-ON |
6.417 (Sec) |
| numa_balancing-OFF |
6.494 (Sec) |
PerfBenchMemMemset
perf bench -f simple mem memset -s 4GB -l 5 -f default
| Test |
Result |
| numa_balancing-ON |
17.438783330964565 (GB/sec) |
| numa_balancing-OFF |
17.63163114627642 (GB/sec) |
PerfBenchFutexWake
perf bench -f simple futex wake -s -t 1024 -w 1
| Test |
Result |
| numa_balancing-ON |
9.2742 (ms) |
| numa_balancing-OFF |
9.2178 (ms) |
SysBenchCpu
sysbench cpu --time=10 --threads=64 --cpu-max-prime=10000 run
| Test |
Result |
| numa_balancing-ON |
214960.28 (Events/sec) |
| numa_balancing-OFF |
214965.55 (Events/sec) |
SysBenchMemory
sysbench memory --memory-access-mode=rnd --threads=64 run
| Test |
Result |
| numa_balancing-ON |
1645 (MB/s) |
| numa_balancing-OFF |
1959 (MB/s) |
SysBenchThreads
sysbench threads --threads=64 run
| Test |
Result |
| numa_balancing-ON |
4604 (Events/sec) |
| numa_balancing-OFF |
5390 (Events/sec) |
SysBenchMutex
sysbench mutex --mutex-num=1 --threads=512 run
| Test |
Result |
| numa_balancing-ON |
33.2165 (Sec) |
| numa_balancing-OFF |
32.1088 (Sec) |