NUMA balancing impact on common benchmarks
NUMA balancing can lead to performance degradation on                    NUMA-based arm64 systems when tasks migrate,
                    and their memory accesses now suffer additional latency.
  
    
      | System | Information | 
  
  
    
      | Architecture | aarch64 | 
    
      | Processor version | Kunpeng 920-6426 | 
    
      | CPUs | 128 | 
    
      | NUMA nodes | 4 | 
    
      | Kernel release | 5.6.0+ | 
    
      | Node name | ARMv2-3 | 
  
Test results
PerfBenchSchedPipe
perf bench -f simple sched pipe  
  
    
      | Test | Result | 
  
  
    
      | numa_balancing-ON | 10.012 (usecs/op) | 
    
      | numa_balancing-OFF | 10.509 (usecs/op) | 
  
PerfBenchSchedMessaging
perf bench -f simple sched messaging -l 10000  
  
    
      | Test | Result | 
  
  
    
      | numa_balancing-ON | 6.417 (Sec) | 
    
      | numa_balancing-OFF | 6.494 (Sec) | 
  
PerfBenchMemMemset
perf bench -f simple  mem memset -s 4GB -l 5 -f default  
  
    
      | Test | Result | 
  
  
    
      | numa_balancing-ON | 17.438783330964565 (GB/sec) | 
    
      | numa_balancing-OFF | 17.63163114627642 (GB/sec) | 
  
PerfBenchFutexWake
perf bench -f simple futex wake -s -t 1024 -w 1  
  
    
      | Test | Result | 
  
  
    
      | numa_balancing-ON | 9.2742  (ms) | 
    
      | numa_balancing-OFF | 9.2178  (ms) | 
  
SysBenchCpu
sysbench cpu --time=10 --threads=64 --cpu-max-prime=10000 run  
  
    
      | Test | Result | 
  
  
    
      | numa_balancing-ON | 214960.28 (Events/sec) | 
    
      | numa_balancing-OFF | 214965.55 (Events/sec) | 
  
SysBenchMemory
sysbench memory --memory-access-mode=rnd --threads=64 run  
  
    
      | Test | Result | 
  
  
    
      | numa_balancing-ON | 1645 (MB/s) | 
    
      | numa_balancing-OFF | 1959 (MB/s) | 
  
SysBenchThreads
sysbench threads --threads=64 run  
  
    
      | Test | Result | 
  
  
    
      | numa_balancing-ON | 4604 (Events/sec) | 
    
      | numa_balancing-OFF | 5390 (Events/sec) | 
  
SysBenchMutex
sysbench mutex --mutex-num=1 --threads=512 run  
  
    
      | Test | Result | 
  
  
    
      | numa_balancing-ON | 33.2165 (Sec) | 
    
      | numa_balancing-OFF | 32.1088 (Sec) |