Benchmarking llama.cpp on legacy hardware

Inspired by the benchmarking results on Apple Silicon published here, I have used the llama-bench tool to produce comparable results on some older and less powerful devices.

This can perhaps help to give an idea of the rate of progress in LLM performance over a longer time period (even going back to a time before anyone thought of running DNNs of the size that we take for granted today on consumer hardware). These numbers are not so much intended as a scientifically accurate study, but more as a few quick orders-of-magnitude estimates across different platforms.

Typical command line:

llama-bench.exe -m models\llama2\llama-2-7b.Q4_0.gguf -p 512 -n 128 -t 4

Dell Latitude E6420

  • Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz
  • 8 GB RAM
  • Release date: 2011

Build 80f19b4

modelsizeparamsbackendthreadstestt/s
llama 7B Q4_03.56 GiB6.74 BCPU4pp5121.98 ± 0.02
llama 7B Q4_03.56 GiB6.74 BCPU4tg1281.90 ± 0.10

Samsung S22 Ultra

  • Release date: 2022

Build 80f19b4

modelsizeparamsbackendthreadstestt/s
llama 7B Q3_K - Small2.75 GiB6.74 BCPU8pp5122.53 ± 0.04
llama 7B Q3_K - Small2.75 GiB6.74 BCPU8tg1281.73 ± 0.53