Benchmarking llama.cpp on legacy hardware
Inspired by the benchmarking results on Apple Silicon published here, I have used the llama-bench tool to produce comparable results on some older and less powerful devices.
This can perhaps help to give an idea of the rate of progress in LLM performance over a longer time period (even going back to a time before anyone thought of running DNNs of the size that we take for granted today on consumer hardware). These numbers are not so much intended as a scientifically accurate study, but more as a few quick orders-of-magnitude estimates across different platforms.
Typical command line:
llama-bench.exe -m models\llama2\llama-2-7b.Q4_0.gguf -p 512 -n 128 -t 4
Dell Latitude E6420
- Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz
- 8 GB RAM
- Release date: 2011
Build 80f19b4
| model | size | params | backend | threads | test | t/s |
|---|---|---|---|---|---|---|
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | pp512 | 1.98 ± 0.02 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | tg128 | 1.90 ± 0.10 |
Samsung S22 Ultra
- Release date: 2022
Build 80f19b4
| model | size | params | backend | threads | test | t/s |
|---|---|---|---|---|---|---|
| llama 7B Q3_K - Small | 2.75 GiB | 6.74 B | CPU | 8 | pp512 | 2.53 ± 0.04 |
| llama 7B Q3_K - Small | 2.75 GiB | 6.74 B | CPU | 8 | tg128 | 1.73 ± 0.53 |