llama.cpp ollama 顯卡性能天梯:CUDA、ROCm、Vulkan

基於 GitHub Discussions 中的 scoreboard 頁面,整理 llama.cpp 在 CUDA、ROCm、Vulkan 下的完整 GPU 跑分表,並解釋 pp512、tg128、Q4_0、FA 等指標該怎麼看。

先看懂這些參數

Q4_0 是什麼

Q4_0 是一種 4-bit 量化格式。它的意義不是「模型更強」,而是「模型更小、更省顯存、更容易塞進更多設備裡」。這些榜單大多統一使用 Llama 2 7B, Q4_0,核心目的就是減少變數,讓不同 GPU 的成績更容易橫向比較。

pp512 是什麼

pp512 一般可以理解為 prompt processing 512 tokens,也就是處理 512 個輸入 token 時的吞吐。

  • pp = prompt processing
  • 512 = 輸入長度是 512 token
  • t/s = tokens per second

它更像是「吃提示詞的速度」,通常能更充分地並行,所以數字往往很高。

tg128 是什麼

tg128 一般可以理解為 text generation 128 tokens,也就是連續生成 128 個 token 時的速度。

  • tg = text generation
  • 128 = 連續生成 128 token
  • t/s = tokens per second

它更接近我們平時感受到的「模型回答快不快」。因為生成階段是逐 token 遞推,所以通常明顯低於 pp512

FA 是什麼

FAFlash Attention。簡單理解,就是注意力計算的一種最佳化開關。

  • with FA 表示啟用了 Flash Attention
  • no FA 表示關閉 Flash Attention

在不少卡上,FApp512 的提升比對 tg128 更明顯;但不同後端、不同驅動和不同架構之間,提升幅度並不一致,個別設備甚至會出現 PP 上升、TG 變化很小,或者 PP 反而下降的情況。

t/s 怎麼看

t/s 就是 tokens per second。它不是幀率,也不是 FLOPS,而是模型吞吐表現的直接結果。

讀榜單時最重要的一點是:先確認你比較的是不是同一種測試。

  • 不要把 pp512tg128 直接混著比
  • 不要把 no FAwith FA 混著比
  • 不要把 CUDA、ROCm、Vulkan 的結果當成完全等價的同一條曲線

先說結論

從這幾條討論串目前可見的資料看,大致可以先記住這幾個結論:

  • CUDA 仍然是目前 llama.cpp GPU 跑分裡最強、樣本也最密集的一條線,特別是高階 Nvidia 卡在 pp512 上優勢很大。
  • ROCm 在高階 AMD 卡和 Instinct 卡上已經能給出非常像樣的成績,MI300X7900 XTXW7900 這些條目都不弱。
  • Vulkan 的優點不是「絕對最快」,而是覆蓋面最廣,Nvidia、AMD、Intel、Apple Asahi / MoltenVK,甚至很多老卡和核顯都能找到條目。
  • tg128 往往更接近日常體感,pp512 更適合看吞吐能力。很多「榜一」卡,在兩項裡的領先幅度並不完全一樣。

CUDA 完整榜單

Llama 2 7B, Q4_0, no FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
RTX 5090 32 GB / GDDR7 / 512 bit 14073.41 ± 115.16 290.02 ± 1.10 8cf6b42 @totaldev
RTX PRO 6000 Blackwell 96 GB / GDDR7 / 512 bit 14854.63 ± 22.73 274.20 ± 0.14 79c1160 @Tom94
H100 80 GB 80 GB / HBM3 / 5120 bit 9918.34 ± 176.97 267.81 ± 1.54 5143fa8 @Hedede
A100 80 GB 80 GB / HBM2e / 5120 bit 4849.53 ± 8.94 190.88 ± 0.33 5143fa8 @Hedede
RTX 4090 D 24 GB / GDDR6X / 384 bit 10293.86 ± 134.72 189.33 ± 0.19 79c1160 @autonomous-AI-lab
RTX 4090 24 GB / GDDR6X / 384 bit 11992.70 ± 107.99 186.21 ± 0.13 2241453 @lhl
RTX 5080 16 GB / GDDR7 / 256 bit 8297.36 ± 9.50 181.99 ± 0.42 8a4280c @Hedede
RTX 5070 Ti 16 GB / GDDR7 / 256 bit 6952.38 ± 13.73 176.85 ± 0.07 933414c @TinyServal
RTX 6000 Ada 48 GB / GDDR6 / 384 bit 9229.23 ± 101.78 176.07 ± 0.26 b8e09f0 @Hedede
RTX 3090 Ti 24 GB / GDDR6X / 384 bit 6567.49 ± 20.30 171.19 ± 3.98 9c35706 @slaren
RTX 3090 24 GB / GDDR6X / 384 bit 5174.69 ± 21.83 158.16 ± 0.21 c76b420 @m18coppola
L40 48 GB / GDDR6 / 384 bit 8870.49 ± 378.76 152.01 ± 0.28 ee09828 @Hedede
RTX 4080 SUPER 16 GB / GDDR6X / 256 bit 8125.15 ± 41.05 148.33 ± 0.20 81086cd @zacharyarnaise
RTX 4080 16 GB / GDDR6X / 256 bit 8031.64 ± 26.49 142.49 ± 0.16 20638e4 @Ristovski
RTX 3080 10 GB / GDDR6X / 320 bit 5013.86 ± 24.80 139.65 ± 0.99 9c35706 @slaren
RTX A6000 48 GB / GDDR6 / 384 bit 4913.93 ± 6.79 138.73 ± 2.75 4795c91 @Hedede
RTX 4070 Ti SUPER 16 GB / GDDR6X / 256 bit 6924.53 ± 13.87 132.26 ± 0.16 9c35706 @Ristovski
RTX PRO 4000 Blackwell 24 GB / GDDR7 / 192 bit 4992.83 ± 113.52 131.66 ± 0.20 7d77f07 @Hedede
RTX A5000 24 GB / GDDR6 / 384 bit 4028.16 ± 19.14 130.07 ± 2.74 e5155e6 @Hedede
Tesla V100 32 GB / HBM2 / 4096 bit 3042.64 ± 40.71 129.08 ± 0.05 51f5a45 @Hedede
RTX 5070 12 GB / GDDR7 / 192 bit 5184.75 ± 18.70 127.54 ± 0.46 @Spyro000 -
A40 48 GB / GDDR6 / 384 bit 4609.01 ± 10.67 124.11 ± 0.17 3470a5c @Hedede
A30 24 GB / HBM2e / 3072 bit 2767.10 ± 1.88 124.81 ± 0.16 583cb83 @Hedede
Titan V 12 GB / HBM2 / 3072 bit 2617.46 ± 2.10 108.79 ± 0.05 e56abd2 @Hedede
RTX 2080 Ti 11 GB / GDDR6 / 352 bit 2890.66 ± 2.42 107.51 ± 0.21 9c35706 @ariya
Quadro RTX 6000 24 GB / GDDR6 / 384 bit 2751.18 ± 19.43 102.77 ± 0.04 b8e09f0 @Hedede
Quadro RTX 8000 48 GB / GDDR6 / 384 bit 2709.95 ± 3.35 102.68 ± 0.03 b8e09f0 @Hedede
RTX A4500 20 GB / GDDR6 / 320 bit 2827.20 ± 66.43 97.32 ± 2.80 5cdb27e @aleksyx
RTX 5060 Ti 16 GB 16 GB / GDDR7 / 128 bit 3737.25 ± 6.79 90.94 ± 0.02 89d1029 @mike-llamacpp
RTX 2070 SUPER 8 GB / GDDR6 / 256 bit 2088.34 ± 1.94 88.06 ± 0.28 bc07349 @phstudy
RTX A4000 16 GB / GDDR6 / 256 bit 2684.06 ± 15.28 83.77 ± 0.37 65349f2 @TinyServal
Titan Xp 12 GB / GDDR5X / 384 bit 1154.96 ± 1.46 76.08 ± 0.08 c4510dc @Hedede
RTX 3060 12 GB / GDDR6 / 192 bit 2137.50 ± 10.12 75.57 ± 0.07 baa9255 @QuantiusBenignus
Quadro RTX 4000 8 GB / GDDR6 / 256 bit 1536.89 ± 0.90 65.62 ± 0.62 7d77f07 @Hedede
RTX 4060 Ti 8 GB 8 GB / GDDR6 / 128 bit 3394.63 ± 7.44 63.86 ± 0.01 89d1029 @mike-llamacpp
GTX 1080 Ti 11 GB / GDDR5X / 352 bit 1084.41 ± 3.01 62.49 ± 0.06 9c35706 @ariya
RTX A4000 Ada 20 GB / GDDR6 / 160 bit 2779.77 ± 9.91 61.83 ± 0.04 a74a0d6 @sdwolfz
RTX 2060 SUPER 8 GB / GDDR6 / 256 bit 1420.24 ± 1.95 60.04 ± 0.01 5c0eb5e @ggerganov
Tesla P100 16 GB / HBM2 / 4096 bit 760.80 ± 2.92 58.35 ± 0.00 b8372ee @Hedede
DGX Spark 128 GB / LPDDR5x 3062.31 ± 11.02 57.21 ± 0.06 5acd455 @ggerganov
Tesla P40 24 GB / GDDR5 / 384 bit 1007.42 ± 1.23 54.74 ± 0.07 c76b420 @m18coppola
RTX 2000 Ada 16 GB / GDDR6 / 128 bit 1956.22 ± 7.74 50.62 ± 0.04 756cfea @DigitalRudeness
Tesla T4 16 GB / GDDR6 / 256 bit 1219.06 ± 4.18 46.38 ± 0.73 d32e03f @pt13762104
RTX 4050 Laptop 6 GB / GDDR6 / 96 bit 1725.85 + 17.85 43.72 + 0.41 d79d8f3 @TimCabbage
GTX 1660 6 GB / GDDR5 / 192 bit 148.91 ± 0.01 41.35 ± 0.02 9515c61 @ariya
Tesla M40 24 GB / GDDR5 / 384 bit 282.65 ± 0.15 38.04 ± 0.02 97d5117 @Hedede
GTX 1070 Ti 8 GB / GDDR5 / 256 bit 714.44 ± 2.04 37.82 ± 0.02 79c1160 @pebaryan
Jetson AGX Orin 64 GB / LPDDR5 / 256 bit 991.31 ± 1.15 33.58 ± 0.14 c1b1876 @TinyServal
Tesla P4 8 GB / GDDR5 / 256 bit 514.53 ± 3.06 33.29 ± 0.00 c76b420 @m18coppola
P106-100 6 GB / GDDR5 / 192 bit 406.94 ± 0.25 30.40 ± 0.02 5fd160b @pebaryan
GTX 1060 6 GB / GDDR5 / 192 bit 416.85 ± 1.75 27.79 ± 0.02 5fd160b @pebaryan
Quadro T1000 4 GB / GDDR5 / 128 bit 79.44 ± 0.01 27.82 ± 0.18 f6da8cb @hanabu
Quadro P2000 5 GB / GDDR5 / 160 bit 309.30 ± 0.05 23.63 ± 0.00 baa9255 @TinyServal
Quadro P1000 4 GB / GDDR5 / 128 bit 183.40 ± 0.11 13.99 ± 0.13 1e74897 @aleksyx
Tesla K80 12 GB / GDDR5 / 384 bit 133.14 ± 0.55 13.80 ± 0.02 32732f2 @pebaryan

Llama 2 7B, Q4_0, with FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
RTX 5090 32 GB / GDDR7 / 512 bit 14970.15 ± 381.06 300.40 ± 0.28 8cf6b42 @totaldev
RTX PRO 6000 Blackwell 96 GB / GDDR7 / 512 bit 16618.98 ± 20.66 281.11 ± 0.41 5143fa8 @Tom94
H100 80 GB 80 GB / HBM3 / 5120 bit 11263.29 ± 98.34 280.74 ± 1.17 5143fa8 @Hedede
A100 80 GB 80 GB / HBM2e / 5120 bit 5285.96 ± 6.58 200.90 ± 0.12 5143fa8 @Hedede
RTX 4090 D 24 GB / GDDR6X / 384 bit 12506.97 ± 11.51 191.57 ± 0.03 79c1160 @autonomous-AI-lab
RTX 4090 24 GB / GDDR6X / 384 bit 14770.63 ± 102.93 188.96 ± 0.05 2241453 @lhl
RTX 5080 16 GB / GDDR7 / 256 bit 9487.70 ± 21.89 184.68 ± 0.05 8a4280c @Hedede
RTX 5070 Ti 16 GB / GDDR7 / 256 bit 8419.56 ± 35.50 182.43 ± 0.09 933414c @TinyServal
RTX 6000 Ada 48 GB / GDDR6 / 384 bit 10576.85 ± 530.21 179.47 ± 0.32 b8e09f0 @Hedede
RTX 3090 Ti 24 GB / GDDR6X / 384 bit 6924.01 ± 10.76 172.26 ± 1.31 9c35706 @slaren
RTX PRO 4500 Blackwell 32 GB / GDDR7 / 256 bit 7251.66 ± 92.40 168.90 ± 0.20 becc481 @Hedede
RTX 3090 24 GB / GDDR6X / 384 bit 5560.06 ± 16.28 161.89 ± 0.18 c76b420 @m18coppola
L40 48 GB / GDDR6 / 384 bit 10097.64 ± 671.22 153.76 ± 0.12 ee09828 @Hedede
RTX 4080 SUPER 16 GB / GDDR6X / 256 bit 9439.01 ± 56.75 147.48 ± 1.41 81086cd @zacharyarnaise
RTX 4080 16 GB / GDDR6X / 256 bit 9205.93 ± 22.31 143.47 ± 0.02 20638e4 @Ristovski
RTX A6000 48 GB / GDDR6 / 384 bit 5662.39 ± 13.87 144.87 ± 0.18 4795c91 @Hedede
RTX 3080 10 GB / GDDR6X / 320 bit 5569.56 ± 14.04 139.95 ± 0.95 9c35706 @slaren
RTX PRO 4000 Blackwell 24 GB / GDDR7 / 192 bit 5674.44 ± 139.53 136.38 ± 0.13 7d77f07 @Hedede
RTX A5000 24 GB / GDDR6 / 384 bit 4552.15 ± 9.68 135.83 ± 0.11 e5155e6 @Hedede
Tesla V100 32 GB / HBM2 / 4096 bit 2973.78 ± 3.62 134.76 ± 0.02 51f5a45 @Hedede
RTX 4070 Ti SUPER 16 GB / GDDR6X / 256 bit 7612.32 ± 37.35 132.85 ± 0.31 9c35706 @Ristovski
A30 24 GB / HBM2e / 3072 bit 3068.72 ± 0.63 131.93 ± 0.18 583cb83 @Hedede
RTX 5070 12 GB / GDDR7 / 192 bit 5783.44 ± 36.95 128.21 ± 2.52 @Spyro000 -
A40 48 GB / GDDR6 / 384 bit 5256.38 ± 19.39 126.24 ± 0.06 3470a5c @Hedede
Titan V 12 GB / HBM2 / 3072 bit 2481.25 ± 1.31 112.17 ± 0.01 e56abd2 @Hedede
RTX 2080 Ti 11 GB / GDDR6 / 352 bit 3107.61 ± 4.34 109.17 ± 0.07 9c35706 @ariya
Quadro RTX 6000 24 GB / GDDR6 / 384 bit 3053.96 ± 1.37 104.38 ± 0.04 b8e09f0 @Hedede
Quadro RTX 8000 48 GB / GDDR6 / 384 bit 3052.35 ± 5.64 103.63 ± 0.02 b8e09f0 @Hedede
RTX A4500 20 GB / GDDR6 / 320 bit 3453.10 ± 49.19 103.00 ± 0.25 5cdb27e @aleksyx
RTX 5060 Ti 16 GB 16 GB / GDDR7 / 128 bit 4195.53 ± 1.98 93.46 ± 0.01 89d1029 @mike-llamacpp
RTX 2070 SUPER 8 GB / GDDR6 / 256 bit 2293.29 ± 5.91 87.71 ± 0.29 bc07349 @phstudy
RTX A4000 16 GB / GDDR6 / 256 bit 2807.83 ± 52.44 85.17 ± 0.66 65349f2 @TinyServal
RTX 3060 12 GB / GDDR6 / 192 bit 2407.67 ± 3.73 76.92 ± 0.03 baa9255 @QuantiusBenignus
Titan Xp 12 GB / GDDR5X / 384 bit 1218.12 ± 1.82 73.84 ± 0.04 c4510dc @Hedede
Quadro RTX 4000 8 GB / GDDR6 / 256 bit 1662.80 ± 2.04 67.62 ± 0.67 7d77f07 @Hedede
RTX 4060 Ti 8 GB 8 GB / GDDR6 / 128 bit 3803.45 ± 70.80 64.03 ± 0.53 89d1029 @mike-llamacpp
Tesla P100 16 GB / HBM2 / 4096 bit 787.36 ± 3.27 61.99 ± 0.00 b8372ee @Hedede
GTX 1080 Ti 11 GB / GDDR5X / 352 bit 1138.14 ± 2.02 61.38 ± 0.03 9c35706 @ariya
RTX A4000 Ada 20 GB / GDDR6 / 160 bit 3171.86 ± 4.34 61.37 ± 0.01 a74a0d6 @sdwolfz
RTX 2060 SUPER 8 GB / GDDR6 / 256 bit 1563.77 ± 0.51 61.13 ± 0.05 5c0eb5e @ggerganov
DGX Spark 128 GB / LPDDR5x 3661.37 ± 38.66 56.74 ± 0.03 5acd455 @ggerganov
Tesla P40 24 GB / GDDR5 / 384 bit 1079.66 ± 0.18 53.73 ± 0.05 c76b420 @m18coppola
RTX 2000 Ada 16 GB / GDDR6 / 128 bit 2250.14 ± 5.91 50.71 ± 0.01 756cfea @DigitalRudeness
Tesla T4 16 GB / GDDR6 / 256 bit 1309.73 ± 1.02 44.03 ± 0.57 d32e03f @pt13762104
GTX 1660 6 GB / GDDR5 / 192 bit 154.45 ± 0.52 41.43 ± 0.01 9515c61 @ariya
Tesla M40 24 GB / GDDR5 / 384 bit 290.17 ± 0.11 39.98 ± 0.01 97d5117 @Hedede
GTX 1070 Ti 8 GB / GDDR5 / 256 bit 790.52 ± 2.39 37.87 ± 0.00 79c1160 @pebaryan
Jetson AGX Orin 64 GB / LPDDR5 / 256 bit 1171.96 ± 4.70 35.88 ± 0.18 c1b1876 @TinyServal
Tesla P4 8 GB / GDDR5 / 256 bit 529.53 ± 2.12 33.12 ± 0.03 c76b420 @m18coppola
P106-100 6 GB / GDDR5 / 192 bit 438.49 ± 0.38 30.64 ± 0.06 5fd160b @pebaryan
GTX 1060 6 GB / GDDR5 / 192 bit 446.19 ± 0.81 28.18 ± 0.01 5fd160b @pebaryan
Quadro T1000 4 GB / GDDR5 / 128 bit 27.46 ± 0.23 27.46 ± 0.23 f6da8cb @hanabu
Quadro P2000 5 GB / GDDR5 / 160 bit 311.55 ± 0.19 23.76 ± 0.01 baa9255 @TinyServal
Tesla K80 12 GB / GDDR5 / 384 bit 133.36 ± 0.60 14.27 ± 0.32 32732f2 @pebaryan
Quadro P1000 4 GB / GDDR5 / 128 bit 173.82 ± 0.02 13.65 ± 0.14 1e74897 @aleksyx

Apple Silicon 參考口徑

#4167 這條討論和後三條最大的區別,是它更早建立了統一口徑,除了 Q4_0,還會順帶放 F16Q8_0。它對理解 PP / TG / t/s 很有幫助。

討論裡直接給出的說明是:

  • PP 表示 prompt processing
  • TG 表示 text-generation
  • t/s 表示 tokens per second

文中可見的一個時間對比樣例,是 M2 Ultra 在同一台機器上隨著版本和 FA 演進後的成績:

時間 設備 版本/說明 頻寬 GB/s GPU 核心 F16 PP F16 TG Q8_0 PP Q8_0 TG Q4_0 PP Q4_0 TG
2023-11-21 M2 Ultra 8e672ef 800 76 1401.85 41.02 1248.59 66.64 1238.48 94.27
2024-11-12 M2 Ultra 86ed72d + FA 800 76 1525.95 43.15 1368.18 73.11 1391.78 108.80
2025-08-02 M2 Ultra 5c0eb5e + FA 800 76 1561.35 43.24 1386.97 73.35 1412.42 109.41
設備 Q4_0 PP Q4_0 TG Q8_0 PP Q8_0 TG F16 PP F16 TG
M1 Pro 16 GPU 266.25 36.41 270.37 22.34 302.14 12.75
M2 Ultra 76 GPU 1238.48 94.27 1248.59 66.64 1401.85 41.02
M3 Max 40 GPU 690.99 65.85 749.37 43.00 794.26 25.27

Apple 這條線這裡不展開全文搬運,後面重點看你指定的三類獨顯後端榜單。

ROCm / HIP 完整榜單

Llama 2 7B, Q4_0, no FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
Instinct MI300X 192 GB / HBM3 / 8192 bit 11476.40 ± 72.79 232.92 ± 0.53 ee3a9fc @yeahdongcn
RX 7900 XTX 24 GB / GDDR6 / 384 bit 3552.27 ± 101.96 167.11 ± 0.50 2f0c2db @Diablo-D3
Instinct MI210 64 GB / HBM2e / 4096 bit 2486.22 ± 9.58 124.51 ± 0.04 8160b38 @65a
Pro W7900 48 GB / GDDR6 / 384 bit 3213.17 ± 80.47 121.18 ± 0.06 8160b38 @65a
RX 7900 XT 20 GB / GDDR6 / 320 bit 3098.38 ± 24.02 116.15 ± 0.06 1e15bfd @AdamNiederer
RX 9070 16 GB / GDDR6 / 256 bit 2381.77 ± 3.68 114.48 ± 0.60 d0660f2 @andj1210
Instinct MI100 32 GB / HBM2 / 4096 bit 2732.83 ± 1.98 110.48 ± 0.14 9c35706 @firefox42
RX 9070 XT 16 GB / GDDR6 / 256 bit 5055.19 ± 109.58 101.27 ± 0.27 583cb83 @Hadrianneue
RX 7800 XT 16 GB / GDDR6 / 256 bit 2151.81 + 17.94 100.94 + 0.10 00131d6 @olegshulyakov
Instinct MI50 32 GB / HBM2 / 4096 bit 1057.24 ± 0.53 98.95 ± 0.25 97d5117 @wtarreau
RX 7900 GRE 16 GB / GDDR6 / 256 bit 1456.98 ± 12.39 96.07 ± 0.10 6fa3b55 @MihaiBojescu
AI PRO R9700 32 GB / GDDR6 / 256 bit 4443.54 ± 339.25 93.84 ± 0.26 bd4ef13 @gogich77
Instinct MI60 32 GB / HBM2 / 4096 bit 1289.11 ± 0.62 91.46 ± 0.13 504af20 @Said-Akbar
RX 6900 XT 16 GB / GDDR6 / 256 bit 1889.84 ± 31.21 88.49 ± 0.00 a972fae @notgood
Pro VII 16 GB / HBM2 / 4096 bit 1064.99 ± 1.18 87.45 ± 0.04 2739a71 @8XXD8
RX 6800 XT 16 GB / GDDR6 / 256 bit 1447.07 ± 1.36 83.92 ± 0.03 79c1160 @MrLavender
Pro V620 32 GB / GDDR6 / 256 bit 1803.65 ± 2.54 74.66 ± 0.01 5c0eb5e @samteezy
RX 9060 XT 16 GB / GDDR6 / 256 bit 1419.67 ± 3.64 67.58 ± 0.24 a0e13dc @lcy0321
RX 5700 XT 8 GB / GDDR6 / 256 bit 354.17 ± 0.18 67.55 ± 0.04 c05e8c9 @daniandtheweb
Instinct MI25 16 GB / HBM2 / 2048 bit 409.83 ± 0.23 63.94 ± 0.06 2739a71 @8XXD8
AI Max+ 395 128 GB / LPDDR5 911.36 ± 1.79 50.01 ± 0.07 e60f241 @firefox42
RX 7600 XT 16 GB / GDDR6 / 128 bit 1099.64 ± 2.05 48.58 ± 0.06 9c35706 @wbruna
RX Vega 64 8 GB / HBM2 / 2048 bit 240.68 ± 0.09 48.46 ± 0.09 ec428b0 @davispuh
Radeon 8060S System Shared / DDR5 351.36 ± 0.67 47.97 ± 0.33 1d0125b @hspak
Radeon 880M System Shared / DDR5 163.25 ± 13.86 12.97 ± 1.63 c55d53a @Hedede

Llama 2 7B, Q4_0, with FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
Instinct MI300X 192 GB / HBM3 / 8192 bit 11945.97 ± 54.29 218.53 ± 0.09 ee3a9fc @yeahdongcn
RX 7900 XTX 24 GB / GDDR6 / 384 bit 3874.25 ± 11.92 170.12 ± 0.56 2f0c2db @Diablo-D3
Pro W7900 48 GB / GDDR6 / 384 bit 3472.86 ± 52.86 127.43 ± 0.12 8160b38 @65a
Instinct MI210 64 GB / HBM2e / 4096 bit 2571.82 ± 2.89 130.18 ± 0.06 8160b38 @65a
RX 9070 16 GB / GDDR6 / 256 bit 2452.68 ± 1.33 115.32 ± 0.52 d0660f2 @andj1210
RX 7900 XT 20 GB / GDDR6 / 320 bit 3261.75 ± 9.09 112.30 ± 0.06 1e15bfd @AdamNiederer
Instinct MI50 32 GB / HBM2 / 4096 bit 1129.43 ± 0.15 105.82 ± 0.07 97d5117 @wtarreau
Instinct MI100 32 GB / HBM2 / 4096 bit 2755.00 ± 3.68 104.71 ± 0.10 9c35706 @firefox42
AI PRO R9700 32 GB / GDDR6 / 256 bit 4773.07 ± 49.30 97.98 ± 0.13 bd4ef13 @gogich77
RX 7900 GRE 16 GB / GDDR6 / 256 bit 1598.79 ± 11.48 97.53 ± 0.06 6fa3b55 @MihaiBojescu
RX 9070 XT 16 GB / GDDR6 / 256 bit 4903.51 ± 96.36 97.28 ± 0.13 583cb83 @Hadrianneue
RX 7800 XT 16 GB / GDDR6 / 256 bit 2304.63 + 2.85 95.99 + 0.21 00131d6 @olegshulyakov
RX 6900 XT 16 GB / GDDR6 / 256 bit 1948.31 ± 13.51 85.04 ± 0.02 a972fae @notgood
Pro V620 32 GB / GDDR6 / 256 bit 1256.86 ± 0.55 70.83 ± 0.02 5c0eb5e @samteezy
RX 9060 XT 16 GB / GDDR6 / 256 bit 1479.27 ± 0.71 65.42 ± 0.19 a0e13dc @lcy0321
RX 5700 XT 8 GB / GDDR6 / 256 bit 314.17 ± 0.29 62.02 ± 0.05 c05e8c9 @daniandtheweb
AI Max+ 395 128 GB / LPDDR5 1003.53 ± 2.91 49.87 ± 0.02 e60f241 @firefox42
Radeon 8060S System Shared / DDR5 366.08 ± 1.44 48.97 ± 0.15 1d0125b @hspak
RX 7600 XT 16 GB / GDDR6 / 128 bit 1199.16 ± 1.07 47.65 ± 0.06 9c35706 @wbruna
RX Vega 64 8 GB / HBM2 / 2048 bit 153.17 ± 0.72 42.46 ± 0.40 ec428b0 @davispuh
Radeon 880M System Shared / DDR5 213.31 ± 14.05 16.16 ± 1.41 c55d53a @Hedede

Vulkan 完整榜單

Llama 2 7B, Q4_0, no FA

Chip pp512 t/s tg128 t/s Commit Comments
Nvidia RTX 5090 10381.64 ± 508.84 263.63 ± 0.91 ca71fb9 coopmat2
AMD Radeon RX 7900 XTX 3531.93 ± 31.74 191.28 ± 0.20 2f0c2db
Nvidia RTX 4090 9452.03 ± 187.70 187.97 ± 0.21 4ae88d0 coopmat2
Nvidia RTX 5080 7444.99 ± 20.11 185.10 ± 0.54 f6b533d coopmat2
Nvidia A100 6389.86 ± 4.83 160.78 ± 0.16 2257758 coopmat2
Nvidia RTX 3090 4298.97 ± 10.59 160.13 ± 0.25 4ae88d0 coopmat2
Nvidia RTX 4080 Super 7101.18 ± 269.79 147.13 ± 5.64 81086cd coopmat2
Nvidia RTX 3080 4287.11 ± 55.50 139.15 ± 0.05 7c7d6ce coopmat2
Nvidia RTX A5000 3641.55 ± 9.05 139.89 ± 0.69 4ae88d0 coopmat2
AMD Radeon RX 9070 XT 5036.04 ± 88.16 137.11 ± 0.02 e9fd8dc
Nvidia RTX 5070 Ti 6213.63 ± 27.72 135.63 ± 0.18 d13d0f6 coopmat2
AMD Radeon AI Pro R9700 4036.04 ± 34.58 130.19 ± 0.39 3191462
Nvidia Tesla V100 1391.39 ± 1.19 129.58 ± 0.58 7d77f07
Nvidia RTX 4070 Ti Super 6099.18 ± 154.30 129.45 ± 0.18 4ae88d0 coopmat2
AMD Radeon RX 7900 XT 2941.58 ± 17.17 123.18 ± 0.40 71e74a3
AMD Radeon RX 9070 3164.10 ± 66.84 119.71 ± 3.40 21c17b5
AMD Radeon RX 7800 XT 2017.33 ± 19.30 118.27 ± 0.27 4fdbc1e
AMD Radeon RX 7900 GRE 2336.31 ± 7.52 116.11 ± 0.26 4b2a477
Apple M3 Ultra 1116.83 ± 0.55 115.54 ± 0.78 2d451c8 MoltenVK
Intel Arc Pro B70 3379.00 ± 47.92 112.02 ± 1.08 b863507
Nvidia Titan V 984.36 ± 4.13 108.86 ± 0.28 e56abd2
AMD Radeon Pro VII 1078.54 ± 0.86 107.82 ± 0.14 N/A
AMD Radeon RX 6900 XT 1837.21 ± 25.44 104.60 ± 0.30 a972fae
Intel Arc Pro A60 2261.11 ± 9.53 104.25 ± 0.07 97d5117
AMD Radeon RX 6800 XT 1752.92 ± 1.71 100.32 ± 0.97 N/A
AMD Radeon VII 1059.14 ± 0.56 101.19 ± 0.53 77d6ae4
Nvidia RTX 2080 Ti 1888.24 ± 9.20 97.58 ± 6.60 N/A
AMD Radeon RX 6800 1698.69 ± 0.80 95.61 ± 0.19 4b385bf
AMD Radeon Pro W6800X Duo 687.71 ± 4.33 94.82 ± 0.12 N/A
Nvidia RTX 5060 Ti 3460.92 ± 7.16 93.51 ± 0.15 89f10ba coopmat2
Nvidia RTX 4070 3179.37 ± 46.16 92.29 ± 0.28 9a48399
AMD Radeon Pro W6800X 510.80 ± 0.13 86.47 ± 0.46 13b4548 MoltenVK
AMD Radeon RX 6700 XT 1051.20 ± 0.98 83.88 ± 0.08 6d75883
AMD Radeon RX 6750 XT 1040.58 ± 0.35 81.98 ± 0.03 228f34c
AMD Radeon Pro V620 1595.32 ± 1.59 81.78 ± 0.06 03d4698
Nvidia RTX 3070 2113.02 ± 7.38 78.71 ± 0.13 1b8fb81
AMD Radeon Instinct MI60 369.26 ± 2.48 78.16 ± 1.40 504af20
Nvidia RTX 3060 1815.70 ± 5.85 75.94 ± 0.80 92c0b38 coopmat2
Apple M4 Max 724.77 ± 20.93 75.02 ± 0.14 1ece0cb6
Nvidia Tesla T10 1692.70 ± 2.05 75.01 ± 0.21 7f76692 coopmat2
Nvidia RTX A4000 2248.14 ± 7.59 73.74 ± 0.08 f5245b5 coopmat2
AMD Radeon RX 5700 XT 529.69 ± 0.26 70.73 ± 0.04 4fdbc1e
AMD Radeon RX 9060 XT 2141.67 ± 6.87 70.54 ± 0.74 ed52f36
Intel Arc B580 620.94 ± 15.33 70.14 ± 0.28 7f76692
AMD Radeon Pro V540 583.88 ± 6.56 69.64 ± 0.24 9da3dcd
AMD Radeon Pro W5700 449.85 ± 0.46 68.55 ± 0.15 23bc779
Intel Arc Pro B60 522.36 ± 3.60 68.55 ± 0.01 516a4ca
Nvidia GTX 1080 Ti 540.69 ± 0.71 64.99 ± 0.08 360d653
Nvidia RTX 2070 Super 1199.13 ± 7.70 64.64 ± 0.20 b7552cf
Nvidia RTX 3070 Mobile 1689.40 ± 19.57 63.64 ± 0.39 ceff6bb coopmat2
Nvidia Tesla P100 678.14 ± 1.40 63.16 ± 0.06 eec1e33
AMD BC-250 370.66 ± 0.04 62.32 ± 0.32 5886f4f
AMD Radeon RX 6650 XT 1029.52 ± 1.21 62.14 ± 0.02 dbb852b
Nvidia RTX 4060 Mobile 2135.66 ± 23.18 59.53 ± 0.03 a5c07dc coopmat2
Nvidia Tesla P40 488.06 ± 0.27 59.36 ± 0.16 N/A
Nvidia GTX 1660 Ti Mobile 511.67 ± 2.85 56.60 ± 0.07 b43556e
AMD Radeon Instinct MI25 439.42 ± 0.34 54.69 ± 0.03 2739a71
AMD Radeon RX 6600 XT 574.65 ± 0.86 53.92 ± 0.11 091592d
AMD Ryzen AI Max+ 395 1288.96 ± 6.49 53.59 ± 0.38 7f76692
AMD Radeon RX 7600 XT 840.85 ± 3.02 53.02 ± 0.01 01d8eaa
Intel Arc A770 1073.85 + 29.68 52.56 + 0.11 a69d54f
Nvidia GB10 2737.79 ± 19.56 52.28 ± 0.03 b9da444 coopmat2
AMD FirePro S9300 x2 247.26 ± 0.43 51.86 ± 0.11 eec1e33 Split across two GPUs
AMD Radeon RX 6600 761.89 ± 1.76 50.63 ± 0.02 b1c70e2
AMD Radeon RX Vega 56 439.87 ± 0.61 50.23 ± 0.14 92c0b38
Intel Arc B570 913.95 ± 0.90 49.64 ± 0.03 7f76692
Nvidia RTX 3060 Mobile 1059.76 ± 3.54 49.03 ± 0.13 dbb3a47
AMD Radeon RX 6800M 861.99 ± 7.67 48.71 ± 0.71 8e6f8bc
AMD Radeon RX 6600M 605.59 ± 0.65 48.21 ± 0.07 fe5b78c
Intel Arc A770M 875.92 ± 2.16 47.69 ± 0.16 eeee367
Nvidia P104-100 311.90 ± 0.22 46.18 ± 0.05 eec1e33
AMD Radeon RX Vega 64 356.08 ± 0.09 45.73 ± 0.18 ec428b0
Nvidia RTX A2000 1245.19 ± 8.76 45.52 ± 0.54 b1afcab coopmat2
AMD Radeon RX 7600M XT 459.39 ± 2.34 45.28 ± 0.10 b9ab0a4 eGPU
AMD Radeon Pro V340 375.41 ± 0.24 45.16 ± 0.06 9da3dcd Split across two GPUs
Nvidia GTX 1070 Ti 297.50 ± 0.54 42.86 ± 1.20 860a9e4 eGPU
Intel Arc A750 1075.94 ± 13.89 42.66 ± 0.18 c1b1876
Nvidia RTX 4050 Mobile 1154.28 + 15.76 41.89 + 0.10 d79d8f3
Nvidia GTX 1070 321.57 ± 0.93 41.48 ± 0.09 eec1e33
Intel Arc Pro B50 193.50 ± 0.24 39.99 ± 0.10 7b43f55
Nvidia Tesla M40 92.48 ± 0.02 39.35 ± 1.22 b8372ee
AMD Radeon RX 580 258.03 ± 0.71 39.32 ± 0.03 de4c07f
AMD Radeon RX 470 218.07 ± 0.56 38.63 ± 0.21 e288693
AMD Radeon Pro W5500 315.39 ± 3.76 36.82 ± 0.38 860a9e4
AMD Radeon RX 480 248.66 ± 0.28 34.71 ± 0.14 3b15924
Apple M2 Ultra 205.98 ± 0.02 34.34 ± 0.12 dbb852b Asahi Linux
Nvidia GTX 980 186.24 ± 0.09 33.90 ± 0.51 860a9e4
Nvidia P106-100 183.78 ± 0.26 29.77 ± 0.04 23bc779
AMD FirePro W8100 155.22 ± 0.17 29.52 ± 0.05 4536363
Nvidia Tesla P4 265.54 ± 0.21 28.03 ± 0.14 24d2ee0
AMD Radeon RX 6500 XT 255.25 ± 0.35 27.81 ± 0.10 g9fdfcd
Apple M3 263.70 ± 0.02 26.39 ± 0.14 b9ab0a4 MoltenVK
AMD FirePro S10000 94.78 ± 0.02 25.32 ± 0.02 914a82d Split across two GPUs
Nvidia Quadro P2000 169.55 ± 0.17 23.05 ± 0.03 63f8fe0
Intel Core Ultra 200 Series 544.95 ± 4.15 22.49 ± 0.09 cea560f
AMD Ryzen AI 9 300 Series 479.07 ± 0.41 22.41 ± 0.18 N/A
AMD Ryzen 6000 Series 240.89 ± 0.52 21.26 ± 0.08 ee09828
Apple M2 Pro 62.70 ± 0.03 20.95 ± 0.11 1fe0029 Asahi Linux
Nvidia GTX 1050 Ti 136.42 ± 0.67 20.96 ± 0.21 2f0c2db
AMD Ryzen 8000 Series 266.19 ± 1.36 20.53 ± 0.08 a5c07dc
AMD Ryzen 7000 Series 281.62 ± 1.56 19.91 ± 0.07 ebce03e
AMD Ryzen Z1 Extreme 199.36 ± 7.02 18.77 ± 0.02 53ff6b9
AMD FirePro D700 69.95 ± 0.04 16.62 ± 0.01 d3bd719 MoltenVK, running in FP16 mode on FP32 only chip
AMD Radeon Pro WX 4100 78.79 ± 0.10 16.05 ± 0.07 860a9e4
Apple M2 50.79 ± 0.16 13.50 ± 0.02 8c0d6bb Asahi Linux
Apple M1 38.29 ± 0.00 12.47 ± 0.03 2370665 Asahi Linux
AMD Ryzen 5000 Series 90.55 ± 0.08 10.98 ± 0.07 d84635b
Intel Core 1100 Series 187.20 ± 1.78 10.39 ± 0.04 abb9f3c
AMD Radeon RX 550 52.66 ± 0.49 10.20 ± 0.01 N/A
AMD Ryzen 4000 Series 103.87 ± 0.02 9.63 ± 0.01 4b385bf
Nvidia Tesla K80 89.46 ± 0.10 9.39 ± 0.06 5d46bab Running on single GPU
Nvidia Tesla K40 64.37 ± 0.09 9.30 ± 0.19 eec1e33
MediaTek Dimensity 9400 38.36 ± 15.15 8.92 ± 0.06 b9ab0a4 GPU supports coopmat but pp512 is faster with it turned off
Intel Core Ultra 100 Series 185.51 ± 0.22 8.21 ± 0.07 1d72c84
AMD Ryzen 3000 Series 48.63 ± 0.10 8.49 ± 0.01 1fe0029
CIX CD8180 2.80 ± 0.01 5.51 ± 0.00 4dca015
Intel Core 1000 Series 25.58 ± 0.00 4.25 ± 0.18 N/A
Intel Core 8000 Series 25.43 ± 0.17 3.35 ± 0.03 c4df49a
Intel N150 28.84 ± 0.02 2.93 ± 0.00 4f63cd7

Llama 2 7B, Q4_0, FA enabled

Chip pp512 t/s tg128 t/s Commit Comments
Nvidia RTX 5090 11796.38 ± 601.36 273.68 ± 0.52 ca71fb9 coopmat2
AMD Radeon RX 7900 XTX 3332.90 ± 11.47 195.30 ± 0.23 2f0c2db
Nvidia RTX 5080 8054.59 ± 35.68 192.17 ± 0.21 f6b533d coopmat2
Nvidia RTX 4090 10830.41 ± 36.25 190.10 ± 0.31 4ae88d0 coopmat2
Nvidia A100 7064.40 ± 1.63 170.56 ± 0.02 2257758 coopmat2
Nvidia RTX 3090 4732.33 ± 4.80 162.28 ± 0.21 4ae88d0 coopmat2
Nvidia RTX 4080 Super 8007.37 ± 46.03 150.20 ± 0.26 81086cd coopmat2
Nvidia RTX 3080 4913.83 ± 21.52 145.74 ± 0.16 7c7d6ce coopmat2
Nvidia Tesla V100 1411.25 ± 2.12 142.13 ± 0.03 7d77f07
Nvidia RTX A5000 4071.22 ± 13.13 140.43 ± 0.22 4ae88d0 coopmat2
AMD Radeon RX 9070 XT 4911.74 ± 28.52 138.20 ± 0.18 e9fd8dc
Nvidia RTX 5070 Ti 6764.53 ± 11.95 135.65 ± 0.02 d13d0f6 coopmat2
AMD Radeon AI Pro R9700 4333.83 ± 29.36 130.90 ± 0.12 3191462
AMD Radeon RX 7900 XT 3043.93 ± 10.42 124.20 ± 0.09 71e74a3
AMD Radeon RX 7800 XT 2094.64 ± 14.38 119.63 ± 0.13 4fdbc1e
AMD Radeon RX 9070 3277.24 ± 18.17 119.55 ± 0.06 21c17b5
AMD Radeon RX 7900 GRE 2402.07 ± 22.50 116.77 ± 0.08 4b2a477
Apple M3 Ultra 1115.55 ± 0.75 115.99 ± 0.12 2d451c8 MoltenVK
Intel Arc Pro B70 3314.53 ± 17.95 111.63 ± 0.05 b863507
Nvidia Titan V 792.74 ± 4.30 109.21 ± 0.72 e56abd2
AMD Radeon Pro VII 783.94 ± 0.77 108.45 ± 0.48 N/A
AMD Radeon RX 6900 XT 1761.93 ± 4.75 106.15 ± 0.04 a972fae
Nvidia RTX 2080 Ti 1936.25 ± 32.08 100.99 ± 0.24 N/A
AMD Radeon RX 6800 XT 1704.79 ± 0.71 100.50 ± 0.06 N/A
AMD Radeon Pro W6800X Duo 795.28 ± 0.72 100.08 ± 0.02 N/A
Nvidia RTX 5060 Ti 3912.65 ± 5.86 97.01 ± 0.14 89f10ba coopmat2
AMD Radeon RX 6800 1749.46 ± 3.36 96.65 ± 0.48 4b385bf
Nvidia RTX 4070 4293.57 ± 27.70 91.49 ± 0.89 9a48399 coopmat2
AMD Radeon RX 6750 XT 997.05 ± 0.45 82.29 ± 0.06 228f34c
AMD Radeon RX 6700 XT 1010.90 ± 12.89 81.86 ± 0.19 6d75883
Nvidia RTX 3060 2012.88 ± 10.12 80.59 ± 0.02 92c0b38 coopmat2
AMD Radeon Pro V620 1556.31 ± 2.82 79.24 ± 0.09 03d4698
Nvidia RTX A4000 2482.74 ± 26.05 76.07 ± 0.08 f5245b5 coopmat2
Nvidia Tesla T10 1840.14 ± 1.22 76.05 ± 0.13 7f76692 coopmat2
AMD Radeon RX 5700 XT 538.31 ± 0.35 74.43 ± 0.03 4fdbc1e
Intel Arc B580 419.49 ± 3.37 72.00 ± 0.24 7f76692
Apple M4 Max 557.46 ± 26.87 71.79 ± 4.16 1ece0cb6
AMD Radeon Pro W5700 446.98 ± 0.39 71.30 ± 0.24 23bc779
Intel Arc Pro B60 274.76 ± 0.27 70.54 ± 0.03 516a4ca
AMD Radeon RX 9060 XT 1915.41 ± 7.90 70.52 ± 0.16 ed52f36
Nvidia Tesla P100 685.51 ± 0.88 66.48 ± 0.02 eec1e33
AMD Radeon RX 6650 XT 1088.90 ± 0.40 64.53 ± 0.75 dbb852b
Nvidia GTX 1080 Ti 529.96 ± 0.38 64.63 ± 0.10 360d653
AMD BC-250 356.87 ± 1.24 63.14 ± 0.09 5886f4f
Nvidia RTX 3070 Mobile 1832.07 ± 57.14 62.92 ± 0.37 ceff6bb coopmat2
Nvidia RTX 4060 Mobile 2358.03 ± 12.17 60.01 ± 0.08 a5c07dc coopmat2
Nvidia Tesla P40 484.37 ± 0.27 59.22 ± 0.15 N/A
Nvidia GTX 1660 Ti Mobile 514.34 ± 0.88 57.30 ± 0.42 b43556e
AMD Radeon RX 7600 XT 1024.38 ± 7.56 56.11 ± 0.02 01d8eaa
AMD FirePro S9300 x2 243.33 ± 0.22 55.64 ± 0.06 eec1e33 Split across two GPUs
Nvidia GB10 3279.89 ± 26.78 53.64 ± 0.05 b9da444 coopmat2
AMD Radeon RX 6600 808.76 ± 0.15 53.24 ± 0.03 b1c70e2
Intel Arc A770 1119.68 + 30.25 53.07 + 0.09 a69d54f
AMD Ryzen AI Max+ 395 1357.07 ± 10.94 53.00 ± 0.13 7f76692
AMD Radeon RX Vega 56 428.54 ± 0.50 52.66 ± 0.03 92c0b38
Intel Arc B570 288.51 ± 0.09 50.49 ± 0.05 7f76692
Nvidia P104-100 325.30 ± 0.25 48.64 ± 0.04 eec1e33
AMD Radeon Pro V340 360.23 ± 0.74 47.54 ± 0.06 9da3dcd Split across two GPUs
AMD Radeon RX 6800M 784.16 ± 2.76 49.06 ± 0.34 8e6f8bc
AMD Radeon RX Vega 64 320.12 ± 0.22 47.06 ± 0.01 ec428b0
Nvidia RTX A2000 1361.85 ± 3.26 45.69 ± 0.20 b1afcab coopmat2
Intel Arc A770M 384.74 ± 0.78 45.68 ± 0.06 eeee367
Intel Arc A750 303.37 ± 1.44 43.96 ± 0.03 c1b1876
Nvidia GTX 1070 Ti 292.85 ± 0.23 43.42 ± 0.34 860a9e4 eGPU
Nvidia GTX 1070 330.84 ± 1.02 43.33 ± 0.06 360d653
Nvidia Tesla M40 93.35 ± 0.01 41.68 ± 0.01 b8372ee
Intel Arc Pro B50 132.48 ± 0.04 41.02 ± 0.04 7b43f55
AMD Radeon RX 470 197.26 ± 0.27 37.28 ± 0.11 3769fe6
AMD Radeon RX 480 194.52 ± 0.61 37.23 ± 0.09 0bcb40b
Apple M2 Ultra 198.83 ± 0.85 198.83 ± 0.85 dbb852b Asahi Linux
Nvidia GTX 980 180.97 ± 0.74 34.16 ± 0.10 860a9e4
Nvidia P106-100 183.40 ± 0.34 30.79 ± 0.32 23bc779
AMD FirePro W8100 140.52 ± 0.34 29.28 ± 0.14 4536363
Nvidia Tesla P4 287.14 ± 0.29 28.37 ± 0.24 24d2ee0
Nvidia Quadro P2000 181.71 ± 0.12 23.77 ± 0.02 63f8fe0
Intel Core Ultra 200 Series 536.48 ± 1.27 23.05 ± 0.04 cea560f
AMD Ryzen AI 9 300 Series 532.59 ± 3.55 22.31 ± 0.06 N/A
AMD Ryzen 6000 Series 277.91 ± 0.37 21.15 ± 0.09 ee09828
Apple M2 Pro 58.86 ± 0.02 20.97 ± 0.03 1fe0029 Asahi Linux
AMD Ryzen 8000 Series 297.39 ± 1.22 20.59 ± 0.38 a5c07dc
AMD Ryzen 7000 Series 312.85 ± 2.51 20.09 ± 0.35 835b2b9
Nvidia GTX 1050 Ti 127.54 ± 1.03 20.08 ± 0.17 2f0c2db
AMD Radeon Pro WX 4100 75.59 ± 0.19 16.56 ± 0.04 860a9e4
Apple M1 35.93 ± 0.00 12.85 ± 0.02 2370665 Asahi Linux
Apple M2 46.81 ± 0.08 12.25 ± 2.30 8c0d6bb Asahi Linux
AMD Ryzen 5000 Series 79.06 ± 0.01 10.75 ± 0.00 5d195f1
Intel Core 1100 Series 174.77 ± 4.47 10.58 ± 0.03 abb9f3c
Nvidia Tesla K40 64.37 ± 0.02 9.92 ± 0.06 eec1e33
AMD Ryzen 4000 Series 113.32 ± 0.01 9.87 ± 0.01 4b385bf
Nvidia Tesla K80 88.26 ± 0.19 9.49 ± 0.01 5d46bab Running on single GPU
AMD Ryzen 5 3000 Series 47.41 ± 0.14 8.47 ± 0.01 1fe0029
Intel Core Ultra 100 Series 77.66 ± 2.75 7.75 ± 0.05 2e89f76
Intel Core 8000 Series 25.55 ± 0.04 3.35 ± 0.02 c4df49a
Intel N150 25.59 ± 0.00 2.91 ± 0.00 4f63cd7

這些表格該怎麼用

如果你只是想買卡,或者看手裡機器大概在哪個檔位,最實用的讀法其實是這三步:

  1. 先看你關心的是 tg128 還是 pp512
    日常對話、寫程式、聊天體感,優先看 tg128;長上下文吞吐、批次處理、服務端壓 prompt,則更應該看 pp512

  2. 再看你實際跑的後端。
    Nvidia 通常看 CUDA 更貼近真實上限;AMD 機器更應該先對照 ROCmVulkan;跨平台相容場景則更適合參考 Vulkan

  3. 最後再看 FA
    很多卡開啟 FApp512 會漲得更明顯,但 tg128 不一定同步大漲,所以不能只看單個最高分。

一句話總結

同樣是 llama.cpp 跑分,pp512tg128Q4_0FACUDA / ROCm / Vulkan 分別代表完全不同的維度。先把口徑分清,再看數字,榜單才有意義。

如果你只想記一個最短結論,那就是:

  • CUDA 目前整體最強
  • ROCm 在高階 AMD 卡上已經很能打
  • Vulkan 覆蓋最廣,老卡、核顯、Intel Arc、Apple Asahi 都能找到可比條目
  • tg128pp512 更接近日常真實體感

原始來源

记录并分享
使用 Hugo 建立
主題 StackJimmy 設計