Find the sweet spot in my Precision 7520 with eGPU RTX 2060 12GB VRAM via M.2 to OcuLink connection.
I used llama.cpp and played around with the -ncmoe flag to get my sweet spot of tokens per second. For speed measurement, I this time just used the web page provided by the server and asked for 20 prime numbers each time. The goal was to get the best speed but still a huge context window.
| test | cmoe | tk\s | ctx |
| 1 | 50 | 16.5 | 262144 |
| 2 | 40 | 19.1 | 147456 |
| 3 | 30-35 | failed | failed |
| 4 | 36 | oom | 4096 |
| 5 | 37 | 20.2 | 4096 |
| 6 | 38 | 19.3 | 40448 |
| 7 | 39 | 19.1 | 93952 |
Test 1
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | – CUDA0 (RTX 2060) | 11833 = 5843 + ( 5757 = 1358 + 3339 + 1060) + 231 |
llama_memory_breakdown_print: | – Host | 40969 = 40449 + 0 + 520 |
Test 2
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | – CUDA0 (RTX 2060) | 11833 = 1191 + (10417 = 7886 + 1911 + 620) + 223 |
llama_memory_breakdown_print: | – Host | 34031 = 33735 + 0 + 296
Test 3 – 4 failed
Test 5
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | – CUDA0 (RTX 2060) | 11833 = 801 + (10802 = 10334 + 126 + 342) + 228 |
llama_memory_breakdown_print: | – Host | 31229 = 31213 + 0 + 16 |
Test 6
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | – CUDA0 (RTX 2060) | 11833 = 1095 + (10508 = 9518 + 579 + 411) + 229 |
llama_memory_breakdown_print: | – Host | 32140 = 32053 + 0 + 87 |
Test 7
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | – CUDA0 (RTX 2060) | 11833 = 1147 + (10463 = 8702 + 1245 + 515) + 222 |
llama_memory_breakdown_print: | – Host | 33085 = 32894 + 0 + 191 |