How to Install Local AI on Dell Precision 7520

Posted Jan 31, 07:42 PM

Goal:

LLM

Works with GPU
Runs locally and uses the GPU.
(optional) can be used for vibe coding with IntelliJ.
is accessible via web browser.
has a WhatsApp relay. (Another phone number recommended)

Prepare

Disable all radio devices on the Dell in the BIOS.
Attach a cable network
I tried first the installation also without a network attached, but this just made my life harder.
Download Ubuntu 22.04 Server, even though a newer version is available.
The error I ran into was :fail curtin command block-meta dev/pve/data not an existing file of block device…
I tried several things. From partitioning by hand over vgremove, pvremove, wipefs and some other solutions unsuccessfully.
The older installer is just not so picky as the one from 24.04.

Installation

Install Ubuntu LTS 22.04 Server:

Choose Ubuntu Server with the HWE Kernel at boot
Check “Ubuntu Server (minimized)”
Check “Search for third-party drivers”
NO LVM, I needed to disable it.
- I also edited the automatically made partitions by reducing the size of the main partition in favor of having a swap partition of 16G.
Check “Install OpenSSH Server”

Install GPU drivers

update repository
sudo apt update
check drivers
sudo ubuntu-drivers devices

vendor : NVIDIA Corporation
model : GM107GLM [Quadro M1200 Mobile]
driver : nvidia-driver-535-server – distro non-free
driver : nvidia-driver-470 – distro non-free
driver : nvidia-driver-450-server – distro non-free
driver : nvidia-driver-535 – distro non-free
driver : nvidia-driver-580 – distro non-free recommended
driver : nvidia-driver-580-server – distro non-free
driver : nvidia-driver-470-server – distro non-free
driver : nvidia-driver-390 – distro non-free
driver : nvidia-driver-418-server – distro non-free
driver : nvidia-driver-545 – distro non-free
driver : nvidia-driver-570 – distro non-free
driver : nvidia-driver-570-server – distro non-free
driver : xserver-xorg-video-nouveau – distro free builtin

Install the recommended one:

remove all old nvidia packages.
sudo apt purge 'nvidia*'
get sources to build the dkms
sudo apt install build-essential linux-headers-$(uname -r)
get new gcc to prevent later compilation problems with nvcc and llama.cpp
sudo apt install gcc-12 g++-12
install the driver
sudo apt install nvidia-driver-580

reboot.

Then check the installation:
nvidia-smi -> It should show your card.

Sun Feb  1 09:05:38 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro M1200                   Off |   00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8            N/A  /  200W |       2MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |

Install llama.cpp

Prepare

Get necessary software
sudo apt install git
sudo apt install cmake
sudo apt install dialog
sudo apt install open-ssl

Install a newer version of the CUDA toolkit (we need version 12, but the repo has version 11)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
sudo apt update
sudo apt install cuda-toolkit-12-5 This will take a while; grab a coffee or clean your room.

Assign CUDA to the paths (put those commands eventually also into .bashrc):

export CUDA_HOME=/usr/local/cuda-12.5
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Test the installed version

which nvcc

/usr/local/cuda-12.5/bin/nvcc

nvcc --version

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0

Clone the repository
git clone https://github.com/ggerganov/llama.cpp
Check the following sites for the actual CUDA installation method:
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
Build a llama
cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DCMAKE_C_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12
cmake --build build --config Release -j 8 j8 is the amount of parallel jobs that will be used. It speeds the build a lot.

Go cooking or make something meaningful. This will take a long time!

Install Model

Switch to the home folder
cd
Create models folder
mkdir -p models
Download the 7b model
curl -L -o openhermes-2.5-mistral-7b.Q4_K_M.gguf \ https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF/resolve/main/openhermes-2.5-mistral-7b.Q4_K_M.gguf

Start LLM

My Dell Precision 7520 has:

32 GB RAM dual channel
16 GB RAM single channel

Depending on the context size, the communication will be faster or slower.
--ctx-size 4096

On 4 GB VRAM (Quadro M1200), this was a good starting point.
--n-gpu-layers 32

~/llama.cpp/build/bin/llama-server   -m ~/models/openhermes-2.5-mistral-7b.Q4_K_M.gguf \
   --n-gpu-layers -1 \
   --host 0.0.0.0
   --port 8080

All the settings hardly depend on the model that is used. I tried different models with more or less success. However, my approach is to have 2 services, one for a smaller model and one for a bigger model. 3B models run well on the GPU, which means that most of the layers are in VRAM.
For the bigger ones, I set --n-gpu-layers 0 to prevent them from running on the GPU. They then live in my RAM and are processed in the CPU.

Despite the one used for this article, I currently have the following models running:

cpu_model.gguf -> glm-4.7-flash-claude-4.5-opus.q4_k_m.gguf
gpu_model.gguf -> OwlLM2-e2b.Q8_0.gguf

I recommend putting the server into a service and using systemctl to start it.

Llama.cpp brings its own web UI with it. It is available on the specified port.

I get speeds of about 6-12 tokens per second. On both.

I also tried https://github.com/openclaw/openclaw for the WhatsApp relay, but this doesn’t work well. OpenClaw needs a context size of at least 16k, which is not manageable by this system, or at least not for the GPU model. However, depending on the models, I had to wait at least 3 minutes for just a “Hello.” As an alternative, I now use https://github.com/HKUDS/nanobot which does not need such a big context size.

At the end I achieved all open points. The part with IntelliJ is still ongoing.

Author Adrian Höhne
Categories Linux, AI

Comi's Kaese

An open reminder for my thougths

How to Install Local AI on Dell Precision 7520

Goal:

LLM

Prepare

Installation

Install Ubuntu LTS 22.04 Server:

Install GPU drivers

Install the recommended one:

Install llama.cpp

Prepare

Install Model

Start LLM