My Random Posts

Yet another blog

My first impressions on ROCm and Strix Halo

Tags = [ rocm ]

Here I'll share my first impressions with ROCm and Strix Halo and how I've set up everything.

Strix Halo on htop 128GB efficiently shared between the CPU and GPU.

OS choice and driver installation

I'm used to working with Ubuntu, so I stuck with it in the supported 24.04 LTS version, and just followed the official installation instructions.

BIOS update

It seems that things wouldn't work without a BIOS update: PyTorch was unable to find the GPU. This was easily done in the BIOS settings: it was able to connect to my Wi-Fi network and download it automatically.

BIOS settings and Grub changes

Also in the BIOS settings, you might need to make sure you set the reserved video memory to a low value and let the memory be shared between the CPU and GPU using the GTT. The reserved memory can be as low as 512 MB.

Implications:

  • The CPU is not able to use the GPU reserved memory.
  • The GPU can use the total of reserved memory plus GTT, but utilizing both simultaneously can be less efficient than a single large GTT pool due to fragmentation and addressing overhead.
  • Some legacy games or software sadly might see the GPU memory as 512 MB and refuse to work; this has not happened to me so far, though.

Then on /etc/default/grub, I've made this change:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash ttm.pages_limit=32768000 amdgpu.gttsize=114688"

and then ran sudo update-grub.

Note that amdgpu.gttsize shouldn't include the whole system memory; you should leave some memory (I read from 4 GB to 12 GB) reserved for the CPU (total memory minus reserved GPU memory minus GTT) for the sake of the stability of the Linux kernel.

PyTorch with UV

This was somewhat tricky because of the weird dependency graph of PyTorch, but eventually I've got it working with:

[project]
name = "myproject"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "torch==2.11.0+rocm7.2",
    "triton-rocm",
]

[tool.uv]
environments = ["sys_platform == 'linux'"]

[[tool.uv.index]]
name = "pytorch-rocm"
url = "https://download.pytorch.org/whl/rocm7.2"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch-rocm" }
torchvision = { index = "pytorch-rocm" }
triton-rocm = { index = "pytorch-rocm" }

and you can even add this to your .bashrc:

alias pytorch='''uvx --extra-index-url https://download.pytorch.org/whl/rocm7.2 \
    --index-strategy unsafe-best-match \
    --with torch==2.11.0+rocm7.2,triton-rocm \
    ipython -c "import torch; print(f\"ROCM: {torch.version.hip}\"); \
    print(f\"GPU available: {torch.cuda.is_available()}\"); import torch.nn as nn" -i
'''

Llama.cpp

podman run --rm -it --name qwen-coder --device /dev/kfd --device /dev/dri \
--security-opt label=disable --group-add keep-groups \
-p 8080:8080 -v /some_path/models:/models:z  ghcr.io/ggml-org/llama.cpp:server-rocm \
-m /models/qwen3.6/model.gguf -ngl 99 -c 327680 --host 0.0.0.0 --port 8080 \
--flash-attn on --no-mmap

Note that you can easily download the model with:

uvx hf download Qwen/Qwen3.6-35B-A3B --local-dir /some_path/models/qwen3.6

And convert it to gguf with the convert_hf_to_gguf.py script from the llama.cpp repo:

git clone https://github.com/ggerganov/llama.cpp.git /some_path/llama.cpp
cd /some_path/models/qwen3.6 &&
uvx --extra-index-url https://download.pytorch.org/whl/rocm7.2 \
    --index-strategy unsafe-best-match \
    --with torch==2.11.0+rocm7.2,triton-rocm,transformers \
    ipython /some_path/llama.cpp/convert_hf_to_gguf.py \
    -- . --outfile model.gguf

Opencode

I'm using Podman to run Opencode; see my repo for how to set it up.

And this is my config to have it work with Llama.cpp:

{
    "$schema": "https://opencode.ai/config.json",
    "provider": {
        "local": {
            "options": {
                "baseURL": "http://localhost:8080/v1",
                "apiKey": "any-string",
                "reasoningEffort": "auto",
                "textVerbosity": "high",
                "supportsToolCalls": true
            },
            "models": {
                "qwen-coder-local": {}
            }
        }
    },
    "model": "local/qwen-coder-local",
    "permission": {
        "*": "ask",
        "read": {
            "*": "allow",
            "*.env": "deny",
            "**/secrets/**": "deny"
        },
        "bash": "allow",
        "edit": "allow",
        "glob": "allow",
        "grep": "allow",
        "websearch": "allow",
        "codesearch": "allow",
        "webfetch": "allow"
    },
    "disabled_providers": [
        "opencode"
    ]
}

Conclusion

So as I promised, my first impressions are: so far, so good. I was able to play with PyTorch and run Qwen3.6 on llama.cpp with a large context window. There were some rough edges, but I think it was quite worth it.