AMD GPU and Deep Learning: A Practical Tutorial Guide

In the past, it was often said that running deep learning software on AMD GPUs was a major hassle, and users with deep learning needs were advised to buy Nvidia graphics cards. However, with the recent popularity of Large Language Models (LLMs) and many research institutions releasing LLaMA-based models, I became interested and wanted to test them out. Since most of the high-VRAM graphics cards I have on hand are from AMD, I decided to try running them on these cards.

Table of Contents

Testing Environment

The current testing environment uses a VM with GPU passthrough; the setup is as follows:

CPU: 16-core AMD Epyc 7302P
RAM: 96GB DDR4 ECC 2933
GPU: AMD Instinct MI25 (WX9100 BIOS)
OS: Ubuntu 20.04
Host OS: PVE 7.4-3
ROCm 4.5.2

ROCm is an open-source platform developed by AMD for running deep learning applications, which includes HIP, a CUDA conversion layer.

System Settings

ROCm Installation Steps

First, we need to add the user to the 'video' group to obtain the necessary permissions.

sudo usermod -a -G video $LOGNAME

Then, download and install the installer:

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/focal/amdgpu-install_5.4.50403-1_all.deb  
sudo apt-get install ./amdgpu-install_5.4.50403-1_all.deb

Finally, install the GPU drivers and ROCm, note that I am specifically designating version 4.5.2 here. While the latest version is currently 5.4.3, the last version to support the AMD Instinct MI25/WX9100 is 4.5.2.

sudo amdgpu-install --usecase=rocm,dkms --rocmrelease=4.5.2

The specified rocm 跟 dkms is a common use case. To see which use cases are available, you can use the following command:

sudo amdgpu-install --list-usecase

The installation process will take some time, mostly depending on your internet speed. Once the installation is complete, we need to restart the system to load the new drivers.

sudo reboot

PyTorch Installation Steps

Most projects nowadays use PyTorch as their deep learning framework. To use ROCm , you need to specify a particular version during installation.

Note that most projects specify in requirements.txt the requirements PyTorch, and the default is to install CUDA the version; if it is already installed, you need to uninstall it first.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm4.5.2

Finally, specify the corresponding ROCm version; here we are installing version 4.5.2.

Testing

To test PyTorch whether the GPU is being used, you can enter the following commands in the Python command line:

import torch
torch.cuda.is_available()

If it displays True , it means the installation was successful!

Conclusion

After practical operation, I found that using an AMD GPU to run deep learning applications is not difficult. Currently, I have successfully run stable-diffusion and Vicuna-7B. If readers with AMD graphics cards also want to try some deep learning projects, you might consider using AMD cards to save on the cost of purchasing Nvidia cards.