Historically, AMD GPUs have been considered less suitable for deep learning tasks, leading many deep learning users to prefer Nvidia GPUs. However, recently, LLMs (Large Language Models) have gained significant attention, and numerous research teams have released models based on LLaMA, prompting me to feel inspired and eager to experiment. I have several AMD GPUs with ample VRAM, so I’ve decided to test these cards for running LLMs.
Table of Contents
Test Environment
The current testing environment uses VM with GPU passthrough. The environment is as follows:
- CPU: 16-core AMD EPYC 7302P
- RAM: 96GB DDR4 ECC 2933
- GPU: AMD Instinct MI25 (WX9100 BIOS)
- OS: Ubuntu 20.04
- Host OS: PVE 7.4-3
- ROCm 4.5.2
ROCm is AMD's open-source platform designed for deep learning-related applications, including HIP, which provides CUDA porting capabilities.
System Configuration
ROCm Installation Steps
First, we need to add the user to the video group to gain the necessary permissions.
sudo usermod -a -G video $LOGNAME
Then download and install the installation script:
sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/focal/amdgpu-install_5.4.50403-1_all.deb
sudo apt-get install ./amdgpu-install_5.4.50403-1_all.deb
Finally, install the GPU driver and ROCmNote that I specifically specify version 4.5.2. The latest version is currently 5.4.3, but the latest version supporting AMD Instinct MI25/WX9100 is 4.5.2.
sudo amdgpu-install --usecase=rocm,dkms --rocmrelease=4.5.2
This version is commonly used for rocm 跟 dkms a typical use case. To see which use cases can benefit from the following commands:
sudo amdgpu-install --list-usecase
The installation process may take some time, mainly depending on network speed. After installation, we need to reboot the system to load the new driver.
sudo reboot
PyTorch Installation Steps
Most projects currently use PyTorch as a deep learning framework, but to use ROCm you must specifically specify the version during installation.
Note that most projects use requirements.txt Mid-term meeting PyTorch, the default installation is CUDA the version, if already installed, you need to uninstall it first.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm4.5.2
Finally, you must specify the corresponding ROCm version, this is the installation of version 4.5.2.
Test
Test PyTorch whether GPU is being used, you can enter the following command in Python:
import torch
torch.cuda.is_available()
If it shows True , it means the installation was successful!
Summary
After actual operation, it was found that running deep learning applications on AMD GPU is not difficult. Currently successfully running stable-diffusion and Vicuna-7B. If you have an AMD graphics card and would like to experiment with some deep learning projects, you are welcome to explore using AMD graphics cards to save the cost of purchasing Nvidia graphics cards.
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.