Gene WorkSpace

Misc

Over 6,000 cloud edge servers revealed! Nvidia Tesla T10 real-world performance sharing

Posted on January 27, 2025January 27, 2025by Gene Kuo

Recently, a special display card has emerged in the Chinese market during the Tesla T10 launch event — a GPU originally designed by NVIDIA exclusively for cloud gaming services, primarily used in the GeForce NOW cloud gaming platform. These retired display cards have now entered the secondary market, with current prices on Chinese platforms around 1,350 RMB (approximately 190 USD). Due to their affordability, I purchased two to examine their performance.

No comments yet

Cloud

Deploying Prometheus monitoring system on Arista switches

Posted on January 12, 2025January 12, 2025by Gene Kuo

Overview This article will explain how to run node_exporter and snmp_exporter via Docker containers on an Arista switch, and how to monitor switch status in real time using Prometheus.

No comments yet

Cloud
...
- Kubernetes

Leveraging Nvidia GPU on Kubernetes for LLM Chatbot Deployment

Posted on February 28, 2024February 28, 2024by Gene Kuo

在 Kubernetes 上利用 Nvidia GPU 部署 LLM Chatbot

With the rapid advancement of artificial intelligence and large language models, more and more companies and developers are eager to integrate language models into their own chatbot systems (LLM Chatbot). This article aims to guide readers through deploying a high-performance LLM chatbot in a Kubernetes environment using Nvidia GPU, covering everything from essential installation and tools to detailed deployment steps.

No comments yet

Container
...
- Linux

How to Monitor Container PSI Metrics

Posted on December 28, 2023February 28, 2024by Gene Kuo

Introduction In the previous article, we discussed PSI (Pressure Stall Information) and how to monitor system PSI metrics. This article will dive deeper into how to monitor PSI information for a single container.

No comments yet

Linux

Understanding and Applying Linux PSI (Pressure Stall Information) Metrics

Posted on December 22, 2023February 28, 2024by Gene Kuo

Introduction When CPU, memory, or I/O resources experience contention, workloads may suffer from increased latency, degraded performance, and even face abrupt OOM (Out of Memory) termination. Without proper monitoring to detect such contention, users risk exhausting their hardware resources, leading to frequent crashes due to over-provisioning. Since Linux kernel 4.20, the Linux kernel has introduced PSI (Pressure Stall Information), a metric that enables users to precisely understand how resource shortages impact overall system performance. This article will briefly explain PSI and how to interpret its data.

No comments yet

Cloud
...
- Container
  - Kubernetes
- OpenStack

Deploying Charmed Kubernetes with OpenStack Integrator

Posted on December 16, 2023February 28, 2024by Gene Kuo

Introduction Charmed Kubernetes is a Kubernetes deployment solution provided by Canonical, enabling deployment of Kubernetes across various environments through juju. This article will walk through deploying Charmed Kubernetes onto OpenStack, and leveraging the OpenStack Integrator to provide Persistent Volumes and Load Balancers to Kubernetes.

No comments yet

Cloud
...
- Kubernetes
- OpenStack

Quickstart Kubernetes Deployment: Using kops on OpenStack – Practical Guide

Posted on November 2, 2023February 28, 2024by Gene Kuo

Kubernetes offers multiple deployment options, and among various tools, kops stands out for its ease of use and high integration. This article will provide an in-depth look at the kops tool, guiding readers through practical steps to rapidly set up a Kubernetes cluster in an OpenStack environment.

1 Comment

Ceph
...
- Storage

How to Select the Right Ceph SSD

Posted on October 19, 2023February 28, 2024by Gene Kuo

With the continuous decline in SSD prices, many enthusiasts and enterprises are beginning to explore using Ceph to build storage pools based on SSDs, aiming for higher performance. However, to achieve optimal Ceph performance, selecting the right SSD is critical. In this article, we will explore how to choose SSDs that are well-suited for Ceph.

No comments yet

Misc

AMD GPU and Deep Learning: Practical Learning Guide

Posted on April 30, 2023February 28, 2024by Gene Kuo

Historically, AMD GPUs have been considered less suitable for deep learning tasks, leading many deep learning users to prefer Nvidia GPUs. However, recently, LLMs (Large Language Models) have gained significant attention, and numerous research teams have released models based on LLaMA, prompting me to feel inspired and eager to experiment. I have several AMD GPUs with ample VRAM, so I’ve decided to test these cards for running LLMs.

No comments yet

Cloud
...
- Container
  - Kubernetes

Introduction to Kubernetes Cluster-API

Posted on March 26, 2023February 28, 2024by Gene Kuo

Kubernetes has been developing for many years in the cloud-native world and has also evolved numerous specialized projects related to managing its lifecycle, such as Kops and Rancher. VMware, on the other hand, has launched a project named Cluster API to leverage Kubernetes' own capabilities for managing other Kubernetes clusters. This article will briefly introduce the Cluster API project.

No comments yet