Deploying Prometheus Monitoring System on Arista Switches

Overview

This article will introduce how to Arista run Docker containers on switches node_exportersnmp_exporterto achieve the functionality of monitoring switch status via Prometheus.

System Architecture

This monitoring solution includes the following main components:

  • node_exporter: Collects host-level metrics data
  • snmp_exporter: Collects network device data via SNMP protocol
  • Prometheus: Serves as a time-series database for storing and querying monitoring data

Container Management Settings

Basic Container Settings

container-manager
   container-profile default
      networking mode host

Node Exporter Settings

container node-exporter
   image prom/node-exporter
   no shutdown
   profile default
   command --collector.disable-defaults --collector.cpu --collector.hwmon --collector.meminfo --collector.vmstat --collector.stat
   persist storage
      mount src file:/ dst /host

This configuration:

  • Use the official node_exporter image
  • Enable specific collectors to monitor CPU, hardware, memory, and other metrics
  • Mount the host filesystem to collect system data

SNMP Exporter Settings

container snmp-exporter
   image prom/snmp-exporter:latest
   no shutdown
   profile default

The SNMP exporter is designed as a centralized monitoring agent that can monitor thousands of devices simultaneously.

Network Access Control Configuration

Based on Arista's standard control plane ACL, we need to open additional ports required by node_exporter and snmp_exporter. The complete ACL configuration is as follows:

ip access-list default-with-exporter
   counters per-entry
   10 permit icmp any any
   20 permit ip any any tracked
   30 permit udp any any eq bfd ttl eq 255
   40 permit udp any any eq bfd-echo ttl eq 254
   50 permit udp any any eq multihop-bfd micro-bfd sbfd
   60 permit udp any eq sbfd any eq sbfd-initiator
   70 permit ospf any any
   80 permit tcp any any eq ssh telnet www snmp bgp https msdp ldp netconf-ssh gnmi
   90 permit udp any any eq bootps bootpc ntp snmp ptp-event ptp-general rip ldp
   100 permit tcp any any eq mlag ttl eq 255
   110 permit udp any any eq mlag ttl eq 255
   120 permit vrrp any any
   130 permit ahp any any
   140 permit pim any any
   150 permit igmp any any
   160 permit tcp any any range 5900 5910
   170 permit tcp any any range 50000 50100
   180 permit udp any any range 51000 51100
   190 permit tcp any any eq 3333
   200 permit tcp any any eq nat ttl eq 255
   210 permit tcp any eq bgp any
   220 permit rsvp any any
   230 permit tcp any any eq 9340
   240 permit tcp any any eq 9559
   250 permit udp any any eq 8503
   260 permit udp any any eq lsp-ping
   270 permit udp any eq lsp-ping any
   280 permit tcp any any eq 9116
   290 permit tcp any any eq 9100

This ACL configuration includes:

  • Standard control plane access control rules for Arista switches
  • Additionally allow ports required by Prometheus exporters:
    • 9100: Default port for node_exporter
    • 9116: Default port for snmp_exporter

Apply the ACL to the control plane:

system control-plane
   ip access-group default-with-exporter in

SNMP Configuration

snmp-server community public ro

Configure a read-only SNMP community string to allow data collection.

Verify Deployment

After deployment is complete, we can use curl Use the following command to verify if the exporters are running correctly:

Test node_exporter

# 測試 node_exporter 是否正常運行
curl -s localhost:9100/metrics | head -n 5

The expected output should look like this:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 8953.12
node_cpu_seconds_total{cpu="0",mode="system"} 245.45
node_cpu_seconds_total{cpu="0",mode="user"} 189.76

Testing snmp_exporter

# 測試 snmp_exporter 是否正常運行
curl -s localhost:9116/metrics | head -n 5

The expected output should look like this:

# HELP snmp_exporter_build_info A metric with a constant '1' value labeled by version
# TYPE snmp_exporter_build_info gauge
snmp_exporter_build_info{version="0.20.0"} 1

If you see output similar to the above, it indicates that the exporters are running normally and are ready to receive data scraping requests from Prometheus.

Conclusion

With these settings, we can:

  1. Run monitoring components natively on Arista switches
  2. Collect comprehensive system and network performance metrics
  3. Integrate with existing Prometheus monitoring systems

This solution is particularly well-suited for large-scale network device monitoring, enabling centralized management and monitoring of multiple devices. Once deployment is complete, simply configure the corresponding targets on the Prometheus server to begin collecting monitoring data.

Reference

Leave a Reply