Zabbix Monitoring for ISPs | Grafana and Alerts

Zabbix with custom templates for ISP equipment

A generic Zabbix template does not work for ISP equipment. Huawei NE8000 has specific MIBs for PPPoE sessions, BGP prefixes and module temperature. ZTE and FiberHome OLTs have different OIDs for ONU optical power. MikroTik CCR has monitoring resources via API that SNMP does not expose well.

We develop and maintain our own templates for the equipment we operate. This means you see what matters — not a generic CPU and memory dashboard that tells you nothing when the problem is a downed BGP session.

Templates for Huawei NE8000/NE40: BGP sessions, active PPPoE sessions, CGNAT utilization, temperature
Templates for Juniper MX: interface state, BGP sessions, QoS queues, RE heap
Templates for MikroTik CCR/CHR: CPU per core, PPPoE sessions, BGP peers, IP pool usage
Templates for Huawei MA5800, ZTE C320/C600 and FiberHome AN5516 OLTs: optical power per ONU, slot alarms, temperature

Centralized LibreNMS on Rasys infrastructure: external visibility

Zabbix installed on the ISP's own infrastructure has a blind spot: if the ISP's infrastructure goes down, Zabbix goes with it. To have visibility from the outside — to see that the ISP is offline even when everything inside is stopped — we maintain a centralized LibreNMS instance on Rasys infrastructure that collects metrics from clients' edge equipment.

LibreNMS on Rasys infrastructure collecting via SNMP from the ISP's edge equipment
Dual metrics: local Zabbix (high granularity) + external LibreNMS (external availability)
Unavailability alerts arrive even when the ISP's internal NOC is blind
Uptime and availability history for each edge link without depending on the ISP's own infrastructure

Grafana: dashboards that make sense for operators

Zabbix stores data. Grafana presents it in a way that lets you understand what is happening in seconds — without navigating menus or having to remember which metric to look at.

Network overview dashboard: upstream state, IX.br, PPPoE sessions, backbone utilization
Per-equipment dashboard: load profile over the day, day-over-day comparison
OLT dashboard: PONs with most alarms, ONUs with low optical power
BGP dashboard: prefixes announced/received per session, flap history

NetFlow and sFlow: top talkers and DDoS detection

Knowing that an interface is saturated is not enough — you need to know who is causing the saturation. NetFlow and sFlow answer that in real time.

NetFlow/sFlow export configuration on edge and distribution equipment
Collection with ntopng, pmacct or GoFlow2 depending on scale and budget
Top talker identification by source IP, destination IP, port and protocol
DDoS detection by anomalous packet volume per second or by source ASN

Telegram and Discord alerts

An alert that only arrives by email is an alert that gets ignored. We configure Zabbix alerts for Telegram and Discord with a clear message: what went down, the severity level, and a direct link to the host in Zabbix.

Telegram bot with alerts by severity (information, warning, average, high, disaster)
Discord webhook with color-coded embed by criticality level
Channels separated by severity: informational alerts go to the general group, disaster goes to the dedicated high-priority channel
Alert suppression during maintenance windows configured in Zabbix

Centralized syslog

Logs scattered on each device are logs no one reads. With centralized syslog, you have event correlation across multiple devices — and can reconstruct what happened during an incident without SSH-ing into every box.

Remote syslog configuration on routing equipment, OLTs and edge switches
Centralized collection with rsyslog or Graylog depending on volume
Parsing of BGP, PPPoE, OSPF and GPON messages for structured indexing
Configurable retention by severity (errors for 90 days, informational for 30)

How we work and how to get started

We work on a monthly plan — no one-off projects and no hourly-rate diagnostics. The first conversation is at no cost: we call, you share an AnyDesk session and show us the live environment while we share observations on what is being monitored and what is missing. If it makes sense for both sides, we close the monthly plan and go from there. Full admin access from the start — we need it to work effectively.

Talk to us — initial conversation, no commitment. See also: 24x7 NOC, BGP for ISPs.

FREQUENTLY ASKED QUESTIONS

How does the work start with you?

The first conversation is at no cost. You reach out, we call, you open an AnyDesk session and show us the monitoring environment live — or the lack of one. We share observations on what needs to be monitored and how. If it makes sense for both sides, we close the monthly plan and start the following week.

Do you charge a setup or onboarding fee?

No. The monthly plan covers everything: initial configuration, template creation, dashboards, alerts and ongoing maintenance.

Do I need to host Zabbix on my infrastructure, or do you host it?

Zabbix runs on the ISP's infrastructure to have direct SNMP access to the equipment. The external LibreNMS instance runs on our infrastructure. We install and fully maintain Zabbix.

How long does it take to have monitoring working?

For a mid-size ISP with Zabbix already installed, getting templates, alerts and basic dashboards live typically takes 1 to 2 weeks. If Zabbix does not yet exist, we include the installation — and the timeline is 2 to 4 weeks to have everything working.

Only find out about problems when customers call?

Zabbix on your server, with custom templates and alerts that make sense. Initial conversation at no cost.

Talk about monitoring on WhatsApp

Zabbix Monitoring for Internet Service Providers