Zabbix Monitoring for Internet Service Providers
Knowing the network went down before the subscriber calls changes everything. Proper monitoring is the difference between resolving in 5 minutes and spending 2 hours figuring out what happened.
Zabbix with custom templates for ISP equipment
A generic Zabbix template does not work for ISP equipment. Huawei NE8000 has specific MIBs for PPPoE sessions, BGP prefixes and module temperature. ZTE and FiberHome OLTs have different OIDs for ONU optical power. MikroTik CCR has monitoring resources via API that SNMP does not expose well.
We develop and maintain our own templates for the equipment we operate. This means you see what matters — not a generic CPU and memory dashboard that tells you nothing when the problem is a downed BGP session.
- Templates for Huawei NE8000/NE40: BGP sessions, active PPPoE sessions, CGNAT utilization, temperature
- Templates for Juniper MX: interface state, BGP sessions, QoS queues, RE heap
- Templates for MikroTik CCR/CHR: CPU per core, PPPoE sessions, BGP peers, IP pool usage
- Templates for Huawei MA5800, ZTE C320/C600 and FiberHome AN5516 OLTs: optical power per ONU, slot alarms, temperature
Centralized LibreNMS on Rasys infrastructure: external visibility
Zabbix installed on the ISP's own infrastructure has a blind spot: if the ISP's infrastructure goes down, Zabbix goes with it. To have visibility from the outside — to see that the ISP is offline even when everything inside is stopped — we maintain a centralized LibreNMS instance on Rasys infrastructure that collects metrics from clients' edge equipment.
- LibreNMS on Rasys infrastructure collecting via SNMP from the ISP's edge equipment
- Dual metrics: local Zabbix (high granularity) + external LibreNMS (external availability)
- Unavailability alerts arrive even when the ISP's internal NOC is blind
- Uptime and availability history for each edge link without depending on the ISP's own infrastructure
Grafana: dashboards that make sense for operators
Zabbix stores data. Grafana presents it in a way that lets you understand what is happening in seconds — without navigating menus or having to remember which metric to look at.
- Network overview dashboard: upstream state, IX.br, PPPoE sessions, backbone utilization
- Per-equipment dashboard: load profile over the day, day-over-day comparison
- OLT dashboard: PONs with most alarms, ONUs with low optical power
- BGP dashboard: prefixes announced/received per session, flap history
NetFlow and sFlow: top talkers and DDoS detection
Knowing that an interface is saturated is not enough — you need to know who is causing the saturation. NetFlow and sFlow answer that in real time.
- NetFlow/sFlow export configuration on edge and distribution equipment
- Collection with ntopng, pmacct or GoFlow2 depending on scale and budget
- Top talker identification by source IP, destination IP, port and protocol
- DDoS detection by anomalous packet volume per second or by source ASN
Telegram and Discord alerts
An alert that only arrives by email is an alert that gets ignored. We configure Zabbix alerts for Telegram and Discord with a clear message: what went down, the severity level, and a direct link to the host in Zabbix.
- Telegram bot with alerts by severity (information, warning, average, high, disaster)
- Discord webhook with color-coded embed by criticality level
- Channels separated by severity: informational alerts go to the general group, disaster goes to the dedicated high-priority channel
- Alert suppression during maintenance windows configured in Zabbix
Centralized syslog
Logs scattered on each device are logs no one reads. With centralized syslog, you have event correlation across multiple devices — and can reconstruct what happened during an incident without SSH-ing into every box.
- Remote syslog configuration on routing equipment, OLTs and edge switches
- Centralized collection with rsyslog or Graylog depending on volume
- Parsing of BGP, PPPoE, OSPF and GPON messages for structured indexing
- Configurable retention by severity (errors for 90 days, informational for 30)
How we work and how to get started
We work on a monthly plan — no one-off projects and no hourly-rate diagnostics. The first conversation is at no cost: we call, you share an AnyDesk session and show us the live environment while we share observations on what is being monitored and what is missing. If it makes sense for both sides, we close the monthly plan and go from there. Full admin access from the start — we need it to work effectively.
Talk to us — initial conversation, no commitment. See also: 24x7 NOC, BGP for ISPs.
FREQUENTLY ASKED QUESTIONS
How does the work start with you?
The first conversation is at no cost. You reach out, we call, you open an AnyDesk session and show us the monitoring environment live — or the lack of one. We share observations on what needs to be monitored and how. If it makes sense for both sides, we close the monthly plan and start the following week.
Do you charge a setup or onboarding fee?
No. The monthly plan covers everything: initial configuration, template creation, dashboards, alerts and ongoing maintenance.
Do I need to host Zabbix on my infrastructure, or do you host it?
Zabbix runs on the ISP's infrastructure to have direct SNMP access to the equipment. The external LibreNMS instance runs on our infrastructure. We install and fully maintain Zabbix.
How long does it take to have monitoring working?
For a mid-size ISP with Zabbix already installed, getting templates, alerts and basic dashboards live typically takes 1 to 2 weeks. If Zabbix does not yet exist, we include the installation — and the timeline is 2 to 4 weeks to have everything working.