Monitoring
The Monitoring page provides network device monitoring, uptime tracking, tag-based rollup alerting, and discovery tools. It includes ping, TCP port, and HTTP endpoint checks, plus ARP-based and port-range network scanning.
Open Monitoring
Overview
The page is split into three tabs:
- Monitors — create and manage uptime monitors
- Scan — ARP sweep with optional TCP port-range scan
- Tags — group monitors into rollup alert sets with shared health rules and dependency suppression
Monitors Tab
Stats Dashboard
A strip of tiles at the top of the tab summarizes fleet health:
- Total — total monitors configured
- Up — currently responding
- Down — currently failing
- Unknown — never checked or indeterminate
- Skipped — not checked this cycle (dependency down, disabled, etc.)
- Tags Down — number of Monitor Tags currently rolled up to
down(with degraded count when applicable). Hidden when no tags are configured. - Avg Response — rolling average response time across all monitors (ms)
Stats auto-refresh on every state change, and a periodic poll (~30 s) keeps counts aligned with the server even if a subscription event is missed.
Filtering
- Search box — filters by name, IP, state, or tags
- Tag chips — click a tag to filter; multiple tags combine as OR. Clear removes all tag filters. Each chip carries a small dot showing the tag's current rollup state — green/red/amber/grey/dashed for up/down/degraded/unknown/inert.
- Result count shows
filtered / total.
Bulk Edit
Click Bulk Edit in the filter row to enter multi-select mode. Each card grows a checkbox; an action bar above the list offers:
- Select all (filtered) / Clear — operate on the current visible filter set
- Add Tag — add the chosen tags to every selected monitor
- Remove Tag — strip the chosen tags from every selected monitor
- Replace Tags — replace each selected monitor's tag set with exactly the chosen tags (empty picker strips all tags)
Tag pickers in the bulk modal pick by id from the live monitor_tag table. Click Done in the filter row to exit bulk mode.
Monitor Cards
Each card displays:
- Name and optional linked device (
D#123) - Paused badge when disabled
- Type / Target / Interval summary line
- Depends on and Power zone references (if configured)
- Tag chips — colored by the live rollup state of each tag the monitor belongs to (up = green, down = red, degraded = amber, unknown = grey). A
suppressed by <tag>chip appears when an upstream tag in the dependency chain is currently down — indicating member alerts on this monitor are muted. - Last check timestamp, response time, and uptime (for monitors currently up)
- Error text if the last check failed
- Meta icons — email recipient count, SMS recipient count, notification-profile override flag, last alert time with throttle-remaining
- State badge —
up,down,error,checking, orunknown
When a monitor changes state, its card briefly flashes green (up) or red (down) so the transition is visible at a glance.
Card Actions
Icon buttons on each card:
- Pause / Play — toggle monitor enabled state
- Run — force a check immediately
- History — open the response-time history modal (see below)
- Edit — open the edit modal
- Delete — remove the monitor
History Modal
The History icon opens a full-size chart of the monitor's response times, populated from the persisted monitor_history table so the data survives server restarts.
The chart shows:
- Blue line — response time per check (lower = faster)
- Red dots — failed checks
- Orange bells at the top with a dashed vertical line — checks that fired a notification. Hover for subject, recipient counts, dependent-monitor impact, and send timestamp.
- X-axis — time labels across the window (includes date when the window spans more than one day)
- Max label — top-left corner shows the peak response time in the window
Below the chart, four tiles summarize the period: total checks, success rate, average response time, and max response time. A running failure count is shown beneath the tiles.
:::info History Retention Monitor check results are retained for 30 days and pruned hourly. The in-memory cache keeps the most recent 100 checks for fast subscription replay; older data is served from the database. :::
Creating a Monitor
Click Add Monitor in the filter row, or use the Monitor button on a Scan tab row to prefill from discovery.
Monitor Types
- Ping (ICMP) — fastest; tests reachability only
- TCP — connects to a specific port; good for service-level checks
- HTTP / HTTPS — issues a request and validates the response status
Basic Fields
- Name — display label
- Type — ping / tcp / http
- Target — IP, hostname, or URL (HTTP monitors take a full URL)
Check Configuration
- Check Interval — 30 s, 1 min, 5 min, 15 min, or 1 hour
- Timeout — seconds before a single check is treated as failed (1–30)
- Retries — consecutive failed checks required before the monitor is flagged down (1–10)
Type-Specific Fields
- TCP — Port
- HTTP — Method (GET/POST/HEAD) and Expected Status Code
Associations
- Associate with Device — link to a GEM device (shows as
D#idon the card) - Depends on Monitor — skip checks while a parent is down, and suppress alert storms for downstream outages
- Power Zone (auto-reboot on down) — power-cycles the zone when the monitor stays down past its retry threshold
Tags
Comma-separated values. Tags become filter chips on the Monitors tab.
:::tip Prefer the Tags tab for new groupings
The legacy comma-separated tags field is still honored for filtering, but new rollup groupings should be created on the Tags Tab. Tag rows are first-class records with health rules, dependency edges, and rollup notifications — the legacy string is filter-only.
:::
Notifications
- Email Recipients — pick from GEM users or type addresses directly
- SMS Recipients — pick from GEM users or type numbers directly
- Alert Throttle — minimum time between alerts for this monitor. Options: no throttle, 5 min, 15 min, 1 h, 6 h, or 24 h (default).
- Bypass notification profile (always send) — ignores day/hour windows on the recipient's notification profile. Use for critical alerts that should page at any hour.
:::tip Send Test Notification The Send Test Notification button at the bottom of the modal fires a real email/SMS to every configured recipient using the currently-entered settings (whether the monitor is saved or not). Results are shown inline per recipient so you can verify alerting is wired up before relying on it. :::
Real-Time Updates
The page subscribes to live monitor events, so the list updates without reloading:
- check_result — refreshes the card state and last-response line
- state_change — triggers the flash highlight and stats refresh
- created / updated / deleted — keeps the list in sync across sessions
Scan Tab
Network discovery combining an ARP sweep with an optional TCP port-range scan. Use this tab to enumerate hosts on the LAN, identify services on remote subnets, and onboard devices or monitors in one pass.
Settings
- Network Interface — auto-fills the IP range from the interface's CIDR
- IP Range — CIDR (
192.168.1.0/24) or range form - Port Range — comma and dash syntax, e.g.
23,80,443,5900-5902 - Quick Scan (skip port scan) — skip the TCP port scan entirely and only discover hosts via ARP. Returns results in seconds; disables the Port Range field.
Click Start to scan, Stop to cancel a running port scan.
How the Scan Works
- An ARP pre-pass enumerates every responder in the IP range. Each host row is populated with IP, MAC, vendor, and an initial driver suggestion.
- If Quick Scan is unchecked, the TCP port scan runs afterward. Open ports stream in live and merge into the matching host's row, and the driver suggestion re-ranks as new ports arrive.
- Hosts that respond to the port scan but not to ARP (cross-subnet, ARP blocked) appear as bare rows with no MAC or vendor.
Results
Each row shows:
- Select — checkbox for bulk monitor creation; the header checkbox toggles all rows
- Vendor — from the MAC OUI lookup; falls back to
Unknown - IP — clickable when the host is already a GEM device; opens that device's editor in a modal in place (no navigation away from the scan)
- MAC
- Open Ports — port pills, added live as the scan progresses
- Suggested Driver — top match with a confidence score and the primary reason. Low-confidence suggestions (<40%) render in muted grey. For hosts already in GEM the column shows the device's configured driver display name instead of a guess.
- Actions — Import opens the device creator prefilled with IP / MAC / port; Monitor opens the monitor creator (port 80/443 prefill as HTTP, a single other port as TCP, no port prefill for multi-port hosts).
Hosts already linked to a GEM device are highlighted in green, show an Already in GEM pill with the device name/label and ID, and the Import button is disabled.
:::tip Reused on the Devices page The same scanner is embedded as the Scan Network modal on the Devices page (with the bulk-monitor controls hidden) so you can onboard hardware without leaving the device admin. :::
Bulk Monitor Creation
Select multiple hosts via row checkboxes to create monitors for all of them in one click. A bar appears above the results table when any rows are selected:
- Selected count — how many hosts will be monitored
- Tags — comma-separated tags applied to every created monitor (defaults to
discovered) - Create N Monitors — runs the same prefill logic as the per-row Monitor button for each selected host (HTTP for port 80/443, TCP for a single other port, otherwise ping) and saves them sequentially. A toast reports successes and failures.
- Clear — deselects all rows without creating anything
Selections survive incremental port-scan updates but are wiped when a new scan starts.
Tags Tab
Monitor Tags group correlated monitors so a single rollup alert covers a shared outage instead of N member alerts. Each tag is a first-class record with its own health rule, dependency edges, and notification fan-out.
Why Tags
Without tags, every member monitor that goes down sends its own email/SMS — a flapping internet uplink can produce dozens of duplicate pages for the same root cause. Tags collapse that into one rollup alert per outage, and dependency edges suppress noise when a deeper outage is already firing.
Tag Cards
Each tag card shows:
- Label (or Name when no label is set) and a state badge:
up/down/degraded/unknown/inert - Rule — anchor monitor name, threshold percentage, or "all members down"
- Members — count plus how many are currently down
- Depends on — chip list of upstream tags
- Recipient summary —
N email · M sms, plus down/recovery macro chips - A red warning when the tag has no recipients and no macros (transitions are inert)
- History, Edit, and Delete action buttons
The inert — needs setup badge means the tag is configured to never roll up to down (an anchor rule with no anchor, or a threshold rule with 0%). Inert tags are safe — they were created either by the legacy migration or by an unfinished edit, and they fire nothing until configured.
Creating or Editing a Tag
Click Add Tag (or the edit icon on a card). The modal exposes:
Identity
- Name — lowercase identifier (filter-friendly)
- Label — optional human-readable label shown on dashboards and emails
- Description — optional free text
Health Rule
- Anchor monitor — the tag mirrors a single anchor monitor's state. Tag goes down when the anchor goes down. Leaving the anchor empty makes the tag inert.
- All members down — tag is
downonly when every member is down;degradedwhen some members are down;upwhen all members are up. - Threshold % of members down — tag goes
downwhen the configured percentage of members are down.degradedwhile below threshold but with at least one member down.
Members — pick monitors via the searchable picker. Tags can have any number of members.
Depends On (Upstream Tags) — pick other tags this one depends on. If any transitive upstream tag is currently down, this tag's rollup alert is suppressed and member alerts on its monitors are muted. Cycles are rejected at save time.
Trigger Macros on State Change
- On Down — run macro — fires once per rollup transition into
down. Suppressed when an upstream tag is also down. - On Recovery — run macro — fires once when the tag transitions
down → up. Receivesoutage_duration_msin context.
The macro context for both fields includes monitor_tag_id, monitor_tag_name, monitor_tag_label, previous_state, new_state, trigger_monitor_id / trigger_monitor_name / trigger_monitor_ip, and rollup_reason. See Macros — Context Simulator for previewing these values while authoring steps.
Rollup Notifications
- Email Recipients / SMS Recipients — pick from GEM users or type in addresses/numbers
- Alert Throttle — minimum time between rollup alerts for this tag (no throttle, 5 min, 15 min, 1 h, 6 h, or 24 h default). Throttle is per-direction — a recovery alert is never gated by a recent down alert.
- Bypass notification profile (always send) — same semantics as on per-monitor notifications
Suppression Semantics
- Member suppression — when a tag transitions in the same direction as one of its members, the tag's single rollup alert covers the member; the per-monitor email/SMS is muted.
- Upstream suppression — when any transitive upstream tag is currently
down, both the member alert and the dependent tag's rollup are muted. Only the deepest still-up→down transition produces side effects. - Boot-time resync gate — the recovery edge (
down → up) only fires if adownalert was actually dispatched. On boot, the engine seeds per-direction last-alert timestamps from the most recent notifying rows inmonitor_historyandmonitor_tag_history, so a monitor that went down before the restart still produces a recovery alert when it comes back up — and a monitor that bootsdownwithout any prior down alert won't emit a phantom recovery.
Tag History
Click the history icon on a tag card to open the rollup transition timeline:
- A horizontal SVG timeline of state segments (up / down / degraded / unknown), color-coded
- A table of every transition with timestamp, previous→new state, rollup reason, triggering monitor, and dispatch summary (macro id, email/sms counts)
- Outage duration is computed from the cheap
last_down_timecolumn populated when the tag enters down/degraded — no history-walk in the hot path
History rows are pruned alongside monitor_history (30 days, hourly).
Legacy Tag Migration
Legacy comma-separated monitor.tags strings are migrated into monitor_tag rows on first boot after upgrade. Migrated tags default to the anchor rule with no anchor configured — meaning they're inert and won't fire any rollup alerts until an admin opens them and configures the rule. The migration can never increase alert volume.
The migration is idempotent and gated by a system attribute (monitor_tag_migration_done), so deleting a migrated tag and rebooting won't resurrect it from a stale legacy string.
Auto-Reboot on Failure
When a monitor has a Power Zone configured and the monitor stays down past its retry threshold, GEM power-cycles the zone to recover the device. Useful for:
- Cameras and NVRs that hang
- Routers and switches that need periodic reboots
- Equipment without remote management
:::warning Don't auto-reboot critical life-safety equipment Don't assign a power zone to monitors for HVAC controllers, access control panels, or other equipment where an unattended reboot could cause harm. :::
Monitor Dependencies
Depends-on relationships build a hierarchy:
Monitor 1: Internet Gateway (8.8.8.8)
Monitor 2: Local Router (depends on Monitor 1)
Monitor 3: Camera (depends on Monitor 2)
When a parent is down, dependent monitors are counted as skipped rather than alerting separately — eliminating alert storms during upstream outages.
When a monitor with dependents goes down, the alert email lists every downstream monitor affected, and the SMS includes a compact count (impacts 4 dependent monitors) so the recipient sees the blast radius at a glance. The same payload is recorded on the triggering history row and is visible as an orange marker on the History Modal chart.
:::tip Dependencies vs Tags
Per-monitor depends_on_monitor_id is a one-to-one parent/child link that pauses checks while the parent is down. Tag dependencies are tag-to-tag edges that suppress alerts (member emails and dependent tag rollups) without affecting the underlying check schedule. Use both together: dependency edges to model "if A is down, don't bother probing B"; tag edges to collapse a noisy outage into a single page.
:::
Related Documentation
- Devices — device configuration
- Device Health — historical uptime data
- Data Retention — log retention settings
- Notification Profiles — recipient day/hour windows and channels