Understanding Alerts — Triage & Resolution
Phase 4 — Monitoring & Policies · OpenFrame Onboarding
Read this first. OpenFrame doesn't yet have a single unified "alerts inbox" with formal open→acknowledged→closed states — a dedicated notification/alert center is on the roadmap. Today, "alerts" surface across a few places: the Monitoring compliance views, the Logs stream, and per-device Alert Configuration. This guide shows where to look and how to work a problem from trigger to resolved with what's in the product now.
The goal of monitoring is simple: know something's wrong before the client does. Here's how that signal reaches you and what to do with it.
What triggers an alert
- A policy fails. A device stops matching a compliance check (encryption turned off, OS out of date) and flips to FAILING; the policy becomes non-compliant.
- A device goes offline or overdue. Based on the thresholds in a device's Alert Configuration (see below).
- A tool reports an event. Fleet, the agent, and other tools write events to the Logs stream.
Where alerts surface today
1. The Monitoring dashboard (your first stop).
On Monitoring → Policies, the Failed Policies tile tells you if anything is non-compliant right now. Open a failing policy and its Devices table shows exactly which machines are FAILING — that's your triage list.
2. The Logs stream.
Logs (left nav) is the running event feed — Log ID, Status (e.g. INFO), Tool (e.g. Fleet), Source, and Log Details. Use the column filters and Search for Logs to find events for a tool or status, and Refresh to pull the latest. This is where tool- and system-level events land.
3. Per-device Alert Configuration.
On a device's detail page, the Security tab has an Alert Configuration section: Email / Text / Dashboard alert toggles, an Alert Template, and offline/overdue thresholds. This is where you decide what counts as alert-worthy for that device and how you want to be told.
A triage workflow
When Failed Policies is non-zero, or a device looks unhealthy:
- Identify. Open the failing policy → note which devices are FAILING. Or start from the device's detail page if a specific machine is in question.
- Understand. Look at the policy's Query to see exactly what it checks, and the device's Security / Compliance tabs for context (encryption, patches, agent health).
- Check the agent. A surprising number of "failures" are really an offline or missing Fleet agent — confirm on the device's Agents tab before chasing the check itself.
- Fix. Remediate on the device — often the fastest path is Run Script from the device's "…" menu (Phase 3) for a repeatable fix, or Remote Control/Shell for hands-on work.
- Verify (resolution). After the agent's next check-in, the device should flip back to PASSING and the policy return to COMPLIANT. Refresh to confirm — that's your "closed."
Getting alerts in front of your team
Since there's no unified alert center yet, route signal to where your team already works:
- Slack is the cleanest real-time path — connect it (Phase 8) and send alerts to a dedicated channel.
- Per-device Alert Configuration lets you enable Email / Text / Dashboard notifications and set offline/overdue thresholds for the machines that matter most.
What's missing today (and coming)
So you know the edges: there's no single queue that tracks each alert through acknowledged/assigned/closed states, and no global severity-routing rules engine. Those are part of the planned notification/alert center. If a unified alerts workflow is important to you, it's worth a vote in the OpenMSP Slack community.
Quick checklist
- Check Failed Policies on the Monitoring dashboard regularly
- Open failing policies to find the specific FAILING devices
- Rule out an offline/missing Fleet agent before deep-diving a failure
- Remediate (Run Script / Remote) and verify the device returns to PASSING
- Set per-device Alert Configuration thresholds and route alerts to Slack
What's next
That completes Phase 4 — Monitoring & Policies. Next is Phase 5 — Scripts & Automation, where you'll run and schedule the scripts you reach for during exactly this kind of remediation.
Based on OpenFrame v0.9.19. This area is actively evolving — a unified notification/alert center is on the roadmap, so re-check the console before treating any of the "missing" pieces as fixed.
