Building Alert Rules That Actually Catch Threats (Without the Noise)

Alert fatigue is the silent killer of security operations. According to research from the Ponemon Institute, the average SOC team receives over 11,000 alerts per day, and analysts can realistically investigate fewer than half. The result is predictable: real threats hide in the noise, response times stretch from minutes to hours, and talented analysts burn out chasing false positives.

The problem isn't that detection is hard. The problem is that most alerting systems give you a binary choice: either you cast a wide net and drown in noise, or you tighten your rules so much that you miss genuine threats. Building effective alert rules requires a different approach—one that combines precise detection logic with smart throttling, channel routing, and noise reduction.

SecureNow's alert rule system is built on this philosophy. Rules are defined as SQL queries against ClickHouse trace data, giving you the full power of a query language designed for analytical workloads. Combined with cron scheduling, configurable throttling, multi-channel delivery, and exclusion patterns, you get a detection system that catches what matters without burying your team in notifications.

The Anatomy of a SecureNow Alert Rule

Every alert rule in SecureNow has four core components: the detection query, the schedule, the throttle configuration, and the delivery channels. Understanding how these pieces fit together is the foundation for building rules that work.

The Detection Query and QueryMapping System

At the heart of every alert rule is a SQL query that runs against your ClickHouse trace data. SecureNow uses a QueryMapping system that maps each rule to a specific SQL query designed to detect a particular threat pattern. This isn't a simplified query builder with limited expressiveness—it's full SQL, which means you can detect virtually any pattern that exists in your OpenTelemetry trace data.

A basic detection query might look like this:

SELECT 
  peer_ip,
  count(*) AS request_count,
  countIf(status_code >= 400 AND status_code < 500) AS client_errors
FROM traces
WHERE timestamp >= now() - INTERVAL 15 MINUTE
  AND http_target LIKE '/api/auth%'
GROUP BY peer_ip
HAVING client_errors > 50
ORDER BY client_errors DESC

This query identifies IPs generating more than 50 client errors against authentication endpoints in the last 15 minutes—a classic indicator of credential stuffing or brute force activity. The HAVING clause is your sensitivity dial: set it too low and you'll alert on legitimate users with forgotten passwords; set it too high and you'll miss the early stages of an attack.

The QueryMapping system lets you define these queries as reusable templates. Each mapping specifies the SQL, the parameters that can be tuned (like thresholds and time windows), and the metadata that gets included in the resulting notification. This means you can create a library of detection patterns and deploy them across multiple applications without rewriting queries from scratch.

Cron Scheduling: When and How Often

Every alert rule runs on a cron schedule. The default interval is every 15 minutes, which provides a reasonable balance between detection speed and system load for most use cases. But this is fully configurable—you can run critical rules every minute for near-real-time detection, or reduce frequency for lower-priority monitoring.

The scheduling decision should be driven by two factors: how quickly you need to detect the threat, and how much query load you can tolerate. A rule detecting active credential stuffing should probably run every 1–5 minutes. A rule monitoring for slow-burn reconnaissance can comfortably run every 30 minutes or hourly.

Keep in mind that the schedule interval interacts with your query's time window. If your rule runs every 15 minutes but queries the last 5 minutes of data, you'll have 10-minute gaps where events aren't evaluated. Align your schedule interval with your query's lookback period to ensure continuous coverage.

Throttle Configuration: Preventing Alert Storms

Throttling is arguably the most important—and most underappreciated—component of an alert rule. Without throttling, a sustained attack that triggers your detection query will fire a notification every time the cron job runs. During a multi-hour credential stuffing campaign, that's dozens of identical notifications flooding your channels.

Each SecureNow alert rule has a configurable cooldown period. After the rule triggers and sends a notification, it enters a cooldown state. During this period, even if the detection query returns results, no new notification is sent. The default cooldown is 15 minutes, but you should adjust it based on the nature of the threat.

For acute threats where rapid escalation is expected, a shorter cooldown (5–10 minutes) ensures you stay updated on evolving situations. For persistent, low-grade threats like scanning activity, a longer cooldown (30–60 minutes) prevents noise while still ensuring the activity is periodically flagged.

The key insight is that throttling doesn't suppress detection—it suppresses notification. The rule still runs, the query still executes, and the results are still available. What changes is whether a new notification gets pushed to your channels. This means you can always check rule execution history to see what happened during a cooldown period.

Alert Channels: Getting the Right Alert to the Right Place

Not every alert deserves the same delivery treatment. A critical detection on your production authentication service should probably wake someone up. A low-severity monitoring alert about unusual traffic patterns can wait for the next shift review.

SecureNow supports three alert channels, and you can configure multiple channels per rule:

Email (via Resend) — Best for alerts that need a persistent, searchable record. Email alerts include the full detection context: the query results, the IPs involved, the time window, and direct links to investigate in SecureNow. Use email for medium-severity alerts and as a backup channel for critical rules.

Slack (via webhooks) — Best for real-time team awareness. Slack alerts appear in your configured channel with enough context for quick triage. They're ideal for high-severity rules where you want the on-call analyst to see the alert immediately. Configure your webhook to post to a dedicated security alerts channel to avoid drowning general channels.

In-app notifications — Best for operational tracking. In-app notifications feed directly into SecureNow's notification triage workflow, where they can be acknowledged, investigated, and resolved through the standard status lifecycle. Every alert rule should include in-app as a channel, since it's the primary interface for investigation.

A well-configured rule for critical authentication threats might use all three: in-app for investigation workflow, Slack for immediate visibility, and email for audit trail. A lower-priority rule monitoring for unusual geographic access patterns might use only in-app notifications.

Per-Rule Exclusions: Surgical Noise Reduction

Even well-designed detection queries generate noise. Health check endpoints, monitoring probes, internal service-to-service traffic, and legitimate automated systems can all trigger rules designed to catch malicious activity. The solution isn't to make your queries less sensitive—it's to exclude the known benign patterns.

SecureNow provides two levels of exclusion patterns:

Global exclusions apply across all alert rules. These are ideal for patterns that are universally benign in your environment—health check endpoints like /api/health or /healthz, monitoring probes from known IPs, and internal load balancer traffic. Setting these once eliminates an entire class of false positives from every rule.

Per-rule exclusions apply only to a specific rule. These are for context-dependent noise. For example, your authentication monitoring rule might need to exclude /api/auth/reset-password because legitimate password reset flows generate the same 4xx patterns as credential stuffing. But that exclusion doesn't make sense for your API scanning detection rule.

Exclusion patterns use path matching, so you can target specific endpoints or endpoint groups. The pattern /api/internal/* would exclude all internal API paths, while /api/auth/reset-password targets a single endpoint.

For more advanced exclusion strategies, including AI-suggested exclusions and test-before-apply workflows, see our guide on reducing false positives in SOC alerting.

Bulk Operations: Managing Rules at Scale

As your detection library grows, managing individual rules becomes impractical. SecureNow supports bulk enable/disable operations, letting you activate or deactivate groups of rules in a single action.

This is particularly useful during incidents. If a false positive pattern is generating noise across multiple rules, you can temporarily disable the affected rules while you create the appropriate exclusion patterns, then re-enable them once the exclusions are in place. It's also valuable for scheduled maintenance windows—disable your availability monitoring rules before the planned outage, re-enable them when it's over.

Each rule also tracks operational statistics: when it was last triggered, how many times it has executed, and its current enabled/disabled state. These stats help you identify rules that aren't firing (which might indicate they need tuning or that the threat model has changed) and rules that fire constantly (which almost certainly need tighter thresholds or additional exclusions).

Real-World Alert Rule Examples

Theory is useful, but let's look at practical rules that SOC teams deploy in production.

4xx Spike Detection

SELECT 
  peer_ip,
  count(*) AS total_requests,
  countIf(status_code >= 400 AND status_code < 500) AS error_count,
  round(error_count / total_requests * 100, 2) AS error_rate
FROM traces
WHERE timestamp >= now() - INTERVAL 15 MINUTE
  AND http_target NOT LIKE '/api/health%'
GROUP BY peer_ip
HAVING error_count > 100 AND error_rate > 80
ORDER BY error_count DESC
LIMIT 50

This rule catches IPs with a high volume of client errors and a high error rate—classic indicators of scanning, credential stuffing, or API fuzzing. The error_rate > 80 condition filters out IPs that are generating some errors alongside mostly successful traffic (likely legitimate users), focusing on IPs whose activity is overwhelmingly unsuccessful.

Recommended config: 5-minute schedule, 15-minute throttle, Slack + in-app channels.

5xx Error Monitoring

SELECT 
  http_target,
  count(*) AS error_count,
  uniqExact(peer_ip) AS unique_ips,
  min(timestamp) AS first_error,
  max(timestamp) AS last_error
FROM traces
WHERE timestamp >= now() - INTERVAL 10 MINUTE
  AND status_code >= 500
GROUP BY http_target
HAVING error_count > 20
ORDER BY error_count DESC

Unlike the 4xx rule which focuses on attacker behavior, this rule monitors for application health issues that could indicate exploitation. A sudden spike in 500 errors on a specific endpoint might mean an attacker found an input that crashes your service—a potential denial of service or exploitation vector.

Recommended config: 10-minute schedule, 30-minute throttle, Email + in-app channels.

IP-Based Anomaly Detection

SELECT 
  peer_ip,
  uniqExact(http_target) AS unique_endpoints,
  count(*) AS total_requests,
  min(timestamp) AS first_seen,
  max(timestamp) AS last_seen,
  dateDiff('second', first_seen, last_seen) AS duration_seconds
FROM traces
WHERE timestamp >= now() - INTERVAL 30 MINUTE
GROUP BY peer_ip
HAVING unique_endpoints > 30 AND duration_seconds < 300
ORDER BY unique_endpoints DESC

This rule identifies IPs that hit a large number of unique endpoints in a short time window—a hallmark of automated scanning and API enumeration. Legitimate users rarely visit more than 30 unique endpoints in under 5 minutes. Automated tools do it constantly.

Recommended config: 15-minute schedule, 30-minute throttle, in-app channel.

For more on querying trace data for security insights, see our guide on ClickHouse queries for application security analytics.

Best Practices for Alert Rule Design

Building effective alert rules is as much an art as a science. These practices, distilled from real SOC team experience, will help you build a detection library that improves over time rather than decaying into noise.

Start Conservative, Then Tune

New rules should launch with higher thresholds and longer throttle periods. It's far better to miss a few edge cases during the first week than to flood your team with false positives that erode trust in the alerting system. Monitor the rule's execution stats, review a sample of triggers, and gradually lower thresholds as you build confidence in the rule's precision.

Document the Intent, Not Just the Query

Every rule should have a clear description of what threat it's designed to detect and what response it should trigger. Six months from now, when a new analyst sees the rule fire, they should understand both what happened and what to do about it without reverse-engineering the SQL.

Align Schedule, Lookback, and Throttle

These three parameters interact in ways that aren't always obvious. A rule that runs every 5 minutes with a 15-minute lookback and a 15-minute throttle will evaluate the same data multiple times but only alert once per throttle window. A rule that runs every 15 minutes with a 15-minute lookback and a 5-minute throttle will evaluate each data window once but could rapid-fire if the condition persists. Map out the interaction before deploying.

Use Exclusions Proactively

Don't wait for false positives to add exclusions. If you know your rule will match health check endpoints, internal monitoring probes, or CDN edge nodes, add the exclusions when you create the rule. Proactive exclusion is cheaper than reactive triage.

Review and Retire Regularly

Alert rules have a shelf life. The threat landscape changes. Your application evolves. Endpoints get renamed or deprecated. Schedule quarterly reviews of your rule library. Disable rules that haven't fired in months—they're either perfectly calibrated (unlikely) or detecting a pattern that no longer exists. Archive rules that are no longer relevant rather than leaving them cluttering the active set.

Layer Your Detection

No single rule catches everything. Design your rules in layers:

Layer 1: Volume-based — Detect high-volume activity that exceeds normal baselines (4xx spikes, request floods).
Layer 2: Pattern-based — Detect specific attack signatures (sequential endpoint scanning, authentication failures from distributed IPs).
Layer 3: Behavioral — Detect anomalous patterns that don't match known signatures (unusual request timing, geographic anomalies).

Each layer catches threats the others miss. Together, they create a detection surface that's far more robust than any individual rule.

Test Against Historical Data

Before deploying a new rule to production, run the detection query manually against historical trace data using SecureNow's forensics query engine. This lets you see what the rule would have detected over the past days or weeks, giving you a realistic preview of its precision and recall before it starts generating real notifications.

From Rules to Response

Alert rules are the front door of your detection and response pipeline. They generate the notifications that feed into your triage workflow, which drives IP investigation, forensic analysis, and ultimately incident resolution. A well-designed rule library doesn't just detect threats—it shapes the quality and efficiency of everything downstream.

The investment you make in thoughtful rule design, disciplined throttling, and proactive noise reduction pays compound dividends. Every false positive you eliminate is an analyst-minute redirected toward genuine threats. Every well-documented rule is institutional knowledge that survives team changes. Every layered detection strategy is a safety net that catches what individual rules miss.

Build your rules with the same rigor you'd apply to production code. Test them, document them, review them, and iterate. The threats will evolve. Your detection should evolve with them.

Frequently Asked Questions

What query language does SecureNow use for alert rules?

SecureNow uses SQL queries against ClickHouse trace data. Each rule maps to a QueryMapping that defines the SQL used to detect specific threat patterns in your application traces.

How does alert throttling prevent alert storms?

Each alert rule has a configurable cooldown period (default 15 minutes). After triggering, the rule won't fire again until the cooldown expires, preventing notification floods during sustained attacks.

Can I send alerts to multiple channels?

Yes, SecureNow supports Email (via Resend), Slack (via webhooks), and in-app notifications. You can configure multiple channels per alert rule for redundant delivery.

How do exclusion patterns work with alert rules?

Exclusion patterns filter out known benign paths (like /api/health) from alert results. You can set global exclusions that apply to all rules or per-rule exclusions for targeted filtering.