Back to Blog

Writing Custom ClickHouse Queries for Application Security Analytics

A developer's tutorial on writing ClickHouse SQL queries for security analytics — find suspicious IPs, detect error patterns, and analyze application traffic using trace data.

Posted by

Writing Custom ClickHouse Queries for Application Security Analytics

Your application generates thousands of traces per hour. Each trace contains spans with HTTP methods, URLs, status codes, client IPs, database queries, response times, and service-to-service call chains. That data sits in ClickHouse — a columnar database engine built for exactly the kind of analytical queries that security investigations demand. The question is whether you know how to ask it the right questions.

Most developers are comfortable with PostgreSQL or MySQL. ClickHouse SQL looks similar on the surface, but its columnar architecture and specialized functions unlock analytical patterns that traditional databases struggle with. Scanning billions of spans to find the ten IPs that exhibited suspicious behavior in the last hour? ClickHouse does that in under a second.

This tutorial teaches you the ClickHouse query patterns that matter for application security: the schema you're querying, the functions that make security analytics practical, and a library of ready-to-use queries you can run in SecureNow's forensics console or convert into alert rules.

Understanding the Trace Schema

SecureNow stores OpenTelemetry trace data in the SigNoz format. The primary table you'll query is signoz_traces.distributed_signoz_index_v2. Understanding its columns is essential for writing effective queries.

Key Columns for Security Analytics

ColumnTypeDescription
timestampDateTime64When the span was created
traceIDStringUnique trace identifier
spanIDStringUnique span identifier
parentSpanIDStringParent span (empty for root spans)
serviceNameLowCardinality(String)Application/service name
nameLowCardinality(String)Span name (usually the route or operation)
kindInt8Span kind (1=internal, 2=server, 3=client)
durationUInt64Span duration in nanoseconds
response_status_codeLowCardinality(String)HTTP status code
attribute_string_http_methodLowCardinality(String)HTTP method (GET, POST, etc.)
attribute_string_http_urlStringFull request URL
attribute_string_http_hostStringHTTP Host header
attribute_string_peer_ipStringClient IP address
attribute_string_db_systemLowCardinality(String)Database system (postgres, mysql, etc.)
attribute_string_db_statementStringDatabase query statement

Additional columns exist for custom attributes you've added to your spans. Use the schema introspection feature in SecureNow's forensics console to explore the full column list for your specific data.

The __USER_APP_KEYS__ Placeholder

When writing queries for alert rules, use __USER_APP_KEYS__ instead of hardcoding service names. SecureNow replaces this placeholder with your registered application keys at execution time:

WHERE serviceName IN (__USER_APP_KEYS__)

This ensures rules automatically apply to all your registered services and adapt as you add new applications.

For ad-hoc forensic queries, you can either filter by specific service names or select an application context in the forensics UI, which injects the appropriate filter automatically.

ClickHouse Functions Every Security Analyst Should Know

ClickHouse includes a rich library of functions that make security analytics practical. These are the ones you'll reach for most often.

Time Bucketing with toStartOfInterval

Group events into time windows for trend analysis. This is essential for detecting spikes and anomalies:

SELECT
  toStartOfInterval(timestamp, INTERVAL 5 MINUTE) AS time_bucket,
  count(*) AS request_count,
  countIf(response_status_code >= 400) AS error_count
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 1 HOUR
  AND serviceName IN (__USER_APP_KEYS__)
GROUP BY time_bucket
ORDER BY time_bucket

This gives you a 5-minute-resolution time series of total requests vs. errors — the basic shape of your traffic that makes spikes visually obvious.

Conditional Counting with countIf and sumIf

Count or sum only rows matching a condition, without needing subqueries:

countIf(response_status_code = 401) AS unauthorized_count,
countIf(response_status_code = 403) AS forbidden_count,
countIf(response_status_code >= 500) AS server_errors

This is dramatically more efficient than running separate filtered queries and joining the results.

Unique Value Collection with groupArray and groupUniqArray

Collect distinct values into an array for each group — perfect for seeing which endpoints an IP targeted or which methods were used:

groupArray(DISTINCT name) AS targeted_endpoints,
groupArray(DISTINCT attribute_string_http_method) AS methods_used

Exact and Approximate Distinct Counts

uniqExact() gives precise distinct counts. uniq() gives approximate counts that are faster on large datasets:

uniqExact(traceID) AS unique_traces,
uniq(attribute_string_peer_ip) AS approx_unique_ips

For security analytics, use uniqExact when precision matters (e.g., counting affected users) and uniq when you just need order-of-magnitude estimates.

String Matching with like, match, and extract

ClickHouse supports both SQL LIKE patterns and full regular expressions via match():

WHERE name LIKE '%/api/auth%'
WHERE match(attribute_string_http_url, '.*(union|select|insert|drop).*')
WHERE extract(attribute_string_http_url, '(\d+\.\d+\.\d+\.\d+)') != ''

The match() function uses RE2 syntax and is significantly more powerful than LIKE for pattern detection.

Array Operations with arrayJoin

OpenTelemetry attributes sometimes contain array values. arrayJoin expands arrays into individual rows, letting you analyze nested data:

SELECT
  arrayJoin(attribute_string_tags) AS tag,
  count(*) AS occurrences
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY tag
ORDER BY occurrences DESC

Security Query Library

Here are eight ready-to-use queries covering the most common security analytics scenarios. Run these directly in SecureNow's forensics console, save them to your query library, or adapt them into alert rules.

<!-- CTA:trial -->

Query 1: Top IPs by Error Rate

Identify IPs that generate a disproportionate number of errors — the primary indicator of scanning and brute-force activity:

SELECT
  attribute_string_peer_ip AS ip,
  count(*) AS total_requests,
  countIf(response_status_code >= 400 AND response_status_code < 500) AS client_errors,
  countIf(response_status_code >= 500) AS server_errors,
  round(client_errors / total_requests * 100, 2) AS client_error_rate,
  round(server_errors / total_requests * 100, 2) AS server_error_rate,
  min(timestamp) AS first_seen,
  max(timestamp) AS last_seen
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 1 HOUR
  AND serviceName IN (__USER_APP_KEYS__)
  AND attribute_string_peer_ip != ''
GROUP BY ip
HAVING total_requests > 10
ORDER BY client_errors DESC
LIMIT 25

The HAVING total_requests > 10 clause filters out IPs with too few requests to be statistically meaningful. Adjust this threshold based on your traffic volume.

Query 2: Suspicious User Agent Detection

Automated tools often use distinctive or missing User-Agent strings. This query surfaces unusual agents hitting your application:

SELECT
  attribute_string_http_user_agent AS user_agent,
  count(*) AS request_count,
  uniqExact(attribute_string_peer_ip) AS unique_ips,
  countIf(response_status_code >= 400) AS error_count,
  round(error_count / request_count * 100, 2) AS error_rate,
  groupArray(DISTINCT name)(10) AS sample_endpoints
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 6 HOUR
  AND serviceName IN (__USER_APP_KEYS__)
  AND attribute_string_http_user_agent != ''
GROUP BY user_agent
HAVING request_count > 20
ORDER BY error_rate DESC
LIMIT 20

Look for user agents like python-requests, curl, Go-http-client, or empty strings with high error rates — these are common scanner signatures. The sample_endpoints column shows what they're targeting.

Query 3: Endpoint Abuse Detection

Find endpoints receiving abnormally high traffic. This catches API abuse, enumeration attacks, and denial-of-service attempts:

SELECT
  name AS endpoint,
  attribute_string_http_method AS method,
  count(*) AS request_count,
  uniqExact(attribute_string_peer_ip) AS unique_ips,
  countIf(response_status_code >= 400) AS error_count,
  avg(duration) / 1000000 AS avg_duration_ms,
  max(duration) / 1000000 AS max_duration_ms
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 1 HOUR
  AND serviceName IN (__USER_APP_KEYS__)
  AND kind = 2
GROUP BY endpoint, method
ORDER BY request_count DESC
LIMIT 20

The kind = 2 filter selects only server spans (incoming requests), excluding internal and client spans. This gives you the true request count per endpoint.

Query 4: Slow Query Detection (Potential DoS or Injection)

Unusually slow database operations can indicate SQL injection probes, resource exhaustion attacks, or queries designed to cause denial-of-service:

SELECT
  attribute_string_db_system AS db_system,
  attribute_string_db_statement AS query,
  duration / 1000000 AS duration_ms,
  serviceName AS service,
  attribute_string_peer_ip AS source_ip,
  timestamp
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 1 HOUR
  AND serviceName IN (__USER_APP_KEYS__)
  AND attribute_string_db_system != ''
  AND duration > 5000000000
ORDER BY duration DESC
LIMIT 20

The duration > 5000000000 filter catches queries taking more than 5 seconds (duration is stored in nanoseconds). Examine the db_statement column for injection patterns — UNION SELECT, SLEEP(), BENCHMARK(), or pg_sleep() are strong indicators.

Query 5: Traffic Distribution Over Time

Visualize traffic patterns to spot anomalies — sudden spikes, unusual off-hours activity, or traffic drops that might indicate a DDoS:

SELECT
  toStartOfInterval(timestamp, INTERVAL 5 MINUTE) AS time_bucket,
  count(*) AS total_requests,
  uniqExact(attribute_string_peer_ip) AS unique_ips,
  countIf(response_status_code >= 200 AND response_status_code < 300) AS success,
  countIf(response_status_code >= 400 AND response_status_code < 500) AS client_error,
  countIf(response_status_code >= 500) AS server_error
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 24 HOUR
  AND serviceName IN (__USER_APP_KEYS__)
  AND kind = 2
GROUP BY time_bucket
ORDER BY time_bucket

Plot the results and look for patterns: does traffic follow a normal diurnal curve? Are there sudden bursts of client errors? Does the unique IP count spike at unusual times?

Query 6: Cross-Service Request Tracing

Track how a suspicious request propagated through your microservices. This is essential for understanding the blast radius of an attack:

SELECT
  serviceName AS service,
  name AS operation,
  spanID,
  parentSpanID,
  duration / 1000000 AS duration_ms,
  response_status_code AS status,
  attribute_string_http_method AS method,
  attribute_string_http_url AS url,
  attribute_string_db_statement AS db_query
FROM signoz_traces.distributed_signoz_index_v2
WHERE traceID = 'your-trace-id-here'
ORDER BY timestamp

Replace 'your-trace-id-here' with the trace ID from an alert or the trace explorer. This query reconstructs the full request path across services, showing every database query, outbound call, and status code in execution order.

Query 7: Authentication Endpoint Analysis

Deep analysis of authentication patterns — the most critical attack surface for most applications:

SELECT
  attribute_string_peer_ip AS ip,
  countIf(response_status_code = 200) AS successful_logins,
  countIf(response_status_code = 401) AS failed_logins,
  countIf(response_status_code = 429) AS rate_limited,
  count(*) AS total_attempts,
  round(failed_logins / total_attempts * 100, 2) AS failure_rate,
  min(timestamp) AS first_attempt,
  max(timestamp) AS last_attempt,
  dateDiff('second', min(timestamp), max(timestamp)) AS duration_seconds
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 6 HOUR
  AND serviceName IN (__USER_APP_KEYS__)
  AND (name LIKE '%/auth%' OR name LIKE '%/login%' OR name LIKE '%/token%')
  AND attribute_string_peer_ip != ''
GROUP BY ip
HAVING total_attempts > 5
ORDER BY failed_logins DESC
LIMIT 25

The duration_seconds column reveals attack velocity — an IP generating 100 failed logins in 30 seconds is running an automated tool, while 100 failures spread over 6 hours might be a shared NAT.

Query 8: Outbound Connection Monitoring

Detect your application making unexpected outbound HTTP calls — a key indicator of SSRF, data exfiltration, or compromised dependencies:

SELECT
  attribute_string_http_host AS destination_host,
  count(*) AS call_count,
  groupArray(DISTINCT serviceName) AS source_services,
  groupArray(DISTINCT attribute_string_http_method) AS methods,
  countIf(response_status_code >= 400) AS error_count,
  min(timestamp) AS first_seen,
  max(timestamp) AS last_seen
FROM signoz_traces.distributed_signoz_index_v2
WHERE timestamp >= now() - INTERVAL 24 HOUR
  AND serviceName IN (__USER_APP_KEYS__)
  AND kind = 3
  AND attribute_string_http_host != ''
GROUP BY destination_host
ORDER BY call_count DESC
LIMIT 30

The kind = 3 filter selects client spans — outbound calls your application makes. Review the list of destination hosts: do you recognize all of them? Unknown domains, IP addresses instead of hostnames, or cloud metadata endpoints (169.254.169.254) are red flags. For a real-world example of catching a supply chain attack through outbound call monitoring, see How a Developer Caught a Supply Chain Attack in Trace Data.

Using the Query Library

Writing queries from scratch every time is inefficient. SecureNow's Query Library lets you save, organize, and reuse queries.

Saving Queries

After running a query in the forensics console, click Save to Library. Add:

  • Name — descriptive, like "Top error-generating IPs (1 hour)"
  • Description — what the query detects and when to use it
  • Category — group related queries (e.g., "Authentication", "IP Analysis", "Traffic Patterns")
  • Tags — flexible labels for cross-category search (e.g., "brute-force", "scanning", "baseline")

Executing Saved Queries

Open the Query Library, find your saved query, and click Execute. The query runs immediately against your current application context. No copy-pasting, no remembering syntax.

Sharing with Your Team

Saved queries are visible to your team members. Build a shared library of security queries tailored to your applications — new team members can start investigating immediately without learning the schema from scratch.

For more on the forensics system including natural language queries, see Security Forensics with Natural Language.

Schema Introspection

Not sure which columns are available in your data? SecureNow's forensics console includes a Schema Introspection feature that lets you explore:

  • All databases and tables available in your ClickHouse instance
  • Column names, data types, and cardinality for each table
  • Sample values from key columns

This is especially useful when your OpenTelemetry instrumentation includes custom span attributes. If you've added attributes like auth.user_id or billing.plan_type to your spans, the schema introspection shows them as queryable columns, and you can incorporate them into your security queries.

Multi-Instance Considerations

If you're running dedicated ClickHouse instances for different applications (rather than the shared default instance), keep these points in mind:

  • Forensic queries execute against the instance bound to the selected application
  • Alert rules with __USER_APP_KEYS__ automatically route to the correct instance
  • Cross-instance queries are not supported — if you need to correlate data across applications on different instances, run separate queries and correlate the results manually
  • Each instance has its own schema, which may differ if applications use different OpenTelemetry instrumentation configurations

Converting Queries to Alert Rules

The bridge between forensics and detection is straightforward. Any query that identifies a threat pattern can become an alert rule. When adapting a forensic query for alerting:

  1. Replace absolute time filters with relative intervals (now() - INTERVAL 15 MINUTE)
  2. Add the __USER_APP_KEYS__ filter if the query should scope to your applications
  3. Ensure the query returns rows only when the threat condition is met (use HAVING for thresholds)
  4. Add a LIMIT clause to cap result size
  5. Remove any columns that aren't useful in the notification context

For a complete guide on building alert rules, see From Traces to Security Alerts: A Developer's Guide.

<!-- CTA:demo -->

Writing Queries That Perform

ClickHouse is fast, but poorly structured queries can still take longer than necessary on large datasets. Follow these guidelines:

  • Always filter by timestamp first. ClickHouse partitions data by time. A time filter lets the engine skip entire partitions.
  • Use LowCardinality columns in WHERE clauses. Columns like serviceName, name, and attribute_string_http_method are LowCardinality, meaning ClickHouse can filter them extremely efficiently.
  • Prefer countIf over subqueries. Multiple countIf expressions in a single query are orders of magnitude faster than separate queries with different WHERE clauses.
  • Use LIMIT generously. You rarely need more than 25 results for investigation. Limiting output reduces memory usage and network transfer.
  • Avoid SELECT *. Columnar databases perform best when you select only the columns you need. Each additional column is an additional data stream to read.

Your trace data is a security goldmine. The queries in this guide give you the tools to extract threat intelligence from it — whether through ad-hoc investigation, saved library queries, or automated alert rules running around the clock.

Frequently Asked Questions

What ClickHouse tables does SecureNow use?

SecureNow queries signoz_traces.distributed_signoz_index_v2 for trace data, signoz_logs for log data, and signoz_metrics for metrics. The traces table contains span data with HTTP methods, URLs, status codes, IP addresses, and more.

Can I run arbitrary SQL queries?

SecureNow restricts forensic queries to SELECT statements for safety. You can query any column in the trace tables, use aggregations, window functions, and ClickHouse-specific functions.

How do I scope queries to my applications?

Use the serviceName column to filter by application, or let SecureNow automatically inject your application keys using the __USER_APP_KEYS__ placeholder in alert rules.

Can I save and reuse queries?

Yes, the query library lets you save queries with names, descriptions, categories, and tags. You can execute saved queries with one click and share them with your team.

Frequently Asked Questions

What ClickHouse tables does SecureNow use?

SecureNow queries signoz_traces.distributed_signoz_index_v2 for trace data, signoz_logs for log data, and signoz_metrics for metrics. The traces table contains span data with HTTP methods, URLs, status codes, IP addresses, and more.

Can I run arbitrary SQL queries?

SecureNow restricts forensic queries to SELECT statements for safety. You can query any column in the trace tables, use aggregations, window functions, and ClickHouse-specific functions.

How do I scope queries to my applications?

Use the serviceName column to filter by application, or let SecureNow automatically inject your application keys using the __USER_APP_KEYS__ placeholder in alert rules.

Can I save and reuse queries?

Yes, the query library lets you save queries with names, descriptions, categories, and tags. You can execute saved queries with one click and share them with your team.