TotallyWildAi · splunk-detection-poc

Five Splunk detection capabilities, end to end.

A self-contained Splunk Enterprise POC running on AWS — built in code, deployed by CI, including AI-assisted detection authoring with a human review gate. Every piece of state is reproducible from git in ~15 minutes.

8
detections deployed
3
CIM datamodels accelerated
6
CI deploy jobs
~15 min
full rebuild from git
~$70/mo
AWS infra, business hours

1. Data parsing & ingestion

Approach. Splunk gets data through purpose-built TAs layered over modular inputs. TA-aws account + input stanzas versioned as YAML in taaws-config/config.yml and pushed via REST on every CI run — no UI clicks to reproduce.

Splunk ingestion pipeline: CloudTrail and VPC Flow Logs deliver to S3, S3 notifies SQS, TA-aws on the Splunk EC2 polls SQS for new objects, and the indexed events feed three accelerated CIM datamodels — Authentication, Network_Traffic, and Change — via the tw_cim_accel mapping layer.
CloudTrail and VPC Flow Logs → S3 → SQS → TA-aws on Splunk EC2. The tw_cim_accel app overlays props/eventtypes/tags/datamodels.conf so indexed events populate the three accelerated CIM datamodels.

What's running

Demonstrable artifacts

2. Data Model Acceleration (DMA)

Approach. CIM-fit first for every detection, raw-SPL fallback second. Acceleration configured as code in tw_cim_accel/default/datamodels.conf; install-apps.sh syncs that file into Splunk_SA_CIM/local/datamodels.conf on every deploy — the only path Splunk reliably honors for DMA overrides (verified the hard way).

What's running

DMA Benchmark dashboard showing |tstats vs |search latency comparison and per-datamodel acceleration health
The DMA Benchmark dashboard at /app/tw_cim_accel/dma_benchmark|tstats over the accelerated Change datamodel returns in milliseconds vs a raw-event scan over the same dataset.

Demonstrable artifacts

3. Splunk detection engineering

Approach. Detection content is YAML, schema-validated, ATT&CK-mapped (against the current v19.1 framework), written in CIM-aligned SPL where the datamodel covers the data and well-documented raw SPL where it doesn't. Every rule is ES-compatible — drops into a licensed ES instance as a correlation search without rewrite.

What's running — 8 detections in detections/aws/

DetectionATT&CKPattern
CloudTrail DisabledT1685.002|tstats over datamodel=Change
Console Login Without MFAT1078.004raw SPL (MFA flag not in CIM)
Console Login Failures BurstT1110.001time-windowed stats
IAM CreateAccessKey For Different UserT1098 / T1078.004cross-field SPL comparison
Security Group Open To WorldT1133 / T1190spath + mvexpand on nested JSON
S3 Bucket Policy Public WriteT1567 / T1530raw SPL on policy JSON
Root Account UsageT1078.004raw SPL
IAM Password Spray Test AI-draftedT1110.003raw SPL, scheduled, alerting

Each YAML carries: ATT&CK technique IDs, kill-chain phase, CIS/NIST mappings, false-positive tuning hints, prereqs, references to Splunk's security_content, AWS docs, and MITRE.

Splunk Security Essentials MITRE ATT&CK heat map showing the techniques our 8 detections cover
SSE's MITRE ATT&CK heat map filtered to Originating app = totallywildai — every coloured tile is a technique covered by one of our 8 custom detections, plotted against the full ATT&CK v19.1 enterprise matrix. Note: the Active / Available counters above the matrix read 0 because SSE gates those on an internal data-inventory score it owns end-to-end; we deliberately don't fight its discovery worker. The operational reality lives in the saved-searches manager (8 scheduled rules firing on real data) — see capability 4.

Demonstrable artifacts

4. Detections as Code (CI/CD)

Approach. Two-pipeline split — infra changes don't trigger content deploys, content changes don't trigger infra plans. GitHub Actions, OIDC-authenticated AWS access. Zero static credentials; Splunk management port :8089 is never publicly exposed.

Detections-as-Code deploy pipeline: GitHub Actions runner uploads the payload to S3, authenticates to AWS via OIDC (no static keys), issues an SSM SendCommand to the Splunk EC2 in the private subnet, and the EC2 fetches the payload from S3 and runs install-apps.sh locally. Splunk's :8089 management port is never publicly exposed.
GitHub Actions → S3 (artifact upload) → SSM SendCommand → Splunk EC2 (private subnet). The EC2 fetches the payload from S3 and runs install-apps.sh locally — no inbound :8089 needed from the CI runner.

Pipelines

terraform.yml

Plan on PR (artifact), apply on push to main. Triggers only on terraform/** changes.

splunk-config.yml — 6 jobs

  1. validate — YAML schema check on detections; .conf sanity on apps-src (BOM, CRLF, duplicate stanzas)
  2. deploy-apps — builds .tgz from apps-src/, syncs to S3, SSM-triggers install-apps.sh
  3. deploy-sse-content — REST PUT to SSE custom_content KV-store
  4. deploy-detections — REST POST to /services/saved/searches
  5. deploy-taaws-config — REST POST to /servicesNS/.../Splunk_TA_aws/data/inputs/aws_sqs_based_s3
  6. deploy-sse-data-inventory — REST PATCH on data_inventory_products KV-store

All four deploys: CI uploads payload to S3, sends one SSM SendCommand, the EC2 fetches + runs locally. No public 8089.

Disaster recovery verified 2026-05-13

terraform apply -replace of the EC2 + workflow re-run brought every piece of state back from git in ~15 minutes total. The rebuild test surfaced one real bug (HTTP 400 vs 404 in TA-aws input REST handler), which is now fixed in code.

Demonstrable artifacts

5. AI for detection development

Approach. AI is a junior author with senior-author review. Claude drafts the detection YAML, the existing CI gates the schema, a human reviews the draft PR, merge triggers the same deploy pipeline as a hand-written rule. The authoring loop AND the validation loop run inside the existing repo structure with no bespoke tooling.

End-to-end demonstrated 2026-05-13

  1. Operator typed 4 inputs into the GitHub Actions form (T1110.003 / iam-password-spray-test / "Detects bursts of failed ConsoleLogin events…")
  2. Claude (claude-sonnet-4-6) drafted a schema-conformant YAML against SCHEMA.md + the cloudtrail-disabled.yml exemplar embedded in the prompt
  3. Draft PR opened automatically with the YAML + reviewer checklist (~30 sec from button click)
  4. Existing validate CI job ran — YAML schema-checked clean
  5. Human reviewed the PR + merged
  6. splunk-config workflow auto-fired on merge → live as a scheduled saved search alerting on production CloudTrail data, every 15 minutes

Guardrails

Cost: ~$0.005 per draft on Sonnet 4.6 (~3K input + ~800 output tokens).

GitHub PR #1 showing the AI-drafted detection YAML and the reviewer checklist embedded in the PR body
GitHub PR #1 — Claude's draft for T1110.003 (IAM Password Spray), opened by the workflow with the reviewer checklist + validate CI passing, ready for human merge.

Demonstrable artifacts

Capability scorecard

CapabilityStatusNotes
Data parsing & ingestionFully demonstratedCloudTrail + VPC Flow live; GuardDuty + HEC are obvious next adds for breadth
DMA optimization & alignmentFully demonstrated3 CIM datamodels accelerated, benchmark dashboard shipped
Detection engineeringFully demonstrated8 rules, ATT&CK v19.1-current, ES-compatible
Detections as Code (CI/CD)Fully demonstrated6 CI jobs green, DR-verified, public :8089 never exposed
AI for detection developmentFully demonstratedEnd-to-end live; an automated Docker-Splunk test runner for AI drafts is Phase 6.1, deferred