Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ewake.ai/llms.txt

Use this file to discover all available pages before exploring further.

Ewake is an AI SRE agent built on a live map of your production environment. When an alert fires, or when a proactive task is triggered, ewake pulls data from across your stack, correlates signals in real time, and returns ranked root-cause hypotheses with supporting evidence in seconds, directly in Slack.

Technology, The Cartographer Agents

Before ewake can investigate an issue, it needs to understand your production environment. That’s why its first job is to draw the map. Ewake continuously scans your production environment and assembles it into a single, coherent live map, updated as your stack evolves. This is what turns a generic notification like “error spike on payments-api” into a precise diagnosis: “This looks like the same failure pattern as incident #47, triggered by a deploy on the auth-service dependency two hours ago.”

How It Works

Ewake’s investigation pipeline runs in five steps, from raw signal to actionable output.

Signal ingestion

Ewake continuously ingests from your connected sources in real time.

Production map construction

Ingested signals are mapped into a persistent knowledge map. This map is the foundation every investigation is built on.

Trigger and correlation

When a trigger fires, ewake pulls relevant context from the knowledge map and runs correlation across signals.

Hypothesis ranking

Ewake produces a ranked list of probable root causes, each backed by direct evidence.

Output and feedback loop

Results are delivered to your Slack thread or Dashboard. Every resolved incident feeds back into the knowledge map.
Suggestion-first, not autonomous, Every output is a hypothesis for your engineer to validate.
  • You see why, every hypothesis links directly to the evidence that produced it
  • You can override, corrections are captured and improve future investigations
  • You stay in control, ewake never silences alerts, restarts services, or modifies code without explicit human approval

A first look at ewake

A Datadog alert fires on payments-api. Before your on-call engineer opens their laptop, ewake has already posted in the alert thread:
  • Probable cause: error spike on payments-api correlates with a deploy on auth-service 12 minutes ago
  • Evidence: 3 relevant log lines, latency chart showing degradation from deploy time, link to the commit
  • Suggested next steps: open a PR to revert the token validation change in auth-service, or inspect the new logic in the linked commit before deciding
Your engineer arrives to context, not a blank terminal.
Every day at 8am, before standup, ewake posts a structured report to #monitoring:
  • System health: all services green except checkout-service (error rate +12% vs yesterday)
  • Anomalies: latency spike on inventory-api between 02:00–03:30 UTC, self-resolved
  • Notable changes: 2 deploys overnight on pricing-service, no correlated degradation
  • Risk signal: checkout-service has shown this pattern 3 times in the past 30 days, each time preceded by a full incident
Your team walks into standup already informed.
A P0 is ongoing. The team is in a war room channel, debating what’s causing a cascade. Someone mentions @ewake:
@ewake what’s causing the latency spike on checkout-service? Correlate with recent deploys.
Ewake responds in seconds with a ranked breakdown, most likely cause first, each hypothesis linked to concrete evidence from Datadog, GitHub, and past incidents. The team stops guessing and starts acting.

Questions?

Have a question or want to learn more about ewake? Reach out directly at support@ewake.ai.