Embedding Operations Runbook

This runbook covers queue health, worker failures, retrieval latency/fallback regressions, and rollback procedures for hosted embeddings.

Quick Health Checks

Fetch the observability snapshot:

curl -s -H "Authorization: Bearer $MEMORIES_API_KEY" \
  "https://memories.sh/api/sdk/v1/management/embeddings/observability?windowHours=24" | jq

Check backfill status for a scoped model:

curl -s -H "Authorization: Bearer $MEMORIES_API_KEY" \
  "https://memories.sh/api/sdk/v1/management/embeddings/backfill?modelId=openai/text-embedding-3-small" | jq

Review monthly usage/cost summary:

curl -s -H "Authorization: Bearer $MEMORIES_API_KEY" \
  "https://memories.sh/api/sdk/v1/management/embeddings/usage?usageMonth=2026-02-01" | jq

Alert Thresholds

Current SLO thresholds emitted in observability payload:

Queue lag warning >= 120000ms (2m), critical >= 600000ms (10m)
Dead-letter rate warning >= 2%, critical >= 5% (after at least 20 attempts)
Retrieval fallback rate warning >= 5%, critical >= 15% (after at least 20 requests)
Retrieval p95 latency warning >= 1200ms, critical >= 2500ms (after at least 10 samples)
Backfill error runs warning >= 1, critical >= 5 in window

Triage Flow

Check health and alarms in /management/embeddings/observability.
If queue alarms fire, inspect queueLagMs, staleProcessingCount, and deadLetterCount.
If worker alarms fire, inspect worker.topErrorCodes and dead-letter ratio.
If retrieval alarms fire, inspect fallback reason trends and graph rollout mode.
Confirm cost impact by comparing cost.customerCostUsd with usage trends.

Worker graph-relationship signals

For relationship-edge extraction incidents, focus on:

GRAPH_RELATIONSHIP_SYNC_FAILED in worker.topErrorCodes:
- indicates edge write/sync failures that trigger retry/dead-letter behavior
GRAPH_RELATIONSHIP_PARTIAL_DEGRADE in successful worker metrics:
- indicates writes succeeded but LLM classification/semantic extraction degraded (GRAPH_LLM_CLASSIFICATION_FAILED, GRAPH_LLM_SEMANTIC_EXTRACTION_FAILED)
- summary payload format is structured for aggregation: relationship_issues=<total>;GRAPH_LLM_CLASSIFICATION_FAILED:<count>,GRAPH_LLM_SEMANTIC_EXTRACTION_FAILED:<count>

Operational note:

when retries are caused by graph sync failures after embedding upsert, workers reuse stored vectors and avoid extra embeddings API calls when possible

Rollback + Feature Flags

1) Retrieval rollback (high fallback/latency)

Move graph rollout to shadow immediately:

curl -s -X PATCH \
  -H "Authorization: Bearer $MEMORIES_API_KEY" \
  -H "Content-Type: application/json" \
  "https://memories.sh/api/sdk/v1/graph/rollout" \
  -d '{"mode":"shadow"}' | jq

If incident persists, move to off and serve baseline retrieval only.

2) Embedding model rollback (provider/model regressions)

Pin the workspace/project to the last known good model via embedding config endpoint/UI.

Verify default model and allowlist from /api/sdk/v1/embeddings/models
Remove failing model from allowlist for affected tenant/project
Resume backfill only after queue lag and dead-letter rates normalize

3) Worker pressure rollback (queue saturation)

Pause backfill for affected scope:

curl -s -X POST \
  -H "Authorization: Bearer $MEMORIES_API_KEY" \
  -H "Content-Type: application/json" \
  "https://memories.sh/api/sdk/v1/management/embeddings/backfill" \
  -d '{"action":"pause","modelId":"openai/text-embedding-3-small"}' | jq

Let realtime write-path jobs drain
Resume with smaller batchLimit and higher throttleMs

Incident Checklist

Capture observability payload and active alarms.
Capture backfill state (status, estimatedRemaining, lastError).
Record top worker error codes and fallback reason distribution.
Apply rollback (rollout mode/model pin/backfill pause) based on dominant alarm.
Confirm alarms clear for at least one full window before restoring canary settings.

Embedding Operations Runbook

On this page