SDK
Embedding Operations Runbook
Operate and triage hosted embedding generation and retrieval regressions.
This runbook covers queue health, worker failures, retrieval latency/fallback regressions, and rollback procedures for hosted embeddings.
Quick Health Checks
Fetch the observability snapshot:
curl -s -H "Authorization: Bearer $MEMORIES_API_KEY" \
"https://memories.sh/api/sdk/v1/management/embeddings/observability?windowHours=24" | jqCheck backfill status for a scoped model:
curl -s -H "Authorization: Bearer $MEMORIES_API_KEY" \
"https://memories.sh/api/sdk/v1/management/embeddings/backfill?modelId=openai/text-embedding-3-small" | jqReview monthly usage/cost summary:
curl -s -H "Authorization: Bearer $MEMORIES_API_KEY" \
"https://memories.sh/api/sdk/v1/management/embeddings/usage?usageMonth=2026-02-01" | jqAlert Thresholds
Current SLO thresholds emitted in observability payload:
- Queue lag warning
>= 120000ms(2m), critical>= 600000ms(10m) - Dead-letter rate warning
>= 2%, critical>= 5%(after at least 20 attempts) - Retrieval fallback rate warning
>= 5%, critical>= 15%(after at least 20 requests) - Retrieval p95 latency warning
>= 1200ms, critical>= 2500ms(after at least 10 samples) - Backfill error runs warning
>= 1, critical>= 5in window
Triage Flow
- Check
healthandalarmsin/management/embeddings/observability. - If queue alarms fire, inspect
queueLagMs,staleProcessingCount, anddeadLetterCount. - If worker alarms fire, inspect
worker.topErrorCodesand dead-letter ratio. - If retrieval alarms fire, inspect fallback reason trends and graph rollout mode.
- Confirm cost impact by comparing
cost.customerCostUsdwith usage trends.
Worker graph-relationship signals
For relationship-edge extraction incidents, focus on:
GRAPH_RELATIONSHIP_SYNC_FAILEDinworker.topErrorCodes:- indicates edge write/sync failures that trigger retry/dead-letter behavior
GRAPH_RELATIONSHIP_PARTIAL_DEGRADEin successful worker metrics:- indicates writes succeeded but LLM classification/semantic extraction degraded (
GRAPH_LLM_CLASSIFICATION_FAILED,GRAPH_LLM_SEMANTIC_EXTRACTION_FAILED) - summary payload format is structured for aggregation:
relationship_issues=<total>;GRAPH_LLM_CLASSIFICATION_FAILED:<count>,GRAPH_LLM_SEMANTIC_EXTRACTION_FAILED:<count>
- indicates writes succeeded but LLM classification/semantic extraction degraded (
Operational note:
- when retries are caused by graph sync failures after embedding upsert, workers reuse stored vectors and avoid extra embeddings API calls when possible
Rollback + Feature Flags
1) Retrieval rollback (high fallback/latency)
Move graph rollout to shadow immediately:
curl -s -X PATCH \
-H "Authorization: Bearer $MEMORIES_API_KEY" \
-H "Content-Type: application/json" \
"https://memories.sh/api/sdk/v1/graph/rollout" \
-d '{"mode":"shadow"}' | jqIf incident persists, move to off and serve baseline retrieval only.
2) Embedding model rollback (provider/model regressions)
Pin the workspace/project to the last known good model via embedding config endpoint/UI.
- Verify default model and allowlist from
/api/sdk/v1/embeddings/models - Remove failing model from allowlist for affected tenant/project
- Resume backfill only after queue lag and dead-letter rates normalize
3) Worker pressure rollback (queue saturation)
- Pause backfill for affected scope:
curl -s -X POST \
-H "Authorization: Bearer $MEMORIES_API_KEY" \
-H "Content-Type: application/json" \
"https://memories.sh/api/sdk/v1/management/embeddings/backfill" \
-d '{"action":"pause","modelId":"openai/text-embedding-3-small"}' | jq- Let realtime write-path jobs drain
- Resume with smaller
batchLimitand higherthrottleMs
Incident Checklist
- Capture observability payload and active alarms.
- Capture backfill state (
status,estimatedRemaining,lastError). - Record top worker error codes and fallback reason distribution.
- Apply rollback (rollout mode/model pin/backfill pause) based on dominant alarm.
- Confirm alarms clear for at least one full window before restoring canary settings.