Back to blog 2026-04-16

Memory Vault Has a REST API

I planned the dashboard next. Then I started thinking about how a React app in a browser would talk to Postgres, and the answer is it can't. A dashboard needs an API to talk to. So I swapped M5 and M6 — the API comes first, the dashboard comes second.

Milestone 5 shipped: Memory Vault now has a REST API. Every MCP tool is also an HTTP endpoint, so you can ingest, search, and manage memories from any app, any language, any script.

What's in it

Eight endpoints, split across five routers:

GET /api/health — service + database health. No auth required. This is what Docker's healthcheck hits.
GET /api/spaces — list memory spaces with active chunk counts.
POST /api/search — hybrid search. Same engine as the CLI and MCP server: vector (HNSW) + full-text (tsvector) + Reciprocal Rank Fusion merging, plus query enrichment for variations.
GET /api/chunks — list chunks with pagination, filter by space, sort by recency or importance.
GET /api/chunks/{id} — fetch a single chunk.
DELETE /api/chunks/{id} — soft-delete (same semantics as the MCP forget tool).
POST /api/ingest/text — quick-ingest a single text string as a chunk.
POST /api/ingest/file — upload a file and run it through the full ingestion pipeline. Adapter auto-detected from filename and content.

OpenAPI docs auto-generate at /docs and /redoc. Every endpoint has schemas, examples, and types.

Authentication

All endpoints except /api/health require a bearer token. Tokens are stored as SHA-256 hashes in the database — the plaintext is shown once at creation and never again. If you lose it, revoke it and make a new one.

Creating a token is one CLI command:

docker compose exec app memory-vault token create my-app

Output:

  Token created. Copy it now — it will NOT be shown again.

  Name:  my-app
  Token: mv_qGJqZeFHH7ypQGgLvc3vtxhg1OMdkA94E6WfNo2JHlg

Then send it as a header:

curl -H "Authorization: Bearer mv_..." http://localhost:8000/api/spaces

You can list tokens (memory-vault token list) to see names, prefixes, created dates, and which are active vs revoked. You can revoke by prefix (memory-vault token revoke mv_qGJqZeFH). Last-used timestamp updates on every valid request.

For local dev, set API_AUTH_ENABLED=false and auth is bypassed entirely. Don't ship that to production.

Rate limiting

Per-client sliding-window rate limit, in-memory, keyed by IP. Default is 120 requests per minute, configurable via API_RATE_LIMIT_PER_MIN. Health and docs endpoints are exempt.

The middleware keeps a deque of request timestamps per client, drops anything older than the window, and returns 429 Too Many Requests with a Retry-After header when the limit is hit. Good enough for a local-first single-user system. If you're exposing this to the internet, put a real reverse proxy in front.

Example — ingest and search

Ingest a memory:

curl -X POST http://localhost:8000/api/ingest/text \
  -H "Authorization: Bearer $MV_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "Decided to use RRF for hybrid search merging", "space": "default"}'

Search it back:

curl -X POST http://localhost:8000/api/search \
  -H "Authorization: Bearer $MV_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "how does hybrid search work", "limit": 5}'

Response includes chunk content, similarity score, space, source, timestamps, query variations used, and query time in milliseconds.

Upload a file:

curl -X POST http://localhost:8000/api/ingest/file \
  -H "Authorization: Bearer $MV_TOKEN" \
  -F "[email protected]" \
  -F "space=default"

The file runs through the same async ingestion pipeline as the CLI — adapter detection, chunking, batch embedding, insert. Response reports how many chunks were created.

Technical decisions

FastAPI, not Flask or Django. FastAPI gives me async-native request handling (important because psycopg 3 is async), automatic OpenAPI generation, Pydantic validation, and dependency injection for auth. All the boilerplate I'd otherwise write, built in.

Lifespan-managed DB pool. The app factory uses FastAPI's lifespan context manager to open the psycopg pool on startup and close it on shutdown. No per-request connection overhead, no thread-local state.

Sliding-window rate limit, not token bucket. Token bucket is more accurate but harder to reason about. A deque of timestamps per client is trivially correct and fast enough for local-first use. If you need distributed rate limiting, drop in slowapi or run behind nginx.

Tokens hashed with SHA-256, not bcrypt or argon2. These aren't user passwords — they're high-entropy random bytes (32 bytes of secrets.token_urlsafe). A 256-bit random token is not bruteforceable. Hashing with SHA-256 is fast and stops a database leak from immediately exposing live tokens. For passwords, use bcrypt/argon2; for API tokens, SHA-256 is the right call.

Soft-delete on DELETE /api/chunks/{id}. Same semantics as the MCP forget tool. The row stays, importance goes to zero, metadata.forgotten gets set to true. Search excludes it. This is deliberate — hard deletes are unforgiving and I've accidentally deleted memories from my own central-memory before.

Bundled with the same docker compose up. No separate service. The FastAPI app runs inside the same container that used to just run the CLI. start.sh now execs python -m src.cli api instead of keeping the container alive with tail -f /dev/null. Healthcheck switched from running memory-vault status to hitting /api/health. One command still starts everything.

What's actually running

Pull the repo, docker compose up -d, and you'll have:

PostgreSQL 16 + pgvector on port 5432
Memory Vault API on port 8000
Interactive docs at http://localhost:8000/docs
Same MCP server still available for Claude (now exposed alongside the HTTP API, not instead of it)

42 tests passing — 28 of them are integration tests that hit every endpoint against a real test database (memory_vault_test, created and dropped per test session) via httpx.AsyncClient bound to the ASGI transport. Covers health, auth (missing header, wrong scheme, bogus token, valid token, revoked token), spaces, ingest text, ingest file, search, chunks list/get/delete/soft-delete semantics, and the OpenAPI schema + docs HTML. The other 14 are pure-logic tests for token hashing, token generation, CORS parsing, schema validation, and the rate-limit window math.

What's next

Milestone 6 is the dashboard — React + Vite + TypeScript, built on top of this API. Search page, browse page, ingest page, stats page. Bundled into the same container via a multi-stage Docker build so there's still one command to start everything.

Originally the plan had the dashboard as M5 and the API as M6. I swapped them because a dashboard needs an API to talk to and I'd rather build the dependency first than build a disposable "minimal API" that I'd have to throw away. Honest course correction beats sticking to a plan that was wrong.

The repo: github.com/MihaiBuilds/memory-vault