Skip to content

ABC Tool

  • Home
  • About / Contect
    • PRIVACY POLICY
zaydmulani09/mnemo: Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval. Works with Ollama, OpenAI, Anthropic, or any OpenAI-compatible backend. · GitHub

zaydmulani09/mnemo: Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval. Works with Ollama, OpenAI, Anthropic, or any OpenAI-compatible backend. · GitHub

Posted on June 3, 2026 By safdargal12 No Comments on zaydmulani09/mnemo: Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval. Works with Ollama, OpenAI, Anthropic, or any OpenAI-compatible backend. · GitHub
Blog


Local-first AI memory layer for any LLM. Persistent knowledge graph,
entity extraction, semantic retrieval — no cloud required.


License
Crates.io
PyPI
Docker


Most LLMs forget everything the moment a conversation ends. mnemo fixes that.

mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency.


  your app
     │
     ▼
  POST /ingest ──► entity extraction (LLM) ──► knowledge graph (SQLite + petgraph)
                                                        │
  POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search
     │
     ▼
  context_prompt  ──► inject into your LLM prompt
  1. You POST raw text to /ingest (a conversation turn, a document, a note).
  2. mnemo sends it to your configured LLM and extracts entities (people, tools, places, concepts) and the relationships between them.
  3. Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically.
  4. On POST /retrieve, mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble a context_prompt string.
  5. You inject context_prompt into your LLM’s system prompt. Done.

Path A — Docker + Ollama (fully free, recommended)

git clone https://github.com/zaydmulani09/mnemo
cd mnemo
docker compose up -d

# Pull the llama3 model the first time (~4 GB)
docker exec mnemo-ollama ollama pull llama3

# Verify everything is healthy
curl http://localhost:8080/health

Path B — Binary (Ollama or OpenAI running separately)

cargo install --path crates/mnemo-api

# With Ollama
export MNEMO_LLM_BASE_URL=http://localhost:11434/v1
mnemo-api

# With OpenAI
export MNEMO_LLM_BASE_URL=https://api.openai.com/v1
export MNEMO_LLM_API_KEY=sk-...
export MNEMO_LLM_MODEL=gpt-4o-mini
export MNEMO_LLM_PROVIDER=openai
mnemo-api
from mnemo import MnemoClient

client = MnemoClient()  # server at http://localhost:8080

# Store a memory
client.ingest("I'm building a Rust vector database called vecdb")

# Get context for injection into your next LLM prompt
print(client.get_context("what am I working on?"))

All endpoints accept and return application/json. Base URL: http://localhost:8080.

Method Path Description Request body Response
GET /health Server + DB + LLM status — HealthResponse
POST /ingest Store text, extract entities IngestRequest IngestResponse
POST /retrieve Retrieve ranked memory context RetrievalQuery RetrievalResult
GET /entities List entities (paginated) ?limit&offset Entity[]
GET /entities/:id Get entity by UUID — Entity
DELETE /entities/:id Delete entity (cascades) — {"deleted":true}
GET /entities/:id/neighbors Knowledge graph neighbors ?depth (max 5) GraphNode[]
GET /chunks List memory chunks (paginated) ?limit&offset&session_id MemoryChunk[]
GET /chunks/:id Get chunk by UUID — MemoryChunk
DELETE /chunks/:id Delete chunk — {"deleted":true}
POST /search Full-text search entities + chunks {"query","limit"} {"entities","chunks"}
DELETE /wipe Delete all memory (irreversible) header: X-Confirm-Wipe: true {"wiped":true}
GET /stats Entity/chunk/graph counts + uptime — StatsResponse

Key request/response types:

Full endpoint documentation with curl examples: docs/api.md


Variable Default Description
MNEMO_DB_PATH mnemo.db SQLite database file path
MNEMO_PORT 8080 API server port
MNEMO_LLM_BASE_URL http://localhost:11434/v1 OpenAI-compatible LLM base URL
MNEMO_LLM_MODEL llama3 Model name for entity extraction
MNEMO_LLM_API_KEY ollama API key (any value works for Ollama)
MNEMO_LLM_PROVIDER ollama Provider type: ollama, openai, anthropic, custom

Pass --config path/to/config.toml to mnemo-api. See mnemo.example.toml:

db_path = "mnemo.db"
port = 8080

[llm]
provider = "ollama"
base_url = "http://localhost:11434/v1"
model = "llama3"
api_key = "ollama"
timeout_secs = 30
max_retries = 3
max_tokens = 2048
temperature = 0.1

Environment variables take precedence over TOML values. The active config source is reported in GET /health → config_source.


Install:

cargo install --path crates/mnemo-cli

Usage:

# Store a memory
mnemo ingest "I use Neovim and prefer dark mode"

# Retrieve relevant context
mnemo search "what editor do I use?"

# List all extracted entities
mnemo entities

# Show entity detail + graph neighbors
mnemo entity <uuid> --neighbors

# List memory chunks
mnemo chunks

# Server health
mnemo health

# Memory statistics
mnemo stats

# Delete everything (prompts for confirmation)
mnemo wipe

# Skip confirmation prompt
mnemo wipe --yes

# Point at a non-default server
mnemo --server http://192.168.1.10:8080 stats

Install:

See sdk/python/README.md for the full API reference.

Async example:

import asyncio
from mnemo import AsyncMnemoClient

async def main():
    async with AsyncMnemoClient() as client:
        await client.ingest(
            "Alice is a principal engineer at Stripe working on payment infrastructure.",
            session_id="session-001",
        )
        context = await client.get_context(
            "what does Alice work on?",
            session_id="session-001",
        )
        print(context)

asyncio.run(main())

A working standalone example: examples/basic_usage.py


Four Rust crates wired together:

Crate Type Role
mnemo-core lib Entity extraction, graph ops, retrieval engine, DB layer
mnemo-api bin Axum REST API — thin handler layer over mnemo-core
mnemo-cli bin CLI tool using blocking reqwest against the API
mnemo-bench bin Performance benchmarks (12 suites)

Full architecture documentation: docs/architecture.md


Benchmarked on Apple M2, SQLite WAL mode, in-memory petgraph. Debug build numbers — release build (--release) is 3–5× faster.

Operation Avg latency Throughput
Entity insert (SQLite) ~0.12 ms ~8,300 ops/s
Entity lookup by ID ~0.08 ms ~12,500 ops/s
Chunk insert ~0.14 ms ~7,100 ops/s
Full-text chunk search ~0.28 ms ~3,500 ops/s
Graph neighbor (depth=1) ~0.21 ms ~4,700 ops/s
Graph neighbor (depth=2) ~0.89 ms ~1,100 ops/s
Full retrieval pipeline ~4.2 ms ~238 ops/s

Run cargo run -p mnemo-bench to benchmark on your hardware.


cargo test --workspace          # run all 122 tests
make coverage                  # HTML coverage report (requires cargo-llvm-cov)
make coverage-summary          # summary to stdout
cd sdk/python && pytest tests/ -v
cargo run -p mnemo-bench                    # all 12 benchmarks
cargo run -p mnemo-bench -- --filter graph  # graph benchmarks only
cargo run -p mnemo-bench -- --json out.json # save results to JSON

Current test counts: 122 Rust tests · 21 Python tests · 12 benchmarks


PRs welcome. Please run make fmt && make lint before submitting.
Open an issue first for large changes.

See CONTRIBUTING.md for full setup instructions, code style guide, and how to add a new LLM provider.


MIT — see LICENSE



Source link

Post Views: 5

Post navigation

❮ Previous Post: Three Android giants could be on a launch-month collision course
Next Post: WiiM expands its whole-home ecosystem with a new soundbar ❯

You may also like

Samsung fixed a frustrating gaming bug on its Galaxy phones
Blog
Samsung fixed a frustrating gaming bug on its Galaxy phones
May 25, 2026
Samsung and Google unveil their first Android XR smart glasses with partners
Blog
Samsung and Google unveil their first Android XR smart glasses with partners
May 21, 2026
For Eclipse, the .5B Cerebras win is just the start of realizing its physical-world thesis
Blog
For Eclipse, the $2.5B Cerebras win is just the start of realizing its physical-world thesis
May 17, 2026
How Figma Upgraded Data Pipeline from Multi-Day Latency to Real-Time
Blog
How Figma Upgraded Data Pipeline from Multi-Day Latency to Real-Time
May 12, 2026

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Defense tech is flooded with money, but who’s built to last?
  • Survey reveals most of you want one simple thing from an AI tool
  • Today’s NYT Wordle Hints, Answer and Help for June 4 #1811
  • iPhone Ultra to sport a liquid metal hinge
  • Amazon’s making up pictures of products in AI search overhaul

Recent Comments

No comments to show.

Archives

  • June 2026
  • May 2026
  • April 2026

Categories

  • Blog

Copyright © 2026 ABC Tool.

Theme: Oceanly News by ScriptsTown