Initializing vROM.js…
Loading WASM engine (172 KB)
Now on npm — v3.4.0

Vector search
in the browser.

Sub-millisecond HNSW search, background ONNX embedding, and OPFS-cached knowledge cartridges. Zero server dependencies. 172 KB of WASM.

$ npm install vrom.js
⚡ Try Live Demo GitHub npm
<1ms
HNSW search
172KB
WASM binary
100%
Client-side
0
Server deps

Search documentation instantly

This runs entirely in your browser — WASM vector search + ONNX embeddings. No API calls, no server.

🔍 Semantic Search

Initializing…
text generation pipeline LoRA fine-tuning tokenizer usage model quantization loading datasets DPO training
  • 🔍

    Type a query above to search HuggingFace documentation with sub-millisecond vector search — running 100% in your browser.

Five lines to RAG

From zero to a searchable knowledge base in seconds.

1

Initialize

Load the 172 KB WASM engine and spawn the embedding worker

2

Mount a vROM

Slot in a pre-built knowledge cartridge — cached in OPFS for instant reload

3

Search

Embed your query in the background worker, then HNSW search in <1ms

4

Use context

Format results for LLM injection with token budgeting and source citations

Dead simple API

The entire SDK is one import and five methods.

Plug-and-play RAG for any LLM

Mount a vROM knowledge cartridge, search with natural language, and inject context into your LLM — all client-side, all under 50 lines of code.

  • Auto-caches to OPFS — works offline after first load
  • Hot-swaps between vROMs without reloading the model
  • Context expansion follows chunk linked-list pointers
  • Token-budgeted formatting for any context window size
  • Works with Vite, Next.js, or vanilla <script> tags
app.ts TypeScript
import { AgentMemory } from 'vrom.js';

// Initialize — loads WASM + spawns worker
const memory = new AgentMemory();
await memory.init();

// Mount a knowledge base (auto-cached)
await memory.mount('hf-transformers-docs');

// Search with natural language
const results = await memory.search(
  'how to fine-tune with LoRA',
  { topK: 5, expandContext: true }
);

// Format for your LLM
const context = memory.formatContext(
  results,
  { maxTokens: 2000 }
);

// → Ready for system prompt injection

Built for browser AI

Everything you need for client-side RAG, nothing you don't.

Sub-millisecond HNSW

Rust-compiled WASM engine runs HNSW approximate nearest neighbor search on 1000+ vectors in under 1ms.

🧠

Background Embedding

ONNX models run in a Web Worker via transformers.js. The UI never freezes, even during inference.

🧩

vROM Cartridges

Pre-computed HNSW indexes you slot in like game cartridges. One-click load, instant search, offline-first.

💾

OPFS Persistence

Indexes and models are cached in the Origin Private File System. Reload the page — everything is still there.

🔄

Hot-Swap Context

Switch between vROMs without reloading the embedding model. Same-model swaps complete in under 500ms.

📦

178 KB on npm

The full package including WASM binary, TypeScript declarations, and embed worker fits in 178 KB.

Benchmark numbers

Measured in Chrome on M1 MacBook Air.

benchmarks latest
MetricValueNotes
HNSW Search< 1 ms1,356 vectors, top-5
Embedding~50 msPer sentence, q8 model
vROM Mount (cached)< 500 msOPFS → WASM load
Hot-Swap< 500 msSame model, different vROM
WASM Binary172 KBGzipped: ~80 KB
npm Package178 KBESM + CJS + types + worker

How it's built

Rust → WASM for speed. Web Workers for non-blocking UI. OPFS for persistence.

┌──────────────────────────────────────────────────────────────────────┐ │ Browser (Main Thread) │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ AgentMemory │ │ │ │ init() → mount() → search() → formatContext() → destroy() │ │ │ └───────┬──────────────────┬────────────────────┬─────────────┘ │ │ │ │ │ │ │ ┌───────▼────────┐ ┌─────▼──────────┐ ┌──────▼──────────────┐ │ │ │ VectorDB │ │ VromCache │ │ Embed Worker │ │ │ │ Rust → WASM │ │ OPFS layer │ │ Web Worker thread │ │ │ │ HNSW <1ms │ │ 1h registry TTL│ │ transformers.js │ │ │ │ 172 KB binary │ │ Stream DL │ │ ONNX ~50ms/embed │ │ │ └────────────────┘ └────────────────┘ └─────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ 💾 OPFS — vROM cache · Registry · Offline-first │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ 📡 HF Hub CDN — vROM Registry · Index files · ONNX models │ │ │ └──────────────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────────┘
Get Started

Build browser RAG today

Install vrom.js, mount a knowledge base, and start searching — in five lines of code.

📦 npm install vrom.js 📖 Documentation ⭐ Star on GitHub