vROM.js — Zero-Boilerplate RAG for the Browser

Live Demo

Search documentation instantly

This runs entirely in your browser — WASM vector search + ONNX embeddings. No API calls, no server.

🔍 Semantic Search

Initializing…

text generation pipeline LoRA fine-tuning tokenizer usage model quantization loading datasets DPO training

🔍

Type a query above to search HuggingFace documentation with sub-millisecond vector search — running 100% in your browser.

How It Works

Five lines to RAG

From zero to a searchable knowledge base in seconds.

Initialize

Load the 172 KB WASM engine and spawn the embedding worker

Mount a vROM

Slot in a pre-built knowledge cartridge — cached in OPFS for instant reload

Search

Embed your query in the background worker, then HNSW search in <1ms

Use context

Format results for LLM injection with token budgeting and source citations

Code

Dead simple API

The entire SDK is one import and five methods.

Plug-and-play RAG for any LLM

Mount a vROM knowledge cartridge, search with natural language, and inject context into your LLM — all client-side, all under 50 lines of code.

✓ Auto-caches to OPFS — works offline after first load
✓ Hot-swaps between vROMs without reloading the model
✓ Context expansion follows chunk linked-list pointers
✓ Token-budgeted formatting for any context window size
✓ Works with Vite, Next.js, or vanilla <script> tags

        app.ts
        TypeScript
      

import { AgentMemory } from 'vrom.js';

// Initialize — loads WASM + spawns worker
const memory = new AgentMemory();
await memory.init();

// Mount a knowledge base (auto-cached)
await memory.mount('hf-transformers-docs');

// Search with natural language
const results = await memory.search(
  'how to fine-tune with LoRA',
  { topK: 5, expandContext: true }
);

// Format for your LLM
const context = memory.formatContext(
  results,
  { maxTokens: 2000 }
);

// → Ready for system prompt injection

Features

Built for browser AI

Everything you need for client-side RAG, nothing you don't.

⚡

Sub-millisecond HNSW

Rust-compiled WASM engine runs HNSW approximate nearest neighbor search on 1000+ vectors in under 1ms.

🧠

Background Embedding

ONNX models run in a Web Worker via transformers.js. The UI never freezes, even during inference.

🧩

vROM Cartridges

Pre-computed HNSW indexes you slot in like game cartridges. One-click load, instant search, offline-first.

💾

OPFS Persistence

Indexes and models are cached in the Origin Private File System. Reload the page — everything is still there.

🔄

Hot-Swap Context

Switch between vROMs without reloading the embedding model. Same-model swaps complete in under 500ms.

📦

178 KB on npm

The full package including WASM binary, TypeScript declarations, and embed worker fits in 178 KB.

Metric	Value	Notes
HNSW Search	< 1 ms	1,356 vectors, top-5
Embedding	~50 ms	Per sentence, q8 model
vROM Mount (cached)	< 500 ms	OPFS → WASM load
Hot-Swap	< 500 ms	Same model, different vROM
WASM Binary	172 KB	Gzipped: ~80 KB
npm Package	178 KB	ESM + CJS + types + worker

Architecture

How it's built

Rust → WASM for speed. Web Workers for non-blocking UI. OPFS for persistence.

┌──────────────────────────────────────────────────────────────────────┐ │ Browser (Main Thread) │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ AgentMemory │ │ │ │ init() → mount() → search() → formatContext() → destroy() │ │ │ └───────┬──────────────────┬────────────────────┬─────────────┘ │ │ │ │ │ │ │ ┌───────▼────────┐ ┌─────▼──────────┐ ┌──────▼──────────────┐ │ │ │ VectorDB │ │ VromCache │ │ Embed Worker │ │ │ │ Rust → WASM │ │ OPFS layer │ │ Web Worker thread │ │ │ │ HNSW <1ms │ │ 1h registry TTL│ │ transformers.js │ │ │ │ 172 KB binary │ │ Stream DL │ │ ONNX ~50ms/embed │ │ │ └────────────────┘ └────────────────┘ └─────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ 💾 OPFS — vROM cache · Registry · Offline-first │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ 📡 HF Hub CDN — vROM Registry · Index files · ONNX models │ │ │ └──────────────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────────┘

Vector search
in the browser.

Search documentation instantly

🔍 Semantic Search

Five lines to RAG

Initialize

Mount a vROM

Search

Use context

Dead simple API

Plug-and-play RAG for any LLM

Built for browser AI

Sub-millisecond HNSW

Background Embedding

vROM Cartridges

OPFS Persistence

Hot-Swap Context

178 KB on npm

Benchmark numbers

How it's built

Build browser RAG today

Vector search in the browser.

Search documentation instantly

🔍 Semantic Search

Five lines to RAG

Initialize

Mount a vROM

Search

Use context

Dead simple API

Plug-and-play RAG for any LLM

Built for browser AI

Sub-millisecond HNSW

Background Embedding

vROM Cartridges

OPFS Persistence

Hot-Swap Context

178 KB on npm

Benchmark numbers

How it's built

Build browser RAG today

Vector search
in the browser.