Tutorial: Performance Tuning¶
Performance tuning in LynseDB is mostly about four choices:
- data shape: dimension, number of vectors, metadata size, named fields;
- write path: batch size, commit frequency, local vs remote mode;
- search path: index family, metric, filters,
k,nprobe, and returned fields; - operations: server limits, snapshots, compaction, and monitoring.
Always measure with your own embeddings and query workload. Vector databases are sensitive to data distribution.
1. Establish a baseline¶
Start with a small benchmark that matches your application:
import time
import numpy as np
import lynse
dim = 128
n = 50_000
k = 10
client = lynse.VectorDBClient(uri="./perf-demo")
db = client.create_database("perf", drop_if_exists=True)
collection = db.require_collection("vectors", dim=dim, drop_if_exists=True)
vectors = np.random.rand(n, dim).astype(np.float32)
collection.bulk_add_binary(vectors, batch_size=10_000, enable_progress_bar=False)
collection.commit()
query = np.random.rand(dim).astype(np.float32)
collection.build_index("FLAT-L2")
start = time.perf_counter()
flat = collection.search(query, k=k)
flat_ms = (time.perf_counter() - start) * 1000
collection.build_index("HNSW-L2")
start = time.perf_counter()
hnsw = collection.search(query, k=k, nprobe=64)
hnsw_ms = (time.perf_counter() - start) * 1000
print("flat_ms", flat_ms, flat.ids.tolist())
print("hnsw_ms", hnsw_ms, hnsw.ids.tolist())
Use the flat result as a quality reference when evaluating approximate indexes.
2. Tune ingestion¶
Use bulk_add_items() when you need metadata:
with collection.insert_session() as session:
session.bulk_add_items(items, batch_size=1000, enable_progress_bar=False)
Use bulk_add_binary() when you have a dense NumPy array and no metadata in the
same call:
collection.bulk_add_binary(vectors, batch_size=50_000, enable_progress_bar=False)
collection.commit()
Ingestion tips:
- convert embeddings to contiguous
float32arrays before insertion; - choose batch sizes that fit memory comfortably;
- commit after meaningful batches, not after every row;
- build or rebuild indexes after bulk loading;
- use local mode for single-process offline ingestion when possible;
- use remote mode when several processes must share one database.
3. Tune search payload size¶
Returning fields increases payload size:
Use return_fields=False in hot paths when IDs and scores are enough. Fetch
fields later for the final IDs if needed:
Use query_vectors() only when raw vectors are actually needed.
4. Tune k and server limits¶
Large k values can dominate latency and response size. Keep k close to what
the user interface or downstream model actually consumes.
In server mode, protect shared deployments:
lynse serve \
--data-dir ./server-data \
--max-top-k 1000 \
--max-batch-vectors 50000 \
--max-collection-vectors 10000000 \
--max-collection-vector-bytes 1099511627776
Set a lower --max-top-k for user-facing APIs.
5. Tune filters¶
Selective metadata filters reduce candidate work:
Filter tips:
- keep field types stable;
- use tenant, language, visibility, category, and date filters early;
- use
CONTAINSfor tag arrays; - use
filter_idswhen you already know candidate IDs; - inspect surprising behavior with
search_profile().
6. Tune index family¶
| Need | Try |
|---|---|
| maximum recall and simple behavior | FLAT-* |
| low-latency online search | HNSW-* |
| explicit recall/latency tradeoff | IVF-* with n_clusters and nprobe |
| lower memory pressure from graph search | DiskANN-* |
| smaller memory or disk footprint | SQ8, PQ, RaBitQ, or PolarVec variants |
| binary vectors | Hamming or Jaccard binary indexes |
Start every tuning session with a flat baseline:
Then compare alternatives:
7. Tune IVF¶
Build:
Search:
IVF knobs:
- more clusters can reduce scanned vectors per query;
- too many clusters can hurt recall unless
nprobealso increases; - higher
nprobeimproves recall and increases latency; - compare against flat results for representative queries.
8. Tune HNSW¶
For HNSW, nprobe acts as the search breadth. Increase it for recall. Decrease
it for latency.
9. Tune quantized indexes¶
Quantized indexes are useful when memory bandwidth, index size, or disk size is the bottleneck:
collection.build_index("FLAT-L2-SQ8")
collection.build_index("FLAT-L2-PQ")
collection.build_index("FLAT-L2-RABITQ")
collection.build_index("FLAT-L2-POLARVEC")
Evaluate quality carefully:
flat_ids = collection.search(query, k=20).ids.tolist()
collection.build_index("FLAT-L2-PQ")
pq_ids = collection.search(query, k=20).ids.tolist()
overlap = len(set(flat_ids) & set(pq_ids)) / max(1, len(flat_ids))
print(overlap)
Overlap with flat results is not the same as user relevance, but it is a useful first check.
10. Monitor server mode¶
Server mode exposes:
curl http://127.0.0.1:7637/healthz
curl http://127.0.0.1:7637/readyz
curl http://127.0.0.1:7637/metrics
Watch:
- request counts and latency;
- slow query warnings;
- WAL bytes;
- data directory bytes;
- vector index bytes;
- process memory;
- index build progress and failures.
Set slow query warnings:
11. Maintain storage¶
Deletes are soft deletes. Many tombstones can waste space and affect inspection:
collection.delete_items(ids_to_remove)
collection.commit()
print(collection.stats())
removed = collection.compact()
print(removed)
Run compaction during a maintenance window for large collections. Take a snapshot or export before risky maintenance.
12. Performance checklist¶
- Use contiguous
float32vectors. - Batch writes and avoid per-row commits.
- Build indexes after large ingestion jobs.
- Keep metadata fields useful but not bloated.
- Use filters to reduce candidate work.
- Keep
kas small as your product allows. - Return fields only when needed.
- Use flat search as a recall baseline.
- Tune
nprobefor HNSW and IVF. - Use quantized indexes only after measuring quality.
- Monitor server metrics and slow query logs.