Quickstart¶
This guide walks through the main LynseDB workflow: connect, create a collection, insert vectors, build an index, search, filter metadata, inspect results, and clean up.
1. Install and import¶
Native Linux and macOS environments are supported. Native Windows environments are not supported; on Windows, use WSL 2 (Windows Subsystem for Linux) or Docker.
2. Connect¶
Use local mode for a single Python process:
Use remote mode when several processes or services need the same database:
With server authentication:
Health and operations endpoints:
curl http://127.0.0.1:7637/healthz
curl http://127.0.0.1:7637/readyz
curl http://127.0.0.1:7637/metrics
curl http://127.0.0.1:7637/openapi.json
3. Create a database and collection¶
drop_if_exists=True is destructive. Use it only for tests or when you really
want to truncate existing data.
db = client.create_database("quickstart", drop_if_exists=True)
collection = db.require_collection(
"documents",
dim=4,
drop_if_exists=True,
description="quickstart collection",
)
Open an existing collection safely:
4. Insert vectors¶
Each vector has a user-provided integer ID. Metadata fields are optional JSON-like dicts and can be used later for filtering, text search, or result display.
items = [
([0.10, 0.20, 0.30, 0.40], 1, {"title": "LynseDB intro", "lang": "en", "rank": 1, "tags": ["vector", "rust"]}),
([0.11, 0.19, 0.29, 0.39], 2, {"title": "Vector guide", "lang": "en", "rank": 2, "tags": ["vector"]}),
([0.80, 0.10, 0.20, 0.10], 3, {"title": "French note", "lang": "fr", "rank": 3, "tags": ["note"]}),
([0.75, 0.12, 0.18, 0.12], 4, {"title": "Another note", "lang": "fr", "rank": 4, "tags": ["note", "archive"]}),
]
with collection.insert_session() as session:
session.bulk_add_items(items, enable_progress_bar=False)
insert_session() commits automatically if the block exits successfully. If an
exception is raised, pending buffered writes are discarded and the original
exception is preserved.
For large dense arrays without per-row metadata:
vectors = np.random.rand(10_000, 4).astype(np.float32)
added = collection.bulk_add_binary(vectors, batch_size=5000, enable_progress_bar=False)
collection.commit()
print(added)
5. Build an index¶
Flat search is the simplest and most recall-friendly default. Use HNSW or IVF as data grows and latency matters.
IVF uses n_clusters; other index families allow the argument and ignore it:
6. Search¶
query = np.array([0.10, 0.20, 0.30, 0.40], dtype=np.float32)
result = collection.search(query, k=2, return_fields=True)
print(result.ids)
print(result.distances)
print(result.fields)
print(result.to_list())
Filter by metadata during vector search:
result = collection.search(
query,
k=3,
where="lang = 'en' AND rank <= 2",
return_fields=True,
)
print(result.to_list())
For IVF and HNSW, nprobe controls search breadth. Higher values generally
improve recall and increase latency.
Approximate flat distance rounding is available for IP, L2, and cosine metrics:
Flat, PQ, RaBitQ, PolarVec, and named vector-field searches ignore nprobe.
Hamming and Jaccard metrics ignore approx and eps.
7. Query metadata and vectors¶
Use query() when you need IDs and fields. Use query_vectors() when you need
stored vectors too.
rows = collection.query(where="tags CONTAINS 'vector'")
print(rows.ids)
print(rows.fields)
vectors = collection.query_vectors(filter_ids=[1, 2])
print(vectors.ids)
print(vectors.vectors.shape)
Calling query() or query_vectors() without where or filter_ids returns an
empty ResultView; it does not perform a full scan.
8. Text and hybrid search¶
Text search uses BM25 over stored metadata fields:
text_result = collection.text_search(
"vector guide",
k=3,
text_fields=["title"],
return_fields=True,
)
print(text_result.to_list())
Hybrid search combines vector and text candidates:
hybrid = collection.hybrid_search(
vector=query,
text="vector",
text_fields=["title", "tags"],
fusion="rrf",
k=3,
return_fields=True,
)
print(hybrid.to_list())
9. Named and sparse vectors¶
Named vector fields store additional embeddings for the same IDs. This is useful for multimodal records, for example text and image embeddings on one item.
collection.create_vector_field("image", dim=3, metric="l2")
image_vectors = np.array(
[
[0.10, 0.20, 0.30],
[0.12, 0.19, 0.28],
[0.90, 0.20, 0.10],
],
dtype=np.float32,
)
collection.add_named_vectors("image", image_vectors, ids=[1, 2, 3])
collection.build_index("HNSW-L2", field_name="image")
collection.commit()
image_result = collection.search(
[0.11, 0.20, 0.29],
k=2,
vector_field="image",
return_fields=True,
)
print(image_result.to_list())
Sparse vectors store feature-ID weights and search with inner product:
collection.add_sparse_vectors(
vectors=[
{10: 1.0, 42: 0.5},
{11: 1.0, 42: 0.8},
],
ids=[1, 2],
)
collection.commit()
sparse_result = collection.search_sparse({42: 1.0}, k=2, return_fields=True)
print(sparse_result.to_list())
10. Update, delete, and compact¶
Use upsert when the same external ID should be replaced or inserted:
collection.upsert_item(
[0.12, 0.20, 0.31, 0.41],
id=1,
field={"title": "updated intro", "lang": "en", "rank": 1},
)
collection.commit()
Deletes are soft deletes. Deleted IDs disappear from search and query results, but their raw storage is kept until compaction:
collection.delete_items([4])
print(collection.list_deleted_ids())
collection.restore_items([4])
print(collection.list_deleted_ids())
collection.delete_items([4])
removed = collection.compact()
print(removed)
11. What you learned¶
This quickstart touched the whole everyday workflow:
- choose local or remote mode with
VectorDBClient; - create a database and collection;
- insert vectors with stable integer IDs and metadata fields;
- commit writes through
insert_session(); - build and tune an index;
- search by vector, filter by metadata, query fields, and retrieve vectors;
- use text, hybrid, named vector, and sparse vector retrieval;
- update, soft-delete, restore, and compact rows.
For a complete curriculum, continue with the Learning path.
10. ResultView¶
Search, query, head, tail, and range APIs return ResultView.
result = collection.search(query, k=2, return_fields=True)
ids, distances, fields = result
print(result.ids) # numpy array
print(result.distances) # numpy array
print(result.fields) # list[dict]
print(result.to_dict())
print(result.to_list())
print(result.to_json())
Use to_list() for row iteration:
Optional dataframe conversions are available when the dependency is installed:
11. Delete, restore, and compact¶
Deletes are soft deletes. Deleted IDs are excluded from search and can be restored until compaction.
collection.delete_items([3])
print(collection.list_deleted_ids())
collection.restore_items([3])
print(collection.list_deleted_ids())
collection.delete_items([4])
removed = collection.compact()
print(removed)