Tutorial: Core Concepts¶
LynseDB is a vector database with a Python-first API and a Rust backend. The main workflow is:
- connect with
VectorDBClient; - create or open a database;
- create or open a collection;
- insert vectors, IDs, and metadata fields;
- build an index when needed;
- search, filter, query, update, delete, and maintain the collection.
Client¶
lynse.VectorDBClient is the entry point.
import lynse
local_client = lynse.VectorDBClient(uri="./data")
remote_client = lynse.VectorDBClient("http://127.0.0.1:7637")
The uri decides the mode:
uri value |
Mode | Meaning |
|---|---|---|
None |
Local | Use the default root path from LynseDB config. |
| filesystem path | Local | Use the Rust backend directly in this Python process. |
http://... or https://... |
Remote | Use the HTTP server. |
Use local mode when one process owns the data directory. Use remote mode when more than one process, worker, or service needs shared access.
Database¶
A database is a named group of collections:
db = local_client.create_database("app")
same_db = local_client.get_database("app")
print(local_client.list_databases())
Use separate databases for separate applications, tenants, or environments when you want independent lifecycle operations such as drop, snapshot, or restore.
Collection¶
A collection is the unit of vector storage and search:
The primary collection dimension is fixed. Every primary dense vector inserted
into this collection must have dim values.
Use separate collections when:
- vector dimensions differ;
- index or metric choices differ;
- data has a different lifecycle;
- permission or tenant boundaries should be physically separate.
Use metadata fields when the records belong together but need filtering.
Row¶
Each row has:
| Part | Required | Notes |
|---|---|---|
| ID | yes | User-provided integer ID, unique inside the collection. |
| primary vector | yes | Dense float32 vector with collection dimension. |
| metadata field | no | JSON-like dict used for filters, text search, and display. |
| named vectors | no | Extra dense vectors attached to the same ID. |
| sparse vector | no | Feature-ID weights attached to the same ID. |
Example:
with collection.insert_session() as session:
session.add_item(
[0.1, 0.2, 0.3, 0.4],
id=1001,
field={
"title": "vector database intro",
"lang": "en",
"tenant": "acme",
"published": True,
"tags": ["vector", "python"],
"created_at": "2026-06-05",
},
)
IDs¶
IDs are external IDs owned by your application. They are not auto-generated by
add_item() or bulk_add_items().
Good ID practice:
- use integers;
- keep IDs unique within one collection;
- prefer incrementing IDs for predictable storage layout;
- store your source document ID or chunk ID in metadata when you need a string business identifier.
bulk_add_binary() is the exception: it assigns sequential IDs after the
current maximum ID because it is optimized for dense arrays without per-row
metadata.
Metadata fields¶
Fields are JSON-like dictionaries:
field = {
"title": "LynseDB guide",
"score": 0.92,
"active": True,
"tags": ["docs", "retrieval"],
"source": {"name": "manual", "page": 3},
}
Use fields for:
- result display;
- filters through
where=...; - BM25 text search;
- reranker payloads;
- application bookkeeping.
Keep field types stable. For example, do not store "rank": "10" in some rows
and "rank": 10 in others.
Vector metrics¶
The metric describes how similarity is measured:
| Metric | Common index suffix | Meaning | Result ordering |
|---|---|---|---|
| Inner product | FLAT, HNSW, IVF, DiskANN |
Larger score is better. | descending score |
| Squared L2 | -L2 |
Smaller distance is better. | ascending distance |
| Cosine | -COS or -Cos |
Larger similarity is better. | descending score |
| Hamming | -HAMMING-BINARY |
Smaller binary distance is better. | ascending distance |
| Jaccard | -JACCARD-BINARY |
Smaller set distance is better. | ascending distance |
Choose the metric that matches your embedding model. Many modern embedding models are evaluated with cosine similarity or inner product after normalization.
Indexes¶
An index controls how search scans candidates:
collection.build_index("FLAT-L2")
collection.build_index("HNSW-L2")
collection.build_index("IVF-L2", n_clusters=256)
Flat indexes are simplest and make good correctness baselines. ANN indexes such as HNSW and IVF trade exactness for latency. Quantized indexes trade some quality or extra reranking work for lower memory or disk use.
ResultView¶
Search and query methods return ResultView:
result = collection.search([0.1, 0.2, 0.3, 0.4], k=3, return_fields=True)
print(result.ids)
print(result.distances)
print(result.fields)
print(result.to_list())
Use attributes for program logic and to_list() for row-shaped display.
Commits and durability¶
Writes are buffered. In everyday code, prefer insert_session():
The session commits when the block succeeds. If the block raises an exception, pending buffered writes from that session are discarded and the original exception is preserved.
Use explicit lifecycle calls for services and operations:
Local and remote parity¶
The high-level Python API is intentionally similar in local and remote mode:
client = lynse.VectorDBClient(uri="./data")
# or
client = lynse.VectorDBClient("http://127.0.0.1:7637", api_key="secret")
db = client.create_database("app")
collection = db.require_collection("docs", dim=4)
This makes it practical to prototype locally, then move to HTTP mode when the application needs multiple processes or deployment controls.