Tutorial: Core Concepts¶
LynseDB is a vector database with a Python-first API and a Rust backend. The main workflow is:
- connect with
VectorDBClient; - create or open a database;
- create or open a collection;
- insert vectors, IDs, and metadata fields;
- build an index when needed;
- search, filter, query, update, delete, and maintain the collection.
Client¶
lynse.VectorDBClient is the entry point.
import lynse
local_client = lynse.VectorDBClient(uri="./data")
remote_client = lynse.VectorDBClient("http://127.0.0.1:7637")
The uri decides the mode:
uri value |
Mode | Meaning |
|---|---|---|
None |
Local | Use the default root path from LynseDB config. |
| filesystem path | Local | Use the Rust backend directly in this Python process. |
http://... or https://... |
Remote | Use the HTTP server. |
Use local mode when one process owns the data directory. Use remote mode when more than one process, worker, or service needs shared access.
Database¶
A database is a named group of collections:
db = local_client.create_database("app")
same_db = local_client.get_database("app")
print(local_client.list_databases())
Use separate databases for separate applications, tenants, or environments when you want independent lifecycle operations such as drop, snapshot, or restore.
Collection¶
A collection is the unit of vector storage and search:
The primary collection dimension is fixed. Every primary dense vector inserted
into this collection must have dim values.
Use separate collections when:
- vector dimensions differ;
- index or metric choices differ;
- data has a different lifecycle;
- permission or tenant boundaries should be physically separate.
Use metadata fields when the records belong together but need filtering.
Row¶
Each row has:
| Part | Required | Notes |
|---|---|---|
| ID | yes | Public string or non-negative integer ID, unique inside the collection. |
| primary vector | yes | Dense float32 vector with collection dimension. |
| metadata field | no | JSON-like dict used for filters, BM25 search, and display. |
| named vectors | no | Extra dense vectors attached to the same ID. |
| sparse vector | no | Feature-ID weights attached to the same ID. |
Example:
collection.add(
ids="doc-1001",
vectors=[0.1, 0.2, 0.3, 0.4],
fields={
"title": "vector database intro",
"lang": "en",
"tenant": "acme",
"published": True,
"tags": ["vector", "python"],
"created_at": "2026-06-05",
},
)
IDs¶
IDs passed to add() are public external IDs owned by your application. LynseDB
keeps those IDs stable and maps them to internal monotonic integer IDs allocated
by the Rust backend.
Good ID practice:
- use strings or non-negative integers;
- keep IDs unique within one collection;
- use strings for natural IDs such as
"doc-123#chunk-4"; - store source document IDs, chunk numbers, and display payloads in metadata when they are useful for filtering or rendering;
- do not depend on internal IDs for application logic.
Metadata fields¶
Fields are JSON-like dictionaries:
field = {
"title": "LynseDB guide",
"score": 0.92,
"active": True,
"tags": ["docs", "retrieval"],
"source": {"name": "manual", "page": 3},
}
Use fields for:
- result display;
- filters through
where=...; - BM25 search;
- reranker payloads;
- application bookkeeping.
Keep field types stable. For example, do not store "rank": "10" in some rows
and "rank": 10 in others.
Vector metrics¶
The metric describes how similarity is measured:
| Metric | Common index suffix | Meaning | Result ordering |
|---|---|---|---|
| Inner product | -IP |
Larger score is better. | descending score |
| Squared L2 | -L2 |
Smaller distance is better. | ascending distance |
| Cosine distance | -COS or -Cos |
1 - cosine_similarity; smaller is better. |
ascending distance |
| Manhattan | -L1 |
Sum of absolute component differences. | ascending distance |
| Haversine | -HAVERSINE |
Great-circle distance in meters for [longitude, latitude]. |
ascending distance |
| Correlation | -CORRELATION |
1 - Pearson r for aligned profiles. |
ascending distance |
| Hellinger | -HELLINGER |
Distance between non-negative distributions. | ascending distance |
| Wasserstein-1D | -WASSERSTEIN |
Earth-mover distance over equal-width ordered bins. | ascending distance |
| Jensen–Shannon | -JENSEN-SHANNON |
Symmetric distance between non-negative distributions. | ascending distance |
| Chebyshev | -CHEBYSHEV |
Largest absolute component difference. | ascending distance |
| Canberra | -CANBERRA |
Sum of normalized component differences. | ascending distance |
| Bray–Curtis | -BRAY-CURTIS |
Normalized total absolute difference. | ascending distance |
| Hamming | -HAMMING-BINARY |
Smaller binary distance is better. | ascending distance |
| Jaccard | -JACCARD-BINARY |
Smaller set distance is better. | ascending distance |
| Tanimoto | -TANIMOTO-BINARY |
Binary Jaccard distance using chemistry terminology. | ascending distance |
| Sørensen-Dice | -DICE-BINARY |
Binary Dice distance. | ascending distance |
Choose the metric that matches your embedding model. Many modern embedding models are evaluated with cosine similarity or inner product after normalization.
Read Domain-aware distance metrics before using coordinate, profile, distribution, or fingerprint metrics; each has an explicit input contract and index compatibility matrix.
Indexes¶
An index controls how search scans candidates:
collection.build_index("FLAT-L2")
collection.build_index("HNSW-L2")
collection.build_index("IVF-L2", n_clusters=256)
Flat indexes are simplest and make good correctness baselines. ANN indexes such as HNSW and IVF trade exactness for latency. Quantized indexes trade some quality or extra reranking work for lower memory or disk use.
ResultView¶
Search and query methods return ResultView:
result = collection.search([0.1, 0.2, 0.3, 0.4], k=3, return_fields=True)
print(result.ids)
print(result.distances)
print(result.fields)
print(result.to_list())
Use attributes for program logic and to_list() for row-shaped display.
Commits and durability¶
add() is the simple write-through path. For grouped ingestion, prefer
insert_session():
with collection.insert_session() as session:
session.add(
ids="doc-1",
vectors=[0.1, 0.2, 0.3, 0.4],
fields={"title": "first row"},
)
The session commits when the block succeeds. If the block raises an exception, pending buffered writes from that session are discarded and the original exception is preserved.
Use explicit lifecycle calls for services and operations:
collection.commit() # fast logical commit
collection.checkpoint() # durable checkpoint
collection.flush() # advanced: flush bytes without clearing WAL
collection.close()
client.close()
commit() is optimized for write latency. It makes the batch visible and clears
WAL state, but it does not promise that data has reached stable storage at the
instant the call returns. checkpoint() is the deterministic durability
boundary; call it before backups, snapshots, controlled shutdowns, or critical
write acknowledgements. flush() is mostly useful for storage-level workflows
that need bytes pushed out while keeping WAL state.
Local and remote parity¶
The high-level Python API is intentionally similar in local and remote mode:
client = lynse.VectorDBClient(uri="./data")
# or
client = lynse.VectorDBClient("http://127.0.0.1:7637", api_key="secret")
db = client.create_database("app")
collection = db.require_collection("docs", dim=4)
This makes it practical to prototype locally, then move to HTTP mode when the application needs multiple processes or deployment controls.