Tutorial: Core Concepts¶

LynseDB is a vector database with a Python-first API and a Rust backend. The main workflow is:

connect with VectorDBClient;
create or open a database;
create or open a collection;
insert vectors, IDs, and metadata fields;
build an index when needed;
search, filter, query, update, delete, and maintain the collection.

Client¶

lynse.VectorDBClient is the entry point.

import lynse

local_client = lynse.VectorDBClient(uri="./data")
remote_client = lynse.VectorDBClient("http://127.0.0.1:7637")

The uri decides the mode:

`uri` value	Mode	Meaning
`None`	Local	Use the default root path from LynseDB config.
filesystem path	Local	Use the Rust backend directly in this Python process.
`http://...` or `https://...`	Remote	Use the HTTP server.

Use local mode when one process owns the data directory. Use remote mode when more than one process, worker, or service needs shared access.

Database¶

A database is a named group of collections:

db = local_client.create_database("app")
same_db = local_client.get_database("app")

print(local_client.list_databases())

Use separate databases for separate applications, tenants, or environments when you want independent lifecycle operations such as drop, snapshot, or restore.

Collection¶

A collection is the unit of vector storage and search:

collection = db.require_collection("docs", dim=768)

The primary collection dimension is fixed. Every primary dense vector inserted into this collection must have dim values.

Use separate collections when:

vector dimensions differ;
index or metric choices differ;
data has a different lifecycle;
permission or tenant boundaries should be physically separate.

Use metadata fields when the records belong together but need filtering.

Row¶

Each row has:

Part	Required	Notes
ID	yes	Public string or non-negative integer ID, unique inside the collection.
primary vector	yes	Dense `float32` vector with collection dimension.
metadata field	no	JSON-like dict used for filters, BM25 search, and display.
named vectors	no	Extra dense vectors attached to the same ID.
sparse vector	no	Feature-ID weights attached to the same ID.

Example:

collection.add(
    ids="doc-1001",
    vectors=[0.1, 0.2, 0.3, 0.4],
    fields={
        "title": "vector database intro",
        "lang": "en",
        "tenant": "acme",
        "published": True,
        "tags": ["vector", "python"],
        "created_at": "2026-06-05",
    },
)

IDs¶

IDs passed to add() are public external IDs owned by your application. LynseDB keeps those IDs stable and maps them to internal monotonic integer IDs allocated by the Rust backend.

Good ID practice:

use strings or non-negative integers;
keep IDs unique within one collection;
use strings for natural IDs such as "doc-123#chunk-4";
store source document IDs, chunk numbers, and display payloads in metadata when they are useful for filtering or rendering;
do not depend on internal IDs for application logic.

Metadata fields¶

Fields are JSON-like dictionaries:

field = {
    "title": "LynseDB guide",
    "score": 0.92,
    "active": True,
    "tags": ["docs", "retrieval"],
    "source": {"name": "manual", "page": 3},
}

Use fields for:

result display;
filters through where=...;
BM25 search;
reranker payloads;
application bookkeeping.

Keep field types stable. For example, do not store "rank": "10" in some rows and "rank": 10 in others.

Vector metrics¶

The metric describes how similarity is measured:

Metric	Common index suffix	Meaning	Result ordering
Inner product	`-IP`	Larger score is better.	descending score
Squared L2	`-L2`	Smaller distance is better.	ascending distance
Cosine distance	`-COS` or `-Cos`	`1 - cosine_similarity`; smaller is better.	ascending distance
Manhattan	`-L1`	Sum of absolute component differences.	ascending distance
Haversine	`-HAVERSINE`	Great-circle distance in meters for `[longitude, latitude]`.	ascending distance
Correlation	`-CORRELATION`	`1 - Pearson r` for aligned profiles.	ascending distance
Hellinger	`-HELLINGER`	Distance between non-negative distributions.	ascending distance
Wasserstein-1D	`-WASSERSTEIN`	Earth-mover distance over equal-width ordered bins.	ascending distance
Jensen–Shannon	`-JENSEN-SHANNON`	Symmetric distance between non-negative distributions.	ascending distance
Chebyshev	`-CHEBYSHEV`	Largest absolute component difference.	ascending distance
Canberra	`-CANBERRA`	Sum of normalized component differences.	ascending distance
Bray–Curtis	`-BRAY-CURTIS`	Normalized total absolute difference.	ascending distance
Hamming	`-HAMMING-BINARY`	Smaller binary distance is better.	ascending distance
Jaccard	`-JACCARD-BINARY`	Smaller set distance is better.	ascending distance
Tanimoto	`-TANIMOTO-BINARY`	Binary Jaccard distance using chemistry terminology.	ascending distance
Sørensen-Dice	`-DICE-BINARY`	Binary Dice distance.	ascending distance

Choose the metric that matches your embedding model. Many modern embedding models are evaluated with cosine similarity or inner product after normalization.

Read Domain-aware distance metrics before using coordinate, profile, distribution, or fingerprint metrics; each has an explicit input contract and index compatibility matrix.

Indexes¶

An index controls how search scans candidates:

collection.build_index("FLAT-L2")
collection.build_index("HNSW-L2")
collection.build_index("IVF-L2", n_clusters=256)

Flat indexes are simplest and make good correctness baselines. ANN indexes such as HNSW and IVF trade exactness for latency. Quantized indexes trade some quality or extra reranking work for lower memory or disk use.

ResultView¶

Search and query methods return ResultView:

result = collection.search([0.1, 0.2, 0.3, 0.4], k=3, return_fields=True)

print(result.ids)
print(result.distances)
print(result.fields)
print(result.to_list())

Use attributes for program logic and to_list() for row-shaped display.

Commits and durability¶

add() is the simple write-through path. For grouped ingestion, prefer insert_session():

with collection.insert_session() as session:
    session.add(
        ids="doc-1",
        vectors=[0.1, 0.2, 0.3, 0.4],
        fields={"title": "first row"},
    )

The session commits when the block succeeds. If the block raises an exception, pending buffered writes from that session are discarded and the original exception is preserved.

Use explicit lifecycle calls for services and operations:

collection.commit()      # fast logical commit
collection.checkpoint()  # durable checkpoint
collection.flush()       # advanced: flush bytes without clearing WAL
collection.close()
client.close()

commit() is optimized for write latency. It makes the batch visible and clears WAL state, but it does not promise that data has reached stable storage at the instant the call returns. checkpoint() is the deterministic durability boundary; call it before backups, snapshots, controlled shutdowns, or critical write acknowledgements. flush() is mostly useful for storage-level workflows that need bytes pushed out while keeping WAL state.

Local and remote parity¶

The high-level Python API is intentionally similar in local and remote mode:

client = lynse.VectorDBClient(uri="./data")
# or
client = lynse.VectorDBClient("http://127.0.0.1:7637", api_key="secret")

db = client.create_database("app")
collection = db.require_collection("docs", dim=4)

This makes it practical to prototype locally, then move to HTTP mode when the application needs multiple processes or deployment controls.