Skip to content

Cluster Deployment and Maintenance

LynseDB cluster mode is a lightweight sharding layer for remote HTTP deployments. You start several ordinary LynseDB servers as shard nodes, then start a coordinator in front of them. For high availability, you may run extra standby coordinators that use the same metadata authority. Applications keep using VectorDBClient("http://coordinator:7637") just like a single remote server.

Use cluster mode when one server is not enough for your data size or query throughput, or when you want a primary plus replica layout for shard failover. For a first production deployment, start with one coordinator and at least two shard groups. Each shard group should have one primary and, if you need failover, one replica.

Architecture

Python client
    |
    v
Coordinator :7637
    |
    +-- shard group sg0
    |     +-- primary http://10.0.0.11:7638
    |     +-- replica http://10.0.0.12:7638
    |
    +-- shard group sg1
          +-- primary http://10.0.0.21:7638
          +-- replica http://10.0.0.22:7638

The coordinator owns cluster metadata and request routing:

  • collection metadata is stored on the metadata owner shard(s);
  • IDs are mapped to shard groups with stable hash buckets;
  • writes are routed to the owning shard group;
  • searches fan out to all shard groups and are merged by the coordinator;
  • writes are mirrored to active replicas when write_mirror_replicas is true;
  • a replica that misses a write is marked stale and is not auto-promoted.

Shard nodes are normal LynseDB HTTP servers. They also start an internal custom RPC listener automatically. The coordinator derives the internal RPC port from the shard HTTP port: http_port + 10000, or http_port - 10000 for high ports. For example, shard HTTP port 7638 uses internal RPC port 17638. Search, batch search, add, upsert, delete, and restore use RPC when available and fall back to HTTP when RPC is unavailable.

Before You Start

Install LynseDB on every node:

pip install lynsedb

Plan these pieces before production:

Item Recommendation
Coordinator Run one active coordinator for the simplest setup. For HA, run standby coordinators with the same metadata owner shard(s).
Shards Run one LynseDB server per primary or replica data directory.
Data directories Give every shard process its own persistent directory. Do not share one directory between nodes.
Network Let clients reach the coordinator. Let the coordinator reach every shard HTTP port and derived RPC port.
Auth Use --api-key on shards and --shard-api-key on the coordinator. Protect the coordinator with a private network or reverse proxy.
Backups Back up every shard data directory. Metadata owner shard data directories include coordinator metadata.

The coordinator currently does not enforce client authentication itself. If the coordinator is reachable outside a trusted network, put it behind a reverse proxy, gateway, firewall, or service mesh that provides authentication.

Coordinator Metadata Mental Model

Coordinator metadata includes routing tables, generated ID allocation, shard promotions, replica state, and the active coordinator lease. It is stored on metadata owner shard(s), so cluster mode does not need a shared disk.

By default, the coordinator infers metadata owners from cluster.json. If the config has three or more shard primaries, the first three primaries are used as replicated metadata owners with majority writes and automatic repair. Smaller clusters use the first primary as a single metadata owner.

Use --metadata-owners only when you want explicit control:

Owners Behavior
omitted Use the first three shard primaries when available; otherwise use the first primary.
one URI Store coordinator metadata on that shard through internal RPC.
three or more URIs Store coordinator metadata with majority CAS and repair stale or missing owners from the committed majority.

--cluster-state is only a local cache path for the coordinator process. Each coordinator can use its own local path, even if the path text is the same on different machines. The authoritative metadata lives on the metadata owner shard(s).

Quick Local Cluster

This example starts two shard groups on one machine. It is the easiest way to learn the moving parts before deploying on separate hosts.

Create a cluster config:

cluster.json
{
  "bucket_count": 4096,
  "write_mirror_replicas": true,
  "shard_groups": [
    {
      "name": "sg0",
      "primary": "http://127.0.0.1:7638",
      "replicas": [
        "http://127.0.0.1:7639"
      ]
    },
    {
      "name": "sg1",
      "primary": "http://127.0.0.1:7640",
      "replicas": [
        "http://127.0.0.1:7641"
      ]
    }
  ]
}

Start the shard nodes in separate terminals:

lynse serve --host 127.0.0.1 --port 7638 --data-dir ./data/sg0-primary
lynse serve --host 127.0.0.1 --port 7639 --data-dir ./data/sg0-replica
lynse serve --host 127.0.0.1 --port 7640 --data-dir ./data/sg1-primary
lynse serve --host 127.0.0.1 --port 7641 --data-dir ./data/sg1-replica

Start the coordinator:

lynse serve \
  --role coordinator \
  --host 127.0.0.1 \
  --port 7637 \
  --cluster-config ./cluster.json \
  --cluster-state ./cluster_state.cache.json

Check that the coordinator is running:

curl http://127.0.0.1:7637/
curl http://127.0.0.1:7637/cluster_info

Use the coordinator from Python:

from lynse import VectorDBClient

client = VectorDBClient("http://127.0.0.1:7637")
db = client.create_database("demo", drop_if_exists=True)
collection = db.require_collection("docs", dim=4)

collection.add(
    ids=["doc-1", "doc-2"],
    vectors=[
        [0.1, 0.2, 0.3, 0.4],
        [0.4, 0.3, 0.2, 0.1],
    ],
    fields=[
        {"title": "first"},
        {"title": "second"},
    ],
)
collection.commit()

result = collection.search([0.1, 0.2, 0.3, 0.4], k=2, return_fields=True)
print(result)

Production Layout

In production, run each shard primary and replica on separate machines or separate failure domains.

Example:

/etc/lynsedb/cluster.json
{
  "bucket_count": 4096,
  "write_mirror_replicas": true,
  "shard_groups": [
    {
      "name": "sg0",
      "primary": "http://10.0.0.11:7638",
      "replicas": [
        "http://10.0.0.12:7638"
      ]
    },
    {
      "name": "sg1",
      "primary": "http://10.0.0.21:7638",
      "replicas": [
        "http://10.0.0.22:7638"
      ]
    }
  ]
}

Start each shard:

lynse serve \
  --host 0.0.0.0 \
  --port 7638 \
  --data-dir /var/lib/lynsedb/shard \
  --api-key "${LYNSE_SHARD_API_KEY}"

Start the coordinator:

lynse serve \
  --role coordinator \
  --host 0.0.0.0 \
  --port 7637 \
  --cluster-config /etc/lynsedb/cluster.json \
  --cluster-state /var/lib/lynsedb/coordinator/cluster_state.cache.json \
  --shard-api-key "${LYNSE_SHARD_API_KEY}" \
  --health-interval-secs 1.0 \
  --health-failures 3

For coordinator HA, run another coordinator with the same cluster.json. If --metadata-owners is omitted, every coordinator infers the same metadata owner set from the shard primaries:

node-a
lynse serve \
  --role coordinator \
  --host 0.0.0.0 \
  --port 7637 \
  --cluster-config /etc/lynsedb/cluster.json \
  --cluster-state /var/lib/lynsedb/coordinator/cluster_state.cache.json \
  --coordinator-uri http://node-a:7637 \
  --shard-api-key "${LYNSE_SHARD_API_KEY}"
node-b
lynse serve \
  --role coordinator \
  --host 0.0.0.0 \
  --port 7637 \
  --cluster-config /etc/lynsedb/cluster.json \
  --cluster-state /var/lib/lynsedb/coordinator/cluster_state.cache.json \
  --coordinator-uri http://node-b:7637 \
  --shard-api-key "${LYNSE_SHARD_API_KEY}"

In this mode, cluster_state.cache.json is only a local cache on each coordinator. The authoritative metadata is stored on the inferred metadata owner shard(s). With the two-shard example above, the first primary http://10.0.0.11:7638 is the single metadata owner. With three or more shard primaries, the first three primaries are used automatically and a stale or restarted metadata owner is repaired from the committed majority.

If you prefer to pin a single metadata owner explicitly, add:

--metadata-owners http://10.0.0.11:7638

If you prefer to pin replicated metadata owners explicitly, pass three or more owners:

lynse serve \
  --role coordinator \
  --host 0.0.0.0 \
  --port 7637 \
  --cluster-config /etc/lynsedb/cluster.json \
  --cluster-state /var/lib/lynsedb/coordinator/cluster_state.cache.json \
  --metadata-owners http://10.0.0.11:7638,http://10.0.0.12:7638,http://10.0.0.21:7638 \
  --coordinator-uri http://node-a:7637 \
  --shard-api-key "${LYNSE_SHARD_API_KEY}"

On later coordinator restarts, keep using the same cluster.json and --metadata-owners value if you set one. --cluster-state remains only the local cache path. You may pass both --cluster-config and --cluster-state, but existing metadata on the owner shard(s) is the source of truth after the first start.

Configuration Reference

Cluster config fields:

Field Default Description
bucket_count 4096 Number of routing buckets assigned to shard groups when a collection is created. Keep this value stable after data exists.
write_mirror_replicas true Mirror writes to replicas whose state is active.
shard_groups required List of shard groups. Each group needs a primary URI and may have replicas.
state_path cluster_state.cache.json Optional local coordinator metadata cache path when --cluster-state is not provided.
shard_api_key none Optional key used by the coordinator when forwarding requests to shards.
metadata_owners or metadata.owners inferred from shard primaries Optional metadata owner shard URI(s). Omit to use the first three primaries when available, or the first primary for small clusters. Use one URI to force single-owner metadata, or 3+ URIs for replicated metadata.

Replica entries can be simple strings:

"replicas": ["http://10.0.0.12:7638"]

Or objects with an explicit state:

"replicas": [
  {"uri": "http://10.0.0.12:7638", "state": "active"}
]

Coordinator CLI flags:

Flag Description
--role coordinator Start coordinator mode instead of a normal shard server.
--cluster-config JSON config used to seed metadata on first start and infer the default metadata owners.
--cluster-state Local coordinator metadata cache path. It does not need to be shared between machines.
--shard-api-key API key sent to shard nodes as Authorization: Bearer ....
--metadata-owners Optional comma-separated metadata owner shard HTTP URIs. Omit to infer owners from shard primaries; provide 3+ URIs for replicated metadata.
--coordinator-uri Advertised URI for this coordinator. Other coordinators proxy to this address when this process is leader.
--coordinator-id Stable coordinator ID. Defaults to --coordinator-uri; most deployments do not need to set it.
--coordinator-lease-secs Leader lease duration. Lower values detect coordinator failure faster but are more sensitive to storage or scheduling stalls.
--request-timeout-secs Timeout for coordinator-to-shard requests.
--health-interval-secs Interval between shard health probes.
--health-failures Consecutive failed probes before a node is considered unhealthy.

Environment variables are also supported:

export LYNSE_ROLE=coordinator
export LYNSE_CLUSTER_CONFIG=/etc/lynsedb/cluster.json
export LYNSE_CLUSTER_STATE=/var/lib/lynsedb/coordinator/cluster_state.cache.json
export LYNSE_SHARD_API_KEY=your_shard_key
export LYNSE_CLUSTER_METADATA_OWNERS=http://10.0.0.11:7638
export LYNSE_COORDINATOR_URI=http://node-a:7637
export LYNSE_HEALTH_INTERVAL_SECS=1.0
export LYNSE_HEALTH_FAILURES=3

How Routing Works

When a collection is created, the coordinator stores its routing table in the metadata authority. Each bucket points to one shard group.

For explicit string or integer IDs, LynseDB hashes the external ID and routes the item to the owning shard group. For records that need generated integer IDs, the coordinator allocates IDs and then routes them. This means clients can continue to use normal add, upsert, delete, restore, search, and query APIs through the coordinator.

Search requests are sent to every shard group. The coordinator asks one healthy node per group, merges the per-shard top results, and returns one result set to the client.

Health and Failover

Shard Failover

The coordinator probes every primary and replica. A node is marked unhealthy after --health-failures consecutive failed probes.

If a primary is unhealthy and an active healthy replica exists in the same shard group, the coordinator promotes that replica. The old primary is moved into the replica list with state stale.

Check cluster state:

curl http://127.0.0.1:7637/cluster_info

Look for:

  • primary: current primary URI for each shard group;
  • primary_epoch: increments when a new primary is promoted;
  • replica state: active replicas can receive mirrored writes and can be promoted; stale replicas cannot be promoted automatically;
  • meta_epoch: increments when coordinator metadata changes.

Important behavior:

  • If a primary write fails, the client request fails.
  • If a replica write fails, the request can still succeed, but that replica is marked stale.
  • A stale replica may be missing data and must be rebuilt before it is marked active again.
  • If a shard group has no healthy active node, reads and writes for that group will fail until a node is restored.

Coordinator Failover

Coordinator failover protects the coordinator process itself. It does not copy or repair shard data; shard primary promotion is handled separately by the active coordinator.

When coordinator failover is enabled, one coordinator holds the leader lease and serves as leader. Other coordinators stay online as standbys. A standby can accept client traffic and proxy it to the leader. If the leader stops renewing its lease, a standby takes over, reloads the latest metadata, and starts serving requests directly.

Check coordinator status:

curl http://coordinator:7637/coordinator_status

Look for:

  • role: leader, standby, or single;
  • coordinator_uri: the URI this coordinator advertises;
  • leader.leader_uri: the current leader address;
  • leader.lease_epoch: increments when leadership moves to another coordinator;
  • metadata.mode: single or replicated;
  • metadata.degraded: whether replicated metadata has unavailable or stale owners.

Maintenance Tasks

Restart a Shard

For a short restart of a healthy node:

  1. Stop the shard process.
  2. Start it again with the same --data-dir, host, port, and API key.
  3. Check curl http://shard-host:7638/healthz.
  4. Check curl http://coordinator:7637/cluster_info.

If the node did not miss writes, it can continue serving. If it missed writes and is marked stale, rebuild it before relying on it.

Restart the Coordinator

The coordinator keeps only a local metadata cache. Restart it with the same cluster.json and the same --metadata-owners value if you set one:

lynse serve \
  --role coordinator \
  --host 0.0.0.0 \
  --port 7637 \
  --cluster-config /etc/lynsedb/cluster.json \
  --cluster-state /var/lib/lynsedb/coordinator/cluster_state.cache.json \
  --shard-api-key "${LYNSE_SHARD_API_KEY}"

Existing metadata is loaded from the metadata owner shard(s). The cache file can be deleted and rebuilt by the coordinator.

Rebuild a Stale Replica

A stale replica should be treated as out of date. Rebuild it from the current primary for that shard group.

Recommended process:

  1. Stop the stale replica.
  2. Keep it out of service while rebuilding.
  3. Take a consistent backup or filesystem copy from the current primary data directory during a maintenance window.
  4. Restore that copy into the replica data directory.
  5. Start the replica with the same port and API key.
  6. Verify the replica health endpoint.
  7. Update coordinator metadata so that replica state changes from stale to active.
  8. Restart or refresh the coordinator so it observes the updated metadata.

Only mark a replica active after you are confident it has the same data as the current primary.

Add Shards

New shard groups are used by collections created after the routing table includes them. Existing collections keep their original bucket_to_group mapping and are not automatically rebalanced.

For a new deployment, choose the shard group count before loading large data. For an existing deployment, the safest scaling path is:

  1. Start a new cluster config with the desired shard groups.
  2. Create a new database or collection.
  3. Re-ingest or migrate the data through the coordinator.
  4. Switch application traffic after validation.

Do not edit bucket_to_group for an existing collection unless you are also moving the corresponding data and have tested the migration offline.

Back Up a Cluster

Back up these items together:

  • every shard primary data directory;
  • every replica data directory if you want faster replica restore;
  • metadata owner shard data directories, which include coordinator metadata;
  • the cluster config and service files.

Before a planned backup, run checkpoints through the coordinator:

from lynse import VectorDBClient

client = VectorDBClient("http://coordinator:7637")
db = client.get_database("demo")
collection = db.get_collection("docs")
collection.checkpoint()

For multiple collections, checkpoint each collection. Then snapshot or copy the metadata owner shard data directories and shard data directories as one backup set.

Upgrade

Use a maintenance window for upgrades:

  1. Back up all shard data directories, including metadata owner shards.
  2. Stop writes at the application layer.
  3. Checkpoint active collections.
  4. Upgrade replicas first and start them.
  5. Upgrade primaries.
  6. Restart the coordinator.
  7. Run smoke tests through the coordinator.
  8. Resume writes.

For large clusters, test the exact version upgrade on a staging copy first.

Monitoring

Monitor each shard as a normal LynseDB server:

curl http://10.0.0.11:7638/healthz
curl http://10.0.0.11:7638/readyz
curl http://10.0.0.11:7638/metrics

Monitor the coordinator:

curl http://coordinator:7637/
curl http://coordinator:7637/cluster_info
curl http://coordinator:7637/coordinator_status

Alert on:

  • shard health or readiness failures;
  • coordinator process down;
  • unexpected coordinator leader changes;
  • replica state changing to stale;
  • unexpected primary_epoch changes;
  • disk usage on shard data volumes;
  • high query latency or write failures on shards.

Troubleshooting

Symptom What to check
Coordinator will not start Pass --cluster-config, or pass --metadata-owners explicitly when no config is available.
cluster config requires at least one shard group The config must contain shard_groups or shards with at least one entry.
Shard requests return unauthorized Make sure shards use --api-key and the coordinator uses the same --shard-api-key.
Metadata owner is unreachable Open the derived RPC port from each coordinator to each metadata owner shard. Metadata reads, writes, lease renewals, and owner repair use internal RPC.
Data RPC is not used Open the derived RPC port between coordinator and shard. Ordinary shard data requests can fall back to HTTP, but metadata owner traffic requires RPC.
metadata.degraded is true A replicated metadata owner is unavailable or behind. The coordinator repairs stale owners from the committed majority when they are reachable again.
Replica becomes stale It missed a mirrored write. Rebuild it from the primary before marking it active.
Search misses recent writes Confirm writes were sent to the coordinator, check shard health, then inspect cluster_info for stale or promoted nodes.
Existing data does not rebalance after adding a shard Existing collection routing is fixed. Create a new collection and migrate or re-ingest.

Operational Checklist

Before accepting production traffic:

  • shard nodes have persistent data directories;
  • metadata owner shard data directories are on persistent storage;
  • coordinator-to-shard HTTP and derived RPC ports are reachable;
  • shard authentication is configured if the shard network is not fully trusted;
  • coordinator access is protected by network policy or a proxy;
  • /cluster_info shows the expected primaries and active replicas;
  • a backup and restore procedure has been tested;
  • application clients connect only to the coordinator.