Text Search

mandatory

Substring and full-text search inside the Koder Stack uses the kdb-native substrate; external search engines (Elasticsearch, Meilisearch, Algolia, Typesense, Solr, Sphinx) are not adopted as dependencies of any Koder component. The substrate ships in two shapes — trigram inverted index for `LIKE` / `ILIKE` / similarity (PG `pg_trgm` equivalent) and GIN over tokenized columns for full-text (PG `tsvector` equivalent). Both live in `kdb-record` and are queried through `kdb-pgwire`'s standard SQL surface. Applied case of `self-hosted-first.kmd` for the search domain.

Policy — Text Search

Substring search and full-text search inside the Koder Stack use the *db-native substrate* External search engines are not adopted as dependencies of any Koder component.

This is the *pplied case*of self-hosted-first.kmd for the search domain. The 5 gates of selfhostedfirst apply here the same way they apply to web servers (Koder Jet vs nginx) and media engines (kodec vs FFmpeg).

Position

Shape Substrate PG analogue kdb ticket
Substring / LIKE '%…%' / ILIKE / similarity Trigram inverted index in kdb-record pg_trgm #679 (this drop scaffolds; future fatias wire planner + storage)
Full-text (tokenize → stem → rank) GIN over tokenized JSON path in kdb-record tsvector + GIN #537 (RFC-012 §7 document substrate + GIN)
Vector similarity (already shipped) HNSW index in kdb-record::vector pgvector #535 (v1 linear) → #540 (HNSW)

All three shapes:

  • live inside infra/data/kdb (kdb-record storage + kdb-planner use)
  • multitenant by tenant_id pathprefix per

    policies/multi-tenant-by-default.kmd

  • queried via standard SQL — no separate API or client library
  • multitenant isolated at the substrate level (crosstenant reads

    return [], not 403)

Why not external

The 5 gates of self-hosted-first.kmd evaluated against the search domain:

Gate Score
*1 — Capability* Trigram + GIN cover the use cases the Koder Stack actually has. The Stack does not index webscale public corpora; it indexes userfacing data (messages in koder-talk, docs in koder-kortex, repo content in flow.koder.dev). Capability ceiling matches need; no infinite scaling required.
*2 — Performance* Trigram index for LIKE (closed in #679's critério) lands kdb within 1.2× PG; PG itself is competitive with Elasticsearch for sub100Mdoc corpora. Specialized engines win at >100M docs; the Stack doesn't operate there.
*3 — Ergonomics* Standard SQL — zero new client lib, zero new dialect, zero new operational surface. Specialized engines require a separate cluster, separate auth, separate indexing pipeline, separate query language (DSL or domain-specific JSON), and separate monitoring.
*4 — Self-coverage* kdb is already the substrate for the Stack; adding trigram + GIN to the substrate is incremental. Adding Elasticsearch would create a second source of truth requiring shadow-write + reconciliation forever.
*5 — Migration cost* Zero — the Stack has not adopted an external search engine, so this policy doesn't gate migration, it gates introduction.

All five gates pass for the Stack's use cases. External search engines remain off the dependency list.

What this policy forbids

  • Adding a dependency on Elasticsearch / OpenSearch / Meilisearch /

    Algolia / Typesense / Solr / Sphinx / Lucene-derived embedded search to any Koder Stack component.

  • Hand-rolling a search service in any Koder product (Flow, Talk,

    Kortex, Hub, etc.) instead of using the kdb substrate.

  • Shipping a "search adapter" that bypasses SQL and talks to a

    parallel store.

What this policy requires

  • New search-shaped features inside the Stack go through kdb. If

    the kdb substrate doesn't yet have the index type needed (e.g. trigram before #679 lands), open a kdb sub-ticket against #679 / #537 / #540 rather than reaching outside.

  • LIKE '%…%' queries must be benchmarked against the trigram

    index when one exists. Linear scans on a column that should be indexed are a perf regression, not the default.

  • Existing trigram / GIN / vector code paths reuse the existing

    kdb-record::trigram / kdb-record::gin / kdb-record::vector modules per policies/reuse-first.kmd — no parallel implementations.

Strictness

*trict.*No advisory bypass. If a Koder product genuinely needs external search capability the Stack doesn't have, the path is to open a kdb ticket to add it, not to reach outside. Exceptions require RFC modification of this policy or self-hosted-first.kmd.

Sibling policies

Source: ../home/koder/dev/koder/meta/docs/stack/policies/text-search.kmd