Text Search
Substring and full-text search inside the Koder Stack uses the kdb-native substrate; external search engines (Elasticsearch, Meilisearch, Algolia, Typesense, Solr, Sphinx) are not adopted as dependencies of any Koder component. The substrate ships in two shapes — trigram inverted index for `LIKE` / `ILIKE` / similarity (PG `pg_trgm` equivalent) and GIN over tokenized columns for full-text (PG `tsvector` equivalent). Both live in `kdb-record` and are queried through `kdb-pgwire`'s standard SQL surface. Applied case of `self-hosted-first.kmd` for the search domain.
Policy — Text Search
Substring search and full-text search inside the Koder Stack use the *db-native substrate* External search engines are not adopted as dependencies of any Koder component.
This is the *pplied case*of self-hosted-first.kmd for the search domain. The 5 gates of selfhostedfirst apply here the same way they apply to web servers (Koder Jet vs nginx) and media engines (kodec vs FFmpeg).
Position
| Shape | Substrate | PG analogue | kdb ticket |
|---|---|---|---|
Substring / LIKE '%…%' / ILIKE / similarity |
Trigram inverted index in kdb-record |
pg_trgm |
#679 (this drop scaffolds; future fatias wire planner + storage) |
| Full-text (tokenize → stem → rank) | GIN over tokenized JSON path in kdb-record |
tsvector + GIN |
#537 (RFC-012 §7 document substrate + GIN) |
| Vector similarity (already shipped) | HNSW index in kdb-record::vector |
pgvector |
#535 (v1 linear) → #540 (HNSW) |
All three shapes:
- live inside
infra/data/kdb(kdb-recordstorage +kdb-planneruse) - multi
tenant byprefix pertenant_idpath - queried via standard SQL — no separate API or client library
- multi
tenant isolated at the substrate level (crosstenant readsreturn
[], not 403)
Why not external
The 5 gates of self-hosted-first.kmd evaluated against the search domain:
| Gate | Score |
|---|---|
| *1 — Capability* | Trigram + GIN cover the use cases the Koder Stack actually has. The Stack does not index webkoder-talk, docs in koder-kortex, repo content in flow.koder.dev). Capability ceiling matches need; no infinite scaling required. |
| *2 — Performance* | Trigram index for LIKE (closed in #679's critério) lands kdb within 1.2× PG; PG itself is competitive with Elasticsearch for sub |
| *3 — Ergonomics* | Standard SQL — zero new client lib, zero new dialect, zero new operational surface. Specialized engines require a separate cluster, separate auth, separate indexing pipeline, separate query language (DSL or domain-specific JSON), and separate monitoring. |
| *4 — Self-coverage* | kdb is already the substrate for the Stack; adding trigram + GIN to the substrate is incremental. Adding Elasticsearch would create a second source of truth requiring shadow-write + reconciliation forever. |
| *5 — Migration cost* | Zero — the Stack has not adopted an external search engine, so this policy doesn't gate migration, it gates introduction. |
All five gates pass for the Stack's use cases. External search engines remain off the dependency list.
What this policy forbids
- Adding a dependency on Elasticsearch / OpenSearch / Meilisearch /
Algolia / Typesense / Solr / Sphinx / Lucene-derived embedded search to any Koder Stack component.
- Hand-rolling a search service in any Koder product (Flow, Talk,
Kortex, Hub, etc.) instead of using the kdb substrate.
- Shipping a "search adapter" that bypasses SQL and talks to a
parallel store.
What this policy requires
- New search-shaped features inside the Stack go through kdb. If
the kdb substrate doesn't yet have the index type needed (e.g. trigram before
#679lands), open a kdb sub-ticket against#679/#537/#540rather than reaching outside. LIKE '%…%'queries must be benchmarked against the trigramindex when one exists. Linear scans on a column that should be indexed are a perf regression, not the default.
- Existing trigram / GIN / vector code paths reuse the existing
kdb-record::trigram/kdb-record::gin/kdb-record::vectormodules perpolicies/reuse-first.kmd— no parallel implementations.
Strictness
*trict.*No advisory bypass. If a Koder product genuinely needs external search capability the Stack doesn't have, the path is to open a kdb ticket to add it, not to reach outside. Exceptions require RFC modification of this policy or self-hosted-first.kmd.
Sibling policies
self-hosted-first.kmd— thealgorithm; this doc is the data for the search domain.
reuse-first.kmd— within-Stack reuse;forbids parallel
kdb-record::trigramclones.multi-tenant-by-default.kmd—trigram + GIN indexes carry
tenant_idpath prefix from commit 1.hyperscale-first.kmd— substrateshape scales without external dependency.