Data blob

Data Blob

  • *rea:*Data Platform
  • *ath:*data/blob
  • *ind:*Distributed blob storage substrate (RFC-001)

Role in the stack

kdb-blob é o substrato de armazenamento de objetos para tudo que produz bytes grandes na Stack: uploads, segmentos de mídia transcodificada, backups, snapshots, modelos de ML. Splits content into 4 MB chunks addressed by BLAKE3256, supports range reads, multipart uploads, replication, erasure coding, lifecycle tiers, and an S3compatible gateway.

Deployed as kdb-blob-server (axum, default 0.0.0.0:7400); RUST_LOG controls tracing; Prometheus /metrics available.

Substrate

  • *ackend* FsBlobStore (filesystemrooted, contentaddressed). v0.2 swaps to networked backends via the same trait surface.
  • *eplication (BLOB)* pick_placement chooses N nodes per chunk; production worker (BLOB) drives the actual fan-out.
  • *rasure coding (BLOB)* ReedSolomon 6+3 for cold tier — 2× space efficiency vs 3× replication while tolerating 3 node losses.
  • *ifecycle (BLOB)* hot → warm → cold → archive transitions per access pattern. Daemon scheduler queued in BLOB.
  • *C (BLOB)* refcountbased, with grace period before reclaim.
  • *crub (BLOB-)* daemon walks chunks, verifies BLAKE3, repairs from peers when corruption detected.

Key features

  • [x] Singleshot PUT / GET / HEAD / DELETE (BLOB#002#003).
  • [x] Multipart resumable upload (BLOB-).
  • [x] Replication topology (BLOB) + erasure coding (BLOB).
  • [x] Lifecycle tiers (BLOB) + GC (BLOB) + scrub/repair (BLOB-).
  • [x] S3compatible gateway — ListObjectsV2, SigV4 parsing (BLOB).
  • [x] FS backend (BLOB), HTTP server (BLOB), Prometheus + tracing (BLOB-/#014).
  • [x] GC + scrub daemons (BLOB-).
  • [x] Bench harness (BLOB-).
  • [x] koder_foundation_util::RateLimiter integration (BLOB-).
  • [x] BlobEventSinkPutAccepted / GetServed / ChunkScrubbed / GcSwept / ReplicationLag (BLOB-).
  • [x] #[instrument] parity on PUTGETHEAD/DELETE (BLOB-).
  • [x] Tenant isolation auth — Bearer token → allowed_tenants (BLOB-).
  • [x] Pertenant rate limit on PUT (BLOB).
  • [x] Blob size cap (100 GB) + ContentLength validation (BLOB).
  • [x] kdb_blob_put_seconds / get_seconds persizebucket histograms (BLOB-).

SLO targets (RFC §SLOs)

  • p99 PUT (small blob ≤ 1 MB) ≤ *0 ms*
  • p99 GET (cached) ≤ *0 ms*
  • Durability: *1 9s*(replication + erasure).
  • Availability: *9.99%*(multi-region replication).

Interfaces

  • HTTP (Koder native): PUT /v1/{tenant}/blobs/{id}, GET, HEAD, DELETE, /v1/{tenant}/blobs/{id}/head, /healthz, /metrics.
  • HTTP (S3compatible): s3compat::list_objects_v2, SigV4 parsing scaffolding (verification queued BLOB).
  • Library: kdb_blob::{FsBlobStore, MultipartManager, lifecycle_evaluate, GcLedger, ScrubStats, HttpState, TenantAuth, BlobEventSink}.

Open backlog

4 followups in infra/data/blob/backlog/pending/: pertenant storage quota (BLOB), SigV4 signature verification (BLOB), lifecycle daemon scheduler (BLOB), replication worker fanout (BLOB-).

Source: ../home/koder/dev/koder/meta/docs/stack/modules/data-blob.md