Pg dump v17

active

pg-dump v17 — Wire Compatibility Spec

Normative spec for what pg_dump --version 17 expects from a Postgres-wire server. Used as the contract that infra/data/kdb (kdbpgwire / kdbgateway) must satisfy to support pg_dump roundtrips, and as the sourceof-truth for the per-table coverage work tracked in infra/data/kdb#341.

1. Provenance

The catalogue of queries below was captured live from PostgreSQL 17.9 (Debian 17.9-0+deb13u1) using infra/data/kdb/scripts/pgdump-v17-capture.sh.

The script spins a throw-away PG 17 cluster, applies a varied schema (5 tables, 4 indexes including partial, 2 explicit sequences, FK + CHECK constraints, view, materialized view, enum type), and runs pg_dump under log_statement = all so the server records the exact stream of statements pg_dump issued.

Captures are checked in under meta/docs/stack/specs/postgres-compat/captures/pg-dump-v17-queries-<flag>.sql for the following flag combinations:

File suffix Flags
default (no flags)
schema-only --schema-only
data-only --data-only
no-owner-no-acl --no-owner --no-acl
table-users --table=users

Recapture by running the script — output is bytestable across runs of the same PG 17 minor (pg_dump does not emit non-deterministic queries).

2. Session prelude (every invocation)

The first ~13 statements in every pg_dump run are session-shape GUCs. Every server claiming pg_dump-compat MUST accept these without error (returning SET / SELECT is sufficient; the values can be ignored if the server has no equivalent knob):

SELECT pg_catalog.set_config('search_path', '', false);
SELECT pg_catalog.pg_is_in_recovery();
SELECT pg_catalog.set_config('search_path', '', false);
SET DATESTYLE = ISO;
SET INTERVALSTYLE = POSTGRES;
SET extra_float_digits TO 3;
SET synchronize_seqscans TO off;
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET transaction_timeout = 0;
SET row_security = off;
SELECT set_config(name, 'view, foreign-table', false)
  FROM pg_settings
  WHERE name = 'restrict_nonsystem_relation_kind';
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ ONLY;

The final two open a readonly repeatableread transaction that wraps the entire dump; servers without MVCC snapshots can accept these as no-ops provided they expose a stable read for the duration of the dump.

3. Catalogue queries — required surface

pg_dump v17 issues queries against the following pg_catalog relations. Servers MUST respond with the row shape PostgreSQL 17 produces; missing columns cause pg_dump to bail with a parser error in its PQgetvalue(... , <col>) calls. Functions in *old*are outofband helpers (not relations) — they MUST exist and return the documented type.

3.1 Relations (must be queryable, even if empty)

Relation Min. columns pg_dump v17 reads Notes
pg_roles oid, rolname Empty result OK if --no-owner
pg_extension tableoid, oid, extname, extrelocatable, extversion, extconfig, extcondition Empty result OK; pg_dump iterates the list
pg_depend classid, objid, refobjid (+ refclassid filter, deptype filter) Drives extension-membership detection
pg_namespace tableoid, oid, nspname, nspowner, nspacl (+ acldefault('n', nspowner)) At minimum public and pg_catalog
pg_class tableoid, oid, relname, relnamespace, relkind, reltype, relowner, relchecks, relhasindex, relhasrules, relpages, relhastriggers, relpersistence, reloftype, relacl, relfrozenxid, relreplident, relrowsecurity, relforcerowsecurity, relminmxid, reloptions, relispopulated, relispartition Drives the entire object stream — see §4 below
pg_proc tableoid, oid, proname, prolang, pronargs, proargtypes, prorettype, proacl, pronamespace, proowner, prokind Filter prokind <> 'a' (non-aggregates)
pg_type tableoid, oid, typname, typnamespace, typacl, typowner, typelem, typrelid, typtype, typisdefined Used to resolve column types
pg_language tableoid, oid, lanname, lanpltrusted, lanplcallfoid, laninline, lanvalidator, lanacl, lanowner Filter WHERE lanispl
pg_attribute attname, atttypmod, attstorage, attcompression, attmissingval, attstattarget, attndims, attlen, attbyval, attalign, attnotnull, atthasdef, attidentity, attgenerated, attisdropped, attislocal, attinhcount *-CRITICAL*— column shape drives CREATE TABLE reconstruction
pg_constraint oid, conname, contype, conrelid, conindid, connamespace, confupdtype, confdeltype, confmatchtype, condeferrable, condeferred, convalidated, conkey, confkey, confrelid CHECK / FK / UNIQUE reconstruction
pg_index indrelid, indkey, indcollation, indclass, indoption, indpred, indexprs, indisclustered, indisreplident, indisvalid, indisready, indislive Drives CREATE INDEX reconstruction
pg_inherits inhrelid, inhparent, inhseqno Partition / inheritance tree
pg_description objoid, classoid, objsubid, description COMMENT ON … reconstruction
pg_publication oid, pubname, pubowner, puballtables, pubinsert, pubupdate, pubdelete, pubtruncate, pubviaroot Empty result OK
pg_settings name, setting set_config(...) FROM pg_settings WHERE name = '…'
pg_foreign_table ftrelid, ftserver Empty result OK if no FDWs

3.2 Relations newly required by PG 17 (gaps vs PG 15/16)

These either did not exist or had different shape in earlier PG and are likely to be missing in a server claiming PG 13/14 compatibility:

Relation Why pg_dump v17 needs it
pg_cast Resolves implicit-cast paths during type reconstruction
pg_collation Percolumn / perindex collation names
pg_conversion Server-encoding conversions
pg_default_acl Default privileges from ALTER DEFAULT PRIVILEGES …
pg_event_trigger Event triggers (DDL hooks) — empty result usually OK
pg_init_privs Captures the "original ACL" assigned by an extension
pg_opclass Index operator class lookup
pg_operator Operator definitions (mostly transitive via opfamily)
pg_opfamily Index operator family
pg_publication_namespace Publicationbyschema (PG 15+)
pg_range Range type bounds
pg_rewrite View / materialized-view definitions (joined with pg_get_viewdef)
pg_seclabels Security labels (SECURITY LABEL …)
pg_sequence Sequence parameters (PG 10+ split-out of pg_class.relkind = 'S')
pg_statistic_ext Extended statistics objects
pg_tablespace Non-default tablespaces — usually empty in cloud setups
pg_transform Transform mappings (e.g. plpython hstore)
pg_ts_config / pg_ts_dict / pg_ts_parser / pg_ts_template Fulltextsearch assets

3.3 Functions

Function Return When pg_dump v17 calls it
pg_is_in_recovery() bool Session prelude — almost certainly returns false
pg_get_viewdef(oid) text View / materialized-view body
pg_get_indexdef(oid) text CREATE INDEX … reconstruction
pg_get_constraintdef(oid) text CHECK / FK / UNIQUE body
pg_get_triggerdef(oid) text Trigger reconstruction
pg_get_expr(node, relid) text Defaultvalue / partialindex predicate body
pg_get_serial_sequence(table, col) text Reidrata SERIAL / IDENTITY columns
pg_options_to_table(text[]) setof FDW server OPTIONS decoding
acldefault(char, oid) aclitem[] Default-ACL synthesis (called inline in §3.1 queries)
array_remove(any[], any) any[] Used inline to drop check_option=… from reloptions

3.4 PG 17 row-shape additions

The big v15→v17 movement in pg_attribute matters:

  • attstorage is "char" in v17 (was oid historically; some legacy

    shims still return oid — pg_dump v17 will reject those rows).

  • attcompression exists from v14+ ("char", default 'p' for pglz).
  • attmissingval is anyarray (often emitted as text).
  • attstattarget is int2 (became nullable in v17 — NULL allowed).
  • attgenerated is "char" ('s' for STORED, '' for none).
  • attidentity is "char" ('a' for ALWAYS, 'd' for BY DEFAULT, '' for none).

pg_class.relrowsecurity and relforcerowsecurity are bool (RLS-related). pg_class.relispartition is bool (PG 10+); MUST exist even on non-partitioned tables (default false).

4. Object-fetch loop (the heart of pg_dump)

After the session prelude, pgdump runs *ne large `pgclass` query*to materialize the entire tableofobjects:

SELECT c.tableoid, c.oid, c.relname, c.relnamespace, c.relkind, c.reltype,
       c.relowner, c.relchecks, c.relhasindex, c.relhasrules, c.relpages,
       c.relhastriggers, c.relpersistence, c.reloftype, c.relacl,
       acldefault(CASE WHEN c.relkind = 'S' THEN 's'::"char" ELSE 'r'::"char" END,
                  c.relowner) AS acldefault,
       CASE WHEN c.relkind = 'f'
            THEN (SELECT ftserver FROM pg_catalog.pg_foreign_table WHERE ftrelid = c.oid)
            ELSE 0 END AS foreignserver,
       c.relfrozenxid, tc.relfrozenxid AS tfrozenxid, tc.oid AS toid,
       tc.relpages AS toastpages, tc.reloptions AS toast_reloptions,
       d.refobjid AS owning_tab, d.refobjsubid AS owning_col,
       tsp.spcname AS reltablespace, false AS relhasoids,
       c.relispopulated, c.relreplident, c.relrowsecurity,
       c.relforcerowsecurity, c.relminmxid, tc.relminmxid AS tminmxid,
       array_remove(array_remove(c.reloptions, 'check_option=local'),
                                  'check_option=cascaded') AS reloptions,
       CASE WHEN 'check_option=local' = ANY (c.reloptions) THEN 'LOCAL'::text
            WHEN 'check_option=cascaded' = ANY (c.reloptions) THEN 'CASCADED'::text
            ELSE NULL END AS checkoption,
       am.amname, (d.deptype = 'i') IS TRUE AS is_identity_sequence,
       c.relispartition AS ispartition
FROM pg_class c …

(Truncated — see captures/pg-dump-v17-queries-schema-only.sql for the full text.) The relkind values returned drive the rest of the dump:

relkind Followed by
'r' pg_attributepg_constraintpg_indexLOCK TABLE … ACCESS SHARECOPY … TO STDOUT (data-only)
'v' pg_get_viewdef(oid)
'm' pg_get_viewdef(oid) (materialized) + REFRESH MATERIALIZED VIEW directive
'S' pg_sequence row + setval('…', last_value, is_called)
'i' Index — usually emitted by following pg_index
'p' Partitioned table — followed by pg_inherits to enumerate children
'c' Composite type — pg_dump may skip if the type is system-managed
'f' Foreign table — pulls the server name from the foreignserver correlated subquery

The LOCK TABLE … IN ACCESS SHARE MODE is non-negotiable: servers MUST either implement table locks or accept the syntax as a no-op (returning LOCK TABLE). Without it, pg_dump v17 aborts the dump.

5. Data path (--data-only)

For each non-system table whose relkind = 'r', pg_dump issues:

COPY public.users (id, handle, email, score, created_at,
                   tags, metadata, avatar, status) TO STDOUT;

The column list is materialized from the pg_attribute query above. The server MUST emit the response on the wire as CopyOutResponse → CopyData* → CopyDone → CommandComplete("COPY n"). Text format is the default; binary is not used by pg_dump v17.

Implementation in kdbpgwire: ticket [infradatakdb#340](......infradatakdbbacklogdone340pgwirecopyto-stdout.kmd).

6. Outofscope for v17 compat

  • *oreignkey reconstruction across schemas with `-schema=…`*—

    pg_dump v17 chases FKs to outside-schema parents; the audit script pins to public so the captures do not exercise this corner.

  • *erverside COMMENT ON LARGE OBJECT*— largeobject dump uses a

    separate code path (--blobs / --no-blobs) not exercised by the audit. Tracked separately if/when lobs become a goal.

  • *arallel pg_dump (-j N)*— parallel restore uses extra COPY

    sessions but the per-session query stream is the same.

  • *pg_dump -nserts** — uses INSERT INTO … VALUES (…)` instead of

    COPY. Audit captures focus on the default COPY path.

7. Versioning

Captures are pinned to PG 17.9. Re-run the capture script when:

  1. PostgreSQL 18 ships and pg_dump v18 diverges (verify with a fresh

    capture against PG 18 in /usr/lib/postgresql/18/bin/).

  2. A user reports a pg_dump failure on kdb-gateway that the diff report

    does not cover.

See kdb-gateway-v0.0.13-vs-pg17-pgdump.md for the perquery coverage delta against the current kdbpgwire surface.

8. Conformance status (20260511)

The 8 originallyidentified RCRITICAL gaps have been closed across a multiday series of fixes culminating in the FK roundtrip milestone. Latest harness state (scripts/pg-dump-roundtrip.sh):

Check State
pg_dump --schema-only exit code **✓
pg_dump --data-only exit code **✓
psql -f kdb-schema.sql (PG 17) *c=0*✓
psql -f kdb-data.sql (PG 17) *c=0*✓
Rowcount roundtrip *00% match*✓
FK guard (orphan INSERT rejected) *locked*(enforced) ✓
Schema diff lines *5*(was 77 originally)

Closed work-items

  • #341 umbrella — pg_catalog coverage for the 8 R-CRITICAL gaps
  • #342 — round-trip test harness
  • #343 — extended-query set_config emission
  • #344 (16 waves) — universal tableoid + per-table column gaps
  • #345 — planner IS TRUE / IS FALSE
  • #346 — subquery catalog inheritance
  • #347IS DISTINCT FROM
  • #348 — text [N] subscript
  • #349ARRAY (SELECT …) constructor
  • #350Text → Array<T> cast
  • #351pg_catalog. prefix-strip for aggregates
  • #353 — FK round-trip umbrella (PK ALTER, FK ALTER, conindid linkage)
  • #357 — regclass cast → oid
  • #360 — FK capture under tenant config
  • #361 — synthetic pgindex + pgclass for PK/UNIQUE

Open polish items (post-milestone)

  • #352 — schema-diff cosmetic cleanup (35 lines remaining; \restrict

    random tokens unavoidable; public SCHEMA placeholder block deferred)

  • #359 — public SCHEMA block suppression (requires pg_dump v17 source

    vendoring to map the suppression algorithm)

Regression coverage

infra/data/kdb/crates/kdb-pgwire/tests/pgwire_fk_roundtrip.rs guards the milestone's three load-bearing wirings (see registry entries 518-520).

Source: ../home/koder/dev/koder/meta/docs/stack/specs/postgres-compat/pg-dump-v17.kmd