tuning-incremental-sync-configद्वारा posthog

A sync's configuration lives on the ExternalDataSchema and can be changed any time via external-data-schemas-partial-update . Most changes are non-destructive (take effect on the next sync), but a few (switching sync_type, changing primary keys) require careful handling to avoid corrupting the synced data.

npx skills add https://github.com/posthog/ai-plugin --skill tuning-incremental-sync-config

Tuning incremental sync config

A sync's configuration lives on the ExternalDataSchema and can be changed any time via external-data-schemas-partial-update. Most changes are non-destructive (take effect on the next sync), but a few (switching sync_type, changing primary keys) require careful handling to avoid corrupting the synced data.

When to use this skill

  • The user wants to change how an already-connected table is synced
  • A diagnosis flagged the incremental field or primary key as wrong
  • The table is syncing too often / not often enough
  • Switching an incremental table to CDC (or vice versa)
  • The source table was changed on the other side (new columns, dropped columns) and the sync config needs to catch up

If the user is setting up a brand-new source, use setting-up-a-data-warehouse-source instead — configuration is chosen at creation time there.

Available tools

ToolPurpose
external-data-schemas-retrieveCurrent sync_type, incremental_field, PKs, sync_frequency
external-data-schemas-incremental-fields-createRefresh candidate incremental fields from the live source
external-data-schemas-partial-updateApply the config change
external-data-schemas-reloadTrigger a sync with the new config
external-data-schemas-resyncWipe and re-import from scratch when the change invalidates existing data
external-data-schemas-delete-dataDrop the synced table while keeping the schema entry
external-data-sources-check-cdc-prerequisites-createPre-flight Postgres CDC (only when switching to/from CDC)
external-data-sources-webhook-info-retrieveCurrent webhook state (when switching to/from sync_type=webhook)
external-data-sources-create-webhook-createRegister a webhook after switching a schema to sync_type=webhook
external-data-sources-update-webhook-inputs-createRotate a webhook signing secret
external-data-sources-delete-webhook-createUnregister webhook when switching schemas off sync_type=webhook

The fields you can tune

From the partial-update endpoint:

FieldValuesNotes
sync_typefull_refresh, incremental, append, cdc, webhookSource must support the target type — check via incremental-fields
incremental_fieldColumn name from the sourceMust appear in incremental_fields list for the schema
incremental_field_typedatetime, date, timestamp, integer, numeric, objectidMust match the column's real type
primary_key_columnsArray of column namesRequired for CDC. Used for upsert dedup on incremental
cdc_table_modeconsolidated, cdc_only, bothOnly meaningful when sync_type=cdc
sync_frequency1min, 5min, 15min, 30min, 1hour, 6hour, 12hour, 24hour, 7day, 30day, neverApplies to all non-CDC types
sync_time_of_dayHH:MM:SSWhen sync_frequency is daily/weekly-scale
should_synctrue / falsePause the schema without deleting it

Workflow

Step 1 — Read the current config

Always start with external-data-schemas-retrieve({id}). Understanding the current state prevents mistakes like "fixing" an incremental_field that's actually correct.

Note:

  • Current sync_type, incremental_field, incremental_field_type, primary_key_columns
  • Current status (don't tune a schema that's currently Running — wait or cancel first)
  • last_synced_at (so you can tell if the next sync worked)
  • latest_error if present (the error often tells you exactly what to change)

Step 2 — If changing sync_type or incremental_field, refresh candidates

Call external-data-schemas-incremental-fields-create({id}). Even though the operation name says "create", it re-reads the source and returns the current candidate fields — use it to confirm the field you want to set actually exists on the source and which sync types are now available for this table.

The response:

{
  "incremental_fields": [{"field": "updated_at", "type": "datetime", ...}, ...],
  "incremental_available": true,
  "append_available": true,
  "cdc_available": true,
  "full_refresh_available": true,
  "detected_primary_keys": ["id"],
  "available_columns": [...]
}

If your target incremental_field isn't in the list, tell the user — they need to either pick a different field or change the source table to add one.

Step 3 — Apply the change

Call external-data-schemas-partial-update({id}, {...changed fields}).

Only send the fields that are actually changing. Partial update means unspecified fields stay as they are.

Examples:

// Switch from full_refresh to incremental
{
  "sync_type": "incremental",
  "incremental_field": "updated_at",
  "incremental_field_type": "datetime"
}

// Change sync frequency to hourly
{"sync_frequency": "1hour"}

// Fix wrong PK on a CDC table
{"primary_key_columns": ["tenant_id", "order_id"]}

// Pause a schema
{"should_sync": false}

Step 4 — Decide whether existing data is still valid

This is the step that's easy to get wrong. Some config changes invalidate the synced data; others don't.

Changes that DON'T invalidate existing data:

  • sync_frequency, sync_time_of_day — scheduling only
  • should_sync — on/off
  • cdc_table_mode in most cases — next sync will start writing to the new shape, but historical consolidated rows stay valid
  • Switching between incremental and full_refresh with the same incremental_field — next sync just re-runs fresh
  • Switching to or from sync_type: "webhook" — the synced data stays valid; only the ingestion path changes. Remember to register or unregister the webhook (see sections below) alongside the sync_type change.

Changes that MAY invalidate existing data and need a resync:

  • Changing incremental_field to a different column — the high-water mark is from the old column and won't match. Without a resync you'll miss rows that were updated between the two fields' histories.
  • Changing primary_key_columns — existing rows may be deduplicated incorrectly against new PK definitions.
  • Switching from full_refresh to append — the existing rows don't have the version-history shape that append expects.
  • Switching from append to full_refresh — opposite problem; you'll end up with duplicate historical versions.
  • Switching to/from cdc — the table shape changes fundamentally.

When the change invalidates data, the clean flow is:

  1. external-data-schemas-partial-update with the new config
  2. Warn the user this is destructive
  3. external-data-schemas-resync to wipe and re-import under the new config

Or equivalently, external-data-schemas-delete-dataexternal-data-schemas-reload. delete-data + reload is cleaner when the table is large and the user wants to start from zero.

Step 5 — Trigger and confirm

For non-destructive changes, call external-data-schemas-reload({id}) to pick up the new config immediately rather than waiting for the schedule.

Wait a moment, then external-data-schemas-retrieve({id}) to confirm status = Running then Completed. Report last_synced_at and any new latest_error.

Specific common changes

Switching full_refresh → incremental

  1. incremental-fields-create to confirm the desired field exists and incremental_available: true.
  2. partial-update: {sync_type: "incremental", incremental_field, incremental_field_type}.
  3. No data wipe needed — next sync just switches strategy. If the source is growing fast, the next incremental sync is the cheap one.

Switching incremental → cdc (Postgres only)

  1. Run external-data-sources-check-cdc-prerequisites-create on the parent source. Only proceed if valid: true.
  2. incremental-fields-create to confirm cdc_available: true and see detected_primary_keys.
  3. partial-update: {sync_type: "cdc", primary_key_columns: [...], cdc_table_mode: "consolidated"}.
  4. Resync required — CDC tables have a different shape. Trigger external-data-schemas-resync after the update. Warn the user this wipes existing data.

Fixing a stale incremental field after schema drift

Source dropped the updated_at column. Sync has been failing with "column does not exist".

  1. incremental-fields-create to see what fields remain.
  2. Pick a replacement (or switch to full_refresh if none are suitable).
  3. partial-update with the new field + type (or new sync_type).
  4. reload to retry.

Changing primary keys on a CDC table

  1. partial-update: {primary_key_columns: [...]}.
  2. Resync required — existing CDC tombstones and upsert keys won't match the new PK definition, leading to row duplication or missed updates.
  3. resync, warn the user.

Changing sync_frequency

  1. partial-update: {sync_frequency: "1hour"}.
  2. No reload needed — the next scheduled sync picks up the new cadence. Or reload manually if the user wants to confirm nothing broke.

Switching a schema to sync_type: "webhook"

Only works for sources that implement WebhookSource (today: Stripe) and tables where supports_webhooks: true from incremental-fields-create.

  1. incremental-fields-create to confirm supports_webhooks: true for the table.
  2. partial-update: {sync_type: "webhook"}.
  3. If the source doesn't already have a webhook registered (check with webhook-info-retrieve), call external-data-sources-create-webhook-create({source_id}) to register it.
  4. No resync required — the schema's existing bulk-synced data stays, and the webhook becomes the primary ingestion path once the next reconciliation finishes.
  5. Keep sync_frequency set (e.g. 24hour) — it acts as a safety-net reconciliation in case any webhook delivery is missed.

Switching off sync_type: "webhook"

  1. partial-update: {sync_type: "incremental"} (or whatever bulk type is appropriate) with the required incremental_field + incremental_field_type.
  2. If no other schemas on the source are still using sync_type: "webhook", call external-data-sources-delete-webhook-create({source_id}) to unregister. Leaving an orphaned webhook registered on the source side just means events will be received and dropped — not harmful, but messy.
  3. If other schemas on the source are still on webhook, leave the webhook registered — it's shared across all webhook-type schemas on the source.

Rotating a webhook signing secret

The source's signing secret (e.g. Stripe's whsec_...) was rotated, and payloads are now failing signature verification.

  1. Grab the new secret from the source's dashboard.
  2. external-data-sources-update-webhook-inputs-create({source_id}, {inputs: {signing_secret: "whsec_..."}}).
  3. No reload needed — the next inbound webhook payload will verify against the new secret.

Pausing a schema

  1. partial-update: {should_sync: false}. Schema stops syncing but stays configured.
  2. To resume later: partial-update: {should_sync: true}, then reload for an immediate run.

Important notes

  • Read before you write. Always retrieve the current config first. partial-update doesn't complain if you set a field to the value it already had, but you might be about to change something you didn't realize was already set.
  • Not every sync_type is available on every schema. The incremental-fields-create response tells you what's available right now, which can be different from what was available at creation (e.g. CDC may have been enabled for the team since).
  • Wipe when the shape changes. Switching sync strategy often changes the physical table. If you don't resync, you'll be mixing row shapes and queries will return garbage.
  • CDC needs prerequisites. Never switch to sync_type: "cdc" without running check-cdc-prerequisites-create first. The sync will just fail immediately.
  • Don't touch a Running schema. If the schema is currently running, either wait for it to finish or external-data-schemas-cancel before applying the change. Updating config mid-sync can leave the incremental high-water mark inconsistent.
  • Sync frequency is cheap to change. Encourage experimentation there. Sync_type and incremental_field are expensive to change — encourage care.
  • Webhooks are registered at the source level, not the schema level. Multiple webhook-type schemas on the same source share one webhook registration. Only delete the webhook when the last webhook-type schema on that source is being switched away, otherwise other schemas stop receiving pushes.

NotebookLM Web Importer

एक क्लिक में वेब पेज और YouTube वीडियो NotebookLM में आयात करें। 200,000+ उपयोगकर्ताओं द्वारा विश्वसनीय।

Chrome एक्सटेंशन इंस्टॉल करें