neon-postgres-egress-optimizer

작성자: neondatabase

사용자가 Postgres 데이터베이스에서 과도한 데이터 전송(이그레스)을 유발하는 애플리케이션 측 쿼리 패턴을 진단하고 수정하도록 안내합니다. 대부분의 높은 이그레스 비용은 애플리케이션이 실제 사용하는 데이터보다 더 많은 데이터를 가져오기 때문에 발생합니다.

npx skills add https://github.com/neondatabase/agent-skills --skill neon-postgres-egress-optimizer

Postgres Egress Optimizer

Guide the user through diagnosing and fixing application-side query patterns that cause excessive data transfer (egress) from their Postgres database. Most high egress bills come from the application fetching more data than it uses.

Step 1: Diagnose

Identify which queries transfer the most data. The primary tool is the pg_stat_statements extension.

Check if pg_stat_statements is available

SELECT 1 FROM pg_stat_statements LIMIT 1;

If this errors, the extension needs to be created:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

On Neon, it is available by default but may need this CREATE EXTENSION step.

Handle empty stats

Stats are cleared when a Neon compute scales to zero and restarts. If the stats are empty or the compute recently woke up:

  1. Reset the stats to start a clean measurement window: SELECT pg_stat_statements_reset();
  2. Let the application run under representative traffic for at least an hour.
  3. Return and run the diagnostic queries below.

If the user has stats from a production database, use those. If they have no access to production stats, proceed to Step 2 and analyze the codebase directly — code-level patterns are often sufficient to identify the worst offenders.

Diagnostic queries

Run these to identify the top egress contributors. Focus on queries that return many rows, return wide rows (JSONB, TEXT, BYTEA columns), or are called very frequently.

Queries returning the most total rows:

SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;

Queries returning the most rows per execution (poorly scoped SELECTs, missing pagination):

SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY avg_rows_per_call DESC
LIMIT 10;

Most frequently called queries (candidates for caching):

SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY calls DESC
LIMIT 10;

Longest running queries (not a direct egress measure, but helps identify problem queries during a spike):

SELECT query, calls, rows AS total_rows,
  round(total_exec_time::numeric, 2) AS total_exec_time_ms
FROM pg_stat_statements
WHERE calls > 0
ORDER BY total_exec_time DESC
LIMIT 10;

Interpret the results

Rank findings by estimated egress impact:

  • High row count + wide rows = biggest egress. A query returning 1,000 rows where each row includes a 50KB JSONB column transfers ~50MB per call.
  • Extreme call frequency on even small queries adds up. A query called 50,000 times/day returning 10 rows each = 500,000 rows/day.
  • Cross-reference with the schema to identify which columns are wide. Look for JSONB, TEXT, BYTEA, and large VARCHAR columns.

Step 2: Analyze codebase

For each query identified in Step 1, or for each database query in the codebase if no stats are available, check:

  • Does it select only the columns the response needs?
  • Does it return a bounded number of rows (LIMIT/pagination)?
  • Is it called frequently enough to benefit from caching?
  • Does it fetch raw data that gets aggregated in application code?
  • Does it use a JOIN that duplicates parent data across child rows?

Step 3: Fix

Apply the appropriate fix for each problem found. Below are the most common egress anti-patterns and how to fix them.

Unused columns (SELECT *)

Problem: The query fetches all columns but the application only uses a few. Large columns (JSONB blobs, TEXT fields) get transferred over the wire and discarded.

Before:

SELECT * FROM products;

After:

SELECT id, name, price, image_urls FROM products;

Missing pagination

Problem: A list endpoint returns all rows with no LIMIT. This is an unbounded egress risk — every new row in the table increases data transfer on every request. Flag this regardless of current table size.

This is easy to miss because the application may work fine with small datasets. But at scale, an unpaginated endpoint returning 10,000 rows with even moderate column widths can transfer hundreds of megabytes per day.

Before:

SELECT id, name, price FROM products;

After:

SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;

When adding pagination, check whether the consuming client already supports paginated responses. If not, pick sensible defaults and document the pagination parameters in the API.

High-frequency queries on static data

Problem: A query is called thousands of times per day but returns data that rarely changes. Every call transfers the same rows from the database. This pattern is only visible from pg_stat_statements — the code itself looks normal.

Look for queries with extremely high call counts relative to other queries. Common examples: configuration tables, category lists, feature flags, user role definitions.

Fix: Add a caching layer between the application and the database so it avoids hitting the database on every request.

Application-side aggregation

Problem: The application fetches all rows from a table and then computes aggregates (averages, counts, sums, groupings) in application code. The full dataset transfers over the wire even though the result is a small summary.

Fix: Push the aggregation into SQL.

Before: The application fetches entire tables and aggregates in code with loops or .reduce().

After:

SELECT p.category_id,
       AVG(r.rating) AS avg_rating,
       COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;

JOIN duplication

Problem: A JOIN between a wide parent table and a child table duplicates all parent columns across every child row. If a product has 200 reviews and the product row includes a 50KB JSONB column, the join sends that 50KB × 200 = ~10MB for a single request.

This is distinct from the SELECT * problem. Even if you select only needed columns, a JOIN still repeats the parent data for every child row. The fix is structural: avoid the join entirely.

Before:

SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;

After (two separate queries):

SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, user_name, rating, body FROM reviews WHERE product_id = 1;

Two queries instead of one JOIN. The product data is fetched once. The reviews are fetched once. No duplication.

Step 4: Verify

After applying fixes:

  1. Run existing tests to confirm nothing broke.
  2. Check the responses — make sure the API still returns the same data shape. Column selection and pagination changes can break clients that depend on specific fields or full result sets.
  3. Measure the improvement — if pg_stat_statements data is available, reset it (SELECT pg_stat_statements_reset();), let traffic run, then re-run the diagnostic queries to compare before and after.

Neon Infrastructure as Code (neon.ts)

The fixes above cut egress (data transferred out of Postgres). The other big non-prod cost lever is compute, and you can codify it durably in neon.ts — Neon's infrastructure-as-code file (see the neon skill for the full reference) — so dev, preview, and CI branches stay cheap by default instead of relying on per-branch flags:

npm i @neon/config
// neon.ts
import { defineConfig } from "@neon/config/v1";

export default defineConfig({
  branch: (branch) => {
    if (branch.exists || branch.isDefault) return {}; // don't touch prod
    return {
      ttl: "7d", // ephemeral branches auto-expire instead of accruing storage
      postgres: {
        computeSettings: {
          autoscalingLimitMinCu: 0.25, // scale to zero when idle
          autoscalingLimitMaxCu: 1, // cap autoscaling on throwaway branches
          suspendTimeout: "5m",
        },
      },
    };
  },
});
neon config apply   # apply to the current branch (neon deploy is an alias)

This is complementary, not a substitute: query-pattern fixes are what actually reduce egress charges, while these settings keep non-production compute and storage from quietly inflating the same bill. Because neon checkout applies the policy when it creates a branch, new dev/preview branches inherit the cheap profile automatically.

Further reading

neondatabase의 다른 스킬

claimable-postgres
neondatabase
로컬 개발, 데모, 프로토타이핑 및 테스트 환경을 위한 즉시 사용 가능한 Postgres 데이터베이스입니다. 계정이 필요하지 않습니다. 데이터베이스는 Neon 계정에 클레임되지 않으면 72시간 후에 만료됩니다.
official
neon-postgres-branches
neondatabase
이 스킬의 결과는 생성된 Neon 브랜치(또는 생성이 진행될 수 없는 경우 명확하고 실행 가능한 다음 단계)여야 합니다. 올바른 브랜치 유형을 선택한 다음 MCP 또는 CLI를 통해 브랜치 생성을 실행합니다.
official
plugin-manager
neondatabase
이 저장소의 Cursor와 Claude Code 전반에 걸쳐 플러그인 구조와 구성을 관리합니다. 플러그인 폴더를 생성, 업데이트 또는 검토할 때 사용하세요…
official
skill-creator
neondatabase
효과적인 스킬을 생성하기 위한 가이드입니다. 이 스킬은 사용자가 Claude의 기능을 확장하는 새로운 스킬을 만들거나 기존 스킬을 업데이트하려 할 때 사용해야 합니다.
official
add-neon-docs
neondatabase
사용자가 Neon에 대한 문서 추가, 문서 추가, 참조 추가, 또는 문서 설치를 요청할 때 이 스킬을 사용하세요. Neon 모범 사례 참조 링크를 추가합니다…
official
neon-auth
neondatabase
애플리케이션에 Neon Auth를 설정합니다. 인증을 구성하고, 인증 라우트를 생성하며, UI 컴포넌트를 생성합니다. Next.js에 인증을 추가할 때 사용합니다.
official
neon-drizzle
neondatabase
완전한 기능을 갖춘 Drizzle ORM 설정을 프로비저닝된 Neon 데이터베이스와 함께 생성합니다. 종속성을 설치하고, 데이터베이스 자격 증명을 프로비저닝하며, 연결을 구성하고,…
official
neon-js
neondatabase
완전한 Neon JS SDK를 설정하여 통합 인증 및 PostgREST 스타일 데이터베이스 쿼리를 제공합니다. 인증 클라이언트, 데이터 클라이언트 및 타입 생성을 구성합니다. 다음 경우에 사용하세요…
official