返回文章列表
PostgreSQLdistributed databasesCockroachDBNeondatabase scalingarchitecture
🐘

PostgreSQL vs Distributed Databases in 2026: When to Scale Horizontally

PostgreSQL with modern extensions can handle far more than most engineers assume. Here is the honest decision framework for when horizontal scaling is actually warranted.

iBuidl Research2026-03-1013 min 阅读
TL;DR
  • PostgreSQL on modern hardware handles 100K+ TPS with proper indexing and connection pooling — most "we need to scale" decisions are premature
  • Neon's serverless PostgreSQL and Aurora PostgreSQL v3 have extended Postgres's ceiling significantly through storage disaggregation
  • Distributed SQL (CockroachDB, YugabyteDB, Spanner) is warranted for multi-region writes and regulatory data residency — almost nothing else
  • The cost of distributed database operational complexity is systematically underestimated at the architecture decision stage

Section 1 — The Premature Scaling Problem

The most expensive database mistake in 2026 is not choosing the wrong distributed database — it is choosing a distributed database when PostgreSQL would have been sufficient. The distributed SQL pitch is compelling: unlimited horizontal scale, multi-region active-active, no single point of failure. The reality is that these capabilities come with substantial operational complexity that teams consistently underestimate.

PostgreSQL on a well-tuned bare metal instance (64 cores, 512GB RAM, NVMe SSDs) handles 150,000+ transactions per second for OLTP workloads. With Citus for horizontal sharding, that extends to millions of TPS. With PgBouncer or PgCat for connection pooling, connection overhead becomes negligible. Before you reach those limits, you will encounter dozens of other bottlenecks — application logic, N+1 queries, cache invalidation — that distributed databases do not solve.

150K+
PostgreSQL max TPS (well-tuned)
bare metal, NVMe, optimized workload
2–5x
CockroachDB latency premium
vs single-region Postgres for OLTP
<500ms
Neon cold start time
serverless PostgreSQL branch activation
<5%
Companies that actually need distributed SQL
of engineering orgs by headcount

Section 2 — What Modern PostgreSQL Can Do

The PostgreSQL extension ecosystem has matured dramatically. pgvector handles vector similarity search for ML applications. TimescaleDB extends Postgres to time-series at scale. pg_partman automates table partitioning. PostGIS handles geospatial queries. Hydra turns Postgres into a columnar OLAP store.

-- PostgreSQL 17 with pgvector: semantic search over 10M documents
-- Without leaving Postgres
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding VECTOR(1536),  -- OpenAI ada-002 dimensions
  created_at TIMESTAMPTZ DEFAULT NOW(),
  metadata JSONB
);

-- HNSW index for fast approximate nearest neighbor
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Hybrid search: semantic + keyword
SELECT
  id,
  content,
  1 - (embedding <=> $1::vector) AS semantic_score,
  ts_rank(to_tsvector('english', content), query) AS keyword_score
FROM
  documents,
  plainto_tsquery('english', $2) query
WHERE
  to_tsvector('english', content) @@ query
  OR (embedding <=> $1::vector) < 0.3
ORDER BY
  (0.7 * (1 - (embedding <=> $1::vector))) +
  (0.3 * ts_rank(to_tsvector('english', content), query)) DESC
LIMIT 20;

-- This replaces Pinecone + Elasticsearch + Postgres (three services → one)

The "single database for everything" pattern, long dismissed as naive, is now legitimate architecture for many applications. PostgreSQL 17's improved parallel query execution, combined with proper read replicas, handles most OLAP workloads that teams previously shunted to Redshift or BigQuery.


Section 3 — Database Options Comparison

DatabaseScale CeilingOperational ComplexityCostBest For
PostgreSQL (bare metal)~150K TPSLow — well-understoodLowMost applications, <10TB
Neon (serverless PG)Auto-scalesVery low — managedMedium, scale-to-zeroVariable workloads, dev environments
CockroachDBUnlimited horizontalHigh — distributed systemsHighMulti-region writes, global ACID
PlanetScale (Vitess)Very high (sharded)Medium — managedMedium-highMySQL-compatible, massive scale
SpannerPlanetary scaleVery low — managedVery highGoogle ecosystem, finance/gaming

Section 4 — When Distributed SQL Is Actually Justified

Three genuine reasons to adopt distributed SQL exist, and they are more narrow than most architectural discussions acknowledge.

Multi-region active-active writes: If you have users in the US, EU, and APAC who all write data and need sub-100ms latency to their nearest region, distributed SQL is the correct answer. Primary-replica PostgreSQL cannot provide this — writes must round-trip to the primary. CockroachDB's regional-by-row table locality or Spanner's global transactions solve this. But be honest about whether your users actually require <100ms write latency across regions, or whether eventual consistency with a CDN would suffice.

Regulatory data residency with shared schema: If regulations require EU user data to never leave EU data centers but your application shares a schema with US users, distributed SQL's row-level locality is elegant. The alternative — separate databases per region with application-level routing — is viable but creates a schema synchronization problem that compounds over time.

True unbounded write scale: If you are ingesting sensor data, financial transactions, or events at rates exceeding what a single Postgres primary can handle (typically >200K inserts/sec sustained), horizontal sharding becomes necessary. Before reaching for CockroachDB, evaluate whether Kafka + a time-series database (TimescaleDB, InfluxDB) matches your access patterns better.

-- CockroachDB regional table: EU data stays in EU
CREATE TABLE users (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  region crdb_internal_region NOT NULL,
  email TEXT UNIQUE,
  data JSONB
) LOCALITY REGIONAL BY ROW;

-- Rows automatically routed to closest region
-- Reads from closest replica, writes to regional primary
-- ACID guarantees preserved across regions (with latency cost)
The Operational Cost Is Compounding

CockroachDB, YugabyteDB, and similar systems require engineers who deeply understand distributed systems — consensus protocols, clock skew, network partition behavior. These engineers are rare and expensive. The operational cost is not a one-time setup cost; it is a permanent ongoing cost in engineering complexity. Model this honestly before committing.


Section 5 — The Neon/Aurora Middle Ground

The most interesting development in the database market is the emergence of serverless PostgreSQL (Neon, Aurora Serverless v2) as a genuine middle ground. These services provide PostgreSQL compatibility, auto-scaling storage, branching for development environments, and scale-to-zero for cost efficiency — all without the operational complexity of distributed SQL.

For the majority of applications that exceed a single PostgreSQL instance's capacity (usually read-heavy applications where replicas solve the problem), Aurora PostgreSQL with read replicas is the right answer. It's expensive compared to self-hosted Postgres, but it is dramatically less expensive and operationally complex than CockroachDB.

The architecture decision tree is simple: Can a single well-tuned Postgres primary + read replicas handle your load? If yes, use it. Do you need multi-region writes or true unbounded write scale? Then evaluate distributed SQL. Does cost or operational simplicity matter more than multi-region? Consider Neon or Aurora Serverless.


Verdict

综合评分
7.5
Distributed Database Adoption Justification / 10

PostgreSQL remains the correct default for 90%+ of applications in 2026. Invest in Postgres optimization before evaluating distributed alternatives — most teams that switch to distributed SQL would have been better served by a DBA and proper indexing. When you do need horizontal scale, evaluate the managed options (Neon, Aurora, PlanetScale) before self-managed distributed SQL. The operational complexity premium of CockroachDB and YugabyteDB is real and permanent.


Data as of March 2026.

— iBuidl Research Team

更多文章