Comparing distributed SQL, NoSQL, and NewSQL solutions for modern high-concurrency applications.
The Database is Never Just a Detail
In the early days of a product, the database is often an afterthought — spin up a Postgres instance, define a schema, and move on. This pragmatism is correct for the early stage. But the database choice made at scale-up — when you are processing millions of transactions per day and the data model has calcified around years of production usage — is one of the most consequential technical decisions an engineering organisation will make. Getting it wrong is extremely expensive to fix.
Relational Databases: Still the Default for Good Reason
PostgreSQL remains the correct default for the vast majority of applications. It offers ACID transactions, a powerful and mature query planner, excellent support for JSONB (semi-structured data) via the jsonb column type, an extensive extension ecosystem (TimescaleDB for time-series, PostGIS for geospatial), and world-class operational tooling.
For most workloads up to tens of millions of rows and thousands of queries per second, a well-tuned PostgreSQL instance with appropriate indexing strategy will serve you. The managed offerings — AWS RDS, GCP Cloud SQL, Azure Database for PostgreSQL — eliminate the operational burden of running PostgreSQL yourself, with automated backups, failover, and patching.
When to Go Distributed
The case for a distributed database begins when your workload exceeds what a single-node PostgreSQL instance can handle — typically in the range of 100,000+ writes per second, or when your dataset grows into the hundreds of terabytes and you need to distribute data geographically for low-latency access.
In this range, you have two primary options. Distributed SQL databases like CockroachDB, Spanner, and YugabyteDB provide the SQL interface and ACID semantics of a traditional relational database, but distribute data across nodes for horizontal scalability. They are the right choice when your application logic depends on multi-row transactions or complex joins that NoSQL architectures struggle with.
The NoSQL Landscape
NoSQL databases are not a single category — they are a family of data models, each optimised for a specific access pattern. Document stores (MongoDB, Firestore) are excellent for hierarchical, schema-flexible data accessed primarily by a single document ID. Key-value stores (Redis, DynamoDB in key-value mode) are the fastest way to cache and retrieve data by a known key. Wide-column stores (Cassandra, Bigtable) are optimised for high-throughput writes and time-series or event data.
The rule for NoSQL adoption is: fit the data model to your access patterns, not the other way around. If you are forcing your Cassandra schema to accommodate a query that Cassandra was not designed for, you are fighting the database, and you will lose.
The Multi-Model Future
The most sophisticated data architectures use multiple database engines in concert, each playing to its strengths. A transactional Postgres database as the system of record, a Redis cache for sub-millisecond reads on hot data, an Elasticsearch index for full-text search and analytics queries, and a data warehouse (Snowflake, BigQuery, or Redshift) for long-running analytical queries that should not compete for resources with the OLTP workload.
The challenge of this architecture is data consistency across stores. Change Data Capture (CDC) tools like Debezium stream row-level changes from Postgres into Kafka, from which they are consumed by downstream services to update their respective data stores. This pattern achieves eventual consistency across the multi-model architecture without requiring distributed transactions.



