Skip to content

Commit 394bd27

Browse files
committed
feat: add article about sovereign cloud and dbaas lock-in
Assisted-by: Claude (Anthropic) Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
1 parent 5dba400 commit 394bd27

3 files changed

Lines changed: 277 additions & 0 deletions

File tree

92.2 KB
Loading
Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
---
2+
title: "Owning the pipe: physical replication, cloud neutrality, and the escape from DBaaS lock-in"
3+
date: 2026-04-14T10:32:59+10:00
4+
description: "Why the physical replication stream is the key primitive that DBaaS providers deliberately withhold — and how a cloud-neutral stack built on PostgreSQL, Kubernetes, and CloudNativePG gives it back to you."
5+
tags: ["postgresql", "postgres", "kubernetes", "k8s", "cloudnativepg", "cnpg", "dok", "data on kubernetes", "dbaas", "sovereignty", "wal", "physical-replication", "open-source", "cncf"]
6+
cover: cover.jpg
7+
thumb: thumb.jpg
8+
draft: false
9+
---
10+
11+
_This article examines how managed database services deliberately suppress
12+
access to the physical replication stream, turning operational convenience into
13+
permanent lock-in. It makes the case for a cloud-neutral stack — PostgreSQL,
14+
Kubernetes, and CloudNativePG — as the only architecture that returns full
15+
operational sovereignty to the organisation that owns the data._
16+
17+
<!--more-->
18+
19+
---
20+
21+
Over the past decade, Kubernetes has done something remarkable: it turned
22+
infrastructure into a portable abstraction. Compute workloads can now move
23+
between any cloud, any data centre, and any bare-metal cluster without
24+
rewriting a line of application code. The underlying hardware has been
25+
effectively commoditised.
26+
27+
The database has not.
28+
29+
While every other layer of the stack has been liberated, the data layer has
30+
not. PostgreSQL sits at the centre of this story. As the world's most deployed
31+
open-source relational database, it is also the engine most targeted by
32+
hyperscaler DBaaS offerings — and the one whose most powerful primitive is most
33+
deliberately withheld: the WAL stream, PostgreSQL's physical replication
34+
mechanism.
35+
36+
## The Day 2 reality of managed databases
37+
38+
The appeal of Database-as-a-Service is real. On Day 1, you click a button and a
39+
production-grade PostgreSQL cluster appears. No storage provisioning, no
40+
replication configuration, no backup policy to write. It is genuinely
41+
impressive, and it is easy to understand why organisations reach for it.
42+
43+
Day 2 is where the architecture reveals itself.
44+
45+
High availability, disaster recovery, point-in-time recovery, performance
46+
tuning, major version upgrades — all of this is managed through a proprietary
47+
control plane that your team does not own, cannot inspect, and cannot export.
48+
The operational intelligence that should live in your platform, expressed as
49+
code, reviewed by your engineers, and versioned in your repositories, is instead
50+
locked inside a hyperscaler's console.
51+
52+
This is not merely an inconvenience. When you need to respond to a compliance
53+
requirement, a regulatory change, or a geopolitical shift that demands you move
54+
workloads to a different jurisdiction or cloud, you discover that the
55+
operational steering wheel is not in your hands. The muscle memory required to
56+
operate your database at scale was never yours to begin with.
57+
58+
## The physical replication gap
59+
60+
The most consequential thing a managed database provider withholds is access to
61+
the WAL stream — the physical replication stream that is the beating heart of
62+
PostgreSQL.
63+
64+
Physical replication is what makes it possible to maintain a byte-for-byte
65+
replica of a primary instance in real time. It underpins streaming WAL to
66+
object storage for backup and point-in-time recovery, live standby clusters
67+
across regions, and the kind of frictionless, ongoing portability that makes
68+
cloud neutrality operational rather than aspirational.
69+
70+
The distinction between PostgreSQL's logical tools matters here. Logical backup
71+
and restore — pg_dump — requires a maintenance window proportional to dataset
72+
size, making it impractical at production scale for large databases. Logical
73+
replication is a different matter entirely: operating continuously at the level
74+
of decoded changes, it is well-suited to a controlled, one-time migration out of
75+
a managed service and is the foundation of blue-green major version upgrades.
76+
It is, in fact, the exact mechanism described in the migration section later in
77+
this article. But logical replication is not designed for permanent, ongoing
78+
portability: it does not replicate DDL, sequences, or large objects, and it
79+
cannot sustain the continuous multi-cluster replication that operational
80+
sovereignty requires over the long term.
81+
82+
That sustained capability requires the WAL stream. And managed database
83+
providers deliberately do not expose it. This is not an oversight — it is the
84+
architecture of lock-in. Once your data reaches the scale where ongoing
85+
physical replication matters, and that stream is withheld, the cost of leaving
86+
grows faster than the cost of staying. The provider knows it.
87+
88+
## The cloud-neutral resolution
89+
90+
The solution is not to avoid the cloud. It is to refuse the false choice between
91+
cloud convenience and operational control.
92+
93+
A cloud-neutral PostgreSQL architecture, built on open-source components, gives
94+
you both. The stack is straightforward:
95+
96+
- **Compute:** Kubernetes — the software-defined, portable infrastructure layer
97+
that runs identically on any cloud or bare-metal environment.
98+
- **Operator:** CloudNativePG — the open-source Kubernetes operator that
99+
codifies all Day 2 operational tasks declaratively.
100+
- **Engine:** Standard PostgreSQL — unmodified, fully open, with no proprietary
101+
extensions or behavioural divergence.
102+
103+
What makes this stack significant is not any individual component, but the fact
104+
that the entire configuration lives in your version control system as Kubernetes
105+
manifests. High availability topology, backup schedules, retention policies,
106+
replication configuration, resource limits — all of it is declarative, auditable
107+
and portable. It moves with you.
108+
109+
I explored the broader implications of this approach in a
110+
[post on the CNCF blog](https://www.cncf.io/blog/2024/11/20/cloud-neutral-postgres-databases-with-kubernetes-and-cloudnativepg/),
111+
if you want to go deeper on the cloud-neutrality angle.
112+
113+
## What CloudNativePG actually delivers on Day 2
114+
115+
[CloudNativePG](https://cloudnative-pg.io) was purpose-built for the Day 2
116+
problem. As a CNCF Sandbox project — the first relational database operator to
117+
enter the CNCF since 2018 and the first ever for PostgreSQL — it automates the
118+
full lifecycle of a PostgreSQL cluster on Kubernetes: automated failover,
119+
synchronous replication, point-in-time recovery, rolling updates, major version
120+
upgrades, and more.
121+
122+
Crucially, because CNPG manages standard PostgreSQL with full access to the
123+
engine internals, the physical replication stream is yours. You own the pipe.
124+
125+
You can stream your WAL to object storage for backup and PITR. You can maintain
126+
a physical standby in a separate Kubernetes cluster — in a different region or
127+
a different cloud entirely — using CloudNativePG's
128+
[distributed topology for replica clusters](https://cloudnative-pg.io/docs/current/replica_cluster#distributed-topology).
129+
You can migrate your entire dataset to a new environment by promoting that
130+
standby — with downtime measured in seconds, not hours.
131+
132+
This is the capability that managed services deliberately withhold, and it is
133+
the capability that makes portability permanent rather than theoretical.
134+
135+
## Observability as a first-class concern
136+
137+
Sovereignty over data and compute is necessary but not sufficient. If your
138+
metrics, logs, and traces are trapped in a proprietary cloud console, you lose
139+
operational visibility the moment you move.
140+
141+
CloudNativePG integrates natively with the CNCF observability stack. It
142+
produces [structured JSON logs directly to stdout](https://cloudnative-pg.io/docs/current/logging),
143+
making them immediately consumable by any log aggregation pipeline. It exposes
144+
a rich set of [PostgreSQL metrics via a native Prometheus endpoint](https://cloudnative-pg.io/docs/current/monitoring),
145+
and it supports OpenTelemetry for distributed tracing.
146+
147+
Your "eyes and ears" are as portable as your data. There is no
148+
proprietary dashboard you must replicate or vendor-specific agent you must
149+
re-instrument when you change cloud providers.
150+
151+
## Migrating without a maintenance window
152+
153+
For organisations currently running on a managed database service, the migration
154+
path follows a clear sequence.
155+
156+
First, build a parallel environment. Use
157+
[logical replication](https://cloudnative-pg.io/docs/current/logical_replication)
158+
to synchronise your data from the managed service into a CNPG-managed cluster. This phase can run
159+
indefinitely alongside production — it is low-risk, reversible, and gives your
160+
team the operational experience of running the new platform under real load
161+
before it matters.
162+
163+
Second, perform the cutover. Because the data is continuously synchronised,
164+
the cutover is a controlled pivot rather than a disruptive migration. Downtime
165+
is a function of the replication lag at the moment you flip, not of dataset
166+
size.
167+
168+
Third, maintain permanent portability. Once you are within the CloudNativePG
169+
ecosystem and running standard PostgreSQL with full WAL access, you can replicate
170+
your cluster anywhere — different cloud, different region, bare metal — using
171+
native physical replication. The investment in moving is a one-time cost. The
172+
freedom it buys is permanent.
173+
174+
The financial services sector illustrates this well. At KubeCon Amsterdam, Laurent Parodi and I gave a
175+
[talk](https://www.youtube.com/watch?v=m0LBKjlxrog) in which he walked through
176+
how HSBC approached this migration, navigating the intersection of strict
177+
regulatory requirements and the operational scale you would expect from one of
178+
the world's largest financial institutions. It is one of the more instructive
179+
real-world examples of this architecture in a heavily regulated environment.
180+
181+
## Staying in the cloud, leaving the DBaaS
182+
183+
For many organisations, the most immediate path forward does not require moving
184+
away from the cloud at all. If your applications already run on a
185+
hyperscaler-managed Kubernetes service — Amazon EKS, Azure Kubernetes Service,
186+
Google GKE — you are already closer to the solution than you might think.
187+
188+
The logical first step is not to migrate to a different provider or to bare
189+
metal. It is to move the PostgreSQL database from the hyperscaler's DBaaS
190+
offering — Amazon RDS, Azure Database for PostgreSQL Flexible Server — into the
191+
Kubernetes cluster you already operate, colocated with the applications that
192+
connect to it. CloudNativePG runs identically on EKS or AKS as it does on any
193+
other conformant Kubernetes distribution. Your application manifests do not
194+
change. Your network topology typically improves, since the database is now
195+
inside the same cluster rather than accessed over a managed service endpoint.
196+
197+
The outcome is immediate and compounding: you recover the operational
198+
intelligence currently locked inside RDS or Flexible Server, you eliminate the
199+
DBaaS premium from your cloud bill, and — crucially — you regain access to the
200+
WAL stream. From that point, replicating to a second region, streaming WAL to
201+
object storage, or moving to a different environment entirely are all decisions
202+
you make on your own terms, at a time of your choosing.
203+
204+
For a step-by-step walkthrough of this migration — covering Amazon RDS, Azure
205+
Database for PostgreSQL Flexible Server, and Google Cloud SQL as source systems
206+
— I wrote
207+
[CloudNativePG Recipe 5]({{< relref "../20240327-zero-cutover-migrations/index.md" >}}),
208+
which covers the full logical replication setup for a near-zero-downtime
209+
cutover into Kubernetes. Some operational details will have evolved with newer
210+
releases, but the approach and the underlying mechanics remain sound.
211+
212+
If you are running on Azure AKS specifically, this [walkthrough on deploying
213+
CloudNativePG on AKS](https://www.youtube.com/watch?v=KEApG5twaA4) is a good
214+
companion. The same logic applies across all hyperscaler Kubernetes offerings:
215+
Today, the cloud is not the problem. The DBaaS is.
216+
217+
## Compliance is now a pull force
218+
219+
For organisations operating under the EU Data Act or preparing for the Cyber
220+
Resilience Act, operational sovereignty is no longer purely an architectural
221+
preference — it is a compliance requirement. Both frameworks demand demonstrable
222+
data portability and the ability to move critical workloads between providers or
223+
onto private infrastructure.
224+
225+
A cloud-neutral architecture built on open standards is the most direct path to
226+
satisfying these requirements, and the architecture described here is precisely
227+
what auditors and regulators mean when they ask for evidence of portability. It
228+
is also the architecture that gives you the operational capability to actually
229+
execute a migration under time pressure, rather than just asserting in a
230+
compliance document that you could.
231+
232+
## The bottom line
233+
234+
DBaaS lock-in is not inevitable. It is the product of a specific architectural
235+
choice — handing Day 2 operational responsibility to a managed service that
236+
withholds the one primitive that makes portability possible at scale.
237+
238+
The alternative is not to build everything yourself. CloudNativePG handles the
239+
hard operational problems. Kubernetes handles infrastructure portability. Standard
240+
PostgreSQL handles your data, with no proprietary divergence. The stack is mature,
241+
production-proven, and already running mission-critical workloads at organisations
242+
including IBM, Google Cloud, Microsoft Azure, HSBC, Tesla, GEICO Tech and Novo
243+
Nordisk. The [full adopters list](https://github.com/cloudnative-pg/cloudnative-pg/blob/main/ADOPTERS.md)
244+
is publicly maintained and growing.
245+
246+
Owning the pipe — keeping access to the physical replication stream — is the
247+
difference between a database that can follow your organisation wherever it needs
248+
to go, and one that cannot.
249+
250+
That distinction is worth building for.
251+
252+
---
253+
254+
If you are interested in the practicalities of running this stack in production,
255+
I encourage you to explore the [CloudNativePG documentation](https://cloudnative-pg.io/docs/)
256+
and [get in touch with the community](https://github.com/cloudnative-pg#getting-in-touch).
257+
The project is open, governed transparently under the CNCF, and built to remain
258+
so.
259+
260+
The themes in this article also formed the basis of a talk I gave with Floor
261+
Drees at [Open Sovereign Cloud Day, KubeCon EU 2026](https://colocatedeventseu2026.sched.com/event/2H5Uc/beyond-the-dbaas-trap-achieving-data-sovereignty-with-kubernetes-and-cloudnativepg-floor-drees-gabriele-bartolini-edb)
262+
— titled "Beyond the DBaaS Trap: Achieving Data Sovereignty with Kubernetes
263+
and CloudNativePG". If you prefer the spoken version, that is a good
264+
companion to this article.
265+
266+
---
267+
268+
Stay tuned for the upcoming recipes! For the latest updates, consider
269+
subscribing to my [LinkedIn](https://www.linkedin.com/in/gbartolini/) and
270+
[Twitter](https://twitter.com/_GBartolini_) channels.
271+
272+
If you found this article informative, feel free to share it within your
273+
network on social media using the provided links below. Your support is
274+
immensely appreciated!
275+
276+
_This article was drafted and refined with the assistance of Claude (Anthropic).
277+
All technical content, corrections and editorial direction are the author's own._
41.1 KB
Loading

0 commit comments

Comments
 (0)