From 2PC to Sagas, Managing Transactions in Distributed Systems

Modern distributed applications rarely resemble the monoliths we built a decade ago. Instead of a single codebase and database, today’s systems are composed of many independently deployed services, often communicating through an event bus such as Apache Kafka. This architectural shift brings scalability and autonomy—but it fundamentally changes how transactions work.

In a monolith, transactions are straightforward. In distributed systems, they are not.

This post explains why traditional transactions break down, why the Outbox pattern is not enough, and how the Saga pattern addresses long-running, distributed business workflows.

Life Was Simple in the Monolith

In a monolithic architecture:

All modules run in a single process
All data typically lives in a single database
Business operations execute inside one transactional context

A checkout flow—charging a payment, reserving inventory, sending confirmation—can be wrapped in a single database transaction. If anything fails, the database rolls everything back.

Two-phase commit (2PC) or even a single local transaction provides:

Atomicity
Consistency
Clear failure semantics

Developers rarely have to think about partial success.

The Distributed Reality

When applications are decomposed into services:

Each service owns its own database
Communication happens via events or remote calls
Transactions may span multiple services and long time windows

At this point, 2PC becomes impractical:

It couples services tightly
It scales poorly
It introduces blocking and failure amplification

More importantly, many real-world workflows are long-running. You cannot lock databases across services for minutes—or hours—waiting for a business process to complete.

Something has to change.

The Dual Write Problem

A common challenge in distributed systems is the dual write problem:

A service updates its database
It publishes an event or calls another service
One succeeds while the other fails

The result is inconsistency.

For example, the database might successfully store a new order, but the event notifying downstream services might never be published. Those services would then remain unaware that the order exists.

The Outbox Pattern (and Its Limits)

The Transactional Outbox pattern addresses this reliability problem.

Instead of publishing a message directly after updating the database, the service first writes the outgoing message to an outbox table in the same database transaction as the business data change. Because both writes occur in a single transaction, they either succeed or fail together.

Once the transaction commits, a separate process retrieves messages from the outbox table and publishes them to the message broker. This publishing step can be implemented in different ways, such as:

A polling publisher, which periodically queries the outbox table for new messages
transaction log tailing, where changes are detected directly from the database’s transaction log

This approach ensures that messages are reliably delivered without requiring distributed transactions.

However, it is important to understand the scope of the problem this pattern solves. The Outbox pattern guarantees reliable communication between services, but it does not guarantee business-level consistency across multiple services.

When Reliable Messaging Is Not Enough

Consider a checkout process split across services:

Order Service creates an order
Inventory Service reserves stock
Shipping Service creates a label
Notification Service sends an email

All of these steps must succeed—or the system must undo the ones that already ran.

In a monolith, this is trivial. In a distributed system, there is no shared transaction to roll back.

This is the class of problems the Saga pattern is designed to solve.

What Is the Saga Pattern?

A Saga models a business transaction as a sequence of local transactions, each executed by a different service.

Key characteristics:

Each service commits its own database transaction
There is no global rollback
Failures are handled through compensating actions

Instead of rolling back, the system explicitly undoes completed steps.

Example:

Inventory reserved → later failure → inventory is released
Payment captured → later failure → refund is issued

This shifts responsibility from the database to the application.

Two Ways to Implement a Saga

1. Choreography (Event-Driven Sagas)

In choreography:

There is no central coordinator
Each service:
- Executes its local transaction
- Publishes an event
- The next service reacts to that event

Advantages

Highly decoupled
Naturally fits event-driven architectures
Scales well

Trade-offs

Business flow is implicit and scattered
Harder to trace and debug
Failure handling logic is distributed

2. Orchestration (Centralized Control)

In orchestration:

A Saga Orchestrator coordinates the process
It tells services what to do and when
It decides when to trigger compensating actions

Communication may be:

Synchronous (HTTP, gRPC)
Asynchronous (messaging, Kafka)

Advantages

Business logic is explicit and centralized
Easier to reason about complex workflows
Clear visibility into process state

Trade-offs

Additional component to manage
Risk of creating a bottleneck if poorly designed

The Core Mental Shift

The most important takeaway is conceptual:

Distributed systems do not give you rollback for free.

You must:

Accept partial failure as normal
Design compensating actions explicitly
Treat business workflows as state machines, not transactions

The Saga pattern formalizes this approach.

For how Sagas handle isolation and prevent cross-saga interference, see Saga Isolation and Semantic Locks.

When to Use What

Single service, simple consistency → local transactions
Reliable event publication → Outbox pattern
Multi-step, cross-service business processes → Saga pattern

In practice, these patterns are often used together:

Outbox ensures that state changes and events are published reliably
Saga ensures that multi-service workflows remain consistent

Closing Thoughts

Migrating from monoliths to distributed systems is not just a technical refactor—it is a transactional paradigm shift.

The Saga pattern is not a silver bullet, but it provides a practical and scalable way to manage long-running, distributed business processes in a world where traditional transactions no longer apply.

Understanding this shift is essential for designing systems that fail predictably—and recover correctly.

Happy coding! 💻