Scaling Databases ✨

Clustering

Clustering involves connecting multiple servers to work together as a single unit to imprve database performance, scalability, and fault tolerance

Key Concepts to not on :-

Load Balancing:= Distributing incoming queries across multiple nodes in the cluser
High Availability:= If one node fails, others can take over to ensure uptime
Horizontal Scaling:= Easily add more nodes to handle increased load
Distributed Data:= Data can be partitioneed or duplicated across nodes for optimal performance
Consistency Challenges:= Ensuring data consistency across nodes may require consensus algorithms like Paxos or Raft

Replication

Replication involves creating and maintaining copies of the same database on multiple servers to ensure redundancy and performance.

Types of Replication:

Master-Slave Replication:
- one master server handles writes, relicas (slaves) handle reads
- Reduces load on the master
Master-Master Replication:
- All servers can handle reads and writes
- Used for high availability and geographical distribution
Synchronous Replication:
- Data is written to all replicas simultaneously
- Ensures data consistency but slower
Asynchronous Replication
- Data is written to replicas with a delay
- Faster but risks inconsistency

Consensus v/s Quorum

Consensus

1) A process in distributed systems where all nodes agree on a single value or decision

2) Ensures consistency across nods, even in the presence of failures or communicatino issues

3) Examples: Algorithms like Paxos, Raft, and ZooKeeper’s Zab Protocol achieve consensus

4) Focus: Reaching agreement among distributed nodes

Quorum

1) A subset of nodes (majority or a specific minimum) required to perform an operation (like read/write)

2) Purpose: Ensures a minimum number of nodes are involved in an operation to maintain consistency and reliability

3) Exampes: A quorum might require over half the nodes to approve a write operation for it to be a valid

4) Determining the minimum number of nodes needed for a consensus-related decisions or operations

Aspect	Consensus	Quorum
Purpose	Agreement on a value/decision	Define participation threshold
Scope	Entire system	Subset of nodes
Implementation	Algorithms like Raft	Used in quorum reads/writes
Example	Electing a leader	Allowing a write if 3/5 nodes agree

Partioning & Sharding

Partitioning involves dividing a database into smaller, manageable pieces (partitions) to improve performance and scalability.

Key concepts:

Horizontal Partitioning(Sharding)
- Splits rows of a table across different database servers
- Examples: Users with IDs 1-1000 go to one partition, and IDs 1001-2000 go to another
Vertical Partitioning:
- Splits columns of a table across different servers
- Examples: User profile data in one server and user purchase history in another
Range Based Partitioning:
- Data is partitioned based on ranges of a key(eg dates, IDs)
Hash-Based Partitioning
- Uses a hash function to evenly distribute rows across partitions

Sharding is a form of horizontal partitioning where data is distributed across multiple databases (shards). Each shard operates independently.

Key Concepts:

Shard Key:
- A field used to determine which shard data belongs to.
- Example: UserID % TotalShards.
Scalability:
- Adding more shards increases the system’s capacity.
Fault Isolation:
- Failure of one shard doesn’t affect others.

Complexity:

Sharding adds complexity in managing queries, rebalancing shards, and ensuring consistency.

CAP Theorem

The CAP Theorem states that in a distributed system, it is impossible to simulataneously guarantee all three of the following properties

Consistency : all nodes in the system see the same data at the same time.

For Eg:= When data is updated on one node, the update is immediately reflected on all the other nodes

Availability : The system remains operational and responds to every request, even if some nodes fail

For Eg:= Every request receieves a response, though it may not be the most recent data

Partition Tolerance : The system remains operational and responds to every request, even if some node fail

Let’s learn more about Partition Tolerance

Partition tolerance addresses the scenario where network communication failures occur between nodes in a distributed system. In a partitioned system:

Nodes cannot communicate with each other due to a network failure.
Each partition might operate independently for a period.
Partition tolerance ensures that the system does not completely fail because of these network partitions. Instead, it continues to function—perhaps in a limited capacity—until the partition is resolved.

3 Types of DBs

Consistency + Availability (No Partition Tolerance):
- Suitable for systems with reliable networks where partitioning is rare.
- Example: Relational databases with strong consistency.
Consistency + Partition Tolerance (No Availability):
- Prioritizes accuracy of data but sacrifices availability during network partitions.
- Example: Distributed databases like MongoDB in "strict consistency" mode.
Availability + Partition Tolerance (No Consistency) (Eventual Consistency):
- Prioritizes uptime but may serve stale or inconsistent data.
- Example: Eventual consistency models like DynamoDB or Cassandra.

Different Types of DBs

Type	Example
RDBMS	MySQL
NoSQL	MongoDB
Hierarchial	IBM IMS
Object Oriented DB	Object DB
Column-Family DB	Cassandra
Graph DB	Neo4j
Time Series DB	Influx DB
Distributed DB	Google Spanner
Cloud DB	Amazon RDS
Key-value(cache) DB	Redis