Why do Databases use B Trees🌲?

when we talk about storage RAM and HardDisks come to the picture

Let’s understand there fucntions and how they differ from each other

Functionality	RAM	HardDisk
Purpose	Temporary Data Storage	Permanent Data Storage
Speed	Much Faster	Slower
Volatility	Loses data when power is off	Data’s are persistent
Size	Smaller Capacities (GBs)	Large Capacity(TBs)
Usage	Supports active tasks and processes	Store files, applications, and OS
Price	Expensive	Cheap

Let’s understand why is HardDisk Slow ?

Hard disks are inherently slower due to the way they retreive datas, which involves blocks and indexing

Block based Retrieval : Data on hard disk is stored in blocks. Accessing a specifi piece of data often requires reading an entire block, even if only a small portion of the block is needed, adding overhead.
Indexing Overhead: Hard disks rely on file system indexes (like FAT or NTFS) to locate data. Traversing these indexes adds extra time before the actual data can be accessed
Fragmentation: Over time, as files are created and deleted, data mayb become fragmenred across the disk,requiring the read/write head to move between multiple physical locations to retrieve a single file
Seek Time: The moving read/write head must physically position itself over the correct track on the spinning platter, introducing latency
Rotational Latency: The disk must spin to the correct position to access data, further slowing retrieval compared to solid-state alternatives like SSDs
RAM relies over Electronics (semiconductors), HD relieves over Magnetism, SSD uses flash memory (NAND-based)to store data electronically
for 1 million data it takes nearly 100 seconds

This is known as single-level indexing, which is efficient for up to 1 million data storage entries.

for 1 milion data entries it takes nearly 250 ms

MultiLevel Indexing

If the data entries crosses billions then this wont be effiecient at all simple looking like a linear search irrespective whether u use binary search ,Index sequential search or indexing it would be hell slow and a pure over head

do forgive me for the numbers they are rought estimate but hope u get the concept behind it

this takes 3 ms

Let’s rotate this image clockwise 90*

There’s ur B Tree

just lot of index tables hope u get the picture here

Data Structure	Search	Insert	Delete
Balanced BST	20ms	20ms	20ms
Unbalanced BST	1 million ms	1 million ms	1 million ms
Sorted Array	20 ms	1 million ms	1 million ms
B Tree	3 ms	3 ms	3 ms

log to the base 2 (10^6) = 20ms

log to the base m (10^6) = 3ms

Where m is the number of childrens of the parent let’s consider it to be 100

then we get log to the base 100 (10^6) = 3ms

Difference Between B and B+ Trees

Aspect	B-Tree	B+ Tree
Structure	Internal nodes store both keys and values.	Internal nodes store only keys; values are in leaf nodes.
Leaf Nodes	Leaf nodes are at different levels.	All leaf nodes are at the same level.
Search Efficiency	Searching may involve traversing through internal nodes and leaves.	Searching is faster as only leaf nodes store values.
Range Queries	Range queries are less efficient.	Range queries are more efficient due to linked leaf nodes.
Insertion and Deletion	More complex due to the internal node structure.	Simplified since only leaf nodes are affected.

Why do Databases use B Trees🌲?

Table of contents

Let’s understand why is HardDisk Slow ?

MultiLevel Indexing

Difference Between B and B+ Trees

Did you find this article valuable?