Why do Databases use B Trees🌲?

Why do Databases use B Trees🌲?

·

3 min read

when we talk about storage RAM and HardDisks come to the picture

Let’s understand there fucntions and how they differ from each other

FunctionalityRAMHardDisk
PurposeTemporary Data StoragePermanent Data Storage
SpeedMuch FasterSlower
VolatilityLoses data when power is offData’s are persistent
SizeSmaller Capacities (GBs)Large Capacity(TBs)
UsageSupports active tasks and processesStore files, applications, and OS
PriceExpensiveCheap

Let’s understand why is HardDisk Slow ?

Hard disks are inherently slower due to the way they retreive datas, which involves blocks and indexing

  1. Block based Retrieval : Data on hard disk is stored in blocks. Accessing a specifi piece of data often requires reading an entire block, even if only a small portion of the block is needed, adding overhead.

  2. Indexing Overhead: Hard disks rely on file system indexes (like FAT or NTFS) to locate data. Traversing these indexes adds extra time before the actual data can be accessed

  3. Fragmentation: Over time, as files are created and deleted, data mayb become fragmenred across the disk,requiring the read/write head to move between multiple physical locations to retrieve a single file

  4. Seek Time: The moving read/write head must physically position itself over the correct track on the spinning platter, introducing latency

  5. Rotational Latency: The disk must spin to the correct position to access data, further slowing retrieval compared to solid-state alternatives like SSDs

  6. RAM relies over Electronics (semiconductors), HD relieves over Magnetism, SSD uses flash memory (NAND-based)to store data electronically

  7. for 1 million data it takes nearly 100 seconds

This is known as single-level indexing, which is efficient for up to 1 million data storage entries.

for 1 milion data entries it takes nearly 250 ms

MultiLevel Indexing

If the data entries crosses billions then this wont be effiecient at all simple looking like a linear search irrespective whether u use binary search ,Index sequential search or indexing it would be hell slow and a pure over head

do forgive me for the numbers they are rought estimate but hope u get the concept behind it

this takes 3 ms

Let’s rotate this image clockwise 90*

There’s ur B Tree

just lot of index tables hope u get the picture here

Data StructureSearchInsertDelete
Balanced BST20ms20ms20ms
Unbalanced BST1 million ms1 million ms1 million ms
Sorted Array20 ms1 million ms1 million ms
B Tree3 ms3 ms3 ms

log to the base 2 (10^6) = 20ms

log to the base m (10^6) = 3ms

Where m is the number of childrens of the parent let’s consider it to be 100

then we get log to the base 100 (10^6) = 3ms

Difference Between B and B+ Trees

AspectB-TreeB+ Tree
StructureInternal nodes store both keys and values.Internal nodes store only keys; values are in leaf nodes.
Leaf NodesLeaf nodes are at different levels.All leaf nodes are at the same level.
Search EfficiencySearching may involve traversing through internal nodes and leaves.Searching is faster as only leaf nodes store values.
Range QueriesRange queries are less efficient.Range queries are more efficient due to linked leaf nodes.
Insertion and DeletionMore complex due to the internal node structure.Simplified since only leaf nodes are affected.

Did you find this article valuable?

Support Thirumalai by becoming a sponsor. Any amount is appreciated!

Â