Database Glossary

Database Glossary

·

11 min read

Databases involve a variety of important terms and concepts that are crucial for understanding how data is stored, managed, and accessed

Basic Concepts

  1. Database:

    • An organized collection of data, generally stored and accessed electronically from a computer system.
  2. DBMS(Database Management System):

    • Software that interacts with end users, applications, and the database itself to capture and analyze data. Examples include MySQL, PostgreSQL, Oracle, and MongoDB.
  3. Schema:

    • The structure of a database, defined in a formal language supported by the DBMS. It includes the tables, fields, relationships, views, indexes, and other elements

Data Models

  1. Relational Models

    • Organizes data into tables(relations) with rows and columns. Each table represents an entity, and each row represents a record.

    • SQL(Structured Query Language) is used to interact with relational databases

  2. NoSQL

    • A broad class of database systems that are not relational. NoSQL databases can be document-based, key-value stores, wide-column stores, or graph databases

    • Examples: MongoDB(document), Redis (key-value), Cassandra(wide-column), Neo4j (graph)

Key Concepts

  1. Table:

    • A collection of related data entries in a relational database, consisting of rows and columns
  2. Row(Tuple):

    • A single record in a table, representing a single data item.
  3. Column (Attribute):

    • A single field in a table, representing a data attribute
  4. Primary Key:

    • A unique identifier for a row in a table, ensuring that each record can be uniquely identified
  5. Foreign Key:

    • An attribute in a table that creates a link between two tables. It corresponds to the primary key in another table
  6. Index:

    • A database object that improves the speed of data retrieval operations on a table. It is created on one or more columns of a table

Transactions and Concurrency

  1. Transaction:

    • A unit of work that is performed against a database. It is often a sequence of operations performed as a single logical unit of work

    • ACID Properties

      • Atomicity: Ensures that all operations within a transaction are completed; if not the transaction is aborted

      • Consistency: Ensures that the database remains in a consistent state before and after the transaction

      • Isolation: Ensures that transactions are securely and independently processed at the same time without interface

      • Durability: Ensures that the results of a completed transacition are permanently stored in the database.

  2. Locking:

    • Mechanism to control concurrent access to data in a database. Locks can be applied to rows, pages or tables to ensure data integrity
  3. Dead Lock:

    • A situation where two or more transactions are waiting for each other to release locks, resulting in a standstill for long time

Database Design

  1. Normalization:

    • The process of organizing the fields and tables of a relational database to minimize redundancy and dependecy. Normal forms(1NF, 2NF, 3NF, etc..)
  2. Denormalization:

    • The process of adding redundancy to a database design to improve read performance at the cost of write performance and data integrity
  3. ER Model(Entity-Relationship Model):

    • A data modeling techinque used to visualize the database structure. It involves entities (objects) and relationship between them

Advanced Concepts

  1. Replication:

    • The process of copying and maintaining database objects, such as tables in multiple database instances to ensure data availability and reliability

  2. Sharding

    • The practice of partioning data across multiple databases to improve perfromance and scalibilty

  3. Partioning:

    • The division of a database or its components into smaller, more manageable pieces called partitions

  4. Backup and Recovery

    • Techniques for creating copies of the database to proect against data loss and for restoring the database to consistent state after a failure
  5. ETL(Extract, Transform, Load):

    • A prcoess in data warehousing that involves extracting data from different sources, transforming it into a suitable format, and loadiing it into a data warehouse

Normalization in Depth

Normalization is a systematic approach to organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining relationships between them. The process of normalization is guided by various normal forms, each addressing specific issues related to data redundancy and dependency. Here is an in-depth look at normalization, from basics to advanced concepts.

Basics of Normalization

1. First Normal Form (1NF):

  • Definition: A table is in 1NF if all columns contain only atomic (indivisible) values, and each column contains values of a single type.

  • Goals: Eliminate repeating groups and ensure that each column contains atomic values.

  • Example: A table with columns for CustomerID, OrderID, and a list of ProductIDs is not in 1NF. To normalize, split the ProductIDs into separate rows.

Example Table (Not in 1NF):

CustomerIDOrderIDProductIDs
11011, 2, 3
21022, 3

Normalized Table (1NF):

CustomerIDOrderIDProductID
11011
11012
11013
21022
21023

2. Second Normal Form (2NF):

  • Definition: A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key.

  • Goals: Eliminate partial dependency, where a non-key attribute depends only on part of the primary key.

  • Example: In a table with a composite primary key (OrderID, ProductID), if CustomerName depends only on OrderID, it's not in 2NF.

Example Table (1NF but not 2NF):

OrderIDProductIDCustomerName
1011Alice
1012Alice
1023Bob

Normalized Tables (2NF):

  • Orders Table:

    | OrderID | CustomerName | | --- | --- | | 101 | Alice | | 102 | Bob |

  • OrderDetails Table:

    | OrderID | ProductID | | --- | --- | | 101 | 1 | | 101 | 2 | | 102 | 3 |

3. Third Normal Form (3NF):

  • Definition: A table is in 3NF if it is in 2NF and all attributes are functionally dependent only on the primary key.

  • Goals: Eliminate transitive dependency, where a non-key attribute depends on another non-key attribute.

  • Example: In a table with OrderID, CustomerID, and CustomerAddress, if CustomerAddress depends on CustomerID, not OrderID, it's not in 3NF.

Example Table (2NF but not 3NF):

OrderIDCustomerIDCustomerAddress
1011123 Main St
1022456 Elm St

Normalized Tables (3NF):

  • Orders Table:

    | OrderID | CustomerID | | --- | --- | | 101 | 1 | | 102 | 2 |

  • Customers Table:

    | CustomerID | CustomerAddress | | --- | --- | | 1 | 123 Main St | | 2 | 456 Elm St |

Advanced Normal Forms

4. Boyce-Codd Normal Form (BCNF):

  • Definition: A table is in BCNF if it is in 3NF and every determinant is a candidate key.

  • Goals: Handle situations where 3NF does not eliminate all anomalies.

  • Example: In a table with StudentID, CourseID, and InstructorID, if each course has only one instructor but an instructor can teach multiple courses, anomalies can occur.

Example Table (3NF but not BCNF):

StudentIDCourseIDInstructorID
1CS1011001
2CS1011001
3CS1021002

Normalized Tables (BCNF):

  • Enrollments Table:

    | StudentID | CourseID | | --- | --- | | 1 | CS101 | | 2 | CS101 | | 3 | CS102 |

  • Courses Table:

    | CourseID | InstructorID | | --- | --- | | CS101 | 1001 | | CS102 | 1002 |

5. Fourth Normal Form (4NF):

  • Definition: A table is in 4NF if it is in BCNF and contains no multi-valued dependencies.

  • Goals: Eliminate multi-valued dependencies, where one attribute depends on another attribute independent of the primary key.

  • Example: In a table with StudentID, CourseID, and Hobby, if a student can have multiple hobbies and enroll in multiple courses independently, it leads to redundancy.

Example Table (BCNF but not 4NF):

StudentIDCourseIDHobby
1CS101Reading
1CS101Swimming
1CS102Reading
1CS102Swimming

Normalized Tables (4NF):

  • StudentCourses Table:

    | StudentID | CourseID | | --- | --- | | 1 | CS101 | | 1 | CS102 |

  • StudentHobbies Table:

    | StudentID | Hobby | | --- | --- | | 1 | Reading | | 1 | Swimming |

6. Fifth Normal Form (5NF):

  • Definition: A table is in 5NF if it is in 4NF and all join dependencies are implied by candidate keys.

  • Goals: Ensure that the decomposition of the table does not lead to loss of information.

  • Example: In a table with ProjectID, EmployeeID, and TaskID, if projects, employees, and tasks are related independently, it can cause anomalies.

Example Table (4NF but not 5NF):

ProjectIDEmployeeIDTaskID
1101A
1102A
2101B
2102B

Normalized Tables (5NF):

  • ProjectEmployees Table:

    | ProjectID | EmployeeID | | --- | --- | | 1 | 101 | | 1 | 102 | | 2 | 101 | | 2 | 102 |

  • ProjectTasks Table:

    | ProjectID | TaskID | | --- | --- | | 1 | A | | 2 | B |

  • EmployeeTasks Table:

    | EmployeeID | TaskID | | --- | --- | | 101 | A | | 102 | A | | 101 | B | | 102 | B |

Summary of Normal Forms

  • 1NF: Eliminate repeating groups; ensure atomicity.

  • 2NF: Eliminate partial dependencies on the primary key.

  • 3NF: Eliminate transitive dependencies.

  • BCNF: Ensure every determinant is a candidate key.

  • 4NF: Eliminate multi-valued dependencies.

  • 5NF: Ensure that the table can be reconstructed from its decompositions without loss of information.

Normalization is a fundamental concept in database design, ensuring data integrity and reducing redundancy. Understanding and applying the appropriate normal forms can significantly enhance the efficiency and reliability of a database.

Codd's Rule for Optimized Design

Codd's Twelve Rules

Rule 0: Foundation Rule

  • A system must qualify as a relational database management system by supporting these rules; otherwise, it is not considered relational.

Rule 1: Information Rule

  • All information in a relational database is represented explicitly at the logical level and in exactly one way: by values in tables.

Rule 2: Guaranteed Access Rule

  • Each piece of data (atomic value) in a relational database is guaranteed to be accessible by using a combination of table name, primary key value, and column name.

Rule 3: Systematic Treatment of Null Values

  • Null values (distinct from the empty string or a zero) are supported fully for representing missing information and inapplicable information in a systematic way, independent of data type.

Rule 4: Dynamic Online Catalog Based on the Relational Model

  • The database description is represented at the logical level as tables and can be queried online via the data manipulation language.

Rule 5: Comprehensive Data Sublanguage Rule

  • A relational system may support several languages and various modes of terminal use. However, there must be at least one language whose statements can express all of the following:

    • Data definition

    • View definition

    • Data manipulation (interactive and by program)

    • Integrity constraints

    • Authorization

    • Transaction boundaries (begin, commit, and rollback)

Rule 6: View Updating Rule

  • All views that are theoretically updatable must be updatable by the system.

Rule 7: High-Level Insert, Update, and Delete

  • The system must support insert, update, and delete operations at a set level (e.g., operating on multiple rows or tables simultaneously).

Rule 8: Physical Data Independence

  • Changes to the physical level (how data is stored) must not require a change to an application based on the structure.

Rule 9: Logical Data Independence

  • Changes to the logical level (tables, columns, rows) must not require a change to an application based on the structure.

Rule 10: Integrity Independence

  • Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and stored in the catalog, not in the application programs.

Rule 11: Distribution Independence

  • The distribution of data across various locations should be invisible to the end-users. Users should not be aware of whether the data is centralized or distributed.

Rule 12: Non-Subversion Rule

  • If a relational system has a low-level (single-record-at-a-time) language, that low-level language cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher-level (set-oriented) relational language.

Summary

Codd's rules are intended to provide a guideline for the features and behavior of relational database management systems, ensuring they adhere to the principles of the relational model. These rules focus on data integrity, data independence, and the systematic treatment of data, which are critical for maintaining a robust and reliable database system. While not all modern RDBMSs fully comply with all of Codd's rules, they remain foundational principles in the design and evaluation of relational databases.

Did you find this article valuable?

Support Thirumalai by becoming a sponsor. Any amount is appreciated!