relational data model illustration overcoded

Relational Database Model: A Powerful Framework for Modern Data

Data are commonly stored in databases and accessed via a database management system (DBMS) such as PostgreSQL. Together, these make up a database system that can be defined, at least in part, by its data model. The relational data model is the most common model for representing data relationships in databases.

This model, having been around 50 years, is based on set theory and provides a robust framework by which simple and complex relationships can be modeled within a database system.

Introduction

The relational data model (RM) is the most widely-used modeling system for database data. It was first described by Edgar F. Codd in his 1969 work A Relational Model of Data for Large Shared Data Banks [1].

Codd’s relational model replaced the hierarchical data model—which had many performance drawbacks. The Structured Query Language (SQL) was developed to support this new data model and, in 1987, was adopted as the official ANSI-approved language for relational database models [2].

Defining the Relational Model

Relational data models are characterized by the following traits:

  1. Data and relationships within data are represented by tables;
  2. Each table has multiple columns with unique names;
  3. Represents data in a record-based fashion such that each table has records of a particular type (“person” table contains data about individual person objects)
  4. Defined by first-order predicate logic
  5. Provides declarative access to data such that users state what data is needed but not necessarily how to get that data.

Logical Design

In relationship to predicate logic, relational data languages such as SQL correspond in the following way:

  1. Database Table == Predicate Variable
  2. Datbase Table Contents == Predicate Relation
  3. Database data Constraints (keys, etc.) == Predicates

For example, the predicate statement there exists data such that data is a Person and the data is a Student roughly translates to the following SQL query statement:

SELECT type
FROM person
WHERE type = 'student'

SQL vs. Relational Model

SQL is the most common language used to query relational databases. However,  modern SQL-driven applications may implement certain features that do not conform to Codd’s original outline of the relational model. Some of these language features include the following:

  1. Duplicate row values
  2. Unnamed columns in a table
  3. Placing significance on the column order
  4. NULL values

Relational Operations

SQL provides a robust means to access, manipulate, and update databases. Queries can be written by users and programmers to access databases in varying degrees of complexity. The power of relational database design comes from the ability to specify interrelatedness among data.

For example, the most basic SQL query might select all rows from all tables. While having the benefit of being syntactical simple, this doesn’t provide much functional or benefit given that it returns all the data. This would be achieved with the following statement:

SELECT *
FROM *

More often, users desire a filtered view of the data. An example might be a university registrar employee querying the student database for a single student, all students in a particular class, or all students with a GPA of 3.5 or higher.

These filtered views can execute logically across single tables or multiple tables using a join operation. In addition to the join operation, there are several other useful relational operations that can be leveraged to result in logically filtered data:

  1. Join
  2. Project
  3. Restrict
  4. Union
  5. Difference
  6. Intersect
  7. Product

Mathematical Theory for Relational Design

SQL operations such as these draw their theoretical basis from set theory. By applying such theory, a wide range of operations can be defined and engineered for optimization in dealing with relational databases.

Another mathematical basis for the relational data model is that of domain relational calculus (DRC). This was used as early as 1977 by Michael Lacroix and Alain Pirotte in their Domain-Oriented relational Languages work [3]. In the words of those authors:

“DRC is a many-sorted calculus, where the structural role of domains is emphasized by defining types for variables and type-checking rules”

The context of this statement within the domain of the relational data model isn’t immediately clear. However, the abstract of their paper continues to provide a slightly clearer clarification:

“Domain-languages” are contrasted with “tuple languages”, which manipulate as basic objects the relation n-tuples. Directly manipulating domains and domain values interacts more directly with the semantics expressed by the relations and produces simpler and more English-like languages”

A simplified interpretation is DRC allows the querying of data by broader relationships as opposed to specific values. For example, the following statement is considered a domain relational query:

SELECT name
FROM person
WHERE name = "bob"

As opposed to a tuple-relational model that would require something along these lines (pseudocode):

select person.name = 'bob' and person.company = 'acme'

Examples of Relational Design

Consider the database design for a stock-trading application. At its very core, it would need to keep track of which company an investor owns securities in, which exchange those securities are traded on, the price paid, and the date on which that position was taken. Below is a sample design of two tables, where the bold ID attribute is the primary key for each:

Company ID name address
Exchange ID name symbol

Using the Company and Exchange model, an investor can then create a third data model using each as a relation. For this model—one we’ll call Position—the following attributes are defined, where those showing underlined values have Foreign Key relations to entries in the two tables above:

Position ID company_id exchange_id share_count price open_date

In this example, a Position object represents a novel relationship between a Company and an Exchange. The primary key identifier for the referenced object must be present. In addition, non-relational attributes can be defined for a position such as a price paid, the date the position was taken, and the number of shares represented.

Example Object Classes

The example above illustrates the very essence of a relational database design. To better illustrate the application of the logical framework this model uses, consider the following database classes (defined using Django’s ORM class model framework):

from django.db import models


class Company(models.Model):
    """
    A database object model representing a company
    """
    # static attributes
    id          = models.IntegerField(primary_key=True)
    symbol      = models.CharField(max_length=8)
    name        = models.TextField()


class Exchange(models.Model):
    """
    A database object model representing an exchange where securities
    are traded by investors.
    """
    # static attributes
    id          = models.IntegerField(primary_key=True)
    symbol      = models.CharField(max_length=12)
    name        = models.TextField()


class Position(models.Model):
    """
    A database model representing a position taken by an investor
    that describes the security, the exchange from which it was
    obtained, the price-paid, the amount of shares, and date
    """
    # static attributes
    id          = models.IntegerField(primary_key=True)
    price       = models.FloatField()
    count       = models.IntegerField()
    date        = models.DateTimeField()
    
    # related attributes
    company     = models.ForeignKey(to='Company', on_delete=models.CASCADE)
    exchange    = models.ForeignKey(to='Exchange', on_delete=models.CASCADE)

There’s some Django-specific syntax in this example but conceptually it’s aligned with the core tennents relational database design. The id attribute is defined as a primary key for Company and Exchange. This field contains the actual integer value that is used with a Position entry is created that references Company and Exchange.

The static attributes contain information that does not relate to anything other than the specific data entry. For example, Acme Aeronautics would be a static attribute value for a Company name.

In these class models, different data models are used to constrain the attribute domains. These include the IntegerField, FloatField, DateTimeField, and CharField classes. These are Django-specific implementations but express the universal concept of attribute domain restriction found in relational database models. In other words, they dictate what type of values can be stored for particular attributes.

Note: Django automatically defines the id field as the primary key. It was done explicitly here for the sake of completeness. It’s also automatically defined to be auto-incremented as new data are added.

Final Thoughts

The relational data model is the most popular database data model and common to many popular applications we all know and love. Updating your Twitter status? You’re likely interacting with Twitter’s User model from their relational database design.

In many cases, hobbyist-level applications can leverage robust frameworks like Django’s ORM system that hide the nitty-gritty of relational database concepts. The syntax might change a bit—avoiding the need for explicit data modeling language use. The underlying features are usually pretty similar and leverage lower-level constructs that are part of Database management systems like PostgreSQL.

References

  1. Codd, E. F. “A Relational Model of Data for Large Shared Data Banks.” Communications of the ACM, vol. 13, no. 6, June 1970, pp. 377–87. June 1970, doi:10.1145/362384.362685.
  2. International Standards Organization (ISO). “Information Processing Systems – Database Language – SQL”. ISO 9075:1987. June, 1987. ICS 35.060
  3. Lacroix, Michel, and Alain Pirotte. “Domain-Oriented Relational Languages.” Proceedings of the Third International Conference on Very Large Data Bases – Volume 3, VLDB Endowment, 1977, pp. 370–78.