Artificial vs Surrogate Keys: A Star-Crossed Database Debate

The Importance of Terms in Technology

Shakespeare famously asked, "What's in a name?" As it turns out, a great deal since the term we use to describe a thing affects how we perceive and think about it. Perhaps it's the rapid rate of change that makes technology particularly susceptible to the effects of how we use words to describe things. Terms too precise can cease to be used, while overly simple application of terms can obscure essential distinctions.

an abstract view of a server back on a creative background

Definition of Natural Keys

This seems to be the case when folks talk about "artificial" and "surrogate" keys. Let's take a closer look. Everyone seems to agree that a "natural" key is some characteristic out there in the real world that can serve as a key in a database.

For example, employees already have a social security number (or social insurance number, or whatever your country calls it). If you were creating a table of employees, you might well use this unique value as a key.

The difference between Artificial and Surrogate Keys

In contrast with "natural keys," however, many people use "artificial keys" and "surrogate keys" as if they were synonymous; in doing so, an important distinction can be lost. Long before there were computers, numbers were assigned to purchase orders. These numbers are "artificial" because they are automatically generated, and the actual numerical value has no intrinsic meaning.

When considering the use of a natural key vs a surrogate key as a primary key in a database table, there are some drawbacks to keep in mind. Firstly, the natural key may not be able to ensure uniqueness across all the records in the table, which could cause data redundancy and inconsistencies.

Secondly, if the value of the natural key changes, all the referencing records in other tables would need to be updated, which could be a complicated and time-consuming task. Thus, it may be beneficial to consider other key types, such as artificial or surrogate keys, to improve database design and efficiency.

Order numbers, however, are used not just by computers as keys. Employees use them, and customers may call to inquire about a particular order number. Initially, in the early days of database technology, the term "surrogate key" was used to talk about values pointing to the physical location of records in a database.

Two Meanings of Surrogate Key

These values are internal to the database and are not available to the database user, nor would any database user want them. We won't consider that meaning here, though the concept might well be applied to the "uniquified" key SQL Server creates when building a clustered index on a non-unique column.

We'll consider the more common distinction of the word "surrogate," which refers to a key created by database designers solely for the internal use of the database, not for use by staff workers or database users. One of the more common examples of a surrogate key is in data warehouses where a date is uniquely identified by an integer, not by a datetime datatype. But, of course, the actual datetime value, along with its associated characteristics, can always be looked up in the dimension table.

Benefits and Performance Trade-offs in Key Types

This can provide performance benefits in some cases, mainly if the datetime datatype employed takes up more bytes than an integer. However, there is another potential value to an integer surrogate as well. Specific integers can capture meanings that are not available using datetime datatypes, for example, "not applicable" or "date to be determined."

When implementing a database design, there may be performance trade-offs when using artificial keys or surrogate keys. One possible drawback of artificial keys is the additional processing power and memory required to generate them, which could negatively impact the database's overall performance.

Conversely, surrogate keys can provide benefits by allowing for faster data retrieval and reducing the need for complex joins in queries. It is, therefore, essential to evaluate the advantages and disadvantages of each key type in the context of the specific database design requirements and choose the most suitable option accordingly.

Remembering the distinction between Natural, Artificial, and Surrogate Keys

To sum up, natural keys are existing characteristic values of database objects that serve as keys. Artificial keys are generated automatically and have no intrinsic meaning, but they are important to database users in their work. Surrogate keys are created by database developers for some specific design purpose and are meaningless to, and not to be used by, clerical staff and database users. With these distinctions and their individual benefits in mind, you can use them wisely and fare better than any of The Bard's tragic protagonists.

Related Training:

Expand your skills and explore our SQL Server and Database courses.

This piece was originally posted Dec 11, 2018 and has been reposted with an updated title and more information on the benefits of key types.

Written by Dan Buskirk

"The pleasures of the table belong to all ages." Actually, Brillat-Savaron was talking about the dinner table, but the quote applies equally well to Dan’s other big interest, tables of data. Dan has worked with Microsoft Excel since the Dark Ages and has utilized SQL Server since Windows NT first became available to developers as a beta (it was 32 bits! wow!). Since then, Dan has helped corporations and government agencies gather, store, and analyze data and has also taught and mentored their teams using the Microsoft Business Intelligence Stack to impose order on chaos. Dan has taught Learning Tree in Learning Tree’s SQL Server & Microsoft Office curriculums for over 14 years. In addition to his professional data and analysis work, Dan is a proponent of functional programming techniques in general, especially Microsoft’s new .NET functional language F#. Dan enjoys speaking at .NET and F# user’s groups on these topics.