Note: this article was originally published to the Wharton Computing Developer Center.
At two recent code reviews, we’ve discussed the use of SQL Server auto-incrementing fields as primary keys. Using system-generated numbers to uniquely identify rows (i.e., implementing surrogate keys) is a good practice* because it reduces the likelihood of errors introduced by bad data. However, these surrogate keys are meaningless and exist only for the purpose of identification.
The problem is that many applications do not treat surrogate keys as meaningless numbers, a practice that reduces portability. Consider a “master table” that lists an application’s user roles. The most basic structure for such a table would be two columns, role_id and role_name:
Many applications would do the following to get a list of faculty users:
SELECT user_name FROM tblRole WHERE rold_id = 2.
However, there’s no guarantee that the faculty role will be assigned to the id of 2 in perpetuity. 2 doesn’t mean faculty, it just means 2; if you reload your data and port the application to another environment, the faculty role id might be 4 or 10 or 33,288.
It’s safer to look up the role_id where role_name = ‘faculty’ and use that value instead of assuming 2. This method, of course, would create problems if the application users decide to change the ‘faculty’ role to ‘instructor.’ A solution would be to create more descriptive master tables that supports front-end changes while maintaining meaningless surrogate keys:
This table structure allows you to get the faculty surrogate key by querying for role_code = ‘f.’ The role_name field is what users see and can be modified without breaking the application. Finally, role_sort_order controls the front-end display order of these items and allows you to change the sort order without touching any code.
*The surrogate key versus natural key debate still rages among database geeks, but it seems that surrogate keys are a standard practice at this shop.