Part of my job is moonlighting as a SQL Server database admin, so a co-worker recently asked me to run a script against a production table. The task was simple: add a few new columns and create some indexes. He had generated the script by using SQL Server 2008 Management Studio’s (SSMS) Generate Change Script function.
Although the general sequence of steps in the generated script made sense, I have some complaints about the SSMS output. Here’s a very simple recreation of the scenario.
Consider a table with two columns. One is a primary key, and one has a default constraint:
BEGIN TRANSACTION
GO
ALTERTABLE dbo.phillies
DROP CONSTRAINT df_phillies_year
GO
CREATETABLE dbo.Tmp_phillies (
phils_id int NOTNULL IDENTITY (1,1),
phils_year char(4)NOTNULL,
division_champ_flag bit NOTNULL,
national_champ_flag bit NOTNULL)ON[PRIMARY]
GO
ALTERTABLE dbo.Tmp_phillies
ADD CONSTRAINT
df_phillies_year DEFAULT((2008))FOR phils_year
GO
SET IDENTITY_INSERT dbo.Tmp_phillies ON
GO
IFEXISTS(SELECT*FROM dbo.phillies)
EXEC('INSERT INTO dbo.Tmp_phillies
(phils_id, phils_year)
SELECT phils_id, phils_year
FROM dbo.phillies
WITH (HOLDLOCK TABLOCKX)')
GO
SET IDENTITY_INSERT dbo.Tmp_phillies OFF
GO
DROPTABLE dbo.phillies
GO
EXECUTE sp_rename
N'dbo.Tmp_phillies', N'phillies','OBJECT'
GO
ALTERTABLE dbo.phillies
ADD CONSTRAINT
pk_phils_id PRIMARYKEY CLUSTERED
(phils_id)ON[PRIMARY]
GO
COMMIT
This script contains BEGIN TRANSACTION and COMMIT TRANSACTION statements but doesn’t accompany them with any kind of error handling. So if you run it as-is and encounter an error, nothing gets rolled back.
Any error handling that you might add, however, is thwarted by the fact that the script’s statements are contained in individual batches (i.e., separated by GO statements).
Say you individually check each statement for errors and issue a rollback/return if something goes awry.
BEGIN TRANSACTION
GO
[snip]CREATETABLE dbo.Tmp_phillies (
phils_id int NOTNULL IDENTITY (1,1),
phils_year char(4)NOTNULL,
division_champ_flag bit NOTNULL,
national_champ_flag bit NOTNULL)ON[PRIMARY]
GO
[snip]-- throw an errorSELECT23/0IF @@ERROR <>0
BEGIN
PRINT 'error!'
ROLLBACK TRANSACTION
RETURN
END
[snip]
GO
DROPTABLE dbo.phillies
In this scenario, changes that occurred prior to the error will be rolled back. However, although the RETURN statement exits the current batch, subsequent batches (for example, the one that deletes your table) will execute.
So what’s with the batch-happy SSMS? In older versions of SQL it might have been necessary to separate a CREATE TABLE and subsequent ALTER TABLE statements into separate batches (I haven’t tested this). But SQL 2005 and 2008 provide statement-level recompilation, so the multitude of batches is not necessary.
When I tested a “GO-less” version of the script, it worked swimmingly. It’s understandable that SSMS can’t generate the error-handling, but if it would reduce the scripts’ number of batches to the minimum required, it would be easier to add the error handling yourself.
Conclusion: don’t rely on these SQL Server Management Studio-generated change scripts for any serious work, and definitely don’t use them as a model for how to write robust T-SQL.
I’m extremely happy to work in an organization with an uber-smart and supportive group of developers. Once a month we meet for a lunchtime “tech talk” presentation on a topic chosen via Google Moderator.
This month, Tim Allen and I collaborated and spoke about database design. Working with Tim to research and organize the material was a blast, and even though we put silly pictures of ourselves on the slides, I’m posting them.
We focused most closely on normalization and indexing, with a few of our other best practices thrown in the mix. The concepts should be applicable to any RDBMS, but the details are specific to MS SQL Server, the database used in most of the organization’s applications.
While preparing a database design presentation with my co-worker Tim, I discovered this hypnotic video demonstrating inserts into a b-tree data structure. For some reason, the music compels me to leave the cubicle and pay a visit to Rami’s falafel truck.
Yesterday a fellow developer hit a strange SQL error and determined that the culprit was a T-SQL CASE statement used in his ORDER BY clause.
1
2
3
4
5
6
7
8
9
10
ORDERBY
CASE
WHEN UPPER(@orderBy)='GROUPMESSAGEID' THEN groupMessageID
WHEN UPPER(@orderBy)='GAMEID' THEN gameID
WHEN UPPER(@orderBy)='GROUPID' THEN groupID
WHEN UPPER(@orderBy)='MESSAGEID' THEN messageID
WHEN UPPER(@orderBy)='ROUNDSENT' THEN roundSent
WHEN UPPER(@orderBy)='SENTON' THEN sentOn
ELSE groupMessageID
END
This code results in the following message:
Operand type clash: uniqueidentifier is incompatible with datetime
After some digging around, we found the underlying cause of the error: the case statement return values have different data types. In some cases, returning different data types will behave as expected. However, mixing numeric and character data causes problems.
In the statement above, gameId, groupID, roundSent, and groupMessageID are integers, sentOn is a datetime, and messageID is a uniqueidentifier. Because the data type precedence pecking order in this case is datetime, int, and then uniqueidentifier, SQL choose datetime as the return type. Uniqueidentifiers cannot be converted to datetimes, hence the error message.
At two recent code reviews, we’ve discussed the use of SQL Server auto-incrementing fields as primary keys. Using system-generated numbers to uniquely identify rows (i.e., implementing surrogate keys) is a good practice* because it reduces the likelihood of errors introduced by bad data. However, these surrogate keys are meaningless and exist only for the purpose of identification.
The problem is that many applications do not treat surrogate keys as meaningless numbers, a practice that reduces portability. Consider a “master table” that lists an application’s user roles. The most basic structure for such a table would be two columns, role_id and role_name:
role_id
role_name
1
student
2
faculty
3
TA
4
admin
Many applications would do the following to get a list of faculty users:
SELECT user_name FROM tblRole WHERE rold_id = 2.
However, there’s no guarantee that the faculty role will be assigned to the id of 2 in perpetuity. 2 doesn’t mean faculty, it just means 2; if you reload your data and port the application to another environment, the faculty role id might be 4 or 10 or 33,288.
It’s safer to look up the role_id where role_name = ‘faculty’ and use that value instead of assuming 2. This method, of course, would create problems if the application users decide to change the ‘faculty’ role to ‘instructor.’ A solution would be to create more descriptive master tables that supports front-end changes while maintaining meaningless surrogate keys:
role_id
role_code
role_name
role_sort_order
role_create_date
role_update_date
1
s
student
3
10/31/2007
2
f
faculty
1
10/31/2007
3
t
TA
2
10/31/2007
11/03/2008
4
a
admin
4
10/31/2007
This table structure allows you to get the faculty surrogate key by querying for role_code = ‘f.’ The role_name field is what users see and can be modified without breaking the application. Finally, role_sort_order controls the front-end display order of these items and allows you to change the sort order without touching any code.
*The surrogate key versus natural key debate still rages among database geeks, but it seems that surrogate keys are a standard practice at this shop.
Dynamic SQL came up again at a recent code review. If only for your own maintenance sanity, it’s worth eliminating dynamic SQL where possible, but there are times you can’t avoid it (for example, some implementations of the new SQL 2005 PIVOT function). If you must do the dynamic SQL thing, you should know that there are two ways to execute it
EXEC(@stringofsql)
sp_executesql @stringofsql, @someparameters
The first option is old school, but it still works. Sp_executesql, however, is newer and allows you to use parameters in conjunction with the dymanically-built SQL string. Having the option to use parameters is a great improvement–if the parameters are the only part of the SQL command that changes, the optimizer can reuse the execution plan instead of generating a new one each time the code is run. See SQL Server Books Online for the exact sp_executesql syntax.
If you want to know more about dynamic SQL, take a look at this aricle.
Recently I encountered a SQL Server error-handling trap that caused a transaction not to roll back as expected. At the risk of sounding like an amateur, here’s what happened. The code was issuing two insert statements that were grouped into a single transaction:
BEGIN TRANSACTION
INSERTINTO dbo.Table1 (some COLUMNS)VALUES(some VALUES)SELECT @error = @@error, @rowcount = @@rowcount
IF @error <>0
BEGIN ROLLBACK TRANSACTION RAISERROR('error on the first insert!',16,1)RETURN END
[DO STUFF]INSERTINTO dbo.Table2 (some COLUMNS)VALUES(some VALUES)SELECT @error = @@error, @rowcount = @@rowcount
IF @error <>0
BEGIN ROLLBACK TRANSACTION RAISERROR('error on the second insert!',16,1)RETURN END
COMMIT TRANSACTION
The above error-handling works most of the time. However, if there is a run-time T-SQL error (for example, a missing object or a divide by 0 error), the offending statement fails but the rest of the transaction executes. In my case, the second insert failed because Table2 had been renamed. The subsequent @error check was never invoked, so the first insert wasn’t rolled back.
This behavior can be overridden by setting SET XACT_ABORT to ON, which specifies that the current transaction will automatically roll back in the event of a T-SQL run-time error.