I have been trying out Microsoft's newest version of Visual Studio, the Visual Studio 2010 Ultimate Release Candidate. The new environment is quite nice although there doesn't seem to be a lot of difference with database projects so far. One nice extra is that you can now have windows tear off and be placed outside of the IDE. As I have multiple monitors I like to have the Solution Explorer open in the main IDE, and the Schema View window open on my second monitor. I find this greatly helps with quickly navigating around my project.
Anyway, back to the blog post.
When I am creating a new database schema, I like to have my project automatically load any required default data to my lookup tables. These tables are the ones where the data very rarely changes and are primarily referenced by Foreign keys. Examples of this table type could be a Status table, an Address type or even a Public Holiday date table that you'd like pre loaded.
Where I work we have tried different ways to action this but all the solutions have depended on a post deployment script executing either a bulk copy or multiple insert statements as follows;
-- Non SQL 2008 version
IF NOT EXISTS (SELECT 1 FROM [dbo].[TableName])
BEGIN
INSERT [dbo].[TableName] ([Column1], [Column2])
VALUES (Value1, Value2);
INSERT [dbo].[TableName] ([Column1], [Column2])
VALUES (Value3, Value4);
INSERT [dbo].[TableName] ([Column1], [Column2])
VALUES (Value5, Value6);
END;
-- SQL 2008 version
IF NOT EXISTS (SELECT 1 FROM [dbo].[TableName])
BEGIN
INSERT [dbo].[TableName] ([Column1], [Column2])
VALUES (Value1, Value2)
, (Value3, Value4)
, (Value5, Value6);
END;
-- Bulk Copy version
IF NOT EXISTS (SELECT 1 FROM [dbo].[TableName])
BEGIN
BULK INSERT [dbo].[TableName] FROM '$(DataFilePath)TableName.bcp WITH ...
END;
The main part of this that has always bothered me is the existence check as the first statement. This basically states that if ANY data exists in the target table, we assume that the table is already loaded correctly so nothing further occurs. The problem with this is that if your default table data changes, or has extra rows added, or has extra rows deleted, then unless you empty the target table before you run this script these changes are never made to the table when you deploy the project.
Well this week, while working on a new SQL 2008 project, I realised that the new SQL 2008 only MERGE statement would be an ideal alternate method to pre load these lookup tables. With one statement I would be able to let SQL 2008 do all the work of comparing my project default data to what already exists in the deployment target database and take the appropriate action to either add or delete a row, depending on whether I have added or deleted a data row in my default data.
I set it up as follows for my AddressType table;
-- Create the table
CREATE TABLE [dbo].[AddressType]
(
[Id] INT IDENTITY(1, 1)
CONSTRAINT [PK_AddressType] PRIMARY KEY CLUSTERED
NOT NULL
, [Code] NVARCHAR(20) CONSTRAINT [UQ_AddressType_Code] UNIQUE ([Code])
NOT NULL
, [Description] NVARCHAR(150) NULL
);
Then I create a post deployment script LoadAddressType.sql in the database project with the following code in it;
MERGE [dbo].[AddressType] AS Target
USING
(
SELECT N'Billing', N'Billing Address Type'
UNION ALL
SELECT N'Branch', N'Branch Address Type'
UNION ALL
SELECT N'Head Office', N'Head Office Address Type'
UNION ALL
SELECT N'Postal', N'Postal Address Type'
UNION ALL
SELECT N'Residential', N'Residential Address Type'
) AS Source ([Code], [Description])
ON (TARGET.[Code] = SOURCE.[Code])
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN NOT MATCHED BY TARGET
THEN INSERT ([Code], [Description])
VALUES (SOURCE.[Code], SOURCE.[Description]) ;
This creates a virtual Source table inside the USING block which, at project deployment is compared to the target table on the common [Code] field. If the target table has a row which is not in the virtual table then it is deleted from the target table and if the target table does not have a row which is in the virtual table, then the virtual row gets inserted to the target table. The first time this script is executed will insert 5 rows to the target AddressType table.
If I were then to update the LoadAddressType.sql script as follows, deleting 'Branch' and modifying 'Head Office' to 'Central Office';
MERGE [dbo].[AddressType] AS Target
USING
(
SELECT N'Billing', N'Billing Address Type'
UNION ALL
SELECT N'Central Office', N'Central Office Address Type'
UNION ALL
SELECT N'Postal', N'Postal Address Type'
UNION ALL
SELECT N'Residential', N'Residential Address Type'
) AS Source ([Code], [Description])
ON (TARGET.[Code] = SOURCE.[Code])
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN NOT MATCHED BY TARGET
THEN INSERT ([Code], [Description])
VALUES (SOURCE.[Code], SOURCE.[Description]) ;
Then on the next deployment of the project, the 2 rows with a [Code] value of 'Branch' and 'Head Office' would be deleted from the target table as they are no longer in the virtual table and a new row with a [Code] of 'Central Office' would be inserted. The other 3 rows would remain unchanged.
If you wanted to extend the functionality so that the Description field gets updated if changed, then you could add the following code block to the merge which also compares the [Description] field;
MERGE [dbo].[AddressType] AS Target
USING
(
SELECT N'Billing', N'Billing Address Type'
UNION ALL
SELECT N'Central Office', N'Central Office Address Type'
UNION ALL
SELECT N'Postal', N'Postal Address Type'
UNION ALL
SELECT N'Residential', N'Residential Address Type'
) AS Source ([Code], [Description])
ON (TARGET.[Code] = SOURCE.[Code])
WHEN MATCHED AND TARGET.[Description] <> SOURCE.[Description]
THEN UPDATE
SET TARGET.[Description] = SOURCE.[Description]
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN NOT MATCHED BY TARGET
THEN INSERT ([Code], [Description])
VALUES (SOURCE.[Code], SOURCE.[Description]) ;
The way I have the virtual table coded relies on implicit data conversion for it's success, as well as being easier to read for blog entries but it really should be coded with the proper data type casting as follows;
MERGE [dbo].[AddressType] AS Target
USING
(
SELECT CAST('Billing' AS NVARCHAR(20)), CAST('Billing Address Type' AS NVARCHAR(150))
UNION ALL
SELECT CAST('Central Office' AS NVARCHAR(20)), CAST('Central Office Address Type' AS NVARCHAR(150))
UNION ALL
SELECT CAST('Postal' AS NVARCHAR(20)), CAST('Postal Address Type' AS NVARCHAR(150))
UNION ALL
SELECT CAST('Residential' AS NVARCHAR(20)), CAST('Residential Address Type' AS NVARCHAR(150))
) AS Source ([Code], [Description])
ON (TARGET.[Code] = SOURCE.[Code])
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN NOT MATCHED BY TARGET
THEN INSERT ([Code], [Description])
VALUES (SOURCE.[Code], SOURCE.[Description]) ;
Note that you could also copy the Merge query into an SSMS query and run it directly to get the same results.
One problem you may run into if you delete a value from your virtual source table is that come deployment, if that value is already referenced by a foreign key in another table, then unless you have the ON DELETE CASCADE option set for the foreign key the Merge statement will fail, causing your deployment to fail.
This method works well for us for small amounts of default data, but is still not suitable for large scale Bulk Insert data loads as it would just take too long to set up the virtual table, not not mention it would probably not be very efficient. For these it is probably best to empty the target table before executing the bulk insert.
