Quick and easy data migration check

What is the fastest and easiest way to check if 2 databases contain the same data?

Check that for every table, the number of records is the same in both databases.

Yes, it’s a superficial check. Just because the count is the same in both databases doesn’t mean they contain the same data. But it’s a quick assessment that you’ve done a data migration correctly.

Suppose we have 3 database tables FunLovingDepartments, AwesomeEmployees, and SuperbEmployeeTypes. We could do this:

select count(*) from FunLovingDepartments

and then run it against both databases. Then we do the same for the other 2 tables.

That’s just too tedious. What if we could include the table name and use a union?

select 'FunLovingDepartments', count(*) from FunLovingDepartments
union
select 'AwesomeEmployees', count(*) from AwesomeEmployees
union
select 'SuperbEmployeeTypes', count(*) from SuperbEmployeeTypes

which gives a result something like

'FunLovingDepartment'   3
'AwesomeEmployees'      17
'SuperbEmployeeTypes'   8

Much faster to analyse with everything together. What if you’ve got dozens and dozens of tables? You’re going to get carpal tunnel syndrome from typing all those select statements. Now, what if I told you how you can generate those select statements?

Notice the structure of the select statement you want.

select '{tablename}', count(*) from {tablename} union
select '{tablename}', count(*) from {tablename} union
...
select '{tablename}', count(*) from {tablename} union
select '{tablename}', count(*) from {tablename}

What you do is write the select statement that generates the select statement!

select 'select '''+name+''',count(*) from '+name+' union'
from sysobjects
where type='U'
order by name

There are 2 single quotes to produce 1 single quote because the escape character in a SQL string is the single quote.

This’ll work for SQL Server and Sybase databases. If you’re working with Oracle, no problem.

select 'select '''||TABLE_NAME||''',count(*) from '||TABLE_NAME||' union'
from ALL_TABLES
order by TABLE_NAME

Oracle SQL syntax uses 2 pipe characters for string concatenation.

The above 2 statements will generate the select-union statement that includes every table in your database. All you need is delete the trailing union from the last line. So for our fictional database, the generator SQL will produce this

select 'FunLovingDepartments', count(*) from FunLovingDepartments union
select 'AwesomeEmployees', count(*) from AwesomeEmployees union
select 'SuperbEmployeeTypes', count(*) from SuperbEmployeeTypes union

So just delete the last union keyword and you’re done.

What we have here is a classic case of code generating code. After you run the generated SQL in both databases, you’re going to get 2 sets of results. If you’ve a long list of tables, doing eyeball checks is going to speed up your myopia.

So what you do is run the generator SQL in one database to produce that big chunk of select-union statements. Then run that big chunk of select-union statements to get a set of results. Then copy those results into Excel. Do the same on the other database. What you’ll then have looks something like this in Excel.
Table count comparison in Excel

Columns A and B contain the result set from one database. Columns F and G contain the result set from the other database. Then in column D, you use an Excel formula to do string comparison between the columns. Let me give you the formula for comparing the first row.

=IF(A1=F1,0,99)+IF(B1=G1,0,9999)

What it means is if A1 (table name from database 1) equals to F1 (table name from database 2), then return value 0, otherwise 99. The other part is if B1 (number of records from database 1) equals to G1 (number of records from database 2), then return value 0, otherwise 9999. Then add the two return values. Copy that Excel cell and paste down the line. Excel will automatically make sure the cell rows are correct (A2, A3 and so on).

If the final value is 0, then for that particular table, the number of records is the same in both database. I used 99 and 9999 respectively to distinguish the two different if comparisons. But you can set them to other values, as long as it looks significantly different from 0. Remember, you’ll be scrolling up and down the Excel file (lots of tables), so you don’t want to have your eyes distinguish between for example 8 and 0.

I think the screenshot probably explained it better.

There you have it, a quick and easy data migration check method. This saved me significant amounts of time and effort before. It’s a deadly combination of a SQL statement generating a SQL statement, which in turn generated a result set, which was then copied to Excel for comparison.

Use the existing tools where possible. Not everything needs a custom written program.

P.S. Yay, it’s spring!