Solve the dual problem

You might have seen an expert programmer in action. You ask him for help in debugging some error, and he comes up with a better way of writing the code. You are amazed at how easy it seemed.

Here’s a secret. He didn’t come up with a better solution. He restated the problem and solved that instead.

The De Morgan dual

You might not know who De Morgan is. He formally stated a set of mathematical logic laws as follows:
not (P and Q) <==> (not P) or (not Q)
not (P or Q) <==> (not P) and (not Q)
not (not P) <==> P

Ok, perhaps that was confusing to you. Let me put it in code form:

bool P = false, Q = true;
if (!(P && Q))
{
   Console.WriteLine("First version");
}
// is equivalent to
if (!P || !Q)
{
   Console.WriteLine("Second version");
}

Does it look more familiar?

The thing with program requirements is that sometimes, they are stated in a convoluted manner, but can be simplified when written in code. Or at least easier to understand in code.

Suppose the business requirement was that if the quantity was not less than 10 and the item was not a normal category item, process with special priority. Well, you could do this

if (!(iQuantity < 10) && (iItem != ItemCategory.Normal))
{
// process with special priority
}
else
{
// process normally
}

That made my mind jump through too many negations. We could transform that into

if ( !( (iQuantity < 10) || (iItem == ItemCategory.Normal) ) )
{
// process with special priority
}
else
{
// process normally
}

I added more space because there were a lot of round brackets.

Well, what's the big deal? Because of the structure of the requirements and the if-else statement, we could rewrite as this

if ( (iQuantity < 10) || (iItem == ItemCategory.Normal) )
{
// process normally
}
else
{
// process with special priority
}

I swapped the contents of the if-else, and thus removed the negation as well. Now, the condition is easier to read.

Just because the requirements were stated in a certain way, doesn't mean you can't rewrite the code to solve the same problem. And my last sentence contained two negations, which made it harder to read.

A practical use would be checking for errors. Usually the error presentation part is shorter than if the condition went through

if ({check for something})
{
// do something, usually many lines of code
}
else
{
// display error message
}

There were times when I was tracing code, through an if condition, and went down line by line, and then hit the else part, and I forgot what the condition was. If it was written this way

if ({error condition})
{
// display error message
}
else
{
// do something, usually many lines of code
}

then it's easier to follow. Add some comments in the else part to state the original condition in case you need to swap back.

You'll have to be sure of what you're doing when swapping. Some requirements cannot be swapped like this. Make sure you know what the contents of the if-else are doing.

Transform the problem into another equivalent

So we've gone through a simple version of changing an if condition to another equivalent if condition. In mathematical terms, it's known as a dual problem. We've stated the if condition one way, and write the code for that. Then we've written the if condition in another equivalent way, and wrote the code for that. And both of them solved the same problem.

What are the uses for this concept? Suppose you're stuck with a problem. You can't write the code to fully express a requirement. Or it could take a lot of effort to solve it. See if you can transform this problem into an equivalent one that's easier to solve.

For example, you could be retrieving a bunch of data from the database. Then you iterate through the data, performing calculations on each row, then inserting everything back into the database. The problem is that it's too slow, or too memory intensive. A lot of data is held in memory while you match and sort and compare and update each row.

Transform the problem from doing calculations and updates in your programming environment to doing calculations and updates in the database environment. The master records are in the database. The detail records are in the database. Why are you retrieving them into memory and do updates? Use the right programming tool; do them in the database!

For another example, say the requirement was to have more white space around images when displayed in a web page. If you didn't know any better, you might have gone through every single image and added a white border around each one. You have understood the problem as creating more white space around images, so you went about solving that.

If you had understood the emphasis as "when displayed in a web page", the problem shifted. You could use CSS to add padding, and the equivalent problem was solved.

This is a powerful concept. Solving an equivalent (dual) problem solves the original (primary) problem.

As programmers, we transform requirements into code. There are many ways to write the code, so there are many equivalents to those requirements. Choose the right equivalent problem to solve.

Reusable code not important

We’re moving faster. Technology advances. Businesses change. New services emerge. It’s not how much better you are at reusing code, but how much better you are at creating new code that’s of value.

It’s a new program, with new logic

Someone once said that his team seldom reuse code, because they almost certainly have to rewrite most of their existing code. Their new program will need new logic and a different way of optimisation.

I’m talking about demos. I can’t remember if it’s ASD, or Conspiracy who said that. Or maybe I read it in a Hugi magazine.

Once you get past the standard texture, music and 3D geometric mesh/object loading, it’s down to the new and creative code. And new and creative code doesn’t come from reusable code. Code used to create a special effect in one demo, might not be transferrable to another demo.

Businesses change

And so does the business logic. My users think of new ways to conduct business all the time. New services, new forms of price plans and new ways of interacting with data.

I develop .NET web applications for them. Like a good programmer, I separated web pages from business logic code. Then I realised that I’ve never used any function related to business logic more than a few times in an entire web application. That business function had a specific use, and so was inappropriate for code reuse, because you can’t reuse it!

To stay competitive, businesses have to change. They’ve got to keep innovating and keep providing more value to their customers. Programs written for one service might have to be rewritten to take advantage of better tools, or refactored to remove useless parts, or even replaced by a new program specifically written for a new service.

Innovation versus APIs, components and the like

Innovation means new ideas. So it means new code. Sure, you could come up with a new idea by combining two or more existing ideas. Existing code could still be used. Fair enough.

Then you come up with another new idea, based on that combo idea. And another based on existing ideas. Eventually, there comes a point where a new idea is better off being free from any parent idea. The new idea is so completely new that it looks nothing like existing ideas.

That’s innovation. So where’s the code going to come from? From scratch.

Application programming interfaces are meant to be reused. I’ve written custom web controls for reuse. I’ve written Javascript functions for reuse.

But unless you’re specifically writing code for reuse purposes, write code for the intended purpose first. If it’s easy to extend it to a more general form, then go ahead. Forget generalisation if any major reconstruction is needed to make it usable by an unknown number of applications for unknown uses.

I am the sole web developer in my team. I don’t have time to think into the future where I guess what my users might possibly want to do with my applications. By the time that imagined future arrives, my users might want a different feature. So what does that get me?

So should you reuse code or write reusable code?

As with many things, it depends. I want you to think for yourself. Decide for yourself if that piece of code is going to be useful in a generic form. Decide for yourself if your time is better used on writing a feature that provides value immediately.

This decision comes with experience. And programming experience isn’t measured in years, or months. It’s measured in the number of lines of code you write that never see the light of day. (which is an excellent topic for another day…)

In these current times, there are excellent code libraries that take out the drudgery of your work. Your time should be used on creating more valuable code on top of that. Reusable code is not important. Valuable code is.

Use the right programming tools

Some programmers never learn beyond their programming language of choice. Or they wrongly merge a new programming tool to their old way of programming, the unchallenged way of thinking. They either have only one tool, or they use the wrong tool.

Dealing with groups of data

Have you ever done this?

  • Select some records from database
  • Iterate through those records
  • For each record, do an insert statement to another table

For example, let’s say we have a table named Items

ItemID Price
ITEM01 0.45
ITEM02 1.50
ITEM03 2.70

Suppose someone bought 2 of everything, and we want to store that information in a table named Orders.

OrderID ItemID Quantity Total
ORDER01 ITEM01 2 0.90
ORDER01 ITEM02 2 3.00
ORDER01 ITEM03 2 5.40

This was what happened in a C program

  • Do the select statement
  • Bind the ItemID and Price to 2 variables, sItemID and fPrice
  • Have a temporary variable fSum store (2 * fPrice)
  • Do an insert statement

Since there were 3 rows retrieved, a total of 3 insert statements were issued.

That could easily be accomplished with the following:

insert into Orders
select 'ORDER01', ItemID, 2, 2 * Price from Items

Ok, that wasn’t a very good example. The point is that SQL operations are meant for manipulating groups of data. You can retrieve rows of records. You can add rows of records. You can update existing chunks of records. You can even wipe out an entire database table.

What happened up there was the inability to understand what database operations were good at. The programmer was still stuck in the standard looping structure in programming languages. He cannot imagine manipulating chunks of data without iterating through each record.

When dealing with databases, use SQL operations as far as possible. This cuts down on the number of client-server communication (3 inserts down to 1). When it becomes difficult or lengthy to form the SQL statement, then use the programming language to help.

The programming environment is very flexible. That doesn’t mean every calculation has to be done in that environment.

Text parsing

Regular expressions are good for manipulating text. If you’ve never heard of regular expressions before, have a look here for an introduction.

Regular expressions describe a search pattern. For example, “\d” searches for any digit. So, to search for an IP address such as “12.23.34.45″, we might use this “\d\.\d\.\d\.\d”. The dot character is a special character, so we need to escape it with a backslash.

That’ll work. Until we find “1234.54.00847.2″ or “6524.738294.8477645.72645″. Remember each part of an IP address ranges from 0 to 255. Ok, so we try to limit by the number of digits in each part like this, “\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}”.

A “{1,3}” means the pattern before this is repeated a minimum of 1 time, and a maximum of 3 times. So “\d{1,3}” means search for a digit at least 1 time, but up to a maximum of 3 times.

We hit a wall when we get results such as “333.444.555.666″. We try to refine our regular expressions more and more. And each time, it gets more and more convoluted and unwieldy. I’m sure someone out there wrote a regular expression that will correctly search for an IP address.

You know what I’m going to say. “That’s not the point“.

Regular expressions are fantastic for searching, manipulating and parsing text. A simple search pattern can easily grab something that looks like an IP address. It fails to grab an actual IP address though.

Now think about this. Can your programming language easily search through some text and find something that looks like an IP address? You’re going to read in the text, store it in a string variable, then try to run through every character and check if it looks like a number. Then you’ve got to check if a dot character follows that number. Then a number, then a dot, then a number, then a dot, then a number.

Your string manipulation code is going to be crazy.

It’s hard for regular expressions to verify a range of numbers in the IP address, but it can easily grab something that looks like an IP address. Your programming language can easily verify if a number is within a specified range, but it’s hard for it to search through text for something that looks like an IP address. Are you getting my drift here?

Use regular expressions to grab text that looks like an IP address. Then use your programming language functions to parse that small piece of string data and verify if the 4 numeric parts in the IP address are within 0 to 255.

Using the right tool

I see programmers try to force something to work in a certain way when there’s a better way to go about it. They retrieve data, do computations in memory, then push it back into the database, when it’s more effectively done within the database environment. They try to completely rely on regular expressions, when a more effective way was to combine the powers of both regular expressions and the programming language used.

Learn to use tools beyond your programming language. Then learn to use the right tools for the right job.

Reverting to base nature

This is an article on observation and human behaviour. Faced with a situation outside a person’s comfort zone, how will that person react? Can that person’s reactions tell you anything about his base nature?

Handling hot tea

Coffee guy by archives @ iStockphoto

I read this in a detective comic book. In one of the episodes, there’s a senior detective giving tips to the protagonist, Q (yes, that’s his name). So the senior detective went about his activities while engaging Q in conversation. He casually prepared two cups of tea*, and served one to Q.

After Q drank his tea, the senior detective smiled and told Q that he had just given Q a quick test. He had deduced two pieces of information

  • Q was right-handed
  • Q was in a state of calm

When a person is (suddenly/unexpectedly/casually) given a cup of hot drink, in order to carefully handle the cup, the person uses the dominant hand (usually). Q had used his right hand to grab the handle.

When a person is sipping a cup of hot drink, to test the temperature and prevent scalding, the person usually relaxes his facial expression, and shows his real expression. Because of the focus on the hot drink, any false expressions used to mask his feelings disappear.

For example, a person is smiling at you. When that person sips a hot drink, and suddenly furrows his eyebrows and his lips sag a little downward, he may be worried about something, but puts on a happier front.

Because of sipping a hot drink, a person revealed his base nature.

* can’t remember if it’s tea or coffee.
And I don’t know if this is true. I haven’t conducted nearly enough observations to conclude…

The truth trailing tough times

I’ve read and heard that one of the ways to determine if that special someone really fits you, is to go travelling with that person. Or go hiking, or camping, or any activity where both persons are under moderate stress.

Suddenly, one finds out how the other person handles flight delays, inconsiderate hikers, missing toothbrushes, a scratch, a cramp, and decisions affecting both persons. During tough times, a person’s base nature shows up.

Flight delays become “How about that cup of coffee?”, and cramps become “Well, at least the view here is beautiful. Ouch, ouch…”

Or missing toothbrushes become “How could you forget that? I distinctly remember telling you to bring it!”, and decisions affecting both persons become a one-sided dictatorship (Amazing Race had plenty of those).

Coding under stress

What happens when you code under deadlines, under new and unheard of requirements, under strange and different environments? You revert to your base nature.

If you have bad coding practices, then under stress, those bad coding practices will show. If you’re lazy about debugging, then under stress, your programs are going to be chock full of bugs. If you’re plain terrible at programming, and had always relied on friends and colleagues, then under stress, well, somebody’s going to notice it.

For example, copying and pasting. I’m sure you had copied and pasted similar pieces of code, never mind the rule of reducing redundant code. This is where the base nature of a programmer shows. The disciplined programmer would copy and paste that code, then quickly go through each line to make sure it’s appropriate to the context (variable names, conditional checks and so on). The careless programmer would just give it a cursory check if it compiled.

The shortcut of manually typing out each line was taken to fulfill our innate programmer laziness. Yet it’s the understanding of the code copied and the actions taken after pasting, that distinguishes the better programmer.

Improving your base nature

Practise your desired qualities when in a calm state. Practise the use of printfs, Console.WriteLines, MessageBoxes or whatever can be used to display variable values. Practise spotting errors before your compiler does. Practise the use of your programming language constructs (how to loop, how to control decision branches) in a non-critical program under a non-critical state.

When your desired qualities become second nature to you, it’s still not enough. Because in the face of adversity, that second nature might still fall off. Practise until it becomes your base nature.

Then reverting to your base nature is ok, because your base nature is already great.

Quick and easy data migration check

What is the fastest and easiest way to check if 2 databases contain the same data?

Check that for every table, the number of records is the same in both databases.

Yes, it’s a superficial check. Just because the count is the same in both databases doesn’t mean they contain the same data. But it’s a quick assessment that you’ve done a data migration correctly.

Suppose we have 3 database tables FunLovingDepartments, AwesomeEmployees, and SuperbEmployeeTypes. We could do this:

select count(*) from FunLovingDepartments

and then run it against both databases. Then we do the same for the other 2 tables.

That’s just too tedious. What if we could include the table name and use a union?

select 'FunLovingDepartments', count(*) from FunLovingDepartments
union
select 'AwesomeEmployees', count(*) from AwesomeEmployees
union
select 'SuperbEmployeeTypes', count(*) from SuperbEmployeeTypes

which gives a result something like

'FunLovingDepartment'   3
'AwesomeEmployees'      17
'SuperbEmployeeTypes'   8

Much faster to analyse with everything together. What if you’ve got dozens and dozens of tables? You’re going to get carpal tunnel syndrome from typing all those select statements. Now, what if I told you how you can generate those select statements?

Notice the structure of the select statement you want.

select '{tablename}', count(*) from {tablename} union
select '{tablename}', count(*) from {tablename} union
...
select '{tablename}', count(*) from {tablename} union
select '{tablename}', count(*) from {tablename}

What you do is write the select statement that generates the select statement!

select 'select '''+name+''',count(*) from '+name+' union'
from sysobjects
where type='U'
order by name

There are 2 single quotes to produce 1 single quote because the escape character in a SQL string is the single quote.

This’ll work for SQL Server and Sybase databases. If you’re working with Oracle, no problem.

select 'select '''||TABLE_NAME||''',count(*) from '||TABLE_NAME||' union'
from ALL_TABLES
order by TABLE_NAME

Oracle SQL syntax uses 2 pipe characters for string concatenation.

The above 2 statements will generate the select-union statement that includes every table in your database. All you need is delete the trailing union from the last line. So for our fictional database, the generator SQL will produce this

select 'FunLovingDepartments', count(*) from FunLovingDepartments union
select 'AwesomeEmployees', count(*) from AwesomeEmployees union
select 'SuperbEmployeeTypes', count(*) from SuperbEmployeeTypes union

So just delete the last union keyword and you’re done.

What we have here is a classic case of code generating code. After you run the generated SQL in both databases, you’re going to get 2 sets of results. If you’ve a long list of tables, doing eyeball checks is going to speed up your myopia.

So what you do is run the generator SQL in one database to produce that big chunk of select-union statements. Then run that big chunk of select-union statements to get a set of results. Then copy those results into Excel. Do the same on the other database. What you’ll then have looks something like this in Excel.
Table count comparison in Excel

Columns A and B contain the result set from one database. Columns F and G contain the result set from the other database. Then in column D, you use an Excel formula to do string comparison between the columns. Let me give you the formula for comparing the first row.

=IF(A1=F1,0,99)+IF(B1=G1,0,9999)

What it means is if A1 (table name from database 1) equals to F1 (table name from database 2), then return value 0, otherwise 99. The other part is if B1 (number of records from database 1) equals to G1 (number of records from database 2), then return value 0, otherwise 9999. Then add the two return values. Copy that Excel cell and paste down the line. Excel will automatically make sure the cell rows are correct (A2, A3 and so on).

If the final value is 0, then for that particular table, the number of records is the same in both database. I used 99 and 9999 respectively to distinguish the two different if comparisons. But you can set them to other values, as long as it looks significantly different from 0. Remember, you’ll be scrolling up and down the Excel file (lots of tables), so you don’t want to have your eyes distinguish between for example 8 and 0.

I think the screenshot probably explained it better.

There you have it, a quick and easy data migration check method. This saved me significant amounts of time and effort before. It’s a deadly combination of a SQL statement generating a SQL statement, which in turn generated a result set, which was then copied to Excel for comparison.

Use the existing tools where possible. Not everything needs a custom written program.

P.S. Yay, it’s spring!