Revenue sharing and operations research – part 3

This is a mini-series on how revenue sharing and operations research are linked. You might want to read part 1 on the specific business problem I was solving, and part 2 for the mathematical formulation of the problem. In this final part, I’ll tell you what was the solution eventually used.

First, although I said it was basically an assignment problem in part 1, on further thought, I don’t believe it is. What I was thinking was “how to assign that $0.01 such that there are no rounding errors”. Thus the “assignment” part. I apologise for any confusion.

Second, I said the financial/maths formulation was an integer problem. But the values are money values with decimal values, right? How can it be an integer problem? Because it has to also be correct up to 2 decimal places. That means fixed arithmetic. Therefore it becomes an integer problem. Just think of the values as cents (instead of dollars correct up to 2 decimal places).

Now, if you’ve read part 2 (and I applaud you if you actually sat through that mass of maths formulations), you should have guessed that using operations research to solve the business problem was not advisable. It might not even be suitable.

However, the problem still needed to be solved. How do you get rid of any extra or missing money?

More maths…

Going back to the example I gave in part 1, there were 3 products with revenue to be split between 2 parties. So there were 6 parts. If each part generated a rounding error of +$0.01, then there was a maximum potential difference of $0.06 between the original revenue to be shared and the sum of the parts after revenue sharing calculations.

I remind you that whatever solution I came up with had to make mathematical sense and financial sense to the programmers and the users. There are actually 2 goals:

  • To have the sum of the parts be equal to the original revenue amount
  • To have each part’s amount (after rounding) be as close to the calculated value as possible

The 1st goal ensures no summation errors. After revenue sharing, there are no extra or missing money amounts. This part is actually easy to fulfill. The 2nd goal is fulfilled with a bit of adjustments. So here’s the solution.

The easy-to-understand maths solution

We split the revenue accordingly to each part first, rounding each part’s amount to 2 decimal places. Then we sort each part in ascending order. Then we sum all the part’s amounts together. If there’s a discrepancy, we correct the discrepancy by adjusting the largest amount. This calls for an example.

Here’s the original example used. Total revenue for the 3 products are:

  • ProductA: $63.13
  • ProductB: $20.75
  • ProductC: $16.12

Assuming a 30-70 percentage split, we have:

  • ProductA: $18.94 (us), $44.19 (them)
  • ProductB: $6.23 (us), $14.53 (them)
  • ProductC: $4.84 (us), $11.28 (them)

Sorting all the parts in ascending order, we have:

  • ProductC: $4.84 (us)
  • ProductB: $6.23 (us)
  • ProductC: $11.28 (them)
  • ProductB: $14.53 (them)
  • ProductA: $18.94 (us)
  • ProductA: $44.19 (them)

The sum of the parts’ amounts is $100.01, which is not equal to the original revenue being shared ($100). The discrepancy is a +$0.01. So we adjust the largest amount. Specifically, we deduct $0.01 from the largest amount (because our discrepancy is positive).

So the revenue share for the content provider for ProductA becomes $44.18, and thus the sum of the parts become $100.

This method ensures that the sum of the each part’s amounts is still equal to the original revenue, which is very important (because this is a financial operation!). This satisfies the 1st goal.

And for each part, the amount is rounded to the nearest 2 decimal place. So each part’s amount is as close to the calculated split value as possible. The only exception is the largest amount might be off a little.

Now I chose the largest amount to “absorb” any rounding discrepancy precisely because it is the largest amount. Note that the term “largest” refers to the magnitude, so if you happen to deal with negative values (it happens, even in financial situations. Consider debt as an example), use the maths absolute function to do the sorting.

Any discrepancy can be mathematically shown to be at most equal to (number of parts) multiply by $0.01 (rounding error).

D <= ± (N * $0.01) where D is the discrepancy and N equals the number of parts. Note that the discrepancy is bounded, which is the mathematical way of saying it has an upper and lower limit (or bound). Note also that in a fraction, a larger numerator means a larger fraction and a smaller numerator means a smaller fraction. A larger denominator means a smaller fraction and a smaller denominator means a larger fraction.

Now, whatever the discrepancy value is, it is bounded, it is fixed, and it is a small value. If we want any amount to “absorb” this discrepancy, then the larger the amount, the smaller the resulting error fraction or error percentage.

For example, if the discrepancy is $0.01 and the amount is $1, the resulting error percentage is 1% ($0.01 / $1.00 * 100%). If the amount is $5, the resulting error percentage becomes 0.2% ($0.01 / $5.00 * 100%).

Suppose the discrepancy is $0.02. We could spread the discrepancy error among the largest 2 amounts, each amount absorbing $0.01. But this makes the programming a little more complicated than it is. Also, it makes the algorithm a bit “dynamic”, which makes tracing any calculations by a programmer or user difficult.

Implementing it in code

All the revenue amounts were stored in the database. Because of this, I recommended that any revenue sharing calculations be done within the database environment itself. Namely, with stored procedures running SQL statements.

Benefits of using stored procedures within the database environment:

  • Can sort values easily (use the SORT BY clause)
  • Can handle dynamic number of values (with temp tables or the original tables)
  • Some calculations can be grouped into a single UPDATE statement
  • All the values are in the database!

The last benefit means there’s little switching of context from the database environment to the… whatever environment. In that project, it was scheduled Unix shell scripts combined with C programs that called the stored procedures. We didn’t want the context to switch back to a Unix environment to do calculations. Doing calculations in the Unix environment with C might be fast, but there are many content providers and many products. The context switching might eat up any performance benefits. Besides, having the calculations in a few stored procedures mean better modularity and separation of functions.

Further considerations

Because we’re in the business of … uh, doing business, we might want the customer to have a better deal. Or at least an easier report to read.

In the method above, we sorted the amounts in ascending order, regardless of whether it’s us or them. So it could well mean that the largest revenue share of the content provider be used to absorb the discrepancy.

This might mean when they read the revenue sharing report, they might question why that amount is $44.18 instead of $44.19. It might be just $0.01 to you, but it’s still money!

What we can do is sort the amounts by theirs first, then ours. And within each, sort by ascending order. So we could have this instead:

  • ProductC: $11.28 (them)
  • ProductB: $14.53 (them)
  • ProductA: $44.19 (them)
  • ProductC: $4.84 (us)
  • ProductB: $6.23 (us)
  • ProductA: $18.94 (us)

In this case, we adjust our revenue share for ProductA to be $18.93 (instead of $18.94) so the revenue sum is correct. Basically, we absorb any discrepancy, using our largest revenue share amount.

And that’s the end of the discussion of revenue sharing: the business part, the maths part and the implementing/programming part. Let me know if you have any questions.

Revenue sharing and operations research – part 2

This is a mini-series on how revenue sharing and operations research are linked. You might want to read part 1 on the specific business problem I was solving. In this part, I’ll be telling you about the maths behind the business solution.

But first, I have to tell you about a couple of maths concepts.

Converging from opposite directions

Let’s say someone asked you how tall you are. Instead of giving a straight answer, you decide to give a simple maths puzzle. You say you are at least 6 feet tall. Then you also say you are at most 6 feet tall. So what’s your height? Let’s put those 2 conditions into mathematical form:

  • H >= 6
  • H <= 6

To satisfy those 2 conditions, there’s only 1 answer: You must be 6 feet tall.

It looks elementary, but it will come into play later on. Just keep in mind that an equality can be split into 2 inequalities. This actually reminds me of the squeeze theorem, but I digress.

Reversing the direction of an inequality

The next maths concept is how to reverse the direction of an inequality. For example, we have

x >= 5

This can also be expressed as

-x <= -5 This will also come in handy later on.

System of linear inequalities

My experience in operations research had been confined to course work in an academic semester. Operations research mainly is about maximising or minimising some objective. You are given a series of conditions to fulfill. Then you’re given an objective to maximise or minimise. And you’re to translate those conditions and objective into a mathematical model formulation. Let’s look at an example.

Jake’s mom gave him $15 to buy some sweets (because he did awesomely in a maths test. Parental note: I don’t think this is a good reward, but hey, it’s your kid). But mom told Jake that he cannot buy more than 3 lollipops (although Jake already decided he’s not buying more than 4). There’s this contest where if you submit wrappers from peppermints and lollipops, you win an iPad (I’m totally making this up!). Jake needed just 4 more wrappers, and he didn’t want more than 4 because he wanted to buy more chocolates. Also, Jake had secretly decided to buy at least $10 worth of sweets.

Now, chocolates cost $2 each. Peppermints cost $1 each. And lollipops cost $3 each.

Let C be the number of chocolates bought.
Let P be the number of peppermints bought.
Let L be the number of lollipops bought.

Let the objective be to maximise the number of sweets bought. So in maths form, we typically write it as:
max C + P + L

Let’s formulate the conditions:
2C + P + 3L <= 15 (mom only gave Jake $15) L <= 3 (mom said Jake can't buy more than 3 lollipops) L <= 4 (Jake independently decided he's not buying more than 4) P + L = 4 (Jake just wanted 4 more wrappers) 2C + P + 3L >= 10 (Jake wanted to waste at least $10)

Now, logically speaking the condition for the number of wrappers doesn’t make sense. We could say the number of wrappers for peppermints and lollipops be at least 4, which makes better sense. But this is a hypothetical example. More to the point, it’s my hypothetical example, and I needed an equality condition. So there.

You might also note that there are redundant conditions.
L <= 3 L <= 4 can be represented by just L <= 3 because if it's satisfied, then L is definitely <= 4. This is to show you that in your mathematical formulation, you can get redundant conditions. It's up to you to eliminate them so you work with less conditions. Real life problems are messy, don't you know? Let's clean up the formulation. Objective: max C + P + L 2C + P + 3L <= 15 -2C - P - 3L <= -10 P + L <= 4 -P - L <= -4 L <= 3 I've rearranged the conditions a little for clarity. Note the inequality reversal for the "more than $10" condition. Note the split of the equality to 2 inequalities. Why are we doing this? So we get the general form Ax <= B where A is the coefficient matrix x is the variable vector B is the value vector We need to do the inequality reversal so the "direction" for all the inequalities is the same. Otherwise we cannot have Ax <= B. In this case, A is | 2 1 3 | |-2 -1 -3 | | 0 1 1 | | 0 -1 -1 | | 0 0 1 | x is | C | | P | | L | B is | 15 | |-10 | | 4 | | -4 | | 3 | With this Ax <= B form, we're dealing with matrices and vectors. That means we can use all the mathematical techniques for solving such formulations, such as Gaussian elimination or even Gauss-Jordan elimination. The idea is to solve all of our unknowns at one go.

You might notice that we have more inequalities than unknowns. This means we have multiple solutions.

For example, a possible solution for (C, P, L) is (2, 3, 1). That solution satisfies all the conditions. It doesn’t mean it’s optimal, but it’s a solution. This means solutions exist for the problem (sometimes just knowing this is reassuring…).

The dual

I want you to know that the objective can always be stated in the opposite manner. If the objective is to “maximise X”, it is equivalent to “minimising -X”. This is known as the dual of the problem. For example, maximising profit is equivalent to minimising the negative value of profit. NOTE: Maximising profit is NOT equivalent to minimising cost. Profit and cost are generally not the opposite of each other (they’re different business terms), even if they sound like they are (and sometimes give the same solutions). The mathematical formulation and solution is different for both of them. Know what you’re solving.

This duality property is useful if your program is optimised to solve only minimisation problems. So you just convert a maximisation problem to a minimisation problem, and your program works just fine!

Linearity

And you might also note that the objective and the conditions are all linear, meaning unknowns are up to power of 1 (as opposed to quadratic or cubic). Sometimes you get a non-linear condition, and depending on the situation, you might be able to form linear conditions that are equivalent to that non-linear condition.

For example, CP = 2 (number of chocolates multiply by number of peppermints equal to 2).
This set of conditions might be equivalent:

  • C >= 1
  • P >= 1
  • C <= 2
  • P <= 2
  • C + P = 3

The first 2 conditions are inferred from the fact that C and P cannot be zero. If they are, then CP = 2 is never satisfied. If they’re never zero, then they must be at least 1.

The 3rd and 4th conditions are inferred from the fact that C and P cannot be more than 2. If they are, then the other unknown is fractional. For example, if C is 4, then P must be 1/2. But these are number of items, so they must be integer. Therefore, C and P must be at most 2.

Based on the first 4 conditions, C and P can only take on either 1 or 2 as values. The only combination of permutations that still satisfies the “CP = 2” condition is one unknown must be 1 and the other unknown be 2. Hence the sum of the two unknowns must be equal to 3.

Not all non-linear conditions can be translated into linear conditions. Go ask an operations research professor for more.

Integral conditions

And we come to the worst part. All the unknowns must be integer (you can’t buy 2.37 chocolates. Well, at least not without the candy store owner going crazy). This integral condition makes the problem much harder to solve. If you solve the Ax <= B form, you generally get fractional values for the unknowns. This is the optimum solution, but doesn't satisfy the integral condition. If I remember correctly, you solve Ax <= B as usual. Then you take one of the unknowns and work with the 2 integers that are the floor and ceiling values of that unknown. Then you solve iteratively again, with reduced and reformulated conditions for Py <= Q. For example, if you get C = 2.37, then you split off 2 scenarios with C = 2 and C = 3 as the optimum solutions. Scenario 1 is Objective: max P + L P + 3L <= 11 -P - 3L <= -6 P + L <= 4 -P - L <= -4 L <= 3 Scenario 2 is Objective: max P + L P + 3L <= 9 -P - 3L <= -4 P + L <= 4 -P - L <= -4 L <= 3 Basically, you just substitute C = 2 and C = 3. For each scenario, you continue splitting with the other unknowns. There are 3 unknowns, so there are eventually 8 scenarios (2^3 = 8). It's a binary tree. Well, it's at most 8 scenarios, because you might get lucky and hit an original optimised solution with one or more of the unknowns having an original value that's already an integer (now that's a long sentence...). For each of the scenarios, you either get a solution or you get a contradiction somewhere (meaning it's unsolvable). The contradiction can come from the splitting, because when you substitute a possible value into one of the unknowns, the resulting set of conditions contain a contradiction. For example, out of the set of conditions, you get 2 conditions as follows: x <= 4 -x <= -8 ( meaning x >= 8 )

Since an unknown variable cannot be both (less than or equal to 4) AND be (more than or equal to 8), the scenario becomes unsolvable.

Then you examine all those scenarios with solutions, substitute the solutions to the unknowns, and compare the result. If it’s a maximising problem, you check which scenario gave a solution that’s the largest. If it’s a minimisation problem, you check which scenario gave a solution that’s the smallest. And that particular solution becomes the optimum integral solution.

Let me tell you, you do not want to do this by hand. This is why software is written to handle the Gaussian eliminations and checking all the possible scenarios due to the integral condition. With just 3 unknowns, you’re expected to do 9 Gaussian eliminations (1 for the original, 8 for the branch scenarios), and check through all 8 scenarios. Imagine having hundreds of variables and conditions…

Mathematics and programming in one. Having skills in one but not the other makes writing the software quite difficult. Imagine a mathematician who cannot express his ideas in programming code, or a programmer who cannot understand the maths involved.

Back to the business problem

After that extremely long mathematical discussion, we can now turn back to the original business problem. If you haven’t read part 1, you should do so now, otherwise you’d be lost.

Let R be the total revenue for a content provider for that particular month
Let R1 be the total revenue for ProductA
Let R2 be the total revenue for ProductB
Let R3 be the total revenue for ProductC
(so R1 + R2 + R3 = R)
Let X1 be the revenue share for the telecommunications company for ProductA
Let X2 be the revenue share for the telecommunications company for ProductB
Let X3 be the revenue share for the telecommunications company for ProductC
Let Y1 be the revenue share for the content provider for ProductA
Let Y2 be the revenue share for the content provider for ProductB
Let Y3 be the revenue share for the content provider for ProductC

Let the revenue sharing split for the telecommunications company be 30%
Let the revenue sharing split for the content provider be 70%

The objective is to minimise rounding errors. Actually, there should be no errors, but we can live with a possible solution first. If it’s non-zero, then we panic…

The unknowns in this case are X1, X2, X3, Y1, Y2 and Y3.

Objective: minimise ABS(R – X1 – X2 – X3 – Y1 – Y2 – Y3)
X1 + Y1 = R1
X2 + Y2 = R2
X3 + Y3 = R3
X1 = 0.30 * R1
X2 = 0.30 * R2
X3 = 0.30 * R3
Y1 = 0.70 * R1
Y2 = 0.70 * R2
Y3 = 0.70 * R3

And Xi’s and Yi’s must be integers, for i = 1, 2, 3

And ABS stands for the absolute function.

As you can see, there’s a lot of equal conditions. And it’s a problem with integer conditions. Frankly speaking, I don’t want to program a solution either. Just looking at what I’ve written is already giving me a headache…

I’m glad my ex-colleague said no. *whew* This was too complicated to implement. And the content provider might have more products, so the number of unknowns is unknown (no pun intended).

There’s actually a little bit more to the maths formulation, but I’ll tell you about it in the 3rd part. I had to come up with a solution that made mathematical sense and could be understood by other people. Mostly, the users (that is, “normal” people) must be able to understand the logic behind it.

Leave a comment if you have any questions about the maths involved (or even tell me I’m wrong), and I’ll do my best to answer them.

Revenue sharing and operations research – part 1

I’ve been putting this one off for too long. First, it was a problem I encountered when I was employed, and I had to be careful about what I wrote. Second, it’s one of those articles where it’s a mix of mathematics and programming, and in this case, business as well. You might not know it, but I get a headache categorising articles like these… that’s why I just dump them into a generic category. Third, fully explaining the problem and the solution makes a long article.

You still here? Good.

I will write this in 3 parts. I’ll explain the business problem first. In the 2nd part, I’ll tell you why mathematics was involved, and how it eventually wasn’t involved because fully implementing the maths model in code wasn’t advisable. Also, I will then be the only one who can understand the code (because of the maths concept). In the 3rd part, I’ll tell you about the solution used (something simpler that other people could understand).

Ok, let’s get to it.

Revenue sharing

I have to tell you about the business situation first because the term “revenue sharing” means differently depending on the context. In that particular job, I was employed by a telecommunications company. One of my users dealt with content providers, companies who create content such as ring tones, downloadable phone backgrounds (I think?), songs, and eventually included videos-on-demand.

So we (the telecommunications company) charged the end users (people like you and me) based on the content consumed, which was included in the phone bill. The content providers needed us mostly because they don’t have the distribution (number of users) or because of the billing platform (the phone bills).

A parallel would be the Apple App Store. Apple charges the consumers, then gives a cut to the content providers (the app developers).

In the beginning, my users used a simple Excel spreadsheet to do the revenue sharing calculations. Most of the contracts were simple percentage splits, such as 30-70 (as in us 30% and content providers 70%). As the number of content providers and products (from the providers) grew, the simple spreadsheet was proving inadequate. Moreover, the spreadsheet solution was the limiting condition for approaching other content providers to partner with us (and thus growing the business).

There were tax considerations because overseas and local content providers were taxed differently. Withholding tax was a slight headache. The Singapore goods and services tax was a nightmare then, because at the time, we were moving from 3% to 4% then 5% (it’s now 7%). Some content providers wanted minimum sum conditions (“Don’t share revenue with me unless it hits at least $X per month”). Also, for internal accounting purposes, the Singapore dollar equivalent must be available (there were overseas content providers, remember?), so the currency exchange rate must also be used. Like I mentioned, the spreadsheet wasn’t working too well at that point…

Eventually, my users came to my department for help. Basically, we were to write up a software system to do those revenue sharing calculations for them every month.

“Uh, so what’s the problem?” I’m getting to that.

Rounding errors

In the report to the content providers, the exact breakdown of each product must be given. This presents the possibility of rounding errors. Let me give you an example.

Let’s say for a content provider, the total revenue is $100 for that month. Let’s use the 30-70 split. Also, there are 3 products involved. Here’s the total revenue breakdown for the products:

  • ProductA: $63.13
  • ProductB: $20.75
  • ProductC: $16.12

Splitting revenues, we have (rounded to the nearest 2 decimal point because of currency considerations):

  • ProductA: $18.94 (us), $44.19 (them)
  • ProductB: $6.23 (us), $14.53 (them)
  • ProductC: $4.84 (us), $11.28 (them)

Now, let’s get the sum total revenue shared.
We get $18.94 + $6.23 + $4.84 = $30.01
The content provider gets $44.19 + $14.53 + $11.28 = $70.00
(note the split for ProductB for the rounding “error”)

But $30.01 + $70.00 is not equal to $100! Therein lies the problem.

I did research on rounding. There’s “normal rounding” (1.14 becomes 1.1, 1.15 becomes 1.2). There’s rounding half up (1.15 becomes 1.2, but -1.15 becomes -1.1), and rounding half down (1.15 becomes 1.1, but -1.15 becomes -1.2). There’s even banker’s rounding.

The department I worked in supported users by providing programming assistance to financial problems. Let’s just say an extra $0.01 or a missing $0.02 causes great panic throughout the team…

So I was consulted on how to solve this problem. This is basically an assignment problem. Which are problems addressed by operations research, which I took a full semester’s course on.

Let’s just say my colleague was not pleased that we might have to implement a full-blown mathematical model just to do revenue sharing. In the next part, I’ll tell you how a maths model might be used, if it was implemented.