High Frequency Trading: 2012

Sunday, September 16, 2012

Stop renewing this blog - please see my main blog

Thursday, August 16, 2012

Calculate the sum of all the primes below two million

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.

Question is from Project Euler Q10 , which is a very interesting website.

My Solution is as below, which has performance:
time a.out
sum of Prime = 142913828922
0.090u 0.005s 0:00.09 100.0% 0+0k 0+0io 0pf+0w

int l_iCount = 2;
const int l_iMax = 2000000;
const int IS_PRIME = 0;
const int NOT_PRIME = 1;
int l_arIntegers[l_iMax];
//initialize and assume all to be prime
memset(l_arIntegers, 0, sizeof(l_arIntegers));
l_arIntegers[0] = NOT_PRIME; //for 1
l_arIntegers[1] = IS_PRIME; //for 2
long long l_iSumOfPrimes = 0;
while(l_iCount < l_iMax)
{
if(l_arIntegers[l_iCount - 1] == IS_PRIME)
{
for(int i = 2 * l_iCount; i < l_iMax; )
{
if(l_arIntegers[i - 1] != NOT_PRIME)
l_arIntegers[i - 1] = NOT_PRIME;
i += l_iCount;
}
//it is a prime
l_iSumOfPrimes += l_iCount;
std::cout << "found a Prime = " << l_iCount << std::endl;
}
l_iCount++;
}
std::cout << "sum of Prime = " << l_iSumOfPrimes << std::endl;

When I started to code the solution, I tried the normal check for a prime number, it was too slow to be acceptable. then I thought maybe it is faster to create an array of all integers to be checked as above, just having simple comparison and assignment to 1, without tons of mod (%) operations. And the execution result proved to be working.

I have checked the recommended solutions from Project Euler after solving it, my solution is very close to the optimal answer.

Here is my Project Euler Label:

Wednesday, July 25, 2012

Find the greatest product of five consecutive digits

the problem is as below, got from ProjectEuler.net Problem 8:

Find the greatest product of five consecutive digits in the 1000-digit number.

7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450

To do brute force calculation, logic is easy but kind of having bad performance. To practice and brainstorming some good algorithms to boost the checking speed can be ultimate fun in solving problems we faced.

My strategy is to check whether we can safely eliminate any 5 digits with 0 inside (which is obvious), then check for 1, 2, 3, till 9. comparing integer values must be much faster than multiply operations, not mention in thousands.

Below is my code, not polished yet, just to illustrate my ideas. My intention is not to show off my code but to ignite better ideas and faster solutions.

int main()
{
        char l_arIntegers[1001] = {     "73167176531330624919225119674426574742355349194934\
96983520312774506326239578318016984801869478851843\
85861560789112949495459501737958331952853208805511\
12540698747158523863050715693290963295227443043557\
66896648950445244523161731856403098711121722383113\
62229893423380308135336276614282806444486645238749\
30358907296290491560440772390713810515859307960866\
70172427121883998797908792274921901699720888093776\
65727333001053367881220235421809751254540594752243\
52584907711670556013604839586446706324415722155397\
53697817977846174064955149290862569321978468622482\
83972241375657056057490261407972968652414535100474\
82166370484403199890008895243450658541227588666881\
16427171479924442928230863465674813919123162824586\
17866458359124566529476545682848912883142607690042\
24219022671055626321111109370544217506941658960408\
07198403850962455444362981230987879927244284909188\
84580156166097919133875499200524063689912560717606\
05886116467109405077541002256983155200055935729725\
71636269561882670428252483600823257530420752963450"};

        char* pIntegers = (char *)l_arIntegers;
        int l_iarMaxWithIdx[10] = {0, 9*9*9*9*1, 9*9*9*9*2, 9*9*9*9*3, 9*9*9*9*4, 9*9*9*9*5, 9*9*9*9*6, 9*9*9*9*7, 9*9*9*9*8, 9*9*9*9*9};


        int l_iMaxProduct = 1;
        int l_iEliminate = -1;

        for(int m = 0; m < 10; m++)
        {
                std::cout << "now check for m = " << m << ", Eliminate = " << l_iEliminate << " and max product " << l_iarMaxWithIdx[m] << " and current max product as " << l_iMaxProduct << std::endl;
                //check whether we can eliminate 1s
                if(l_iMaxProduct > l_iarMaxWithIdx[m])
                {
                        //we can safely eliminate ms already.
                        l_iEliminate += 1;
                        continue;
                }
                bool l_bReachEnd = false;
                for(int i = 4; i < 20 * 51;)
                {
                        if(pIntegers[i] - '0' <= l_iEliminate)
                        {
                                i += 5;
                                if(i >= 1001)
                                        l_bReachEnd = true;
                        }
                        else if(pIntegers[i - 1] - '0' <= l_iEliminate)
                        {
                                i += 4;
                                if(i >= 1001)
                                        l_bReachEnd = true;

                        }
                        else if(pIntegers[i - 2] - '0' <= l_iEliminate)
                        {
                                i += 3;
                                if(i >= 1001)
                                        l_bReachEnd = true;
                        }
                        else if(pIntegers[i - 3] - '0' <= l_iEliminate)
                        {
                                i += 2;
                                if(i >= 1001)
                                        l_bReachEnd = true;
                        }
                        else if(pIntegers[i - 4] - '0' <= l_iEliminate)
                        {
                                i += 1;
                                if(i >= 1001)
                                        l_bReachEnd = true;
                        }
                        else
                        {
                                if(i >= 1000)
                                        l_bReachEnd = true;

                                //none of previous 5 is 0
                                //check for any sum is greater than 9*9*9*9*1
                                int l_iProduct = (pIntegers[i] - '0') * (pIntegers[i-1] - '0') * (pIntegers[i-2] - '0') * (pIntegers[i-3] - '0') * (pIntegers[i-4] - '0');
                                std::cout << "now check for m = " << m << " and current product = " << l_iProduct << " with current max product as " << l_iMaxProduct << std::endl;
                                if(l_iProduct > l_iMaxProduct)
                                        l_iMaxProduct = l_iProduct;

                                if(l_iProduct > l_iarMaxWithIdx[m])
                                {
                                        //we can safely eliminate 1s now
                                        std::cout << "current continuous values are " << pIntegers[i-4] << pIntegers[i-3] << pIntegers[i-2] << pIntegers[i-1] << pIntegers[i] << std::endl;
                                        l_iEliminate += 1;
                                        i = 20 * 51;
                                }

                                i += 1;
                        }
                }
                if(l_bReachEnd)
                {
                        //no need to search again, we find the maximum
                        std::cout << "reach end before checking for " << (m+1) << ", max product  = " << l_iMaxProduct << std::endl;
                        break;
                }
        }

        std::cout << "max product  = " << l_iMaxProduct << std::endl;

        return 0;

$ time a.out
now check for m = 0, Eliminate = -1 and max product 0 and current max product as 1
now check for m = 1, Eliminate = 0 and max product 6561 and current max product as 1
now check for m = 1 and current product = 882 with current max product as 1
now check for m = 1 and current product = 126 with current max product as 882
now check for m = 1 and current product = 294 with current max product as 882
now check for m = 1 and current product = 1764 with current max product as 882
now check for m = 1 and current product = 1470 with current max product as 1764
now check for m = 1 and current product = 630 with current max product as 1764
now check for m = 1 and current product = 630 with current max product as 1764
now check for m = 1 and current product = 270 with current max product as 1764
now check for m = 1 and current product = 135 with current max product as 1764
now check for m = 1 and current product = 432 with current max product as 1764
now check for m = 1 and current product = 648 with current max product as 1764
now check for m = 1 and current product = 648 with current max product as 1764
now check for m = 1 and current product = 324 with current max product as 1764
now check for m = 1 and current product = 180 with current max product as 1764
now check for m = 1 and current product = 180 with current max product as 1764
now check for m = 1 and current product = 20 with current max product as 1764
now check for m = 1 and current product = 90 with current max product as 1764
now check for m = 1 and current product = 270 with current max product as 1764
now check for m = 1 and current product = 378 with current max product as 1764
now check for m = 1 and current product = 1512 with current max product as 1764
now check for m = 1 and current product = 6048 with current max product as 1764
now check for m = 1 and current product = 1344 with current max product as 6048
now check for m = 1 and current product = 1344 with current max product as 6048
now check for m = 1 and current product = 960 with current max product as 6048
now check for m = 1 and current product = 1680 with current max product as 6048
now check for m = 1 and current product = 1680 with current max product as 6048
now check for m = 1 and current product = 5880 with current max product as 6048
now check for m = 1 and current product = 3920 with current max product as 6048
now check for m = 1 and current product = 1568 with current max product as 6048
now check for m = 1 and current product = 672 with current max product as 6048
now check for m = 1 and current product = 840 with current max product as 6048
now check for m = 1 and current product = 600 with current max product as 6048
now check for m = 1 and current product = 450 with current max product as 6048
now check for m = 1 and current product = 900 with current max product as 6048
now check for m = 1 and current product = 2700 with current max product as 6048
now check for m = 1 and current product = 540 with current max product as 6048
now check for m = 1 and current product = 972 with current max product as 6048
now check for m = 1 and current product = 1296 with current max product as 6048
now check for m = 1 and current product = 2916 with current max product as 6048
now check for m = 1 and current product = 972 with current max product as 6048
now check for m = 1 and current product = 3888 with current max product as 6048
now check for m = 1 and current product = 3888 with current max product as 6048
now check for m = 1 and current product = 5832 with current max product as 6048
now check for m = 1 and current product = 5832 with current max product as 6048
now check for m = 1 and current product = 15552 with current max product as 6048
current continuous values are 49698
now check for m = 2, Eliminate = 1 and max product 13122 and current max product as 15552
now check for m = 3, Eliminate = 2 and max product 19683 and current max product as 15552
now check for m = 3 and current product = 6048 with current max product as 15552
now check for m = 3 and current product = 5880 with current max product as 15552
now check for m = 3 and current product = 3920 with current max product as 15552
now check for m = 3 and current product = 900 with current max product as 15552
now check for m = 3 and current product = 2700 with current max product as 15552
now check for m = 3 and current product = 3888 with current max product as 15552
now check for m = 3 and current product = 3888 with current max product as 15552
now check for m = 3 and current product = 5832 with current max product as 15552
now check for m = 3 and current product = 5832 with current max product as 15552
now check for m = 3 and current product = 15552 with current max product as 15552
now check for m = 3 and current product = 11664 with current max product as 15552
now check for m = 3 and current product = 6480 with current max product as 15552
now check for m = 3 and current product = 7560 with current max product as 15552
now check for m = 3 and current product = 7560 with current max product as 15552
now check for m = 3 and current product = 13824 with current max product as 15552
now check for m = 3 and current product = 12096 with current max product as 15552
now check for m = 3 and current product = 12096 with current max product as 15552
now check for m = 3 and current product = 16128 with current max product as 15552
now check for m = 3 and current product = 8960 with current max product as 16128
now check for m = 3 and current product = 3840 with current max product as 16128
now check for m = 3 and current product = 3840 with current max product as 16128
now check for m = 3 and current product = 5760 with current max product as 16128
now check for m = 3 and current product = 11664 with current max product as 16128
now check for m = 3 and current product = 6480 with current max product as 16128
now check for m = 3 and current product = 6480 with current max product as 16128
now check for m = 3 and current product = 3600 with current max product as 16128
now check for m = 3 and current product = 8100 with current max product as 16128
now check for m = 3 and current product = 4500 with current max product as 16128
now check for m = 3 and current product = 6615 with current max product as 16128
now check for m = 3 and current product = 7560 with current max product as 16128
now check for m = 3 and current product = 7560 with current max product as 16128
now check for m = 3 and current product = 3240 with current max product as 16128
now check for m = 3 and current product = 12096 with current max product as 16128
now check for m = 3 and current product = 14112 with current max product as 16128
now check for m = 3 and current product = 2100 with current max product as 16128
now check for m = 3 and current product = 3150 with current max product as 16128
now check for m = 3 and current product = 6300 with current max product as 16128
now check for m = 3 and current product = 10080 with current max product as 16128
now check for m = 3 and current product = 18144 with current max product as 16128
now check for m = 3 and current product = 15552 with current max product as 18144
now check for m = 3 and current product = 15552 with current max product as 18144
now check for m = 3 and current product = 10368 with current max product as 18144
now check for m = 3 and current product = 10368 with current max product as 18144
now check for m = 3 and current product = 10368 with current max product as 18144
now check for m = 3 and current product = 8640 with current max product as 18144
now check for m = 3 and current product = 7776 with current max product as 18144
now check for m = 3 and current product = 810 with current max product as 18144
now check for m = 3 and current product = 1536 with current max product as 18144
now check for m = 3 and current product = 2048 with current max product as 18144
now check for m = 3 and current product = 3072 with current max product as 18144
now check for m = 3 and current product = 4608 with current max product as 18144
now check for m = 3 and current product = 4608 with current max product as 18144
now check for m = 3 and current product = 5760 with current max product as 18144
now check for m = 3 and current product = 6048 with current max product as 18144
now check for m = 3 and current product = 6048 with current max product as 18144
now check for m = 3 and current product = 5400 with current max product as 18144
now check for m = 3 and current product = 15552 with current max product as 18144
now check for m = 3 and current product = 15552 with current max product as 18144
now check for m = 3 and current product = 13608 with current max product as 18144
now check for m = 3 and current product = 40824 with current max product as 18144
current continuous values are 99879
now check for m = 4, Eliminate = 3 and max product 26244 and current max product as 40824
now check for m = 5, Eliminate = 4 and max product 32805 and current max product as 40824
now check for m = 6, Eliminate = 5 and max product 39366 and current max product as 40824
now check for m = 7, Eliminate = 6 and max product 45927 and current max product as 40824
now check for m = 7 and current product = 40824 with current max product as 40824
now check for m = 7 and current product = 31752 with current max product as 40824
now check for m = 7 and current product = 31752 with current max product as 40824
now check for m = 7 and current product = 24696 with current max product as 40824
now check for m = 7 and current product = 28224 with current max product as 40824
now check for m = 7 and current product = 28224 with current max product as 40824
now check for m = 7 and current product = 31752 with current max product as 40824
reach end before checking for 8, max product = 40824
max product = 40824
0.000u 0.001s 0:00.00 0.0% 0+0k 0+0io 0pf+0w

Monday, June 25, 2012

The future of institutional trading - slower trading through cloud based services?

"A venue that questions the need for speed is a new ATV called PDQ run by ex GETCO CEO Keith Ross. PDQ say they have developed new ways to allow the less fleet-footed to compete on a level playing field. They are aiming to persuade traders and institutions that their new trading systems offers a leg up by actually decelerating the execution of trades."

Cloud for Algo Trading?

It is kind of interesting topic for brainstorming. We are aiming to achieve faster speed with hardware acceleration. however some are already thinking to achieve better performance by slowing down the executions. However will it really make sense? Both strategies have same logic, a slower one can get better executions?

Wednesday, June 20, 2012

Open Source Project Planner - OpenProj

Nowadays open source OpenProj becomes more and more popular for project management. How to use it efficiently and utilize its features to the maximum to help a better plan which is still adapt to changes along the path. (http://en.wikipedia.org/wiki/OpenProj)

OpenProj

here is a very good starting point to go through an online training material like http://www.scribd.com/doc/37439251/110/Understanding-the-Critical-Path

Wednesday, May 16, 2012

基于业务设计技术架构（兼谈12306性能问题）by 陈勇

敏捷开发产品管理系列之八：基于业务设计技术架构（兼谈12306性能问题），陈勇从业务员的角度分析怎么解决春节火车订票系统12306崩溃的问题。很有启发性。比方他提到的排队设计，确实是解决问题的好办法之一。

一个项目不是只需要技术人员，从另外一个角度，往往可以起到事半功倍的效果。以下是他的文章的链接。
http://blog.csdn.net/cheny_com/article/details/7569724

Monday, May 7, 2012

Why Scalar Optimization Is Important by Richard Friedman

In this interesting article, Richard mentioned a very simple yet important Amdahl's Law. (http://en.wikipedia.org/wiki/Amdahl's_law#Parallelization)

Below is the formula.

S = T/((1-f)T+fT/N) = 1/(1-f+f/N)

He gave an example that If we can get half the executable code to run in parallel, and the other half running in one processor, what speedup can we expect on a 64–processor system?

Well, that gives a speedup S = 1.97 only!

The basic point here is that we should take the scalar latency optimization seriously when tuning the whole system. However we should not be misled by author to under estimate the importance of parallel as well for it is also important factor for another axis of the system performance measurement - throughput. Without high throughput support, low latency can be screwed while system load is heavy.

Original post of Richard is at http://blogs.sun.com/rchrd/entry/why_scalar_optimization_is_important, however it cannot be accessed or found in WWW any more. It is now collected in book "The Developer's Edge", which is an interesting book for reading.

The Developer's Edge

It is very generous that authors have published this book freely online at http://www.oracle.com/technetwork/server-storage/archive/r11-005-sys-edge-archive-495362.pdf

One of the author Darryl's blog address is http://www.darrylgove.com/ . You can always find the correct download link there.

Sunday, April 29, 2012

The Cost of Mutexes By Darryl Gove

sometimes we wonder what will be the cost of a mutex lock and unlock while we are concerned about the system performance at micro-seconds level.

Darryl Gove has done some interesting experiments in sun Solaris systems. The same idea can be applied to other OS like Linux as well.

His blog is used to be http://blogs.sun.com/d/entry/the_cost_of_mutexes and now it is moved to https://blogs.oracle.com/d/entry/the_cost_of_mutexes

His conclusion is that Mutex is 3 times slower than atomic operations.

And there is another point to note that he tested the cost of a normal function call comparing to inline functions, which shows 10ns required for a normal function call. This cost must be less and less with new generations of CPU, RAM and compilers.

Here I just quote what is really interested to me, you can check his full post by above links.

Excl.     Incl.      Name  
User CPU  User CPU         
 sec.      sec.       
3.973     3.973      
1.341     3.973      count
1.331     1.331      mutex_unlock
0.781     0.781      mutex_lock_impl
0.490     0.490      atomic_add_32
0.030     0.030      mutex_lock

It shows clearly that using of mutex, time is spent on an atomic_add_32, mutex_lock_impl and mutex_unlock. As mutex_lock_impl and mutex_unlock assembly code is very similar to atomic_add_32, it is no surprise that mutex will take around 3x time than atomic_add.

Below are the assembly codes:

time spent for the atomic_add_32 call:

   0.010     0.010              [?]    2ecb8:  ld          [%o0], %o2
   0.040     0.040              [?]    2ecbc:  add         %o2, %o1, %o3
   0.010     0.010              [?]    2ecc0:  cas         [%o0] , %o2, %o3
## 0.370     0.370              [?]    2ecc4:  cmp         %o2, %o3
   0.        0.                 [?]    2ecc8:  bne,a,pn    %icc,0x2ecbc
   0.        0.                 [?]    2eccc:  mov         %o3, %o2
   0.050     0.050              [?]    2ecd0:  retl        
   0.010     0.010              [?]    2ecd4:  add         %o2, %o1, %o0

time spent for mutex_unlock and mutext_lock_impl is similar:

   0.        0.                 [?]    beff8:  mov         %o1, %o3
   0.020     0.020              [?]    beffc:  cas         [%o0] , %o2, %o3
## 0.560     0.560              [?]    bf000:  cmp         %o2, %o3
   0.        0.                 [?]    bf004:  bne,a       0xbeff8
   0.        0.                 [?]    bf008:  mov         %o3, %o2

Tuesday, March 27, 2012

Error - boost::unordered_map is not derived from type A

When we use STL or boost template classes like boost::unordered_map together with our own template class, as below:

#include "boost/unordered_map.hpp"#include <string>
template < typename T >
class A
{
typedef boost::unordered_map< std:;string, T* > FtObjsTypeMap;}

When we compile above class definition with gcc, following compilation error will be reported:

A.h:7: error: type âboost::unordered_map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, T*, boost::hash<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char> >, T*> > >â is not derived from type âA<T>â

The reason is that This because boost::unordered_map< std:;string, T* > is dependent on the template parameter T of the class template A To enable correct parsing of the template without having to make any assumptions about possible specializations of any other templates, the language rules require you to indicate which dependent names denote types by using the typename keyword.

Following class A definition will pass the compilation:

#include "boost/unordered_map.hpp"#include <string>
template < typename T >
class A
{
typedef typename boost::unordered_map< std:;string, T* > FtObjsTypeMap;}

The reason of the error is found at http://stackoverflow.com/questions/2841757/c-template-is-not-derived-from-type for my own recording and its mix usage with typedef, I posted my version of code here for future reference.

Saturday, March 17, 2012

Fibonacci number calculation algorithms

There are many blogs and websites provided excellent solutions for nth Fibonacci number calculation.

Fibonacci number

Different algorithms varies from O(n2) to O(n) to O(logn).

details can be found in below examples:
http://www.geeksforgeeks.org/archives/10120
http://www.haskell.org/haskellwiki/The_Fibonacci_sequence

From the wikipedia, http://en.wikipedia.org/wiki/Fibonacci_number , we know that nth Fibonacci number satisfy this math equations:

As you can see that if we keep a careful pre-calculated Fibonacci numbers in memory, a (m+n)th Fibonacci number can be calculated with combinations of mth and nth Fibonacci numbers. If mth and nth Fibonacci numbers are not in the pre-calculated map, we can further search down the chain.

I feel that if we can have a carefully selected array of pre-calculated Fibonacci numbers, we can easily achieve at worst O(logn) while at best O(1) performance without sacrifice of accuracy as following formula does:

a close estimation, accuracy becomes worse while n grows bigger

As a common approach, preparation of certain information as part of the data structure designed can speed up performance exponentially. This approach is also part of our Unix/Linux culture that we prefer complex data structures than complex algorithms in source codes.

Saturday, February 11, 2012

Bios Advanced CPU Power Saving Mode

CPU Power Saving will impact HFT performance?

As we know that all CPU has its power saving mode enabled as default in your BIOS. You might wonder whether it will have impact on the performance of your High Frequency Trading systems and how?

Let us look at a standard /etc/cpuinfo output from linux OS.

/etc/cpuinfo

Look at above screenshot carefully, you will see that Model Name of this CPU is Intel Core2 CPU with 2.4GHz. However when you check a few lines below - cpu MHz is only 1.6GHz (1596.00MHz). Why is there any difference here? it is all thanks to CPU Power Saving Mode. When your CPU is job free, it will enter power saving mode to run in a slower speed to save energy. This is actually smart strategy to be environmental friendly and end user will never notice the performance difference of normal programmes, which means no sacrifice of customer satisfactions.

However CPU power saving mode will hit HFT performance badly when we are in the scale of 10 to 20 micro-seconds measurements. While some of your threads/process are waiting for triggering events, mostly market data ticks, CPU, which is executing those threads/process, will enter power saving mode. Once market tick comes in, CPU will take longer time to wake up those threads/process and then take time to speed up to execute them. It results in around 10 micro-seconds degradation of performance, which is 50% to 100% decrease of your system's performance. The recommendation is that disable your server's CPU Power Saving Mode during market hours and shut down your servers while they are not in service.

How to disable in Bios the Advanced CPU Power Saving Mode? You can follow below steps:

Bios Advanced CPU Power Saving Mode Settings - Step 1

Bios Settings - Step 2

BIOS Settings Step 3

Friday, January 27, 2012

Performance comparison among std::map, boost::unordered_map and TRIE data structure

We have implemented our own map with TRIE data structure and compared its performance with STL map and boost unordered_map, which is a hash map.

TRIE Data Structure

To know more about TRIE data structure, please go to wikipedia page: http://en.wikipedia.org/wiki/Trie

Without Compiler Optimization and Total 10K random strings:

TRIE MAP: 83170
INT STL MAP: 336640
STR STL MAP: 885470

With -O2 Compiler Optimization and Total 10K random strings:

TRIE MAP: 16647
INT STL MAP: 69325
STR STL MAP: 440632

As you can see,

· TRIE Map is approx. 4 times faster than stl map with int key for searches – with or without optimizations.

· TRIE Map is approx. 25 times faster than stl map with string key for searches – with optimization. Without optimization it is only10 times faster.

With -O2 Compiler Optimization and Max String length < 10 characters and Total 10K random strings

TRIE MAP: 36950

INT HASH MAP: 27369

STR HASH MAP: 107488

With -O2 Compiler Optimization and Max String 20 < length < 30 characters and Total 240K random strings:

TRIE MAP: 3451134

INT HASH MAP: 143514

STR HASH MAP: 1465260

TRIE Map works very well for symbols with length < 10 characters, which even beat the performance of boost::unordered_map.

However its performance degraded while symbol’s length reach around 20 to 30 characters, where boost::unordered_map becomes faster.

And in both cases, integer searching even in maps are much efficient, Around 4 times faster than string search when string length < 10 characters, while 10 times faster than string when string length is between 20 chars and 30 chars.

Monday, January 9, 2012

Priority Inversion in Mars Probe Pathfinder in 1997

On 4 July 1997, after a seven-month voyage, the spacecraft landed on Mars. A few days into the mission, the spacecraft began experiencing system resets, which was caused by Priority Inversion design issues.

To know more about Priority Inversion - http://en.wikipedia.org/wiki/Priority_inversion
So we should never feel nervous when there are bugs reported for our software, even Mars project can have such critical bug as well!

There are a few solutions to the Priority-inversion Problem, all of which involve increasing the priority of tasks during their accesses to shared resources. The variation lies in when to increase priority and to what value. Assume that a task A gets a shared resource R. Except for PI, A’s priority is increased to the priority ceiling when A acquires R. The difference lies in the way the priority ceiling is computed for R.

For NPCS, the priority ceiling is equal to highest priority of all tasks in the system.
For CP and PC, the priority ceiling is equal to the highest priority of all tasks requiring R. The difference is in allowing or denying access to R.
For MB-CP, the priority ceiling is equal to the priority ceiling of the monitor, which contains the critical section of R.
Assume that a task A holds R. In PI, whenever a higher priority task B requests R, A inherits the priority of B and B is blocked.

Thursday, January 5, 2012

Improve Cache Performance with high locality of reference

if users randomly and uniformly access data throughout the master data space a cache is useless, and in fact, it may actually degrade data access time. A cache is effective only if users maintain a high locality of references, that is, data references are spread over tiny areas of the data space, at least over a limited span of time.

L1 and L2 Cache

To be more specific, data to be cached must be at minimum as possible and to be accessed sequentially as much as possible. For example use a small array to store the checking conditions separately if only small portion of the conditions check will return true for next processing stage. If majority of the condition check will return true for next processing stage, the array should contain those extra data needed as well as condition variables.