Coding for Speed – Part 3 - Optimizing your code
Warning: If you were wondering why I took a break in between part 2 and 3 of this series, its because this one is really long. Hopefully you still take the time out to read and learn something new.
So, you’ve been monitoring your code to evaluate where it can be optimised and determined that optimisation, is indeed worth the effort. Now, you need to start sitting down and figuring out exactly how this can be done.
Whether writing code from scratch or trying to improve existing code, optimising your code is definitely a tricky prospect, but that doesn’t mean that there aren’t a few tricks to be aware of that could help you make some performance gains in your code. While these examples don’t apply equally to every solution, language or compiler, its important concepts nonetheless that could help to improve the performance of your code.
Before I get into some pointers to look out for though, I would like to also state that not everything is applicable to every situation or compiler and the intent here is just to give you things to look out for and hopefully provide ideas for how to make your code more optimised. The most important thing is to find the most appropriate solution that meets what you require. Some other important pointers to consider are:
You could optimize your code for performance using all possible techniques, but this might generate a bigger file with bigger memory footprint.
You might have two different optimization goals that might sometimes conflict with each other. For example, to optimize the code for performance might conflict with optimizing the code for less memory footprint and size. You might have to find a balance.
Performance optimization is a never-ending process. Your code might never be fully optimized. There is always more room for improvement to make your code run faster.
Sometimes you can use certain programming tricks to make code run faster at the expense of not following best practices such as coding standards, etc. Try to avoid implementing cheap tricks to make your code run faster.
How do you calculate true optimisation?
While I speak about a lot of things you can consider to make your code faster, the truth is that true optimisation is difficult to calculate. There are a lot of different methods and formulas that can assist with this, which many compilers may also provide, but for the sake of complicating the article I won’t go into these and will rather keep to the basics.
It’s difficult to measure the speed of your code because each line of code doesn’t translate into an actual CPU operation, where decision trees, loops and memory management all play a bigger part of ensuring your code is optimal. There are also external factors like hardware (cache, CPU or GPU types) or for the web your different rendering engines, which also play a part in how your code will perform and if you want to properly optimise your code for these – you need to get into some assembler or lower level coding if you so desire.
Keeping all this in mind though, these are some big things to consider in the practice of writing optimised code:
Using the appropriate algorithm
Speed is not simply about the easiest way of solving a problem or often even the shortest but needs to consider CPU processing. Ideally, what you want to do when optimising any algorithm is figure out which decision tree or branch logic will require the least number of options to work through – or alternatively least CPU time.
Consider the following scenario: We have an interval for x [-100…100] and interval for y [-100…100]. Now in these two intervals, we are looking for a maximum of the function (x*x + y*y)/(y*y + b).
This is a function of two variables: x and y. There is one more constant which could be different and a user will enter it. This constant b is always greater than 0 and also lesser than 1000.
In this example, I don’t want to make use of any prebuilt math functions, though I would be interested to see how optimised some of the prebuilt functions we rely on in our languages are.
Example code (in C++):
#define LEFT_MARGIN_FOR_X -100.0
#define RIGHT_MARGIN_FOR_X 100.0
#define LEFT_MARGIN_FOR_Y -100.0
#define RIGHT_MARGIN_FOR_Y 100.0
using namespace std;
//Get the constant value
cout<<"Enter the constant value b>0"<<endl;
cout<<"b->"; double dB; cin>>dB;
if(dB<=0) return EXIT_FAILURE;
if(dB>1000) return EXIT_FAILURE;
//This is the potential maximum value of the function
//and all other values could be bigger or smaller
double dMaximumValue = (LEFT_MARGIN_FOR_X*LEFT_MARGIN_FOR_X+LEFT_MARGIN_FOR_Y*LEFT_MARGIN_FOR_Y)/ (LEFT_MARGIN_FOR_Y*LEFT_MARGIN_FOR_Y+dB);
double dMaximumX = LEFT_MARGIN_FOR_X;
double dMaximumY = LEFT_MARGIN_FOR_Y;
for(double dX=LEFT_MARGIN_FOR_X; dX<=RIGHT_MARGIN_FOR_X; dX+=1.0)
for(double dY=LEFT_MARGIN_FOR_Y; dY<=RIGHT_MARGIN_FOR_Y; dY+=1.0)
cout<<"Maximum value of the function is="<< dMaximumValue<<endl;
cout<<"Value for x="<<dMaximumX<<endl
<<"Value for y="<<dMaximumY<<endl;
Now, if we analyse the code more carefully, we notice that the part for dX*dX is calculated more times than it should, in this case, it is calculated 200 times and this is a waste of CPU time.
There are several solutions to solve this, but perhaps the most basic would be to create one variable (dX_Squer = dX*dX) and calculate after the first for repetition. This could then be used in all calculations afterwards. You just need to add one more bracket.
There are few more optimizations that can be made in the above code, but I’ll leave these for you to figure out.
Optimize Your Code for Memory
Now we will look at how you could optimize your code from point of memory consumption.
Let us take a simple example. Let us try to swap two values in the memory, which is done in many sorting algorithms.
Some people like to think of this as two people sitting on two chairs and adding one more chair as a temporary holder for one of them during the swap.
int nFirstOne =1, nSecondOne=2;
int nTemp = nFirstOne;
nFirstOne = nSecondOne;
nSecondOne = nTemp;
This is nice but actually saves extra variables in memory.
This could be done without nTemp like this:
int nFirsOne = 3, nSecondOne = 7;
nFirstOne += nSecondOne;
nSecondOne = nFirstOne ? nSecondOne;
nFirstOne -= nSecondOne;
In some cases, you could have large objects in memory that need to swap their places. So, what could you do? Instead of coping to many memory locations you could use their addresses and instead of replacing the memory locations you could just change their address.
Problems with Functions
While we utilize functions because it makes our code more modular, maintainable and scalable, if we are not careful, functions can create some performance bottlenecks.
For example, functions don’t work well in loops:
for(int i=1; i<=10; ++i)
It certainly makes your coding shorter, but repeatedly calling a function to do a similar thing 10 times is unnecessary expenditure on your memory. To implement this better, you should rather implement the repetition directly in your function. This would require less processing.
Next thing to consider is inline functions. And there are other things we can look at which do it better. For instance, if they are small, you can make use of macros. This way you benefit from point of speed, from point of better organization, and as well as reusability.
When passing a big object to a function, you could use pointers or references. Prefer to use references because they would create the code that is way easier to read.
If you are not worried about changing the value that is passed to the function, use references. If you use an object that is constant, it could be useful to use const, which will save some time.
While recursion is extremely helpful in certain specific scenarios, in general, it will generate a slow performing code. If possible, try to avoid recursion, when you don’t need to use it to solve your problem.
map[i].visited = 0;
map[i].visited = 0;
Loops serve a purpose, but you want to reduce your reliance on them and rather only attempt it if it is needed a few times with several operations needed within it. Otherwise, if you need to iteratively sort through something, using another type of sorting algorithm to reduce your processing time.
Data Structure Optimization
Much like most things, data has a big impact on our code performance and the way we structure the data we need to use in our code will play a big part in enhancing the speed of our code.
If you keep your data in the list, you could very easily create the program that will outperform one with the array we have mentioned. Sometimes, if you save your data in some form of tree you could create a program that will perform faster than the one without adequate data structure.
Be careful when using data structure. Sometimes a problem could be solved without keeping all elements of an array or using any data structure at all.
Binary Search or Sequential Search
One of the common tasks we need to do when we program is to search for some value in some data structure. Yes, it is the basis for hash tables, multi-level hash tables, etc.
If you are trying to find one number in an array of numbers you could have two strategies.
The first strategy is very simple. You have your array and value you are looking for. From the beginning of the array you start to look for the value and if you find it you stop the search, and if you don’t find the value you will be at the end of the array. There are many improvements to this strategy.
The second strategy requires the array to be sorted. If an array is not sorted you will not get the results that you wish for. If the array is sorted you split it in two half. In the first half, the elements of an array are smaller than the middle one in another half the elements are bigger than the middle one. If you get yourself in the situation that two markers are not situated the way they should you know that you don’t have the value you have been looking for.
If you sort elements of an array you will lose some time, but if you invest in that you could benefit from faster binary search.
This is one of a number of situations where you would need to understand the problem well and act according to the best possible situation based on your specific scenario.
The array is one of the most basic data structures that occupy some space in memory for its elements.
To understand how these optimizations work, you should be aware of the structure of the array. The name of an array is a constant pointer that points at the first element of an array. This means that you could use pointers and pointer arithmetic.
for(int i=0; i<n; i++) nArray[i]=nSomeValue;
Instead of the above code, the following is better:
for(int* ptrInt = nArray; ptrInt< nArray+n; ptrInt++) *ptrInt=nSomeValue;
The reason for this is in the operations with pointers. In the above example, we have a pointer to the int data type that takes the address from the name of the array. In this case, it is nArray, and we increase that address for one element, and the pointer is moved toward the end of the array for the size of the int data type.
If you have used double, your compiler would know how far it should move the address.
It is harder to read code this way, but it will increase the speed of your program. In other words, when you don’t use a better algorithm, but your program still runs faster, the increased speed might be due to better syntax that will generate faster code.
If you use a matrix, and you have the chance to approach the elements of matrix row by row or in some other manner you should always pick to go row after the row in your matrix. The matrix is an array of arrays it will be stored in memory row after the row, so the most natural way to do approach the array members is to go row by row.
Avoid initialization of large portions of memory with some element. If you could not avoid this type of situation, consider memset or similar commands.
Most basic operations like +=, -=, and *=, when applied on basic data types could slow down the program as well. To be sure you will need to know how it gets transformed into assembler on your computer.
One interesting idea is to replace postfix increment and decrement with their prefix versions.
Sometimes you could try to use operators >> or << instead of multiplication or division, but be very careful, you could end up with bad mistake this way, and then to fix it you could add some range estimations and that will be way slower than the original code you have started with.
Bit operators and tricks that go with them could increase the speed of program, but you should be very careful because you could end up with machine dependant code and that is something to avoid.
Using tables versus recalculating
Tables are often easier to work with and the simplest solution to code, but don’t scale well. Remember that in recalculating, you have the potential of using parallelism, and incremental calculation with the right formulations. Tables that are too large will not fit in your cache and hence may be very slow to access and cannot be optimised further. Much like data structures above – tables should be used with caution.
Using smaller data types are faster than larger ones
The original reason int was put into coding languages was so that the fastest data type on each platform remained abstracted away from the programmer himself. On modern 32 and 64-bit platforms, small data types like chars and shorts actually incur extra overhead when converting to and from the default machine word sized data type.
On the other hand, one must be wary of cache usage. Using packed data (and in this vein, small structure fields) for large data objects may pay larger dividends in global cache coherence, than local algorithmic optimization issues.
Use powers of two for multidimensional arrays
The advantage of using powers of two for all but the leftmost array size is when accessing the array. Ordinarily, the compiled code would have to compute a multiply to get the address of an indexed element from a multidimensional array, but most compilers will replace a constant multiply with a shift if it can. Shifts are ordinarily much faster than multiplies.
Data type considerations
Often to conserve on space you will be tempted to mix integer data types; chars for small counters, shorts for slightly larger counters and only use longs or ints when you really have to. While this may seem to make sense from a space utilization point of view, most CPUs have to end up wasting precious cycles to convert from one data type to another, especially when preserving sign.
y = x;
int x, y;
y = x;
The above list is in no way exhaustive, nor complete, but hopefully will help you to think about optimisation in your code a little more. Similarly, continue to play around with your different compilers and coding languages to determine which ones may work best for you and find that perfect balance between form and function. Writing fast code is important, but can slow down your productivity and you want to make sure it’s worth your while to do.
However, if you practice these tricks and make them a habit, writing optimised code can become more natural to you. And like most other things, speed is definitely sexier and who wouldn’t want to make their code sexier?