//todo: think of better title: optimization

Showing posts with label optimization. Show all posts

Saturday, March 31, 2007

The attack of the forEach loops?

Many library writers are (finally) tuning into library performance and speed optimization. DOM query libraries are getting pretty good, but I still find that most libraries have made the very grave mistake of making a forEach-like structure available.

Sure, Firefox has a fast native implementation, but for the great majority of browsers out there, implementing forEach in javascript is a huge performance hit. And not only that, today I found out that even the plain old for loop used by many of these implementations is slow as well.

Here's my benchmark code and results:

Code

var bm=function(action,load) {
 for (var i=0;i<load;i++) action+=action+action;
 var t0=now();
 eval(action);
 var tf=now();
 return tf-t0;
};
var div=document.getElementById("output");//this is my output div, just add one to the page.
var a="var a=[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9],b=function() {return this;};"
var load=6;
var test = function() {
 var resultsets="";
 for (var i=0;i<15;i++) {
  var results=[];
  results.push(bm(a+"for (var i=0,l=a.length;i<l;i++) b();",load));//for
  results.push(bm(a+"var i=a.length;while (--i!==0) b();",load));//while backwards
  results.push(bm(a+"var i=-1,l=a.length-1;while (++i!==l) b();",load));//while forward
  resultsets+=results+"<br>";
 }
 div.innerHTML=resultsets;
};

Results

note: I'm just posting the results from my final algorithms, tested in Opera.

for	while backwards	while forward
156	78	78
78	72	79
93	63	94
78	78	78
78	78	78
78	63	78
93	63	78
94	78	78
78	78	78
79	78	78
78	78	78
94	62	79
93	63	94
78	78	78
78	78	78
1326	1078	1204

The last column shows the sum of all times. The forward while loop performed 9.2% better than the for loop, and the backwards while loop performed 18.7% better.

Breakdown

Things that I noticed while testing and trying different looping algorithms:

The number of declared variables matter. If possible, use only one.
Using array.length in the exit condition instead of assigning it to a variable first (i.e. var length = array.length) slows things down a lot.
The operator in the exit condition matters. The === and !== operators are faster than <, > and variations.
Math operators for the incrementation don't seem to affect the speed of the loop. --i takes the same time as ++i.

Final thoughts

One last phenomenon that I think is worth mentioning on this whole forEach talk: many developers have the tendency of "latching" onto familiar practices. What this means is that updating existing functionality gets done using a structured programming approach. So, it's not uncommon to see things like this:

//the old version
array.forEach(doSomething);
//becomes
array.forEach(function(i) {doSomething(i,someNewParam)});

Notice the double function call? In browsers without native support for Array.forEach, this also needs to internally call a Function.call() for every iteration. Definitely not cheap. Use it in the core of a nifty popular library for maximum damage amplification.

Bottom line: the forEach pattern is redundant and expensive. And now, with the talks of centralizing javascript distribution, (like Yahoo is doing with YUI) the code footprint argument is becoming obsolete too.

Note to self: my getElementsByClassName function needs to be updated now =P

Thursday, March 15, 2007

RegExp good practices

After not finding anything from a Google search on "RegExp good practices", I figured someone should write something about it. Here's what I think are important tips when writing code with regular expressions:

Avoid them if possible - regexp bugs are hard to trace
Be careful with .* and negatives ([^...]) - matching too much is the leading cause of regexp bugs
Comment regular expressions - the longer it is, the more ellaborate you should be
Never ever regex document.body.innerHTML!

1) The main reason why I recommend staying away from regular expressions is that they are difficult to test, especially when used to handle input.

2) More often than not, developers don't test all cases. What happens if there are french characters? Line breaks? Some may argue that there are testing frameworks out there to help, but let's face it, not everyone uses them, and even if you do, complex regular expressions can still sometimes get away with obscure unusual character-sequence matches that only a regexp guru would be able to pick up.

3) All code should, of course, be commented, but reality is far from perfect. The least you as a developer should do is comment regular expressions, simply because they are so cryptic by nature. I find that "translating" the regexp into plain english and then stating what it is supposed to accomplish is reasonable.

4) I hack a lot while experimenting, and I'll tell you this first hand: Regular expressions are for small strings only! If your project absolutely requires regex'ing through paragraphs and paragraphs of content, you're probably neglecting your server-side capabilities, and heading towards a glacially slow dead-end.

Last thoughts

I'm mostly talking with javascript in mind here, but I think it's safe to say that these regexp rules apply in any environment. If you have any to add, feel free to post a comment.

Monday, March 12, 2007

getElementsByClassName speed optimization

From my experience in coding javascript, I came to realize that as cool as bleeding-edge javascript libraries and full CSS3 and XPath support are, they are fairly slow, particularly when it comes to DOM manipulation.

John Resig recently did a neat benchmark test on different techniques for implementing the getElementsByClassName method (in light of the recent news that Firefox 3 will have its own native implementation). While it gives a good overall idea of what algorithms should be avoided, it would certainly be nicer to just have a copy-and-paste solution that runs the fastest way possible in all major browsers.

Most libraries have pretty good implementations, but I have yet to see one that is truly optimized for speed. Here's my variation, written with the sole goal of being the fastest possible getElementsByClassName() script:

var getElementsByClassName=function(className,parentNode) {
 var results = [];
 //if (document.getElementsByClassName) document.getElementsByClassName(className,parentNode);//this is the code for FF3. Prototype 1.5 doesn't play nice with this code, though.
 if (document.evaluate) {
  var query = document.evaluate(".//*[contains(concat(' ', @class, ' '), ' " + className + " ')]", parentNode || document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
  for (var i = 0, length = query.snapshotLength; i < length; i++) results.push(query.snapshotItem(i));
 }
 else {
  var nodes = (parentNode || document).getElementsByTagName("*");
  for (var i = 0, j = 0, length = nodes.length; i < length; i++) {
   var nodes_i=nodes[i];
   if ((" "+nodes_i+" ").indexOf(" "+className+" ") > -1) results[j++] = nodes_i;
  }
 }
 return results;
};

Benchmarking

For my test, I created a page with 200 sibling divs, out of which, half had class names. Each test queried the DOM 30 times, 1 second after page load, to minimize interference from network-related threads.

Benchmark results for library equivalents were fairly similar in Firefox (all of them scored somewhere near 190-200ms), but there were large performance differences in Internet Explorer 6. Prototype's implementation was 3 times slower than the script above.

Compared to CSS selector query engines like Prototype's $$() or JQuery, the script above performed between 10 to 20 times better, depending on the browser being tested. (In other words, $$(".my_class") is a big no-no).

The limitations of the script

It only takes spaces as class separators. No tabs or line feeds. This is the main reason why the script is faster in browsers that don't support document.evaluate
It assumes a browser, and doesn't check whether the "document" global exists. Rhino would always need to specify the root element anyways.

What makes it fast

No expensive regular expressions
document.evaluate delegates all the expensive DOM parsing to the underlying, compiled-code engine, where available.
Using variables instead of object members cuts down a lot of overhead. A lot of people don't believe me when I say this stuff is important until they actually see the numbers. Just today I got the javascript execution time of an e-commerce site down from 970ms to 180ms simply by fixing a few for loops.

Last thoughts

I personally don't see why anyone would want to separate class names with a tab. Line feeds are even worse since they break the indentation flow and make HTML virtually impossible to read. Sure it's part of the specs, but I'd much rather use a fast and sensible subset.