Saturday, March 31, 2007

The attack of the forEach loops?

Many library writers are (finally) tuning into library performance and speed optimization. DOM query libraries are getting pretty good, but I still find that most libraries have made the very grave mistake of making a forEach-like structure available.

Sure, Firefox has a fast native implementation, but for the great majority of browsers out there, implementing forEach in javascript is a huge performance hit. And not only that, today I found out that even the plain old for loop used by many of these implementations is slow as well.

Here's my benchmark code and results:

Code

var bm=function(action,load) {
 for (var i=0;i<load;i++) action+=action+action;
 var t0=now();
 eval(action);
 var tf=now();
 return tf-t0;
};
var div=document.getElementById("output");//this is my output div, just add one to the page.
var a="var a=[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9],b=function() {return this;};"
var load=6;
var test = function() {
 var resultsets="";
 for (var i=0;i<15;i++) {
  var results=[];
  results.push(bm(a+"for (var i=0,l=a.length;i<l;i++) b();",load));//for
  results.push(bm(a+"var i=a.length;while (--i!==0) b();",load));//while backwards
  results.push(bm(a+"var i=-1,l=a.length-1;while (++i!==l) b();",load));//while forward
  resultsets+=results+"<br>";
 }
 div.innerHTML=resultsets;
};

Results

note: I'm just posting the results from my final algorithms, tested in Opera.

forwhile backwardswhile forward
1567878
787279
936394
787878
787878
786378
936378
947878
787878
797878
787878
946279
936394
787878
787878
132610781204

The last column shows the sum of all times. The forward while loop performed 9.2% better than the for loop, and the backwards while loop performed 18.7% better.

Breakdown

Things that I noticed while testing and trying different looping algorithms:

  • The number of declared variables matter. If possible, use only one.
  • Using array.length in the exit condition instead of assigning it to a variable first (i.e. var length = array.length) slows things down a lot.
  • The operator in the exit condition matters. The === and !== operators are faster than <, > and variations.
  • Math operators for the incrementation don't seem to affect the speed of the loop. --i takes the same time as ++i.

Final thoughts

One last phenomenon that I think is worth mentioning on this whole forEach talk: many developers have the tendency of "latching" onto familiar practices. What this means is that updating existing functionality gets done using a structured programming approach. So, it's not uncommon to see things like this:

//the old version
array.forEach(doSomething);
//becomes
array.forEach(function(i) {doSomething(i,someNewParam)});

Notice the double function call? In browsers without native support for Array.forEach, this also needs to internally call a Function.call() for every iteration. Definitely not cheap. Use it in the core of a nifty popular library for maximum damage amplification.

Bottom line: the forEach pattern is redundant and expensive. And now, with the talks of centralizing javascript distribution, (like Yahoo is doing with YUI) the code footprint argument is becoming obsolete too.

Note to self: my getElementsByClassName function needs to be updated now =P

Saturday, March 24, 2007

Javascript OOP

Today I was trying to write a new "Class" module to be able to easily create an OOP patterns in Javascript. Actually, I had some 10 or 20 different variations. Most lacked one feature or another, or implemented things awkwardly, or simply became too big.

My goal (quite an ambitious one) was to create something as feature-packed as Dean Edwards' Base and at the same, as small as Kevin Lindsay's pattern. And be easily documentable.

After a whole day of writing and rewriting variations, I ended up with the same script I had written a while ago (except cleaner looking).

I was about done with Class patterns, but on my way to the shower, I had a "brilliant" idea: Simply append a string to the name of inherited members. Like so:

var Class=function() {};
Class.Extend=function(Class) {
 var ThisProto=this.prototype, SubClass=Class.Constructor||ThisProto.Constructor||function() {};
 SubClass.Extend=this.Extend, SubClass.prototype=Class;
 for (var Member in ThisProto) SubClass.prototype["undefined"==typeof Class[Member]?Member:Member+"Base"]=ThisProto[Member];
 return SubClass;
};

Yes, pretty stupid I know, but hey it's 7 lines of code and it keeps the correct scope in base methods. Here's a demonstration:

var Animal=Class.Extend({
 Constructor:function() {
  var Who="World";//private
  this.Who=function(Value) {return Who;}//privileged
 },
 Greeting:"Hello",//public
 Say:function() {alert(this.Greeting+" "+this.Who());}//public
});
var Dog=Animal.Extend({
 Say:function() {
  this.SayBase();//calling the base method
  alert("Woof Woof");
 }
});
var dog=new Dog()
dog.Say()//Hello World
 //Woof Woof

Monday, March 19, 2007

Location.href doesn't seem to work? Use "return false;"

Sometimes, people write stuff like this:

<a href="javascript:void(0)" onclick="doSomething(); location.href='newPage';">Click me</a>

and wonder why IE is being stupid and not going to the page we're telling it to go. To get around the problem DON'T use a setTimeout. It's bad practice. Just add a "return false" to your onclick like so:

onclick="doSomething(); location.href='newPage'; return false;" 

Of course, it's even better to avoid writing javascript in a html file in the first place, but I'm simplifying for the sake of explaining the idea.

Saturday, March 17, 2007

Amusing accident

So today I was adding content to a javascript tutorial written by a co-worker. I wrote and tested quite a bit of code before I noticed that in one line, my co-worker had written a URL. It wasn't in a string, nor a regular expression, nor it was commented out. I assume he meant to comment it out, but it was there in its own line between two other functions, kinda like so:

function example1() {/*...*/}
http://www.domain.com/page/1/lorem/ipsum/
function example2() {/*...*/}

It's amusing how he accidentally wrote perfectly valid javascript code and probably isn't even aware how that line is actually a label (used for goto statements) followed by a comment.

Thursday, March 15, 2007

RegExp good practices

After not finding anything from a Google search on "RegExp good practices", I figured someone should write something about it. Here's what I think are important tips when writing code with regular expressions:

  1. Avoid them if possible - regexp bugs are hard to trace
  2. Be careful with .* and negatives ([^...]) - matching too much is the leading cause of regexp bugs
  3. Comment regular expressions - the longer it is, the more ellaborate you should be
  4. Never ever regex document.body.innerHTML!

1) The main reason why I recommend staying away from regular expressions is that they are difficult to test, especially when used to handle input.

2) More often than not, developers don't test all cases. What happens if there are french characters? Line breaks? Some may argue that there are testing frameworks out there to help, but let's face it, not everyone uses them, and even if you do, complex regular expressions can still sometimes get away with obscure unusual character-sequence matches that only a regexp guru would be able to pick up.

3) All code should, of course, be commented, but reality is far from perfect. The least you as a developer should do is comment regular expressions, simply because they are so cryptic by nature. I find that "translating" the regexp into plain english and then stating what it is supposed to accomplish is reasonable.

4) I hack a lot while experimenting, and I'll tell you this first hand: Regular expressions are for small strings only! If your project absolutely requires regex'ing through paragraphs and paragraphs of content, you're probably neglecting your server-side capabilities, and heading towards a glacially slow dead-end.

Last thoughts

I'm mostly talking with javascript in mind here, but I think it's safe to say that these regexp rules apply in any environment. If you have any to add, feel free to post a comment.

Tuesday, March 13, 2007

Flex, CSS and code separation

Ok, so I've been looking into Flex these past few weeks. From a quick glance, it looked promising: XML markup to declare components, the ability to extend component via ActionScript 3, and a nice data binding scheme.

While I respect the efforts the Flex team made into creating a nice application development platform, I found that using it was far more restricting that I would like, particularly when it came to styling applications.

The CSS support that Flex provides is horrible, to put it nicely. It seems that the Flex development team never ever built a single CSS-based layout, or any post-1995 layout for that matter.

The official way of styling text (directly from the docs) involves having 2 nested nodes, with <FONT> tags in a CDATA section to define basic things like font size.

Flex does allow us to put some styles in a separate stylesheet file, but I wouldn't call it CSS per se. It provides mechanisms that are not part of W3C's specs, and it fails miserably at rendering styles from something as simple as an id selector. Class selectors with underscores even throw compiler errors!

Not only that, basic rules like left, top are not supported in CSS (they need to be inline in the MXML file). Even more basic things like text-align are not supported at all in some components! The closest attempt at fixing it (that I could find) is a huge 30 line monster. There must be some really dumb aspect that I am overlooking; it really shouldn't have to be that complicated.

One other thing worth mentioning is that the extra compiling cycle can really slow down production, especially when running into so many CSS support issues. A fairly simple application takes on average 5 seconds to compile. Considering also the learning curve associated with any OOP framework, 5 seconds of compile time is definitely not to be taken lightly.

Last thoughts

Flex is a very powerful tool and as such, very dangerous. Whereas in the XHTML/CSS circles, there has been extensive discussions on semantic markup, style separation, and unobtrusive javascript, the Flex community simply doesn't have mature resources to help new developers avoid mingling structure, styles and functionality into untraceable memory-hogging rat's nests.

Monday, March 12, 2007

getElementsByClassName speed optimization

From my experience in coding javascript, I came to realize that as cool as bleeding-edge javascript libraries and full CSS3 and XPath support are, they are fairly slow, particularly when it comes to DOM manipulation.

John Resig recently did a neat benchmark test on different techniques for implementing the getElementsByClassName method (in light of the recent news that Firefox 3 will have its own native implementation). While it gives a good overall idea of what algorithms should be avoided, it would certainly be nicer to just have a copy-and-paste solution that runs the fastest way possible in all major browsers.

Most libraries have pretty good implementations, but I have yet to see one that is truly optimized for speed. Here's my variation, written with the sole goal of being the fastest possible getElementsByClassName() script:

var getElementsByClassName=function(className,parentNode) {
 var results = [];
 //if (document.getElementsByClassName) document.getElementsByClassName(className,parentNode);//this is the code for FF3. Prototype 1.5 doesn't play nice with this code, though.
 if (document.evaluate) {
  var query = document.evaluate(".//*[contains(concat(' ', @class, ' '), ' " + className + " ')]", parentNode || document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
  for (var i = 0, length = query.snapshotLength; i < length; i++) results.push(query.snapshotItem(i));
 }
 else {
  var nodes = (parentNode || document).getElementsByTagName("*");
  for (var i = 0, j = 0, length = nodes.length; i < length; i++) {
   var nodes_i=nodes[i];
   if ((" "+nodes_i+" ").indexOf(" "+className+" ") > -1) results[j++] = nodes_i;
  }
 }
 return results;
};

Benchmarking

For my test, I created a page with 200 sibling divs, out of which, half had class names. Each test queried the DOM 30 times, 1 second after page load, to minimize interference from network-related threads.

Benchmark results for library equivalents were fairly similar in Firefox (all of them scored somewhere near 190-200ms), but there were large performance differences in Internet Explorer 6. Prototype's implementation was 3 times slower than the script above.

Compared to CSS selector query engines like Prototype's $$() or JQuery, the script above performed between 10 to 20 times better, depending on the browser being tested. (In other words, $$(".my_class") is a big no-no).

The limitations of the script

  • It only takes spaces as class separators. No tabs or line feeds. This is the main reason why the script is faster in browsers that don't support document.evaluate
  • It assumes a browser, and doesn't check whether the "document" global exists. Rhino would always need to specify the root element anyways.

What makes it fast

  • No expensive regular expressions
  • document.evaluate delegates all the expensive DOM parsing to the underlying, compiled-code engine, where available.
  • Using variables instead of object members cuts down a lot of overhead. A lot of people don't believe me when I say this stuff is important until they actually see the numbers. Just today I got the javascript execution time of an e-commerce site down from 970ms to 180ms simply by fixing a few for loops.

Last thoughts

I personally don't see why anyone would want to separate class names with a tab. Line feeds are even worse since they break the indentation flow and make HTML virtually impossible to read. Sure it's part of the specs, but I'd much rather use a fast and sensible subset.