Al's Code Shop

Thursday, April 21, 2011

Sort Algorithms as Folk Dances

They've "implemented" a few choice sort algorithms as dances at Sapientia University in Hungary. There's a YouTube channel for it!

Awesome.

A Few Things I Wish Java Had

As I switch back and forth between Java and C# there are a few things I wish Java had that C# does. Almost all of my gripes are related to syntactic sugar, but I think they are nice to have.

Nullable types - I think they're cleaner than using Object wherever a null reference might come into play. I know that they are still boxed under the hood in .Net (into Nullable<T> struct), so the performance is probably similar, but I think they are nice to have. May have a future post here...
Lambdas - again, really just syntactic sugar for anonymous functions and can get quite ugly, but these are nice to have the majority of the time. (I suppose anonymous methods might be the start for Java. Java 9?)
Unsigned types - can o' worms? Perhaps. See below for a quote from Gosling
Implicitly typed variables - At first, I thought that var was not the greatest idea. But now, I love it. So easy and clean and totally type safe.
Generic containers that accept value types - Hashtable<Integer, Integer>. Really?
Better collection initializers - C# has this down.

On the unsigned types front... Having signed bytes doesn't make any sense to me. Here's a quote from James Gosling on the topic:

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

I'm all for simplicity, but signed bytes seem over the top. Maybe he should've had all signed types except bytes or have signed and unsigned bytes. I totally agree with him on the complexity of signed/unsigned arithmetic in C as well, but it's not that hard to grasp (or lookup).

Don't get me wrong - Java is nice, these are just a few things that I think C# did really well.

Wednesday, March 16, 2011

Faster Swap - XOR or Temporary Variable?

Jerry Pisk asks -- Which is faster for swapping integer values – the old XOR “trick” or just using a temp variable?

Jerry opines that the temp var method should be faster - I agreed, but thought it might be interesting to dig into this a little.

There’s a fair amount of anecdotal evidence on the internets about this topic already, but I wanted to test the theory out in both C# and in C++ (MSVC) for myself.

The algorithm I used to compare went like this: create two large-ish arrays, fill them with random, unsigned 32-bit ints and then reverse the arrays in place – one using temp storage and the other using XOR.

The crux of the code looks like this:

Swap

while (i < j) {

temp = array[i];

array[i] = array[j];

array[j] = temp;

}

XOR

while (i < j) {

array[i] ^= array[j];

array[j] ^= array[i];

array[i] ^= array[j];

}

My mean timings over a few thousand iterations were identical for both languages on my machine – ~11ms for Temp and ~17ms for XOR for 5 million swap operations.

I was interested to see what the compilers did to the high-level XORs and, in the end, they did essentially the same thing:

.Net

xor dword ptr [edx],eax

C++

xor DWORD PTR [eax+edx*4], esi

Note: I didn’t see the XCHG instruction in either language for either algorithm.

So we see that the temp variable swap technique is significantly faster than the XOR swap technique (in these languages on my machine, etc.).

Wednesday, March 2, 2011

Some HTML5 Fun

I’ve been playing around with HTML5 and with the <canvas> tag in particular. It’s pretty neat – especially how quickly you can put something together.

I’d been looking at Conway's Game of Life and thought it’d be a fun way to dig into something slightly more complex than just drawing lines on the canvas tag (boring! :).

I pretty quickly had something up and running and found the game fun to watch. I chose to use a random start state, but this could be easily tweaked to use some of the well-known start states which produce cool effects (see the Wikipedia article for some of those).

I’m planning on Open Sourcing it shortly (it’s nothing awesome, but I thought I’d share it anyway).

Here’s a screen shot of it “in action”:

I'll follow up shortly with its new home.

Tuesday, February 15, 2011

Protocol-Relative URLs

These are pretty cool. Basically, you don’t specify your scheme when adding a URI to an HTML tag which points to content for your page. Instead, you let the browser determine which protocol was used when requesting the page and, in turn, it uses that protocol for the initial request for additional content. This will typically avoid the dreaded “This page contains both secure and nonsecure items” dialog in IE:

It also simplifies your HTML and JavaScript – you don’t need to know which scheme the browser is using and have stuff like:

var scheme = document.location.protocol;

//some code...

someImage.src = scheme + "//alscodeshop.com/images/pic.png";

Instead, you’ll just have:

someImage.src = "//alscodeshop.com/images/pic.png";

or:

By the way, you can use this in all browsers - it’s not IE-specific.

I learned this from Paul Irish (he credits others) and he has a couple of really minor caveats regarding using these...

Wednesday, February 9, 2011

STL list - Random Access

I was working with an unordered list in C++ and needed to get the n^thelement, but STL doesn’t offer random access and I knew I was going to end up writing a loop anyway. And, Boost doesn’t provide random access for list collections either.

So I started thinking about this a little. C# offers index operators to access the List<T> class and that's pretty nice to have. So I wrote my own that looked something like this:

template <class T> class RAList : public list<T>

{

public:

T& operator [](unsigned int n)

{

iterator it = begin();

for(; n > 0; --n)

{

++it;

}

return *it;

}

};

You could have a const version and add error checking/handling if you really felt like it (my version gets what the Microsoft STL implementation provides which is essentially boundary condition handling in debug builds via the dereference operator for the iterator). You could also just as easily add it to the singly-linked list slist<T> as well. You could also start looping from the back of the list if you knew the element you were looking for was nearer the end.

Anyway, now, I can do stuff like this:

RAList<A> raList;

raList.insert(raList.end(), A("One"));

raList.insert(raList.end(), A("Two"));

A a = raList[1];

That’s a little better.

Of course, this is an O(n) operation and often a list is used if you have a decent-sized, variable-length collection, so it can take some time to find that n^thelement. However, you'd have to loop anyway to find it, so no loss there.

C# does provide this functionality for List<T>, but it’s a gimme because List<T> in the CLI is backed with an array! That’s a topic for a future post, however. And, Java, like the STL does not provide random access to implementations of List (ArrayList and others do offer it, however, via the get(i) method, but not via an indexer).

Now, if C++ had extension methods… I guess that’s a topic for a future post too.

Thursday, February 3, 2011

Reference Types vs. Value Types - Static Fields Edition

A common misconception among .Net developers is that the difference between value types and reference types is that value types are always stored on the stack and reference types are stored on the heap.

In fact, the compiler determines where to store variables based on the lifetime of the object. Long-lived objects are stored on the heap, and short-lived objects are likely to be stored on the stack (or in a register). As well, reference types are, of course, passed by reference and value types are passed by value (that is, copied). Eric Lippert has a great post which covers the specifics in detail http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx

How the types are actually stored in memory is an implementation detail and one which will likely never matter to most developers.

A case study on this topic I find interesting is static fields. There is one copy of a static field for all instances of a type http://msdn.microsoft.com/en-us/library/79b3xss3.aspx and their lifetime is tied to the lifetime of their domain (they are long-lived).

So, going back to the misconception, if we have a value-typed static field, how could all instances access that same variable if it was stored on the stack? I suppose that the system could copy the variable to each stack frame as it was created, but that would be inefficient – there would have to be a “master copy” of the variable stored on the heap, and quite ugly.

Instead, static fields are stored on the heap - even if they are value types.

The CLR creates so-called “loader heaps” which live for the lifetime of an application domain. Static fields are stored in the one of the loader heaps called the HighFrequencyHeap. Loader heaps are not garbage collected; this is essential since static fields must have the same lifetime as the domain.

Here is some sample code which has a class with value-typed static field, a value-typed non-static field and a value-typed local variable.

class A

{

public static uint fieldA;

public uint fieldB;

}

class Program

{

static void Main(string[] args)

{

var a = new A();

a.fieldB = 2;

A.fieldA = 2;

uint localC = 2;

}

Let’s look under the hood at what happens during the three assignments in Main(). This is the actual assembler code from the JIT compiler as available in Visual Studio – not the CIL.

a.fieldB = 2;

00000055 mov eax,dword ptr [ebp-40h]

this moves a pointer to our non-static field on the stack to the register eax. “ebp” points to the base of the stack for this function. The stack grows downward on x86 machines (like mine), so offsets are negative.

00000058 mov dword ptr [eax+4],2

this moves the value 2 to the stack location pointed to by eax

A.fieldA = 2;

0000005f mov dword ptr ds:[00303368h],2

Now, we're going to work with our static field. This moves the value 2 to the location pointed to by ds:[00303368h]. “ds” is the data segment register which represents an offset into memory which is not part of the stack (it is, in fact, the heap). Let's examine the heap a little more closely here.

If we run the SOS debugging extension in Visual Studio and type
!eeheap -loader
in the immediate window, we get the following info about the HighFrequencyHeap:

Domain 1: 00632d18
LowFrequencyHeap: 00300000(2000:2000) Size: 0x2000 (8192) bytes.
HighFrequencyHeap: 00302000(8000:2000) Size: 0x2000 (8192) bytes.

We can see that our static field (at address 00303368h - from our assembly code) is inside the boundaries of the HighFrequencyHeap (00302000h --> 00304000h). Here we can clearly see a value type that is stored on the heap!

uint localC = 2;

00000069 mov dword ptr [ebp-44h],2

this moves the value 2 to the stack location ebp-44h effectively combining the two statements in the first example into one, but the result is the same. We can also see that this is only 4 bytes away from the first example variable on the stack.

So, we see in this example that the storage location for a value type is not always the stack. In fact, there is a set of criteria the compiler uses to determine where a variable is stored.