Wednesday, April 04, 2007

Method calls on value types and boxing

It was DevWeek 2005 and I was in a session with Jeff Richter about low level .NET things and I foolishly asked him a question. He looked at me as if I had asked the most dumb question ever asked by anyone, so I decided not to follow up on my question, even though I didn't feel he'd really given me the answer I was looking for.

So what was the question? Well he was describing boxing of value types and how it can cause performance problems so it was best to avoid it where possible, even though it's not always clear when boxing is occurring. I'd asked didn't boxing need to occur for any method call on a value type. On reflection I'm not sure this was a particularly dumb question but I've never really got fully to the bottom of it, mainly because it's never really been much of an issue to me. But here's my take on it, which may or may not be accurate. Boxing is only going to happen if the method call is a virtual method where the value type doesn't override the base object implementation. Now it might be boxing would also be required if the value type did override the base method (assuming boxing is required to get the virtual method table), if value types could be inherited from. But they can't so the discussion is kind of irrelevant. This may well be why value types can't be inherited from, but this is all frankly getting way too complicated for me to understand, so I'll quickly move on.

Anyway to illustrate the point, here's a little test C# application.

namespace ConsoleApplication2
{
  struct ValTypeTest
  {
    int val;

    public ValTypeTest(int val)
    {
      this.val=val;
    }

    public override string ToString()
    {
      return val.ToString();
    }
  }

  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      ValTypeTest thing = new ValTypeTest(34);
      Console.WriteLine(thing.ToString());
      Console.WriteLine(thing.GetHashCode());

      int number = 34;
      Console.WriteLine(number.ToString());
      Console.WriteLine(number.GetHashCode());

      Console.ReadLine();
    }
  }
}

If you look at the IL code for this in Reflector using .NET 1.1 (I'll explain why I'm using .NET 1.1 shortly), you'll see this -

.method private hidebysig static void Main(string[] args) cil managed
{
    .custom instance void [mscorlib]System.STAThreadAttribute::.ctor()
    .entrypoint
    .maxstack 2
    .locals init (
        [0] valuetype ConsoleApplication2.ValTypeTest thing,
        [1] int32 number)
    L_0000: ldloca.s thing
    L_0002: ldc.i4.s 0x22
    L_0004: call instance void ConsoleApplication2.ValTypeTest::.ctor(int32)
    L_0009: ldloca.s thing
    L_000b: call instance string ConsoleApplication2.ValTypeTest::ToString()
    L_0010: call void [mscorlib]System.Console::WriteLine(string)
    L_0015: ldloc.0 
    L_0016: box ConsoleApplication2.ValTypeTest
    L_001b: callvirt instance int32 [mscorlib]System.ValueType::GetHashCode()
    L_0020: call void [mscorlib]System.Console::WriteLine(int32)
    L_0025: ldc.i4.s 0x22
    L_0027: stloc.1 
    L_0028: ldloca.s number
    L_002a: call instance string [mscorlib]System.Int32::ToString()
    L_002f: call void [mscorlib]System.Console::WriteLine(string)
    L_0034: ldloca.s number
    L_0036: call instance int32 [mscorlib]System.Int32::GetHashCode()
    L_003b: call void [mscorlib]System.Console::WriteLine(int32)
    L_0040: call string [mscorlib]System.Console::ReadLine()
    L_0045: pop 
    L_0046: ret 
}

As you can see, the call to GetHashCode causes the value type to be boxed, whereas the call to ToString doesn't, because ToString has been overridden whereas GetHashCode hasn't been. But if we look at the IL code in .NET 2, it looks like this

.method private hidebysig static void Main(string[] args) cil managed
{
    .custom instance void [mscorlib]System.STAThreadAttribute::.ctor()
    .entrypoint
    .maxstack 2
    .locals init (
        [0] valuetype ConsoleApplication2.ValTypeTest thing,
        [1] int32 number)
    L_0000: nop 
    L_0001: ldloca.s thing
    L_0003: ldc.i4.s 0x22
    L_0005: call instance void ConsoleApplication2.ValTypeTest::.ctor(int32)
    L_000a: nop 
    L_000b: ldloca.s thing
    L_000d: constrained ConsoleApplication2.ValTypeTest
    L_0013: callvirt instance string [mscorlib]System.Object::ToString()
    L_0018: call void [mscorlib]System.Console::WriteLine(string)
    L_001d: nop 
    L_001e: ldloca.s thing
    L_0020: constrained ConsoleApplication2.ValTypeTest
    L_0026: callvirt instance int32 [mscorlib]System.Object::GetHashCode()
    L_002b: call void [mscorlib]System.Console::WriteLine(int32)
    L_0030: nop 
    L_0031: ldc.i4.s 0x22
    L_0033: stloc.1 
    L_0034: ldloca.s number
    L_0036: call instance string [mscorlib]System.Int32::ToString()
    L_003b: call void [mscorlib]System.Console::WriteLine(string)
    L_0040: nop 
    L_0041: ldloca.s number
    L_0043: call instance int32 [mscorlib]System.Int32::GetHashCode()
    L_0048: call void [mscorlib]System.Console::WriteLine(int32)
    L_004d: nop 
    L_004e: call string [mscorlib]System.Console::ReadLine()
    L_0053: pop 
    L_0054: ret 
}

Now it's no longer clear whether boxing occurs or not, because both calls use the IL constrained opcode. It would appear this opcode has been added for a variety of reasons, but one of them is to help with binary compatibility, so if a value type changes so it adds an override for a virtual method or removes an override, it will still work without any changes to the calling app. The downside of this is that boxing is even more hard to spot than it was before.

Saying that, worrying about boxing is often not really worth the trouble. It smells of premature optimization and in most cases isn't likely to cause problems. Saying that, it does suggest if you're writing your own value types, you're probably going to want to override most of object's base methods, particularly GetHashCode, which is used in quite a lot of places.

2 comments:

Anonymous said...

Hello. I've read msdn article about boxing and constrained opcode but it was not explained clearly there on my opinion. You provided really nice explanation on this subtle difference, thank you.
Yevgen

Anonymous said...

Actually, if you read Mr Richters book CLR Via C# its explained there quite decently. Brilliant book - big Richter fan.