Sunday, June 29, 2014

Land Registry May 2014 data

I’ve uploaded the Land Registry data for May 2014 to doogal.co.uk. For all the talk of a house price bubble, the Land Registry data (arguably the most accurate of all the house price indices) doesn’t seem to show much movement at all over the last few months. Of course it’s a different story in London. There was a story about the housing insanity in Hackney a few months ago, and looking at the sales in East London does show prices shooting up over the last year.

Saturday, May 31, 2014

Land Registry house price data for April 2014

The Land Registry house price data for April 2014 is now up on my site

Tuesday, May 27, 2014

ONS postcode data for May 2014 uploaded to doogal.co.uk

I’ve just uploaded the latest postcode data from the ONS to my site. There are over 2.5 million postcodes in there, alive and dead. My data checks suggest everything is in order, but let me know if you find a problem.

Monday, May 19, 2014

Downloading Javascript generated data

I have a number of web pages that generate some data in text areas using Javascript. The only way users could download this data was to copy and paste the contents of these text areas, but I wanted to add a download button to simplify the process. The problem is that this simply isn’t possible in Javascript. The only client-side solutions I’ve seen either require Flash or are not supported in all browsers.

So I came up with a slightly hacky and inefficient solution. The basic idea is to post the data to the server and get the server to return it to the client as a download. The HTML looks like this

      <form action="download.php" method="post">
        <div>
          <input type="hidden" name="fileName" value="locations.csv" />
          <input type="submit" value="Download" />
        </div>
        <textarea id="geocodedPostcodes" style="width:100%;" rows="20" name="data"></textarea>
      </form>

All that is needed is a hidden field that tells the server-side script what the download file name should be and a text area with a name of “data”.

The server-side script is pretty simple, it looks like this

<?php
  header('Content-Disposition: attachment; filename="' . $_POST["fileName"] . '"');

  // add data
  print($_POST["data"]);
?>

All it does is get the requested file name and echo back the data.

It’s seems a bit crazy (and a waste of bandwidth) that this seems to be the only way to achieve a seemingly simple task, but that looks to be the case. I’d be happy to be proved wrong.

Sunday, May 11, 2014

Help me go on a bike ride

Last year I saw the various Ride London rides on the telly and rolling through Kingston and fancied doing it myself. Riding round London and Surrey on traffic-free roads is very appealing, compared to the usual stop-start, take your life in your hands experience of cycling round these parts. So the first chance I had, I applied in the ballot for the Ride London-Surrey ballot. And in January I heard I’d missed out on getting a place.

There was one more option. Sign up with a charity and raise some money and get a guaranteed place. So I decided to try and help out Cancer Research. Why Cancer Research? Primarily because cancer affects so many people at all stages of life but also, on a personal level, one of my partner’s best friends lost her life to cancer a couple of years ago, before she reached the age of 40.

I’ve set up a page for donations, added a link from my website and been amazed by the number of people who don’t know me who’ve already donated. If you’ve found this blog or my website useful, or are just feeling generous, then please consider donating some money. I will certainly appreciate it, as will Cancer Research.

Wednesday, April 30, 2014

Land Registry March 2014 data uploaded

I’ve uploaded the Land Registry house price data for March 2014 to my website. Now that probably all the sales data for 2013 has come in, it’s plain to see sales volumes were up in 2013 and prices continue to drift upwards

Friday, April 18, 2014

The perils of micro-optimisations

A debate has been raging on my website over the use of StringBuilder.AppendFormat in my exception logger code. OK, raging is something of an exaggeration, there have been two comments in two years. But the point made by two people is that rather than

error.AppendLine("Application: " + Application.ProductName);

I should be using

error.AppendFormat("Application: {0}\n", Application.ProductName);

Since this means I wouldn’t be using string concatenation, which is considered bad for performance reasons. My main reason for not doing anything about this is because I’m lazy, but also because the whole point of this code is that it only runs when an exception is thrown, which hopefully is a pretty rare event, so performance is not a major concern.

But then I wondered what the difference in performance is between these two approaches? So I wrote a little test application that looks like this.

    static void Main(string[] args)
    {
      for (int j = 0; j < 10; j++)
      {
        // try using AppendLine
        Console.WriteLine("AppendLine");
        StringBuilder error = new StringBuilder();
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < 1000000; i++)
        {
          error.AppendLine("Application: " + Application.ProductName);
        }
        sw.Stop();
        Console.WriteLine(sw.ElapsedMilliseconds);

        // try using AppendFormat
        Console.WriteLine("AppendFormat");
        error.Clear();

        sw.Restart();
        for (int i = 0; i < 1000000; i++)
        {
          error.AppendFormat("Application: {0}\n", Application.ProductName);
        }
        sw.Stop();
        Console.WriteLine(sw.ElapsedMilliseconds);
      }

      Console.ReadKey();
    }

The results from this app in milliseconds are as follows (reformatted for clarity)

AppendLine 307 315 321 372 394 370 289 298 300 296
AppendFormat 366 360 362 471 353 359 354 365 365 350

So which is quicker? Well it looks like AppendLine might be marginally quicker. But, much more importantly, who the feck cares? We are repeating each operation 1 million times and the time to execute is still less than half a second. Maybe you can pick holes in my test application, but again I would ask who the feck cares? Either approach is really fast.

And this is the main problem with trying to optimise this kind of stuff. We can spend huge amounts of time figuring out if one approach is quicker than another, but a lot of the time is doesn’t matter. Either the code runs quick enough using any sensible approach, or it’s hit so infrequently that even a really poor implementation will work.

Of course we should consider performance whilst writing code, but we should only use particular approaches when we know they are going to produce more performant code. A good example is the StringBuilder class. We can be pretty sure this is going to be better than using string concatenation, otherwise it wouldn’t exist in the first place. That said, if you’re concatenating two strings I really wouldn’t worry about it.

But the key to writing efficient code is to understand what is slow on a computer. Network operations are slow. Disk access is slow. Because of that, anything that requires large amounts of memory (meaning virtual memory i.e. disk access) is slow. Twiddling bits in memory is quick. Fast code is achieved by avoiding the slow stuff and not worrying about the quick stuff.

And once you’ve written your code and found it doesn’t run ask quick as you’d hoped, don’t jump in and replace calls to AppendLine with calls to AppendFormat, profile your application! Every time I profile an application, I’m always amazed at the causes of the performance bottleneck, it’s rarely where I thought it would be.

If you don’t have a profiler, use poor man’s profiling. There are also free profilers available, I quite liked the Eqatec Profiler which seems to be available from various download sites, although it’s no longer available from Eqatec. But whatever you do, don’t get into Cargo Cult Programming