Sunday, June 13, 2010

Importing the Code-Point dataset into SQL Server

Many moons ago, I imported my own postcode dataset into SQL Server but ever since the Ordnance Survey have made their Code-Point dataset available for free, I’ve been meaning to import that, since it’s likely their data will be more correct than my own.

It comes in a series of CSV files and contains the postcode, easting, northing and various other bits of data. Unfortunately the documentation is somewhat lacking so it’s not entirely clear what all the other data is. But those three bits of data are all I was after.

So I wrote a little console application to import the data, that looks like this

using System;
using System.Data.SqlClient;
using System.IO;
using DotNetCoords;
using LumenWorks.Framework.IO.Csv;

namespace ImportCodepoint
{
  class Program
  {
    static void Main(string[] args)
    {
      string[] files = Directory.GetFiles(@"C:\Users\Doogal\Downloads\codepo_gb\Code-Point Open\data\CSV");
      foreach (string file in files)
      {
        Console.WriteLine("Importing file " + Path.GetFileName(file));
        ReadFile(file);
      }

    }

    private static void ReadFile(string file)
    {
      using (StreamReader reader = new StreamReader(file))
      {
        CsvReader csvReader = new CsvReader(reader, false);
        using (SqlConnection conn = new SqlConnection(
          "Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=Postcodes;Data Source=DOOGAL-LAPTOP\\SQLEXPRESS"))
        {
          conn.Open();
          foreach (string[] data in csvReader)
          {
            string postcode = data[0];
            double easting = double.Parse(data[10]);
            double northing = double.Parse(data[11]);

            // convert easting/northing to lat/long
            OSRef osRef = new OSRef(easting, northing);
            LatLng latLng = osRef.ToLatLng();
            latLng.ToWGS84();

            using (SqlCommand command = conn.CreateCommand())
            {
              command.CommandText = string.Format("INSERT INTO Postcodes (Postcode, Coordinates) " +
                "VALUES ('{0}', geography::STPointFromText('POINT({2} {1})', 4326))",
                data[0], latLng.Latitude, latLng.Longitude);
              command.ExecuteNonQuery();
            }
          }
        }
      }
    }
  }
}

A couple of points, I’ve used this CSV parser, rather than re-inventing the wheel. It would be pretty easy to do this parsing myself, since only a few fields are quoted and no fields contain any special characters, but I’ve been pretty happy with this CSV parser in the past so saw no point re-inventing the wheel.

I’ve also used my .Net coordinates library to convert GB OS references to latitude/longitude, but obviously you can store the postcodes using eastings and northings if you prefer.

The final thing to do is to check that the imported data looks sensible. The easiest way to do this is to query the table in SQL Server Management Studio and look at the spatial results. Unfortunately this is limited to 5000 results, so I used this little bit of SQL (based on this blog post) to select a subset of the data to make sure the general shape looked correct. Obviously this doesn’t check all the data but did build some confidence that what I’d done was correct.

SELECT 
   EXPR2.*
FROM 
   (
   SELECT 
      Postcode, Coordinates,
      RowNumber,
      (EXPR1.RowNumber % 400) AS ROW_MOD
   FROM
      (
      SELECT 
         Postcode, Coordinates, 
         ROW_NUMBER() OVER ( ORDER BY Postcode ) AS 'RowNumber'
      FROM 
         Postcodes 
      ) EXPR1
   ) EXPR2
WHERE 
   EXPR2.ROW_MOD = 0

6 comments:

Graham Morgan said...

Great post Doogal. It inspired me to check out PowerShell to see how the same thing could be scripted.
Cheers, Graham

Jordon Pilling said...

This is awesome & will save me a lot of time! the OS documention on this makes me cry. But can i ask one question, How acurate is the conversion from OS Northings / Eastings to WGS84 LatLongs? We'd like to use the resulting lat longs for route planning in MS MapPoint - accurate enough for that?

Chris Bell said...

The conversion should be accurate enough, never had a problem with it myself. The only issue you might have is using just a postcode for route planning. In some places a single postcode might cover quite a large area.

Jordon Pilling said...

Cheers Chris, we knocked up a vb version of your importer last night and did some spot checks, seems to be perfect for what we want. (we basically want to use them when MapPoint cant GeoCode a postcode itself we'll use this OS data as a 2nd source of information - in this scenario an approximate location on the route is better than sticking it at the bottom of the run sheet as we currently do). Thanks again!

Tim Almond said...

Thanks. The importer is great.

DotNetSi said...

Hey, thanks for posting this - Great code, works a treat!