Importing the Code-Point dataset into SQL Server
Many moons ago, I imported my own postcode dataset into SQL Server but ever since the Ordnance Survey have made their Code-Point dataset available for free, I’ve been meaning to import that, since it’s likely their data will be more correct than my own.
It comes in a series of CSV files and contains the postcode, easting, northing and various other bits of data. Unfortunately the documentation is somewhat lacking so it’s not entirely clear what all the other data is. But those three bits of data are all I was after.
So I wrote a little console application to import the data, that looks like this
using System; using System.Data.SqlClient; using System.IO; using DotNetCoords; using LumenWorks.Framework.IO.Csv; namespace ImportCodepoint { class Program { static void Main(string[] args) { string[] files = Directory.GetFiles(@"C:\Users\Doogal\Downloads\codepo_gb\Code-Point Open\data\CSV"); foreach (string file in files) { Console.WriteLine("Importing file " + Path.GetFileName(file)); ReadFile(file); } } private static void ReadFile(string file) { using (StreamReader reader = new StreamReader(file)) { CsvReader csvReader = new CsvReader(reader, false); using (SqlConnection conn = new SqlConnection( "Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=Postcodes;Data Source=DOOGAL-LAPTOP\\SQLEXPRESS")) { conn.Open(); foreach (string[] data in csvReader) { string postcode = data[0]; double easting = double.Parse(data[10]); double northing = double.Parse(data[11]); // convert easting/northing to lat/long OSRef osRef = new OSRef(easting, northing); LatLng latLng = osRef.ToLatLng(); latLng.ToWGS84(); using (SqlCommand command = conn.CreateCommand()) { command.CommandText = string.Format("INSERT INTO Postcodes (Postcode, Coordinates) " + "VALUES ('{0}', geography::STPointFromText('POINT({2} {1})', 4326))", data[0], latLng.Latitude, latLng.Longitude); command.ExecuteNonQuery(); } } } } } } }
A couple of points, I’ve used this CSV parser, rather than re-inventing the wheel. It would be pretty easy to do this parsing myself, since only a few fields are quoted and no fields contain any special characters, but I’ve been pretty happy with this CSV parser in the past so saw no point re-inventing the wheel.
I’ve also used my .Net coordinates library to convert GB OS references to latitude/longitude, but obviously you can store the postcodes using eastings and northings if you prefer.
The final thing to do is to check that the imported data looks sensible. The easiest way to do this is to query the table in SQL Server Management Studio and look at the spatial results. Unfortunately this is limited to 5000 results, so I used this little bit of SQL (based on this blog post) to select a subset of the data to make sure the general shape looked correct. Obviously this doesn’t check all the data but did build some confidence that what I’d done was correct.
SELECT EXPR2.* FROM ( SELECT Postcode, Coordinates, RowNumber, (EXPR1.RowNumber % 400) AS ROW_MOD FROM ( SELECT Postcode, Coordinates, ROW_NUMBER() OVER ( ORDER BY Postcode ) AS 'RowNumber' FROM Postcodes ) EXPR1 ) EXPR2 WHERE EXPR2.ROW_MOD = 0

1 comments:
Great post Doogal. It inspired me to check out PowerShell to see how the same thing could be scripted.
Cheers, Graham
Post a Comment