Friday, December 30, 2016

Property sale data November 2016

I’ve uploaded the latest house price data from the Land Registry for England and Wales to my website. I’ve also made some changes to the way annual changes are calculated. This means the annual change figure is much less volatile, but it does mean changes in the directions of travel are slower to appear. The current annual change is 3.5% but reducing each month. If the current trend continues, the annual change will go negative in about 9 months time.

UK stations updated with 2016 usage data

I’ve uploaded the UK station usage data for 2016 to my site. I’ve also made a few minor tweaks to the layout which hopefully make the data more useful

Monday, November 28, 2016

Land Registry house price data for October 2016

I’ve uploaded the October 2016 Land Registry house price data to my website. Prices are just about managing to maintain a positive annual increase. Although houses are continuing their upward march, there seems to have been a mini collapse in the price of flats, I’m guessing due to the Buy To Let tax changes.

Friday, October 28, 2016

Friday, September 30, 2016

England and Wales house price data August 2016

As I type this, my server is churning through the latest Land Registry data for August 2016. The top level data has been imported and it suggests prices continue to glide ever upwards. The only difference seems to be the annual change is getting smaller, whereas it’s been around 5% for the last few years, it’s now around 2%.

If you look back at the data for earlier this year, you can now clearly see the spike in sales in March just before the buy to let Stamp Duty hike. Sales now seem to have returned to the levels seen after the financial crisis.

Wednesday, August 31, 2016

Tuesday, August 23, 2016

Wednesday, August 17, 2016

Google Maps Distance Matrix may not be what you’re after

For many years I’ve been using the Google Maps APIs on my website. It’s been fun to use and until recently the licensing has been very unrestrictive. If an API returned a response saying you’d gone over the query limit, just wait for a second or so, try again and generally it would work. So with some use of setTimeout, it was possible to build reasonably scalable apps that cost nothing.

It’s looking like those days are coming to an end. The various APIs are starting to introduce hard limits on their usage. Once you’re over the limit, that’s it until the counter resets at the start of the next day. I first hit this with my use of the Directions API in my Driving Distances page. I can’t say I’m too happy with the way it was introduced, the Google API Console had given no previous indication of my usage of the API but on the same day as they started displaying the usage report, the hard limit was also introduced. Since my site was way over the limit, that page fell over almost immediately.

So I had a Baldrick cunning plan. I’d swap out the Directions API for the Distance Matrix API. This is exactly the kind of application this API was designed for. Unfortunately I failed to read the usage limits correctly and after I uploaded the new code, the page fell over in a heap again after a few hours. It turns out the usage limits apply to the elements passed to the Distance Matrix API, not the number of requests. So a 10 by 10 matrix counts as 100 towards the free 2,500 limit, not 1 as I had assumed. Given that this API provides less information and has fewer options than the Directions API but has the exact same usage limits, this is rather disappointing!

I am trying to figure out the best way forward now. I could start to pay for extra requests, but since a 100 by 100 matrix would cost $5, the costs could mount up quickly. I can put a maximum daily cost on the account so I don’t have to pay enormous amounts if someone overuses the page, but this could lead to the page becoming unavailable again.

I suspect the outcome will be me removing the page, or at least no longer linking to it from the rest of the site. I make a tidy sum from Google AdSense advertising on the site, but I fear this may just be the start, as more and more APIs start to introduce a hard limit and I don’t particularly want to pay that money back to Google every month to pay for their APIs. It was fun whilst it lasted I guess!

Saturday, July 09, 2016

textareas slow in Chrome

So there’s a page on my site, https://www.doogal.co.uk/BatchGeocoding.php, that someone complained about. Specifically, they complained that if they tried to geocode 3,000 postcodes, it was terribly slow. I tried it myself and experienced the same problem. When geocoding postcodes, the page uses my own internal database, so it should suffer none of the throttling issues of Google Maps. No worries I thought, I can reproduce the problem, which is generally the biggest hurdle, fixing it should be straightforward.

So I fired up Chrome’s profiler and found… absolutely nothing… None of the delays were in my code. So I tried Microsoft Edge and it was super quick. I pretty much gave up at that point and suggested the user tried MS Edge.

Three weeks later I had a look with fresh eyes. And something popped up from the recesses of my mind, spellcheck=”false”. I vaguely remembered setting that attribute on a textarea in the past had improved performance and once again, this fixed the issue. A single geocode was previously taking a second, now all 3,000 took a couple of minutes. This may be a bug in Chrome or maybe spellchecking is a very CPU intensive process. Either way, turning it off makes everything better.

As always this is just a reminder for me and maybe it will be useful to someone passing through.

Tuesday, May 31, 2016

Land Registry sales data for April 2016

I’ve uploaded the latest Land Registry data to my site (although the postcode level summary data is still being generated). The annual change has dropped to 1%, which may well be the first effects of the BTL tax changes recently introduced.

Wednesday, May 25, 2016

UK postcode data for May 2016

I’ve uploaded the latest UK postcode data to my website. Well, nearly all of it. Northern Irish postcodes are still from 2008 since the nice folk at NISRA still release their data under a restrictive license. I assumed at some point they would come into line with the rest of the UK and provide the postcode data with a liberal license, but it doesn’t seem to be happening. Since the data I have is now very old (from a time when the data was released with a liberal license), I am considering removing it from the site. Let me know if you find it useful and I’ll keep it online. And maybe send a polite request to NISRA to open up their data…

Sunday, April 24, 2016

doogal.co.uk now using https

For a while now, Google has been trying to get everyone to move their sites over to https. There’s lots of valid reasons to do this, although the majority of sites don’t really need it.

The carrot of improved rankings hadn’t prompted me to make the change but Chrome 50 removed support for geolocation services which I use in a number of places. So the site was broken in Chrome 50. And it was that stick that motivated me to make the switch.

One reason I’d held off from using https was the cost of a certificate. But things have moved on and it’s now possible to grab a certificate for nothing from Let’s Encrypt. It’s pretty simple to acquire and install a certificate using these instructions

So the switch has been made. For most users of the site, nothing should have changed, unless I’ve broken something (let me know!). Comments are currently being migrated, so you may find some comments aren’t where they should be. If you are grabbing data from the site directly you may need to change the URL you use from http:// to https://. 

Sunday, April 10, 2016

Sampling Strava again

I thought I’d repeat the experiment conducted by Mark Slavonia last year to see how the Strava usage numbers stack up now.

First I needed to know the number of signed up users of Strava. This is pretty straightforward, head off to https://www.strava.com/athletes/6161562 and keep increasing the number at the end until Strava says it can’t find a user. Last March there were 8.2 million users, now there are about 14.4 million, not a bad increase for just over a year.

Next I wanted to capture the active users and the premium users. Since I’m a techy, I can automate this process using the Strava API and a .NET wrapper around it. So I decided to sample 1 in every 10,000 users, giving me about 1,440 sample users which should give the results a reasonable accuracy.

After pulling down that data, the first thing I noticed was that 47 of my requests had returned ‘Not Found’ errors. In fact, most of these were grouped together, suggesting Strava decided to restart their numbering with larger IDs at some point. So the total number of users is probably just shy of 14 million.

Premium Users

Of the 1.393 users I had left, 28 were Premium users, so approximately 2% of all users. This figure is pretty close to last year’s figure so I’m happy to believe it. That equates to 280,000 premium users or $16.6 million in revenue for Strava.

As an aside, I’m a Premium member, but not because Strava offers particularly compelling features for Premium users, but mainly to show my support for a website that is exceedingly useful and fun. I suspect Strava could differentiate between free and Premium a lot more to increase the percentage of Premium users. Take a look at all the functionality available at veloviewer.com.

Active Users

The Strava API doesn’t let me get activity data for other users, so I’m not able to find out how active users are directly. But it does provide an Updated field, which I’m hoping gets updated when a user uploads an activity (the Strava API docs are a little vague on this point). Using last year’s definition of an active user being someone who has done something in the last 24 days, how many active users are there? I found 181 users where that Updated field was in the last 24 days. That’s about 13%, or 1.8 million active users. The percentage is again fairly close to last year’s figure so I’m happy to go with it.

Gender

I found 806 men and 270 women and 317 blanks. Ignoring the blanks, that almost exactly a 75% / 25% split between men and women.

By Country

I think my sample is too small to draw accurate conclusions from the home countries of Strava users, but lets play with the numbers anyway. 463 had blank entries for the country which leaves 930 users with a country specified. I’ve removed countries with less than 5 users in my sample and then adjusted for population. Below I’ve highlighted the countries where more than 2% of the population have signed up for Strava. It seems like there is massive potential to increase usage in many countries, although that may depend on whether there is a culture of recreational running and riding in these countries (China and India being the obvious biggest potential markets). And little old blighty, the UK, is top of the pile. Go UK!

Country Population Sampled users Approximate Strava users % of population
Australia 24,309,330 35 527,000 2.2
Austria 8,569,633 5 75,000 0.9
Belgium 11,371,928 6 90,000 0.8
Brazil 209,567,920 74 1,114,000 0.5
Canada 36,286,378 25 376,000 1.0
Chile 18,131,850 6 90,000 0.5
Colombia 48,654,392 9 135,000 0.3
Denmark 5,690,750 5 75,000 1.3
France 64,668,129 35 527,000 0.8
Germany 80,682,351 18 271,000 0.3
India 1,326,801,576 5 75,000 0.0
Ireland 4,713,993 9 135,000 2.9
Italy 59,801,004 37 557,000 0.9
Japan 126,323,715 9 135,000 0.1
Mexico 128,632,004 6 90,000 0.0
Netherlands 16,979,729 32 482,000 2.8
New Zealand 4,565,185 9 135,000 3.0
Philippines 102,250,133 8 120,000 0.1
Poland 38,593,161 5 75,000 0.2
Portugal 10,304,434 16 241,000 2.3
Romania 19,372,734 6 90,000 0.5
Russia 143,439,832 12 181,000 0.1
South Africa 54,978,907 16 241,000 0.4
South Korea 50,503,933 14 211,000 0.4
Spain 46,064,604 46 692,000 1.5
Switzerland 8,379,477 7 105,000 1.3
Taiwan 23,395,600 14 211,000 0.9
Ukraine 44,624,373 6 90,000 0.2
UK 65,111,143 153 2,303,000 3.5
US 324,118,787 233 3,508,000 1.1

Wednesday, April 06, 2016

Strava segments by year

I thought I’d do a little analysis of the data I’m slowly collecting on my Strava segment explorer. Every segment has a numeric ID and these started at 1* and each new segment gets an ID of one greater than the most recent segment. So it’s quite easy to figure out how many segments are getting created over time.

So since I don’t have a huge dataset to analyse, lets see how many segments had been created at the end of every year.

2016-04-06 (1)

So what’s this tell us? It shows the total number of segments created at the end of every year and it looks like since 2011, the number of segments created every year has remained fairly constant. I guess the interesting question would be whether creation of segments can be used as some kind of proxy for usage of Strava since Strava keep this information confidential? I think the answer to that is probably no. A new user in an area already choc full of segments probably isn’t going to feel the need to create more, although they may create a few personal ones (home to work etc). Long term users probably already have all the segments they need. A better approach would be to repeat this study from last year.

But I guess it does show Strava is still being actively used by its users, beyond that it’s hard to say anything definitive.

*Not entirely true, the lowest ID I’ve found is 96, but I imagine the ID of the first segment ever created was 1.

Monday, February 29, 2016

UK postcodes with altitude data

After much faffing around, I have finally added altitude data to my postcode data. Altitudes should be included with all the various CSV downloads. Hopefully these are reasonably accurate, but let me know if you find anything obviously wrong.

Saturday, February 27, 2016

UK house price data January 2016

I’ve uploaded the latest Land Registry data to my site. The source file is a lot bigger than usual and seems to contain quite a lot of old sales.

Prices are rising even faster and the number of sales is on the increase.

Friday, February 26, 2016

February 2016 UK postcode data

I’ve uploaded the latest ONS postcode data to my website. I have run my usual sanity checks and it all seems OK but let me know if you spot any problems

Wednesday, February 03, 2016

Cycling speed variations with time of year

I’ve always assumed summer is the time of year when I will cycle the fastest. My ride data seems to back this up, with my average summer riding speed higher than my winter speed, but there are so many variables it is difficult to be definitive about it. For instance, during the summer I’m more likely to head out of the city and ride on some roads without too much traffic and fewer red lights, so I manage to maintain a higher average speed (although another variable to throw in is I tend to go up bigger hills where I’m slower!). Also I ride more during the summer so my overall fitness is probably higher. So maybe the difference in speed is not weather or temperature related.

But I was still surprised to read someone suggesting they go faster in the winter. The only time this happens to me is when I’m out during a windy winter day and catch a nice tailwind.

But I thought I’d check the Strava data from my site. Which months have the most KOMs? Checking that removes at least one variable, different roads. First I had to update my site to store the KOM date for each segment then I had to grab some data. I chose segments in the UK, removing those that were less than 0.5km (too easily messed up with dubious GPS data) and removing segments with less than 100 riders (not competitive enough) and this is what I got.

image

This is a fairly small sample but it certainly suggests the summer months are the best time to grab a KOM, which also suggests the summer is the fastest time of the year for cycling. But then a thought crossed my mind, people tend to cycle more during the summer months, so perhaps it’s unsurprising that most KOMs are achieved then. So my conclusion is that I still have no idea and more research is required. And there are too many variables…

For my own recollection, this is the SQL I used to grab the data

SELECT MONTH(KOMDate), COUNT(*) FROM segments
WHERE Country='United Kingdom' AND
Distance>500 and
TriesCount>100 and
KOMDate IS NOT NULL
GROUP BY MONTH(KOMDate)
ORDER BY MONTH(KOMDate)