Sunday, April 24, 2016

doogal.co.uk now using https

For a while now, Google has been trying to get everyone to move their sites over to https. There’s lots of valid reasons to do this, although the majority of sites don’t really need it.

The carrot of improved rankings hadn’t prompted me to make the change but Chrome 50 removed support for geolocation services which I use in a number of places. So the site was broken in Chrome 50. And it was that stick that motivated me to make the switch.

One reason I’d held off from using https was the cost of a certificate. But things have moved on and it’s now possible to grab a certificate for nothing from Let’s Encrypt. It’s pretty simple to acquire and install a certificate using these instructions

So the switch has been made. For most users of the site, nothing should have changed, unless I’ve broken something (let me know!). Comments are currently being migrated, so you may find some comments aren’t where they should be. If you are grabbing data from the site directly you may need to change the URL you use from http:// to https://. 

Sunday, April 10, 2016

Sampling Strava again

I thought I’d repeat the experiment conducted by Mark Slavonia last year to see how the Strava usage numbers stack up now.
First I needed to know the number of signed up users of Strava. This is pretty straightforward, head off to https://www.strava.com/athletes/6161562 and keep increasing the number at the end until Strava says it can’t find a user. Last March there were 8.2 million users, now there are about 14.4 million, not a bad increase for just over a year.
Next I wanted to capture the active users and the premium users. Since I’m a techy, I can automate this process using the Strava API and a .NET wrapper around it. So I decided to sample 1 in every 10,000 users, giving me about 1,440 sample users which should give the results a reasonable accuracy.
After pulling down that data, the first thing I noticed was that 47 of my requests had returned ‘Not Found’ errors. In fact, most of these were grouped together, suggesting Strava decided to restart their numbering with larger IDs at some point. So the total number of users is probably just shy of 14 million.

Premium Users

Of the 1.393 users I had left, 28 were Premium users, so approximately 2% of all users. This figure is pretty close to last year’s figure so I’m happy to believe it. That equates to 280,000 premium users or $16.6 million in revenue for Strava.
As an aside, I’m a Premium member, but not because Strava offers particularly compelling features for Premium users, but mainly to show my support for a website that is exceedingly useful and fun. I suspect Strava could differentiate between free and Premium a lot more to increase the percentage of Premium users. Take a look at all the functionality available at veloviewer.com.

Active Users

The Strava API doesn’t let me get activity data for other users, so I’m not able to find out how active users are directly. But it does provide an Updated field, which I’m hoping gets updated when a user uploads an activity (the Strava API docs are a little vague on this point). Using last year’s definition of an active user being someone who has done something in the last 24 days, how many active users are there? I found 181 users where that Updated field was in the last 24 days. That’s about 13%, or 1.8 million active users. The percentage is again fairly close to last year’s figure so I’m happy to go with it.

Gender

I found 806 men and 270 women and 317 blanks. Ignoring the blanks, that almost exactly a 75% / 25% split between men and women.

By Country

I think my sample is too small to draw accurate conclusions from the home countries of Strava users, but lets play with the numbers anyway. 463 had blank entries for the country which leaves 930 users with a country specified. I’ve removed countries with less than 5 users in my sample and then adjusted for population. Below I’ve highlighted the countries where more than 2% of the population have signed up for Strava. It seems like there is massive potential to increase usage in many countries, although that may depend on whether there is a culture of recreational running and riding in these countries (China and India being the obvious biggest potential markets). And little old blighty, the UK, is top of the pile. Go UK!
Country Population Sampled users Approximate Strava users % of population
Australia 24,309,330 35 527,000 2.2
Austria 8,569,633 5 75,000 0.9
Belgium 11,371,928 6 90,000 0.8
Brazil 209,567,920 74 1,114,000 0.5
Canada 36,286,378 25 376,000 1.0
Chile 18,131,850 6 90,000 0.5
Colombia 48,654,392 9 135,000 0.3
Denmark 5,690,750 5 75,000 1.3
France 64,668,129 35 527,000 0.8
Germany 80,682,351 18 271,000 0.3
India 1,326,801,576 5 75,000 0.0
Ireland 4,713,993 9 135,000 2.9
Italy 59,801,004 37 557,000 0.9
Japan 126,323,715 9 135,000 0.1
Mexico 128,632,004 6 90,000 0.0
Netherlands 16,979,729 32 482,000 2.8
New Zealand 4,565,185 9 135,000 3.0
Philippines 102,250,133 8 120,000 0.1
Poland 38,593,161 5 75,000 0.2
Portugal 10,304,434 16 241,000 2.3
Romania 19,372,734 6 90,000 0.5
Russia 143,439,832 12 181,000 0.1
South Africa 54,978,907 16 241,000 0.4
South Korea 50,503,933 14 211,000 0.4
Spain 46,064,604 46 692,000 1.5
Switzerland 8,379,477 7 105,000 1.3
Taiwan 23,395,600 14 211,000 0.9
Ukraine 44,624,373 6 90,000 0.2
UK 65,111,143 153 2,303,000 3.5
US 324,118,787 233 3,508,000 1.1

Wednesday, April 06, 2016

Strava segments by year

I thought I’d do a little analysis of the data I’m slowly collecting on my Strava segment explorer. Every segment has a numeric ID and these started at 1* and each new segment gets an ID of one greater than the most recent segment. So it’s quite easy to figure out how many segments are getting created over time.
So since I don’t have a huge dataset to analyse, lets see how many segments had been created at the end of every year.
2016-04-06 (1)
So what’s this tell us? It shows the total number of segments created at the end of every year and it looks like since 2011, the number of segments created every year has remained fairly constant. I guess the interesting question would be whether creation of segments can be used as some kind of proxy for usage of Strava since Strava keep this information confidential? I think the answer to that is probably no. A new user in an area already choc full of segments probably isn’t going to feel the need to create more, although they may create a few personal ones (home to work etc). Long term users probably already have all the segments they need. A better approach would be to repeat this study from last year.
But I guess it does show Strava is still being actively used by its users, beyond that it’s hard to say anything definitive.
*Not entirely true, the lowest ID I’ve found is 96, but I imagine the ID of the first segment ever created was 1.

Monday, February 29, 2016

UK postcodes with altitude data

After much faffing around, I have finally added altitude data to my postcode data. Altitudes should be included with all the various CSV downloads. Hopefully these are reasonably accurate, but let me know if you find anything obviously wrong.

Saturday, February 27, 2016

UK house price data January 2016

I’ve uploaded the latest Land Registry data to my site. The source file is a lot bigger than usual and seems to contain quite a lot of old sales.

Prices are rising even faster and the number of sales is on the increase.

Friday, February 26, 2016

February 2016 UK postcode data

I’ve uploaded the latest ONS postcode data to my website. I have run my usual sanity checks and it all seems OK but let me know if you spot any problems

Wednesday, February 03, 2016

Cycling speed variations with time of year

I’ve always assumed summer is the time of year when I will cycle the fastest. My ride data seems to back this up, with my average summer riding speed higher than my winter speed, but there are so many variables it is difficult to be definitive about it. For instance, during the summer I’m more likely to head out of the city and ride on some roads without too much traffic and fewer red lights, so I manage to maintain a higher average speed (although another variable to throw in is I tend to go up bigger hills where I’m slower!). Also I ride more during the summer so my overall fitness is probably higher. So maybe the difference in speed is not weather or temperature related.

But I was still surprised to read someone suggesting they go faster in the winter. The only time this happens to me is when I’m out during a windy winter day and catch a nice tailwind.

But I thought I’d check the Strava data from my site. Which months have the most KOMs? Checking that removes at least one variable, different roads. First I had to update my site to store the KOM date for each segment then I had to grab some data. I chose segments in the UK, removing those that were less than 0.5km (too easily messed up with dubious GPS data) and removing segments with less than 100 riders (not competitive enough) and this is what I got.

image

This is a fairly small sample but it certainly suggests the summer months are the best time to grab a KOM, which also suggests the summer is the fastest time of the year for cycling. But then a thought crossed my mind, people tend to cycle more during the summer months, so perhaps it’s unsurprising that most KOMs are achieved then. So my conclusion is that I still have no idea and more research is required. And there are too many variables…
For my own recollection, this is the SQL I used to grab the data
SELECT MONTH(KOMDate), COUNT(*) FROM segments
WHERE Country='United Kingdom' AND
Distance>500 and
TriesCount>100 and
KOMDate IS NOT NULL
GROUP BY MONTH(KOMDate)
ORDER BY MONTH(KOMDate)

Monday, February 01, 2016

Thursday, December 31, 2015

UK house prices November 2015

You can now view and download UK house price data for November 2015 from my website. Not much to report, prices continue on their ever upward trajectory, transactions appear to be creeping upwards

Tuesday, November 24, 2015

Postcode data for November 2015

I’ve uploaded the latest ONS postcode data for November 2015 to my website, all 2,554,806 of them. I’ve run my usual checks but let me know if you spot anything that looks incorrect.

Sunday, November 22, 2015

Fixing Strava elevation data

For some reason, Strava actually trust the data that comes from their users. More specifically they use the elevation data from the user’s ride when the user creates a segment. From a technical point of view, this is definitely the easiest thing to do, but unfortunately GPS devices do occasionally lose their mind, so the data can be a mess. This can lead to garbage segment data, like this. A glance at the elevation profile makes it obvious that something is amiss. This dodgy data then means any derived data is also dubious, such as the climb category and the VAM numbers. The KOM rider on this particular segment has a VAM of 9,992 which is over 5 times what a drugged up Lance Armstrong could achieve. Even my average VAM on category 4 climbs is over 1,000 which suggests I could make a good fist of keeping up with a bunch of professional cyclists. Which I couldn’t. Ever.
In an ideal world, Strava would fix up these dodgy segments in some way. One fix would be to average out all the elevation data from every rider who has ridden a segment. Alternatively, they could use the elevation data from one of the mapping services. Finally, they could make it easier to report bad data.
So whilst we wait for Strava to fix this issue, I thought I’d have a play with the second option. My Strava segment search tool now has the ability to view segments as well as view them on Strava. This is what the example segment looks like. It use Google Maps to calculate the elevation of the segment and adds that to the elevation profile, along with calculated statistics.

Monday, November 09, 2015

UK stations data

As a prelude to some other work I might one day get round to, I’ve uploaded a list of UK train stations to my website. It comes in CSV and KML flavours, with the KML highlighting the busiest stations (mostly in the South East, as if you need to ask).

Thursday, October 29, 2015

UK property sale data for September 2015

It’s the end of the month so it’s time to upload the latest Land Registry property data to my website. The data crunching is still in progress but I’m off on holiday for a few days so can’t wait for it to complete before posting here (calculating all the various averages can take quite some time). Predictably enough, the data shows house prices continuing their upward march.

One sale this month caught my eye. Flat 4, 19 Terrapin Road was the first flat we bought, back in 1999. It’s just changed hands again, for a cool £530,000. So in 16 years, the price has increased over fivefold… Just one example of the insanity of the London housing market.

Tuesday, September 29, 2015

UK house prices August 2015

I’ve uploaded the latest house sale data from the Land Registry to my website. Prices seems to be ticking up at an increasing rate, numbers of sales are not changing much. One would imagine without an increase in volumes, prices can’t remain at their current high level. but I could have said the same thing for the past 7 years…

Saturday, August 29, 2015

UK Property Sales July 2015

I have uploaded the latest Land Registry house price data for Jul 2015 to my site. Prices continue to rise moderately and sales continue to be at a low level

Wednesday, August 26, 2015

August 2015 UK postcode data

I’ve uploaded the latest UK postcode data to my website. It now contains 2,551,959 postcodes, including live and terminated postcodes. I’ve run sanity checks on the data and all appears well but let me know if you spot any problems.

Friday, August 07, 2015

How to top a Strava segment leaderboard

Strava is all about the segments and bragging rights are gained by being top of the leaderboard for a segment. But for those of us living in areas with many other cyclists, we are very unlikely to be fast enough to top most of the local segments, even if we get pushed along by a massive tailwind. Here’s an example near me, with nearly 50,000 riders attempting it over half a million times. The leaderboard contains a number of professional cyclists, since a number of races have passed through, so I’m never going to get anywhere near the top (since you asked, I’m at about 10,000 currently).
But we all want our own KOM/QOM, so what to do?

Find an obscure segment

Head over to my Strava segment search tool and zoom in and pan around a bit. You should see quite a few more segments than you’ll find with Strava’s own search. They’ll generally be less popular segments and hence more likely to have beatable times. 

Create your own obscure segment

I have a couple of KOMs for my rides to and from work. These are fairly meaningless KOMs since I’m the only person to ride one of the segments and the other one has only been ridden by one other rider. But if that’s enough to make you feel you’ve made it as a rider, then go ahead and create your own segment. For this to work, you need to decrease the chance of anyone else riding your segment, so stick to obscure roads, make the segment fairly long and choose a route that nobody would ever normally follow. This is a brilliant example, one day I’ll get round to riding it to see if I can top the leaderboard

Keep riding

A while back I headed out on a ride, going down some roads I haven’t explored before. On getting home I discovered I’d topped a leaderboard without even trying. Admittedly only 10 other people have ridden the segment but it still counts!

Cheat

There are websites that will take the output from your bike computer and shift it around so it appears you went faster than you did. No, really. It’s obvious chasing after Strava KOMs is a fairly pointless activity, but cheating to do it has to be the most ridiculous thing ever.
But what about inadvertently cheating? Whilst on a ride, my GPS went a bit haywire and for a few minutes I was at the top of a leaderboard. I guess the algorithms at Strava spotted the mistake (or the former holder of the KOM) and I was demoted pretty quickly. But what about this one? Everybody on the first page of that leaderboard is averaging over 78mph, which is very impressive for a hilly segment round Richmond Park. But if you look at the actual rides for those amazing times, none of them bear any relation to the actual segment, they are just in the same general area. Figure out how that bug works and you could be topping lots of leaderboards.