Saturday, January 21, 2012

Do it yourself inbound link alerts

Embedded Analytics provide a nice service that will email you whenever somebody clicks on a new link to your site. I’ve been signed up for a while and it’s interesting to see who’s linked to my site. But I received an email last week informing me that my site had so many inbound links that I would have to start paying for the service. To be fair the amount they were going to charge me wasn’t a lot, but I couldn’t really justify spending money on something that is essentially just a way to waste a bit of time for me. And I also figured I could probably do the same thing myself through the Google Analytics API, since this is what Embedded Analytics uses.

I’m assuming that Embedded Analytics uses the source for visitors to your site to spot new links. There is a downside to this since it won’t spot links that have been added but have not been clicked on, but generally these won’t be that interesting, since they presumably are links on low traffic sites.

So to implement this requires a few steps. Pull out the data from Google Analytics and store this data somewhere (DB, XML file, whatever). Then next time we pull the data out of Google, check for new URLs in the returned data and send a notification of these new URLs. Embedded Analytics also goes a step further and validates that the links are valid and that the pages containing them are available from the web. I was only really interested in the first part of this solution so have written a piece of code to pull out the URLs using the Google Data API for .NET. The rest of the work is left as an exercise for the reader!

using System;
using Google.GData.Analytics;

namespace GoogleAnalytics
{
  class Program
  {
    static void Main(string[] args)
    {
      AccountQuery feedQuery = new AccountQuery();
      AnalyticsService service = new AnalyticsService("DoogalAnalytics");
      service.setUserCredentials("email", "password");

      DataQuery pageViewQuery = new DataQuery("https://www.google.com/analytics/feeds/data");
      pageViewQuery.Ids = "ga:202885";
      pageViewQuery.Metrics = "ga:visits";
      pageViewQuery.Dimensions = "ga:source,ga:referralPath";
      pageViewQuery.Sort = "ga:source,ga:referralPath";
      pageViewQuery.GAStartDate = DateTime.Now.AddMonths(-1).ToString("yyyy-MM-dd");
      pageViewQuery.GAEndDate = DateTime.Now.ToString("yyyy-MM-dd");

      DataFeed feed = service.Query(pageViewQuery);
      for (int i = 0; i < feed.Entries.Count; i++)
      {
        DataEntry pvEntry = (DataEntry)feed.Entries[i];
        string host = pvEntry.Dimensions[0].Value;
        string path = pvEntry.Dimensions[1].Value;
        Console.WriteLine("http://" + host + path);
      }

      Console.ReadLine();
    }
  }
}

1 comment:

Mark Schenkel said...

Chris,

Mark here from EmbeddedAnalytics. First off, your overall your assessment of how InboundLinkAlerts works is right on. We basically create a data warehouse of your existing site links by querying the ga:source;ga:referralpath combination. Then we periodically (a user specified input) query this same combination over the last 24 hours and compare those links to those in the warehouse. You are correct that once we find a new one we do a verification to see if it is public facing. And you are absolutely correct that links that are just "dangling", ones no one has ever clicked on, will not activate an alert (where maybe a crawler will discover this).

We are trying to target people with non-technical backgrounds or those who don't want to dive into the API. And we are also trying to add additional features:

- We are now parsing the referring links and classifying them according to popular CMS (e.g. blogger, wordpress, Disqus comments, Pinterest).

- We are building in features on which you can suppress links emanating from specific domains. For example, if you know there is a site that is constantly making links to your site, and these really are of no interest to you, you can set a filter against the domain.

- We recently built in controls for redirects. Often blogging sites will move articles to archive sections; and this can trigger a new alert. Now when we detect a link from a domain we compare all other links that had been discovered from that domain to see if it just being redirected to a new page.

Overall the service has been well received. For those viewing this who don't want to dive into the GA Analytics API, but want a pretty dependable service to alert you of new links to your site, I encourage you to check out InboundLinkAlerts from EmbeddedAnalytics. Very inexpensive ($4.95-$19.95 per year depending on the size of the site).