Adam Howitt's Blog

Jun 10
2005

Improving RSS stats

Duo Consulting has just launched the redeisgned site and we have integrated Raymond Camden's BlogCFC with the site (Ray: Goodies to follow as a thank you).  The first question that came up post launch is how do we know how many unique RSS subscribers we have and how do we know if they clicked on a link?

Good question.  Knowing that web stats are usually voodoo at best and grossly misunderstood at worst, I have adapted BlogCFC in the following way to be able to get the following information:

   1. How many unique subscribers do we have?
   2. How long has each subscriber been receiving the feed?
   3. How many times does the average subscriber click through to the site from a post?

The answers to these questions are really good metrics for measuring the impact of your RSS feed on traffic but also a good way to judge the mood of your readership about the topics you cover.

Okay so how to modify the blog you ask?  The strategy is to add a unique key for each visitor to the website and what better than the datetime stamp right down to the millisecond?  This is only flawed if you care that there is the tiny chance that two people sign up at exactly the same time. The next step is to detect that unique key when a request for the rss feed comes in and if present, append all the links in the RSS feed with the same key.  You now have a threaded experience where you can observe trends in your stats by querying your weblogs and some stats packages (livestats, webtrends) will allow you to create custom reports to monitor these things.

The Code

First edit the includes/pods/rss.cfm file as follows.  Replace the current rss href links with this snippet:

<cfif not isdefined("request.uk")>
    <cfset request.uk = dateformat(now(),'yyddmm')&timeformat(now(),'HHmmssl')>
</cfif>
<a href="rss.cfm?mode=short&amp;uk=#request.uk#">
    #application.resourceBundle.getResource("shortmode")#
</a> /
<a href="rss.cfm?mode=full&amp;uk=#request.uk#">
    #application.resourceBundle.getResource("fullmode")#
</a><br />
The other change is the blog.cfc for the method generateRSS.  Somewhere prior to the XML generation add the following snippet:
<!--- URL appender if a userkey is passed to help track rss feed usage/conversions --->
<cfif structKeyExists(arguments.params,"uk")>
    <cfset urlAppender = "&amp;uk=" & arguments.params.uk>
</cfif>
then skip down to the XML builder itself and update it as follows:
...
<items>
    <rdf:Seq>
        <cfloop query="articles">
            <rdf:li rdf:resource="#instance.blogURL#?#xmlFormat("#instance.blogItemURLPrefix##id#")##urlAppender#" />
        </cfloop>
    </rdf:Seq>
</items>

</channel>
</cfoutput>
</cfsavecontent>

<cfsavecontent variable="items">
<cfloop query="articles">
    <cfset dateStr = dateFormat(posted,"yyyy-mm-dd")>
    <cfset dateStr = dateStr & "T" & timeFormat(posted,"HH:mm:ss") & "-" & numberFormat(z.utcHourOffset,"00") & ":00">
    <cfoutput>
          <item rdf:about="#instance.blogURL#?#xmlFormat("#instance.blogItemURLPrefix##id#")##urlAppender#">
        <title>#xmlFormat(title)#</title>
        <description>
        <cfif arguments.mode is "short" and len(body) gte arguments.excerpt> #xmlFormat(left(body,arguments.excerpt))#...
        <cfelse>#xmlFormat(body)#
        </cfif><cfif len(morebody)> [More]</cfif>
        </description>
    <link>#instance.blogURL#?#xmlFormat("#instance.blogItemURLPrefix##id#")##urlAppender#</link>
    <dc:date>#dateStr#</dc:date>
    <dc:subject>#categoryNames#</dc:subject>
    </item>
...
The last change is to the root rss.cfm file after the params variable is constructed to add this snippet
<cfif isDefined("url.uk")>
    <cfset params.uk = url.uk>
</cfif>
and then ?reinit=1 your blog.  If you hover over your RSS feed links, you should now see a unique user key (uk).

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
[Add Comment] [Subscribe to Comments]
  1. One thing to keep in mind is that if your subscribers are using services like bloglines - bloglines is going to download your feed for each user so your going to use some more bandwidth.

    One Idea I have had for this is to include an invisible tracking image in your rss feed, kind of like they way its done with email. I think feedburner might do this.

  2. Adam: It's even worse than Pete suggests. There are very good reasons that everyone isn't doing this.

    (1) If your feed URLs are different each time a client autodiscovers them, then you can end up with Feedster, Technorati, PubSub, Bloglines, and so on requesting dozens of versions of your feed, over and over. You're basically setting up a DoS of your own box.

    (2) Much worse than that, you're adding your time token to each item's permalink. Yeow! Those permalinks are an essential part of RSS apps' duplicate tracking mechanisms... if your token changes every time an aggregator visits, every one of your entries will be imported into the database as if they were new. Permalinks are sacred... don't mess with 'em.

    (3) Near as I can tell, the current code is kinda broken anyway. I just requested your index page in two different browsers on two different machines, and got the same token every time.

  3. Roger, Thanks for the comments. I disagree that it is as big a problem as you make out. Certainly there is a problem with point number 3 which is due to the scopecache mechanism used by BlogCFC so I know I can look at that. Thanks for the heads up.

    To address the other 2 points: 1. Good point that there could be multiple versions of the feed but I'm not sure I understand the depth of the issue. Does technorati index all the pages like a google looking for unique feeds? I can think of a couple of ways round this from using a session variable to ensure that the unique visitor key remains constant for a single visitor or using the user_agent cgi variable to determine if one of these exceptions exists. 2. Once you subscribe to my feed you use a single URL to access the feed and if that has a UK in it then you will always use the same UK. The same works for the aggregators. They subscribe to a single URL for the feed and this means they will always see the same format for the URLs. An example: You visit the site and subscribe to my feed. The URL you subscribe to is http://www.xyz.com/rss.cfm?uk=061205133545111. When your aggregator or feed reader checks that feed, every URL has the UK you subscribed with for example the first post might be: http://www.xyz.com/index.cfm?entry=1&uk=061205133545111 A different would have the same URL but with the different UK. All the articles a single users sees in the feed will have a consisent URL for each subsequent pull so they can still determine if the feed has changed as ever.

    The only danger I see is that the efficiency of something like bloglines only having to index my site once for 1000 different users is now lost because I have 1000 requests for the page instead of 1. Bloglines may get pissy because they have 1000 copies of my site feed but my web server can handle 1000 requests if need be and this code is about me analyzing my site to try to understand the value and maintain quality.

    Caching is a valuable device but it comes at the cost of being able to measure the success of your site. Pete suggested an invisible tracking image like FeedBurner but it is still a request which has to be acknowledged and logged by the server.

    If bandwidth isn't your biggest priority but understanding the reach of your message and readership is a priority then I would contest that this approach is still valid.

  4. Adam: I've been thinking about your goals, and I think I can recommend a solution that will get you what you want without hosing up anything else.

    Just switch your feeds over to RSS 2.0. Put your uk-enabled URI into the link element, and the non-uk version into the guid element.

    Poof... problem solved. Modern aggregators will always try to use guid for identification purposes, and you'll still get your link tracking.

  5. Roger, Good thoughts! I'll take a look at that this weekend. I appreciate your interest in the problem.

    Adam

  6. While I suggest you consider Roger's suggestion, one quick way to handle the caching is ot just move scope cache INSIDE the layou calls. Then only the meat of the page is cached. You lose some of the caching gains, but your layout (where the RSS link is contained) can be dynamic.

    Looking forward to those gifts. ;)

[Add Comment]