When last I posted about Skepticator, it was a raw, fresh-faced little website with a few command-line apps behind it drawing content. It had something like 64 feeds aggregated, and was ticking along slowly with about 1000 posts to look through. But there was no easy way to actually dig for content. It was a very linear experience, and if you wanted just the latest Skeptical commentary, that was fine.
But that's not why I started it. So, since I last posted, I've been working away like mad to make Skepticator much, much cooler.
- Warning: Massive techy nerdy geek-out follows. If you're not a geek, you might succumb to sudden narcolepsy
One of the first things I made sure was available was a master RSS feed of all the content aggregated. That wasn't too hard - it leverages some of the same code behind the scenes as the main page, so it's almost as simple as a visual redesign, replacing the HTML with some XML. Shortly after that came twitter OAuth login, which allows users with a twitter account to tweet interesting articles right from the page. I was relatively pleased with that one, for the ShortURL generation code, written while moderately drunk. When you click to tweet the article, it does this:
- Shows a textbox with the article title
- Uses Ajax to call out to a back-end web service with the article URL
- The web service takes that Article URL and sends it to is.gd, and gets back a shortened URL
- This is passed back from the web service to Ajax and pasted into the tweet box
All very well and cute, but the site was still lacking that most essential of features, and the one it really needed form the start: Search.
I tried out a couple of approaches in how to do this best. Initially, I wrote a relatively simple SQL query-based keywork search interface. This didn't give very good results, so I resurrected a technique I'd used on SydneyPubGuide.net a few years back, which is to parse the search query, figure out if it needs AND, OR or whatever, remove noise words and build a custom SQL Query.
It still wasn't quite up to snuff, since relevance was a bit of a problem, and again I wanted to rid myself of the linear experience.
So in a brainwave, I decided to leverage Microsoft Search Server 2008.
|Yes, I provide consulting Services on this product. Email me for a quote ;-)
MSS2008 is a SharePoint family product, and I'm a SharePoint consultant by day, so that should have been easy. It's also a freebie
In fact, it was a little less than easy, since I hadn't planned on using it from the start. Had I built the original site with SharePoint in mind, I would have used a set of Sharepoint lists as the main data store, and pulled data from there. As it was, I built Skepticator on an ASP.NET/SQL platform, with no SharePoint layer at all.
Hmm. What to do?
Well, the first thing I needed to do was to create a new SharePoint site and get the Skepticator data into Sharepoint somehow. MSS can search websites, but that wasn't the experience I was looking for. It was necessary to get my Skepticator data into a SharePoint List. This I did with an extension to Skepticator's backend roving robot, Extracticator. Now, instead of just pulling and scanning RSS feeds, Extracticator also scans a new SharePoint list and adds new feed items to it as it goes. The database and SharePoint list are therefore kept in sync*. The list itself lives in a new WSS site on the Skepticator web server, accessible only via 127.0.0.1 with a host header name.
So, now I have a SharePoint list, which MSS indexes, and which I can search from SharePoint. How do I search it from Skepticator?
Well, MSS (and it's big brother, MOSS) provides a web service called search.asmx which you can use to query a SharePoint search index without actually needing a SharePoint interface. So, once I'd added some managed properties to ensure I could search the list on all appropriate properties, I got to work on Skepticator's /search.aspx page, which would call out to this web service, retrieve the data and display it in a manner familiar to site users. This, to be honest, was the easiest and most pleasant part of the process, since search.aspx has a method which returns .NET DataSet objects, which can just be databound to an asp:repeater object. Job done.
That done, I had a working, relevance-weighted, syntax-aware, enterprise-level search interface all ready to go.
But I still wasn't satisfied. I didn't just want to access search via querystring variables like this:
It works, sure, but it's ugly, it's not SEO-friendly and there's no cool value.
So I wrote an ASP.NET HTTPHandler, which does some URL rewriting for me. Now instead of the big ol' nasty URL above, you can use the more pretty:
Which is much nicer, and gives me an "infinite pages" effect to the site, allows search engines to give me a little more lovin' and is generally easier to use.
But I'm still not done. Oh no!
I want these search results to be available to anyone with a bit of programming nous or a decent RSS reader. So therefore I've added a link to every search page which allows you to get the results of your chosen search as RSS.
Yes, folks, you can watch a chosen topic by subscribing to its RSS feed. Want to stay up to date on what skeptics are currently saying about Power Balance?
Want to see what everyone thinks of Dave The Happy Singer?
These feeds are aso free of ads, since they're meant to be consumed by anyone out there who wants them.
Yes, that's right. Consume this data. Use it on your own sites. Do with it as you please.
All I ask is that you maintain politeness and cache a copy locally, just like skepticator itself does to the feeds it scans, rather than hitting up the data every time you need it (which can end up being a fairly high load)
So, what's next?
Oh, and I'm being interviewed by the "Skeptically Speaking" podcast on Monday morning about the Skepticator. Other podcasts will possibly follow (Hint hint, Mr Saunders).
And before I sign off, follow The Skepticator on Twitter for updates and join the fan page on Facebook.
* Eventually, the SQL Database will be removed and all the data migrated in-toto to SharePoint. Once I have a few free days. This data duplication is not exactly best-practice and must, eventually, die.