Improvements to Skepticator

When last I posted about Skepticator, it was a raw, fresh-faced little website with a few command-line apps behind it drawing content. It had something like 64 feeds aggregated, and was ticking along slowly with about 1000 posts to look through. But there was no easy way to actually dig for content. It was a very linear experience, and if you wanted just the latest Skeptical commentary, that was fine.

But that's not why I started it. So, since I last posted, I've been working away like mad to make Skepticator much, much cooler.

 - Warning: Massive techy nerdy geek-out follows. If you're not a geek, you might succumb to sudden narcolepsy

One of the first things I made sure was available was a master RSS feed of all the content aggregated. That wasn't too hard - it leverages some of the same code behind the scenes as the main page, so it's almost as simple as a visual redesign, replacing the HTML with some XML. Shortly after that came twitter OAuth login, which allows users with a twitter account to tweet interesting articles right from the page. I was relatively pleased with that one, for the ShortURL generation code, written while moderately drunk. When you click to tweet the article, it does this:

  • Shows a textbox with the article title
  • Uses Ajax to call out to a back-end web service with the article URL
  • The web service takes that Article URL and sends it to is.gd, and gets back a shortened URL
  • This is passed back from the web service to Ajax and pasted into the tweet box

All very well and cute, but the site was still lacking that most essential of features, and the one it really needed form the start: Search.

I tried out a couple of approaches in how to do this best. Initially, I wrote a relatively simple SQL query-based keywork search interface. This didn't give very good results, so I resurrected a technique I'd used on SydneyPubGuide.net a few years back, which is to parse the search query, figure out if it needs AND, OR or whatever, remove noise words and build a custom SQL Query.

It still wasn't quite up to snuff, since relevance was a bit of a problem, and again I wanted to rid myself of the linear experience.

So in a brainwave, I decided to leverage Microsoft Search Server 2008.

Yes, I provide consulting services on this product. Email me for a quote ;-)
Yes, I provide consulting Services on this product. Email me for a quote ;-)

MSS2008 is a SharePoint family product, and I'm a SharePoint consultant by day, so that should have been easy. It's also a freebie

In fact, it was a little less than easy, since I hadn't planned on using it from the start. Had I built the original site with SharePoint in mind, I would have used a set of Sharepoint lists as the main data store, and pulled data from there. As it was, I built Skepticator on an ASP.NET/SQL platform, with no SharePoint layer at all.

Hmm. What to do?

Well, the first thing I needed to do was to create a new SharePoint site and get the Skepticator data into Sharepoint somehow. MSS can search websites, but that wasn't the experience I was looking for. It was necessary to get my Skepticator data into a SharePoint List. This I did with an extension to Skepticator's backend roving robot, Extracticator. Now, instead of just pulling and scanning RSS feeds, Extracticator also scans a new SharePoint list and adds new feed items to it as it goes. The database and SharePoint list are therefore kept in sync*. The list itself lives in a new WSS site on the Skepticator web server, accessible only via 127.0.0.1 with a host header name.

So, now I have a SharePoint list, which MSS indexes, and which I can search from SharePoint. How do I search it from Skepticator?

Well, MSS (and it's big brother, MOSS) provides a web service called search.asmx which you can use to query a SharePoint search index without actually needing a SharePoint interface. So, once I'd added some managed properties to ensure I could search the list on all appropriate properties, I got to work on Skepticator's /search.aspx page, which would call out to this web service, retrieve the data and display it in a manner familiar to site users. This, to be honest, was the easiest and most pleasant part of the process, since search.aspx has a method which returns .NET DataSet objects, which can just be databound to an asp:repeater object. Job done.

That done, I had a working, relevance-weighted, syntax-aware, enterprise-level search interface all ready to go.

But I still wasn't satisfied. I didn't just want to access search via querystring variables like this:

http://skepticator.com/search.aspx?q=Rebecca%20Watson&p=1

It works, sure, but it's ugly, it's not SEO-friendly and there's no cool value.

So I wrote an ASP.NET HTTPHandler, which does some URL rewriting for me. Now instead of the big ol' nasty URL above, you can use the more pretty:

http://skepticator.com/search/Homeopathy

or

http://skepticator.com/search/Scientology

or

http://skepticator.com/search/Rebecca%20Watson

Which is much nicer, and gives me an "infinite pages" effect to the site, allows search engines to give me a little more lovin' and is generally easier to use.

But I'm still not done. Oh no!

I want these search results to be available to anyone with a bit of programming nous or a decent RSS reader. So therefore I've added a link to every search page which allows you to get the results of your chosen search as RSS.

Yes, folks, you can watch a chosen topic by subscribing to its RSS feed. Want to stay up to date on what skeptics are currently saying about Power Balance?

Try http://skepticator.com/searchrss.aspx?q=power%20balance

Want to see what everyone thinks of Dave The Happy Singer?

Try http://skepticator.com/searchrss.aspx?q=happy%20singer

 These feeds are aso free of ads, since they're meant to be consumed by anyone out there who wants them.

Yes, that's right. Consume this data. Use it on your own sites. Do with it as you please. 

All I ask is that you maintain politeness and cache a copy locally, just like skepticator itself does to the feeds it scans, rather than hitting up the data every time you need it (which can end up being a fairly high load)

So, what's next?

Well, I'm thinking I might work on some ways to make these feeds easier to consume for non-technical users. Perhaps a wordpress plugin, a javascript-based feed widget, or a simple one-line means of including the data. Maybe I'll also start building an API, using web services. Whatever I do, I want to ensure that Skepticator does something good for the skeptical community, even if it's just the core mission of providing easier access to skeptical writing.

Oh, and I'm being interviewed by the "Skeptically Speaking" podcast on Monday morning about the Skepticator. Other podcasts will possibly follow (Hint hint, Mr Saunders).

And before I sign off, follow The Skepticator on Twitter for updates and join the fan page on Facebook.

 

* Eventually, the SQL Database will be removed and all the data migrated in-toto to SharePoint. Once I have a few free days. This data duplication is not exactly best-practice and must, eventually, die.

posted @ Saturday, March 27, 2010 9:25 PM

 
 
 

Comments on this entry:

# re: Improvements to Skepticator

Left by Davo at 3/28/2010 2:31 PM
Gravatar
It would be nice to be able to access it via json? :) If as you say building an API it would have lower overhead than XML.

# re: Improvements to Skepticator

Left by Jason at 3/28/2010 2:36 PM
Gravatar
JSON could be doable. Might need to study the standard(s) before I dive in too deeply. At least with Web Services I know the standards, and I can do SOAP or Simple XML

# re: Improvements to Skepticator

Left by David peabody at 4/19/2010 8:32 AM
Gravatar
I just wanted to check how you get your blog added to the main feed that everyone can see. Rather than just appearing in the searches. Thanks for your time, You can get back to me at the blog or my email. Keep up with your work on your blog and skepticator. They are both excellent.
Comments have been closed on this topic.
«March»
SunMonTueWedThuFriSat
252627282912
3456789
10111213141516
17181920212223
24252627282930
31123456
 
Vaccination Saves Lives: Stop The Australian Vaccination Network
 
 
Say NO to the National School Chaplaincy Program