Skip to main content

Getting RSS from a site that doesn't offer RSS using Dapper and Yahoo Pipes

I was asked if it was possible to get RSS from a site that doesn't offer RSS.

One site whose content I was interested in was "Community & Networks Connection" - it aggregates lots of "community and collaboration software" news.

 Although the site offers RSS feeds, the news in the RSS feed looks like this below - all of the articles are chunked into daily digests forcing you to click through to the site and never, ever, catching your eye.


Of course it would be possible to screen-scrape the data from the site and republish as RSS, maybe using a scripting language or the excellent ScaperWiki tool, but I really wanted something that anyone could use... in seconds.


Dapper To The Rescue

I began by visiting Dapper, a tool that lets you point and click and select which bits of a page you want to scrape. I began by clicking on the images of the news articles at the top.



After a little fiddling, you can choose whether you want that data in RSS or CSV or even as a Google Map. ( It really does take some fiddling and pruning to work out what you do here. Dapper is an astonishingly wonderful tool, I've never seen anything that does what it does with such elegance, but it does work once you've got your head around it. )

I could then choose to add my new RSS feed to my RSS Reader, but I actually made another Dapp that got the articles lower down the page. That now leaves me with two RSS feeds which I don't really want.

One of the "dapps" I created is here:
http://open.dapper.net/dapp-howto-use.php?dappName=CommunitiesandNetworkConnectionDapperVersion2



Yahoo Pipes To The Rescue

Yahoo Pipes is a wonderful visual tool for "piping" together different information sources and republishing it again. The pipe I created ( shown below ) looks like this and takes the two RSS feeds ( at the top ) from Dapper, joins them together ( Union ) , strips out any duplicates ( Unique ) and lastly filters out any junk posts.



The RSS feed that Yahoo Pipes creates is here:
http://pipes.yahoo.com/pipes/pipe.run?_id=10c40fa02b113c58042af74deead0c1a&_render=rss

And it looks a bit like this:


After a few minutes configuring using point and click tools, I can now keep in touch with the news from the site from my news reader. 

Comments

  1. It probably takes you away from 'something anyone can use' but Yahoo Pipes also asked you to create a feed fetching data using XPath.

    Thanks for reminding me about Dapper

    ReplyDelete
  2. XPath is a bit of a brain ache... I've only used it by copying and pasting other people's example. Thanks for that though.

    Also, there's also no way to, er, "pipe" input variables to Yahoo Pipes is there? That'd be handy... a sort of http://pipes.yahoo.com/pipe/f6gfge7geu6sdsfd?day=Tuesday

    ReplyDelete
  3. Hi Tom & Martin. Have you had any problems with Dapper misbehaving? I started using it a while ago, but got frustrated with Dapps failing regularly. Tom, re Input variables in Yahoo Pipes, I'm not sure it's exactly what you're after, but the "User inputs" allow you to publish the pipe and then people can enter their own search into it too. Here's an example: http://pipes.yahoo.com/pipes/pipe.info?_id=qLeMq8782xG2oyVwCB2yXQ

    ReplyDelete
  4. I hadn't used Dapper for a LLLOOOOONNNNG time, and re-discovered it last week. I must admit, it did use to have a high quirk factor, but this time seemed to work fine. I haven't tested it though.

    For simple jobs I think it's worth persevering with, especially if the alternative is .... "So, now let me teach you regular expressions or Xpath, and then we can take a look at cron" :-)

    ps. I didn't mean user inputs, I meant URL based inputs... thanks though.

    ReplyDelete
  5. I'll have to give it another go. I did like what it could do when it worked. That and Yahoo pipes suit me better than other programming methods.

    I see what you mean about the url inputs - yes, that would be very useful.

    ReplyDelete

Post a Comment

Popular posts from this blog

Inserting A Google Doc link into a Google Spreadsheet

This article looks at using Apps Script to add new features to a Google Spreadsheet.

At the University of York, various people have been using Google spreadsheets to collect together various project related information. We've found that when collecting lots of different collaborative information from lots of different people that a spreadsheet can work much better than a regular Google Form.

Spreadsheets can be better than Forms for data collection because:

The spreadsheet data saves as you are editing.If you want to fill in half the data and come back later, your data will still be there.The data in a spreadsheet is versioned, so you can see who added what and when and undo it if necessaryThe commenting features are brilliant - especially the "Resolve" button in comments.
One feature we needed was to be able to "attach" Google Docs to certain cells in a spreadsheet. It's easy to just paste in a URL into a spreadsheet cell, but they can often all look too si…

Writing a Simple QR Code Stock Control Spreadsheet

At Theatre, Film & TV they have lots of equipment they loan to students, cameras, microphone, tripod etc. Keeping track of what goes out and what comes back is a difficult job. I have seen a few other departments struggling with the similar "equipment inventory" problems.

A solution I have prototyped uses QR codes, a Google Spreadsheet and a small web application written in Apps Script. The idea is, that each piece of equipment ( or maybe collection of items ) has a QR code on it. Using a standard and free smartphone application to read QR codes, the technician swipes the item and is shown a screen that lets them either check the item out or return it.

The QR app looks like this.



The spreadsheet contains a list of cameras. It has links to images and uses Google Visualisation tools to generate its QR codes. The spreadsheet looks like this.


The Web Application The web application, which only checks items in or out and should be used on a phone in conjunction with a QR cod…

Getting CSV data into Google Spreadsheets Automatically

Today I was attempting to get CSV data from Estates' Alarm System into Google Docs as a spreadsheet. There were two ways to try and achieve this...


Create an AppScript in Google that pulled a .CSV file from a web serverWrite a (python) script on the local machine that pushed the data into Google Spreadsheet by using the API. The Google AppScript Way As you know, my JavaScript ain't great, but it initially looked like it was going to work... Some code like this below and using the Array to CSV functions from here, looked promising.



function encode_utf8( s ){
//This is the code that "I think" turns the UTF16 LE into standard stuff....
return unescape( encodeURIComponent( s ));
}

function get_csv(){
var url ='http://www-users.york.ac.uk/~admn812/alarms.csv.Active BA Alarms.csv';// Change this to the URL of your file
var response = UrlFetchApp.fetch(url);
// If there's an error in the response code, maybe tell someone
//MailApp.sendEmail("s.brown@york.ac.uk&qu…