With 'Dapper', you can treat the web like a database

What is Dapper? Well, according to the creators, dapper is...

... a service that allows you to extract and use information from any website on the Internet. For those familiar with web services, you can think of Dapper as an API maker. For the rest of you, Dapper allows you to build web applications and mashups using data from any website without any programming.

This means that you can treat the web like a database - well, almost. You can also call it a glorified screen-scrapper' with an Ajax frontend and a service to back it up.

A long time ago, I was inspired by an article with a similar theme. The title of the article was '.NET Web Data ToolKit'. You can find it here.

If you compare the two, with Dapper, we get the data published in standard, easily consumable formats like XML, RSS, JSON etc. With 'web data toolkit' (WDT), it is more programmatic and the results are returned as ADO.Net data sets. The author Tony Loton describes WDT as...

...a toolkit that allows you to SELECT nuggets of information FROM live web pages as though selecting from tables in a relational databas, using industry-standard SQL. Imagine if you could capture the results into an ADO.NET DataTable, allowing you to treat web data like any other data source.

Though this is a powerful idea, it has some drawbacks. To start with, we are dealing with some-one-else's data - it is after all screen scrapping! If you are OK with it and wants to take it to the next level - say, take any set of web site(s) and then mash them up to create totally new web services, then you stumble upon the next drawback. Which is the lack of a workflow.

In most non-trivial usecases the website exposes the data by taking the user through a sequence of steps. An example step could include a login, followed by a click on a link (say my account name) which than takes me to to my data (say my account details). So to get to the 'real' data, the user have to follow multiple steps and certain steps could be involve HTTP 'POST' and not get just simple "GET's. So the service should have support for executing a workflow of steps. On each step, we should be able to control the actual HTTP data being send to the server and also be able to specify alternate control path if things go wrong. So the support for a WorkFlow engine is critical.

The next problem with Dapper would be privacy. I cannot create a web service with Dapper to get to my personal data (say my bank details) as I would have to share my login credentials with the site. So what we really need is a toolkit that I can embed in my application - similar to Web Data ToolKit.