Web Scraping and Data Extraction Made Easy

105 12
As of 2011, there is over 5 million terabytes of data on the internet. This amounts to over 5 million home computers filled to their full capacity. And this number doubles every 5 years.

All this information is accessible to all of us and most of it is free. Unfortunately, the way this data is presented to us is not particularly fit for a business to work with. A Google search will display 10 to 100 results, a YellowPages results page will show us 30 results, and an eBay results will show us 25 to 200 results. Presented in such a way that it makes it easy for an average user to navigate and look around. But it doesn't make it easy for a business or organization to store, analyze and process this information.

And this is where web scraping comes handy. I have googled already for months, looking for a solution to my data extraction needs. I found a few companies offering their web scraping services but at a ridiculously high rate. I also checked some freelancing sites and found some professionals dedicated to this. Prices were better, but still a little high for something that a computer program could do. I'm more of a do-it-yourself kind of person anyway. So how about some web scraping software?

Although there are several out there, Helium Scraper is perhaps the easiest, yet powerful one I have ever found. It's relatively new, so you might have not heard about it. When I first tried it, I was actually quite disappointed by how elementary the main screen looked. But after following the basic tutorial that is included with it, and playing around a little bit, I managed to set it up to extract data that would have been impossible to extract with any other web scraper I have tried before.

This is how it works, in a nutshell:

First, you create some items called kinds. These are the way you tell Helium Scraper what is what. Basically, you highlight a few elements in a page, and say "this are phone numbers" or "this are links" or "this are whatever". Then Helium Scraper finds a pattern and recognizes what you meant by "phone numbers", "links" or "whatever".

Next, you create the actions you want Helium Scraper to perform with the kinds you just created. Here you can automate it to perform just any action you would normally do with a browser, such as clicking or navigating through links, plus, of course, extracting data. They are organized as an intuitive tree where you, for instance, would add an "Extract" and a "Navigate" action inside a "Repeat" action to have Helium Scraper repeatedly extract information from a search results page and then navigate to the next page.

Even though Helium Scraper doesn't require any programming skills, you could greatly benefit from some JavaScript knowledge. I'm myself not a computer programmer, but with a little googling, I've managed to set it up to perform more complicated tasks, such as automatically filling and submitting forms, simulate user selections in combo boxes, and processing the results before being extracted to the database.

If you want to give Helium Scraper a try, just go to http://www.heliumscraper.com and download the free trial. I'm sure you won't be disappointed.


Source...

Leave A Reply

Your email address will not be published.