Harvest the webhttp://www.outwit.com/
If you desperately need...
- a list of all the design studios in London,
- your dream job ads in an Excel file every day,
- hundreds of photos of your favorite movie star,
- all available PDF files about Semantic P2P...
And if you are tired of scrolling down Web
pages, scanning text and compulsively
clicking, cutting and pasting for hours:
Here is the first beta release of
OutWit Hub, your Web Collection Engine.
Our mission is to provide the Web community with a simple Web automation environment (finally) allowing everyone to harvest data elements, documents or media from virtually any public (and legal) source of content. The technology is open and will eventually provide an API and wizards to build simple and efficient tools.
The OutWit Platform is composed of a kernel that contains a large library of data recognition and extraction functions, around which an unlimited number of original extensions —called outfits— can be developed, using the kernel's features for specific applications.
An outfit is a small extension with its own user interface, features, scripts and directory of Web sources. Some outfits will be developed by us, but most, hopefully, will be developed by our users who have a specific need or passion.
Our first outfit, OutWit Hub, is a multi-purpose development & showcase application, in which we have gathered the largest possible number of features, hoping to cover a large spectrum of needs. The Hub will keep evolving and should become a very useful tool for advanced users. However, the real objective of this technology is to build simple, straight-to-the-point applications:
1. a simple tool to collect images
2. a simple tool to find a job
3. a simple tool to follow the news in handball
4. a simple tool to ... (this is why we will never succeed without you.)
At this point, the applications are countless; only you can steer us in the right direction(s).
The OutWit team, May 2008
Sometimes I think it's not worthy to 'harvest' webpages, keep a webpage versioned, revised and updated is more and more popular. ;DSometimes, pages go missing from the internet, though - which sucks. For some kinds of information (research, source code snippets, technical info, reverse-engineering related matters, ...) keeping a local copy can be very nice.
Just save URL to del.icio.us and check it out later.
Is the Web evolving from user-generated to user-refreshed ?-electronixtar (June 11, 2008, 10:47 PM)
http://www.outwit.com/products/firstRun.php?version=0.7.0.183&id={5fb1186a-3398-4c47-b579-0f2eee222ad1}-cmpm (June 13, 2008, 07:16 PM)
So, what is OutWit, in a word? OutWit is a Web collection engine for everyone. It runs on your Windows, MacOS or Linux machine and allows you to browse through and easily grab information, images, contacts or files from the Internet, in a few clicks. The question, when looking for anything on the Internet, is two-fold: find the pertinent data and make it usable for your purposes. Both processes can prove extremely time-consuming and both can be vastly improved using OutWit Hub. Originally conceived for researchers and data managers, the program is bringing Web scraping tools to everyone for both business and personal use. Just browse the Web for pages that include the information you are looking for; OutWit will scan the pages to recognize the data structure and format it into tables, allowing you to rate it and easily export it to files, spreadsheets or databases for later use. |