Show Posts - SomebodySmart

1

UrlSnooper / Cannot find network adapter

« on: November 26, 2015, 07:49 AM »

I installed the latest update and try to run UrlSnooper 2 and it says it cannot find a network adapter.

2

General Software Discussion / Re: Is there software for this?

« on: June 15, 2015, 06:17 PM »

So... what are you doing, anyway?

-ayryq (June 15, 2015, 07:26 AM)

I'm building a genealogy website that will help persons trace their family trees. There's a lot of genealogy info in obituaries. I cannot copy the obituaries onto my new website for obvious copyright reasons but I can help persons find those obituaries on the funeral home and newspaper websites for free.

3

General Software Discussion / Re: Is there software for this?

« on: June 14, 2015, 08:49 PM »

Excellent! It looks like now I'll be able to build hyperlinks to every
obituary on every website built by FuneralOne.com ! Thanks.

As for GreaseMonkey, I don't know anything about that, but
I do use curl and my home-made Python 3.2 programs.

Go to http://www.pedersonfuneralhome.com/obituaries/ObitSearchList/1 (and increment the final number)
-ayryq (June 13, 2015, 07:18 PM)

The last page number is also stored within the source:

Code: HTML [Select]
<input type="hidden" id="totPages" value="83" />

This could probably be done with a GreaseMonkey script that cycles through each page grabbing the links and at the end displaying a page with all of them, which then could be saved using Save Page As ...

Just messing around, this is a heavily modified site scraper from http://blog.nparashuram.com/2009/08/screen-scraping-with-javascript-firebug.html

Currently it will start at the URL @ayryq mentioned above and load every page until the last one, (requires GreaseMonkey naturally), at a rate of about 1 every 3 seconds. It also grabs all the URLs from each page but as I haven't worked out how to store them yet, they get overwritten at each page load.
Code: Javascript [Select]
// ==UserScript==
// @name Get The Deadites
// @namespace http://blog.nparashuram.com/2009/08/screen-scraping-with-javascript-firebug.html
// @include http://www.pedersonfuneralhome.com/obituaries/ObitSearchList/*
// ==/UserScript==

/*
* Much modified from the original script for a specific site
*/

function loadNextPage(){
var url = "http://www.pedersonfuneralhome.com/obituaries/ObitSearchList/";
var num = parseInt(document.location.href.substring(document.location.href.lastIndexOf("/") + 1));
if (isNaN(num)) {
num = 1;
}
// If the counter exceeds the max number of pages we need to stop loading pages otherwise we go energizer bunny.
if (num < maxPage) {
document.location = url + (num + 1);
// } else {
// Reached last page, need to read LocalStore using JSON.parse
// Create document with URLs retreived from LocalStore and open in browser, user can then use Save Page As ...
}
}

function start(newlyDeads){
// Need to get previous entries from LocalStore (if exists)
// var oldDeads = localStorage.getItem('obits');
// if (typeof oldDeads === undefined) { // No previous data so just store the new stuff
// localStorage.setItem('obits', JSON.stringify(newlyDeads));
// } else {
// Convert to object using JSON.parse
// var tmpDeads = JSON.parse('oldDeads');
// Merge oldDeads and newlyDeads - new merged object stored in first object argument passed
// m(tmpDeads, newlyDeads);
// Save back to LocalStore using JSON.stringify
// localStorage.setItem('obits', JSON.stringify(tmpDeads));
// }

/*
* Dont run a loop, better to run a timeout sort of a function.
* Will not put load on the server
*/
var timerHandler = window.setInterval(function(){
window.clearInterval(timerHandler);
window.setTimeout(loadNextPage, 2000);
}, 1000); // this is the time taken for your next page to load
}

// https://gist.github.com/3rd-Eden/988478
// function m(a,b,c){for(c in b)b.hasOwnProperty(c)&&((typeof a[c])[0]=='o'?m(a[c],b[c]):a[c]=b[c])}

var maxPage;
var records = document.getElementsByTagName("A"); // Grab all Anchors within page
//delete records[12]; // Need to delete "Next" anchor from object (property 13)
var inputs = document.getElementsByTagName("INPUT"); // Grab all the INPUT elements
maxPage = inputs[2].value; // Maximum pages is the value of third INPUT tag
start(records);

The comments within the code are what I think should happen but I haven't tested it yet, (mainly because I can't code in Javascript ... but I'm perfectly capable of hitting it with a sledge hammer until it does what I want ... or I give up :P ).

Someone who actually does know Javascript could probably fill in the big blank areas in record time.
-4wd (June 13, 2015, 09:12 PM)

4

General Software Discussion / Re: Is there software for this?

« on: June 13, 2015, 07:00 PM »

I looked at Teleport Pro but it doesn't look like it will be able to scan
and download the output of scripts, just static pages.

well there are a few programs designed to "spider" a page and download all linked pages, images, etc.

one well known one is "Teleport Pro", but there are others.
-mouser (June 13, 2015, 05:52 PM)

5

General Software Discussion / Is there software for this?

« on: June 13, 2015, 01:37 PM »

I go to http://www.pedersonfuneralhome.com/obituaries/ and there's a list of twelve obituaries.

Each has a URL that is in the HTML code and is easy to capture, and the target file is easy to curl or wget,

but I want ALL the hundred of URLs to individual obituaries and I don't want to do work. The NEXT key actually produces a list of the next twelve but the VIEW SOURCE function still lists the first twelve in the source code. Now, is there a product that will download and capture everything one at a time so I can leave the machine on auto-pilot?

Messages - SomebodySmart [ switch to compact view ]

UrlSnooper / Cannot find network adapter

General Software Discussion / Re: Is there software for this?

General Software Discussion / Re: Is there software for this?

General Software Discussion / Re: Is there software for this?

General Software Discussion / Is there software for this?