Welcome Guest.   Make a donation to an author on the site October 24, 2014, 07:42:00 PM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
Read the Practical Guide to DonationCoder.com Forum Search Features
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: IDEA: Copy text of web page while following links  (Read 4034 times)
kab122
Supporting Member
**
Posts: 5


View Profile Give some DonationCredits to this forum member
« on: June 05, 2006, 06:46:20 AM »

When reading an article on the web, you often have to click
Next >
Next >
Next > Next > Next > Next > Next >

Bleh.

I would love a macro which would copy all the text on a page, append it to a text file, and then click "next" for me. It would repeat copying/clicking until "next" is not found. Don't need the images, just the text.

Sometimes the link is "next page" sometimes "continue to page 2" etc. Perhaps the macro could prompt for the target text, or use the currently highlighted text.

Thanks!
Logged
kab122
Supporting Member
**
Posts: 5


View Profile Give some DonationCredits to this forum member
« Reply #1 on: June 07, 2006, 06:31:24 AM »

No takers, eh?

With much mucking about, I wrote one myself. Any review comments are welcome.

[copy or print]
; CONFIG: CHOOSE YOUR HOTKEY ( # = Winkey )
f_Hotkey = #q
f_TargetText = Next
f_Pause = Y
f_outfile = harvest.txt

; END OF CONFIGURATION SECTION

; -----------------------------------
; Documentation:
; Selects all text on web page,
; copies it to file,
; searches web page for "next" button and clicks it,
; repeat.

; -----------------------------------
; Entry script:

#SingleInstance  ; Needed since the hotkey is dynamically created.

Hotkey, %f_Hotkey%, f_Doit

MsgBox, Ready to start, hotkey is "%f_Hotkey%"

return

; -----------------------------------
; Hotkey script:

f_Doit:

WinGetActiveStats, Title, Width, Height, X, Y

; -- Header

InputBox, f_TargetText, Search Key, Enter text linking to next page (searching from bottom of page).
if ErrorLevel <> 0
return

f_Pause = Y
MsgBox, 4, , Pause per screen?
IfMsgBox, No
f_Pause = N
; Otherwise, the user picked yes.

FileDelete %f_outfile%

FileAppend Harvest of window:  , %f_outfile%
FileAppend %Title% `n, %f_outfile%

; -- Body
Loop  ; Since no number is specified with it, this is an infinite loop unless "break" or "return" is encountered inside.
{

Send, {CTRLDOWN}a{CTRLUP}
Sleep, 50

; copy
Send, {CTRLDOWN}c{CTRLUP}
Sleep, 50

; save
FileAppend `n, %f_outfile%
FileAppend ----- `n, %f_outfile%
FileAppend %clipboard%, %f_outfile%

; find
Send, {CTRLDOWN}f{CTRLUP}
WinWait, Find,
IfWinNotActive, Find, , WinActivate, Find,
WinWaitActive, Find,

; WinWait, Microsoft Internet Explorer,

;Target string:
Send, %f_TargetText%
Sleep, 50

Send, {ALTDOWN}u{ALTUP}{ENTER}
Sleep, 50

Send, {ESC}
Sleep, 50

; WinWait, autohotkey - Google Search - Microsoft Internet Explorer,
; IfWinNotActive, autohotkey - Google Search - Microsoft Internet Explorer, , WinActivate, autohotkey - Google Search - Microsoft Internet Explorer,
; WinWaitActive, autohotkey - Google Search - Microsoft Internet Explorer,

Send, {TAB}{ENTER}
Sleep, 250

; Stop if search displays not found dialog
IfWinExist, Microsoft Internet Explorer
{
WinActivate
break
}

If f_Pause = N
{
WinWaitActive, %Title%,
Sleep, 1500
}
else
{
; Let them see bottom of page
Send, {CTRLDOWN}{END}{CTRLUP}
MsgBox, 4, , Would you like to continue?
IfMsgBox, No
break
}
}
; end loop

FileAppend `n, %f_outfile%
FileAppend ----- `n, %f_outfile%
FileAppend EOF `n, %f_outfile%

MsgBox Done.
;return
ExitApp
Logged
mouser
First Author
Administrator
*****
Posts: 33,590



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #2 on: June 07, 2006, 09:50:42 AM »

wow that's pretty cool - i'm going to give it a try.
maybe a nice function to add (if possible) is to print each page?
Logged
hitmark
Participant
*
Posts: 6


View Profile Give some DonationCredits to this forum member
« Reply #3 on: June 15, 2006, 08:44:21 PM »

does it allso copy layout tags? or will it just convert the whole page into a "big" txt file?

what about a article with many pictures? or maybe frames or similar to add stuff to diffrent sides of the main text?

and i take it that it will only work with IE...
Logged
jity2
Charter Member
***
Posts: 69

View Profile Give some DonationCredits to this forum member
« Reply #4 on: July 30, 2006, 05:50:26 PM »

Hello,
In case you haven't tried: try first to print the text using the print link inside the webpage article (not the EDIT/Print option of your browser). It often gather all the text at once! This is ok for many articles found on online journals.
Hope this helps, smiley
Jity
Logged
kab122
Supporting Member
**
Posts: 5


View Profile Give some DonationCredits to this forum member
« Reply #5 on: August 03, 2006, 02:08:28 PM »

jity2

Ya, I know exactly what you are describing. Wish it were that easy.
I'm thinking they must get $ for the ads on every page I visit  tongue

Besides, I just needed the text; don't want to print all the images/ads.

I did this to print off recaps of Lost and House from:
http://www.televisionwithoutpity.com
The writers are absolutely great, lots of yuks for me during my daily commute.
Logged
jity2
Charter Member
***
Posts: 69

View Profile Give some DonationCredits to this forum member
« Reply #6 on: August 03, 2006, 03:27:28 PM »

Hi Kab,

Now with the website, I see! ;-)
I have no direct abd easy anwser! Maybe someone can do a script with Perl (just a guess!)?
Maybe also gather all links in html at once and keep only text?

Example:
page 1
http://www.televisionwith.../articles/content/a12215/
page 2
http://www.televisionwith...ntent/a12215/index-1.html
Page 3
http://www.televisionwith...ntent/a12215/index-2.html
...
page 13
http://www.televisionwith...tent/a12215/index-13.html



Here is an idea that would work: I would use the shareware "Macro Magic".
(select manually the number of times the macro will repeat)
record the macro (mouse gesture plus click) : click on the page, CTRL+A ; copy text; go to all_text.txt (created before and already openned before the macro work), paste text, go back to html , click on the page, scroll down , and click next with mouse

This should work. ;-)
See ya ;-)
Jity


Logged
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.03s | Server load: 0.06 ]