topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Sunday December 15, 2024, 6:34 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: IDEA: Copy text of web page while following links  (Read 8179 times)

kab122

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5
    • View Profile
    • Donate to Member
IDEA: Copy text of web page while following links
« on: June 05, 2006, 06:46 AM »
When reading an article on the web, you often have to click
Next >
Next >
Next > Next > Next > Next > Next >

Bleh.

I would love a macro which would copy all the text on a page, append it to a text file, and then click "next" for me. It would repeat copying/clicking until "next" is not found. Don't need the images, just the text.

Sometimes the link is "next page" sometimes "continue to page 2" etc. Perhaps the macro could prompt for the target text, or use the currently highlighted text.

Thanks!

kab122

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5
    • View Profile
    • Donate to Member
Re: IDEA: Copy text of web page while following links
« Reply #1 on: June 07, 2006, 06:31 AM »
No takers, eh?

With much mucking about, I wrote one myself. Any review comments are welcome.

; CONFIG: CHOOSE YOUR HOTKEY ( # = Winkey )
f_Hotkey = #q
f_TargetText = Next
f_Pause = Y
f_outfile = harvest.txt

; END OF CONFIGURATION SECTION

; -----------------------------------
; Documentation:
; Selects all text on web page,
; copies it to file,
; searches web page for "next" button and clicks it,
; repeat.

; -----------------------------------
; Entry script:

#SingleInstance  ; Needed since the hotkey is dynamically created.

Hotkey, %f_Hotkey%, f_Doit

MsgBox, Ready to start, hotkey is "%f_Hotkey%"

return

; -----------------------------------
; Hotkey script:

f_Doit:

WinGetActiveStats, Title, Width, Height, X, Y

; -- Header

InputBox, f_TargetText, Search Key, Enter text linking to next page (searching from bottom of page).
if ErrorLevel <> 0
return

f_Pause = Y
MsgBox, 4, , Pause per screen?
IfMsgBox, No
f_Pause = N
; Otherwise, the user picked yes.

FileDelete %f_outfile%

FileAppend Harvest of window:  , %f_outfile%
FileAppend %Title% `n, %f_outfile%

; -- Body
Loop  ; Since no number is specified with it, this is an infinite loop unless "break" or "return" is encountered inside.
{

Send, {CTRLDOWN}a{CTRLUP}
Sleep, 50

; copy
Send, {CTRLDOWN}c{CTRLUP}
Sleep, 50

; save
FileAppend `n, %f_outfile%
FileAppend ----- `n, %f_outfile%
FileAppend %clipboard%, %f_outfile%

; find
Send, {CTRLDOWN}f{CTRLUP}
WinWait, Find,
IfWinNotActive, Find, , WinActivate, Find,
WinWaitActive, Find,

; WinWait, Microsoft Internet Explorer,

;Target string:
Send, %f_TargetText%
Sleep, 50

Send, {ALTDOWN}u{ALTUP}{ENTER}
Sleep, 50

Send, {ESC}
Sleep, 50

; WinWait, autohotkey - Google Search - Microsoft Internet Explorer,
; IfWinNotActive, autohotkey - Google Search - Microsoft Internet Explorer, , WinActivate, autohotkey - Google Search - Microsoft Internet Explorer,
; WinWaitActive, autohotkey - Google Search - Microsoft Internet Explorer,

Send, {TAB}{ENTER}
Sleep, 250

; Stop if search displays not found dialog
IfWinExist, Microsoft Internet Explorer
{
WinActivate
break
}

If f_Pause = N
{
WinWaitActive, %Title%,
Sleep, 1500
}
else
{
; Let them see bottom of page
Send, {CTRLDOWN}{END}{CTRLUP}
MsgBox, 4, , Would you like to continue?
IfMsgBox, No
break
}
}
; end loop

FileAppend `n, %f_outfile%
FileAppend ----- `n, %f_outfile%
FileAppend EOF `n, %f_outfile%

MsgBox Done.
;return
ExitApp

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,914
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: IDEA: Copy text of web page while following links
« Reply #2 on: June 07, 2006, 09:50 AM »
wow that's pretty cool - i'm going to give it a try.
maybe a nice function to add (if possible) is to print each page?

hitmark

  • Participant
  • Joined in 2006
  • *
  • Posts: 6
    • View Profile
    • Donate to Member
Re: IDEA: Copy text of web page while following links
« Reply #3 on: June 15, 2006, 08:44 PM »
does it allso copy layout tags? or will it just convert the whole page into a "big" txt file?

what about a article with many pictures? or maybe frames or similar to add stuff to diffrent sides of the main text?

and i take it that it will only work with IE...

jity2

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 126
    • View Profile
    • Donate to Member
Re: IDEA: Copy text of web page while following links
« Reply #4 on: July 30, 2006, 05:50 PM »
Hello,
In case you haven't tried: try first to print the text using the print link inside the webpage article (not the EDIT/Print option of your browser). It often gather all the text at once! This is ok for many articles found on online journals.
Hope this helps, :)
Jity

kab122

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5
    • View Profile
    • Donate to Member
Re: IDEA: Copy text of web page while following links
« Reply #5 on: August 03, 2006, 02:08 PM »
jity2

Ya, I know exactly what you are describing. Wish it were that easy.
I'm thinking they must get $ for the ads on every page I visit  :P

Besides, I just needed the text; don't want to print all the images/ads.

I did this to print off recaps of Lost and House from:
http://www.televisionwithoutpity.com
The writers are absolutely great, lots of yuks for me during my daily commute.

jity2

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 126
    • View Profile
    • Donate to Member
Re: IDEA: Copy text of web page while following links
« Reply #6 on: August 03, 2006, 03:27 PM »
Hi Kab,

Now with the website, I see! ;-)
I have no direct abd easy anwser! Maybe someone can do a script with Perl (just a guess!)?
Maybe also gather all links in html at once and keep only text?

Example:
page 1
http://www.televisio...cles/content/a12215/
page 2
http://www.televisio.../a12215/index-1.html
Page 3
http://www.televisio.../a12215/index-2.html
...
page 13
http://www.televisio...a12215/index-13.html



Here is an idea that would work: I would use the shareware "Macro Magic".
(select manually the number of times the macro will repeat)
record the macro (mouse gesture plus click) : click on the page, CTRL+A ; copy text; go to all_text.txt (created before and already openned before the macro work), paste text, go back to html , click on the page, scroll down , and click next with mouse

This should work. ;-)
See ya ;-)
Jity