ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Finished Programs

DONE: Extracting All Image Links from a Booru Website

<< < (2/3) > >>

skwire:
Here's an AutoHotkey example showing how to download all the images from the example site you gave:


--- Code: Autohotkey ---;http://mspabooru.com/index.php?page=post&s=view&id=166035sSourceURL := "http://mspabooru.com/index.php?page=post&s=view&id="nPageStart := 1nPageEnd   := 161151 ; Create directory to dump images to.FileCreateDir, % A_ScriptDir . "\images" Loop, % nPageEnd{    If ( A_Index < ( nPageStart - 1 ) )    {        Continue    }    Else    {        ; Update tray icon tooltip.        Menu, Tray, Tip, % "Processing URL number: " . A_Index                ; Download HTML page.        URLDownloadToFile, % sSourceURL . A_Index, % A_ScriptDir . "\temp.html"                ; Read in HTML source.        FileRead, myData, % A_ScriptDir . "\temp.html"                ; Parse HTML source for image URLs.        Loop, Parse, myData, `n, `r        {            ; Match image URLs.              If ( RegExMatch( A_LoopField, "(https?:\/\/.*\.(?:png|jpg|jpeg|gif))", Match ) )             {                ; Crack the URL into its parts.                SplitPath, % Match1, OutFileName, OutDir, OutExtension, OutNameNoExt, OutDrive                                ; Skip any images with "thumbnail" in the filename.                If ! InStr( Match1, "thumbnail" )                {                    ; Download the image.                    URLDownloadToFile, % Match1, % A_ScriptDir . "\images\" . OutFileName                }            }        }    }}MsgBox, Done!

rayman3003:
Here's an AutoHotkey example showing how to download all the images from the example site you gave:
-skwire (August 05, 2017, 05:12 PM)
--- End quote ---

Thank u very much. It works like a charm.

But unfortunately, Its sooooooo slow. It leeched 2MB in 2 minutes!

But with "download managers", I could grab at least 15MB in 2 minutes!! "Download Managers" can download files simultaneously, so with them I can download with more speed; Thats why I prefer image links over leeching them by a none-downloader tool (like hotkey).

Speed is very important for me in this case (Like I said in the first post), Bcuz I want to leech more than 800,000 images among three different booru websites.  :-[

But again, Thank u for your time.  :Thmbsup:

Ath:
As said, it's an example of how to get a lot of files from that particular website. Each page has to be downloaded to extract the actual download-url per image.

You might want to do some work on it, as downloading 800.000 files into 1 directory isn't something Windows is very fond of :o
Splitting the from/to range into several scripts allows you to run more scripts in parallel (number of CPU-cores seems reasonable), and this way you can run several sites in parallel too, but you might get your IP banned from the server, because of hammering the site with that many requests :-\
A possible speed improvement could be to not set the tray icon tooltip for each page, but only for each 10th or so, so you still have a notion of progress, especially if Windows is displaying it when it's set, that's usually quite slow. (NB: Haven't tested it myself.)

IainB:
Might be worth looking at the references to "image download" in the thread Re: Firefox Extensions: Your favorite or most useful

Also see:
...Does it really work? Wow. It hasn't been updated for 11 months, so I assumed...
__________________
-Curt (March 12, 2014, 08:47 AM)
--- End quote ---
It most decidedly does work, and you can crawl any website, gathering specific file types.
For example, from the Mozilla FoxySpider Add-on page:
____________________________
About this Add-on
With FoxySpider you can:

* Get all photos from an entire website
* Get all video clips from an entire website
* Get all audio files from an entire website
* Well, actually get any file type you want from an entire websiteFoxySpider can be used to create a thumbnail gallery containing links to rich media files of any file types you are interested in. It can also crawl deep to any level on a website and display the applicable files it found in the same gallery. FoxySpider is useful for different media content pages (music, video, images, documents), thumbnail gallery post (TGP) sites, podcasts. You can narrow and expand the search to support exactly what you want.
Once the thumbnail gallery is created you can view, download or share (on Facebook and Twitter) every file that was fetched by FoxySpider.
____________________________
-IainB (March 12, 2014, 04:12 PM)
--- End quote ---

skwire:
But unfortunately, Its sooooooo slow. It leeched 2MB in 2 minutes!-rayman3003 (August 06, 2017, 04:36 AM)
--- End quote ---

This is going to be due to your location and ISP speed since I was able to grab 30+ megs of images in two minutes even with just that single-threaded code.  Here's a modification that simply creates a text file of image links.  As Ath mentioned, you can comment out line 17 for a tiny speedup but I don't think it's going to make a noticeable difference.


--- Code: Autohotkey ---;http://mspabooru.com/index.php?page=post&s=view&id=166035sSourceURL := "http://mspabooru.com/index.php?page=post&s=view&id="nPageStart := 1nPageEnd   := 161151 FileCreateDir, % A_ScriptDir . "\images" Loop, % nPageEnd{    If ( A_Index < ( nPageStart - 1 ) )    {        Continue    }    Else    {        ; Update tray icon tooltip.        Menu, Tray, Tip, % "Processing URL number: " . A_Index         ; Download HTML page.        URLDownloadToFile, % sSourceURL . A_Index, % A_ScriptDir . "\temp.html"         ; Read in HTML source.        FileRead, myData, % A_ScriptDir . "\temp.html"         ; Parse HTML source for image URLs.        Loop, Parse, myData, `n, `r        {            ; Match image URLs.            If ( RegExMatch( A_LoopField, "(https?:\/\/.*\.(?:png|jpg|jpeg|gif))", Match ) )            {                ; Crack the URL into its parts.                SplitPath, % Match1, OutFileName, OutDir, OutExtension, OutNameNoExt, OutDrive                 ; Skip any images with "thumbnail" in the filename.                If ! ( InStr( Match1, "thumbnail" ) OR Instr( Match1, "width" ) )                {                    ; Download the image.                    ; URLDownloadToFile, % Match1, % A_ScriptDir . "\images\" . OutFileName                     ; Create a list of links.                    FileAppend, % Match1 . "`r`n", % A_ScriptDir . "\ImageLinks.txt"                }            }        }    }}MsgBox, Done!

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version