Author Topic: which file format is more accurate to save webpages? (Read 17122 times)

kalos · « **on:** April 08, 2013, 04:53 PM »

hello!

which file format is more accurate to save webpages?

that it will lossly save all the webpage and its related files, eg videos, scripts, etc, so that I will archive the webpage in fully functional form, offline

thanks!

rgdot · « **Reply #1 on:** April 08, 2013, 05:54 PM »

Why format? I think you should be thinking of a tool that downloads the pages you need well. I use HTTrack, I think it does a good job http://www.httrack.com/

SKA · « **Reply #2 on:** April 09, 2013, 02:45 AM »

mht is a single file format supported by most browsers(also Opera)

Ska

kalos · « **Reply #3 on:** April 10, 2013, 06:59 AM »

as you can see here there are numerous:

which should I choose?

tomos · « **Reply #4 on:** April 10, 2013, 07:33 AM »

Kalos, you are asking different questions ...
Rgdot answered your first question
-

which file format is more accurate to save webpages?

that it will lossly save all the webpage and its related files, eg videos, scripts, etc, so that I will archive the webpage in fully functional form, offline
-kalos (April 08, 2013, 04:53 PM)

no one filetype can save "the webpage and its related files, eg videos, scripts, etc".

Re your second question - what exactly do you want to save?
And a question for you:
how come you have all those option in your Fifefox 'save as' dialogue and I only have three

which file format is more accurate to save webpages?

kalos · « **Reply #5 on:** April 10, 2013, 07:38 AM »

It's not some specific webpage I want to save. I am just trying to find the best way to save webpages. I am sure we all have saved a webpage in order to view it offline, and it ended up with parts missing!

The extra options are from plugins.

tomos · « **Reply #6 on:** April 10, 2013, 08:10 AM »

Ahh... a misunderstanding there - I didnt mean what specific webpage, I meant what exactly do you want to save:
in your first post you seem to want videos etc etc

if you want everything, rgdot has answered your question.
I suspect you wont get much help with all your saveas options unless you describe those plugins/extensions that offer those options. Or you might be better off asking the makers of those extensions.

kalos · « **Reply #7 on:** April 10, 2013, 11:02 AM »

yes, I want everything (that's what "etc" was for)

rgdot's recommendation is very common, but my experiecne with httrack was poor (admittedly long ago)

teleport used to do alot better job, but again:
1) it is somehow cumbersome to use it in everyday usage since I will have to manually insert the url from the web browser to the other (httrack) program (ok, I will just need some kind of ahk script to automate it)
2) it is not as simple as it should, because there are lots of settings and it does not offer the WYSIWYG of the browser

skwire · « **Reply #8 on:** April 10, 2013, 11:19 AM »

that it will lossly save all the webpage and its related files, eg videos, scripts, etc,
-kalos (April 08, 2013, 04:53 PM)

kalos, you need to realise that most videos (and other embedded types of content) are, for lack of a better term, hidden when it comes to saving webpages. I'm not familiar with httrack so maybe I'm completely wrong here. Maybe rgdot can let us know if the application saves videos and such.

kalos · « **Reply #9 on:** April 10, 2013, 11:34 AM »

to be honest, I am not basically interested in videos, but in javascript popups, flash content, and such
but yes, it's up to web developer that if he decides to hide something, he will do it no matter what (via JS, etc)

kalos · « **Reply #10 on:** April 10, 2013, 12:11 PM »

tomos,

are you using Win7? if yes, which theme/shell? it looks very nice

rgdot · « **Reply #11 on:** April 10, 2013, 01:01 PM »

Depends on what you expect something like HTTrack do. Not just HTTrack but other software (and even file formats) of its kind.

Quick example of what I use it for:

I make a site for a small restaurant and I take laptop to show the 'beta' or draft to them. This is a small restaurant and at the restaurant they only have access for payment processing but not actual internet(!)
HTTrack saves everything, including embedded PDF, slideshows, etc. but the place where I embedded a google map will show like a placeholder, youtube embeds same I think.

EDIT: I think my point is if you are seeking to download all content, including videos, I don't think one piece of software or file will do. Video downloader that works well (or at all) will be separate from downloading content.

tomos · « **Reply #12 on:** April 10, 2013, 04:11 PM »

tomos,

are you using Win7? if yes, which theme/shell? it looks very nice
-kalos (April 10, 2013, 12:11 PM)

yes, windows 7 -
I not sure what caught your eye there because you cant really see the theme above (what made you notice it?)
I'm using a clear-glass Aero theme by Eóin (theme link - deviantart) - but I modified it myself after... I cant remember: I usually make the grey-ness within windows lighter if possible but cant really remember what exactly I did here. (Note, it uses different symbols for minimise/close/etc.)

kalos · « **Reply #13 on:** April 10, 2013, 04:15 PM »

it was the greyness, thanks!

tomos · « **Reply #14 on:** April 10, 2013, 04:35 PM »

Might be just a case of changing "window colour" to grey and making it light (vary intensity).

which file format is more accurate to save webpages?

So,
what *are* those Firefox extensions anyway?

kalos · « **Reply #15 on:** April 10, 2013, 04:40 PM »

they are supposed to offer an alternative way to save webpages, than the default which saves the webpage in an html file and an accompanying folder

the advantages are portability, compatibility, organizing single files, etc

I am not sure, but they may offer an advantage in saving a webpage "more completely", so that when you will open it offline, you will be able to view it exactly as if online

tomos · « **Reply #16 on:** April 10, 2013, 04:50 PM »

names? links?

(trying to get info out of you is hard work ;-) )

kalos · « **Reply #17 on:** April 10, 2013, 04:57 PM »

well I have installed these two:
https://addons.mozil...zilla-archive-format
https://addons.mozil.../firefox/addon/unmht

tomos · « **Reply #18 on:** April 10, 2013, 05:17 PM »

Thanks kalos

Curt · « **Reply #19 on:** April 11, 2013, 05:34 PM »

well I have installed : https://addons.mozil...zilla-archive-format
-kalos (April 10, 2013, 04:57 PM)

Are you not satisfied with this MAF addon?

About this Add-on
This extension enhances the way web pages are saved on your computer.

It provides the following advantages over the built-in save system:
— The saved pages are faithful to the original (exact save)
-MAF

Is the problem that the homepage doesn't say "this setting gives the best quality"?

Actually it is saying that you need even one more addon, if you want to use MAF to enhance quality:

Save Complete, by Stephen Augenstein, Not available for Firefox 18+

The Save Complete extension will integrate with Mozilla Archive Format, but must be enabled from the internal configuration settings.

This extension replaces the system used by the browser to save complete web pages. The new system correctly handles style sheets referencing image files, that otherwise would not be saved, causing some pages to appear differently.

About this Add-on
As more and more sites use CSS, Firefox's built-in complete save becomes increasingly less effective, as it doesn't support stylesheets. This extension fixes this, and saves the complete page, including all images, stylesheets, flash, javascript, and anything else associated with the document, even imported stylesheets and images referenced in the stylesheet files.
-Save Complete
-Save Complete

-in every other way MAF is not for better quality, but for smaller files!
If you want better quality only, you will need to stop using MAF etcetera.

kyrathaba · « **Reply #20 on:** April 13, 2013, 04:48 PM »

My father is always wanting to save complete webpages of sites he comes across. Sometimes he has okay luck with "webpage complete" option in the Save As dialog options. But, some caveats from my own past experience: for relatively small webpages -- say pages that consist of only a handful of supporting files (.js, .gif, .png, etc), it works pretty well, although dad is so technically challenged that he's not sure where he saved the files to, and then he forgets that he has to keep the subfolder of support files in the same directory with the HTML page (dad is 78). For progressively more "involved" pages, doesn't work as well.

Over the years, I've found HTTrack to work well. The catch is that it has lots of options (how many levels deep do you want to recurse in pulling files from a website? How deeply do you want to plunge to extract links that may be buried way down in a website's hierarchy? HTTrack works pretty well if you're patient (it can take hours to download a complete website) and if you configure things reasonably before you press "Go".

There have been products that try to take some of the complexity out by allowing you to capture an entire web page in its entirety (images, links, text), such as Surfulator, Evernote WebClipper, but these are using, I think, database backends and not separating elements of a page out into the constituent parts.

So, what sort of webpages are you saving (from personal websites, or business websites, or perhaps hobbyist sites {dad wants all these gun-enthusiast pages/sites saved in their entirety, then he buries them in three sub-folders deep of organization on his ... Desktop --> C:\Users\Paul\Desktop\Saved Pages\Guns\GunDigest\Sept08\... you get the idea; and forgets them for all eternity).

So give us your short- and long-term objectives in saving these, and also whether they're always just pages, or sometimes entire sites or sections thereof.

Scott_Y · « **Reply #21 on:** August 09, 2013, 01:30 PM »

... which file format is more accurate to save webpages?
-kalos (April 08, 2013, 04:53 PM)

You might try the webpage note type in the Pro version of RightNote, which saves entire web pages as an internal note. You can then organize the notes in a tree hierarchy, tag them, search by content or tags, etc. https://www.donation...ex.php?topic=35733.0

kyrathaba · « **Reply #22 on:** August 09, 2013, 01:51 PM »

Thanks, Scott_Y. I'll pass that along.