topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Sunday December 15, 2024, 6:02 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Interesting compression ratio difference between file compression tools.  (Read 11353 times)

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,544
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
(The text from the image below has been copied into the spoiler below the image.)

15_523x515_6D3FCE79.png

Spoiler
Notes on compression ratio difference between ZIP and 7-ZIP.
I tend to use the standard Windows ZIP (using the built-in Windows compression tool) to compress old documents into an archive state, but where I still need to have those documents indexed/searchable.  I use an iFilter to enable WDS (Windows Desktop Search) to search within .ZIP files.
Otherwise I tend to use 7-ZIP, which generally has much better file compression.
The HP Support utilities program for the new (refurbished) HP Pavilion I am currently setting up uses several directories of information.  One is a folder holding the User Guides (C:\Program Files\HP\Documentation\platform_guides\ug).
The guides are all PDF documents.
There were 38 PDF files (one file for each of the 38 different languages catered for), each typically of about 2.2MB in size (with some occasional variation).
Wherever possible, I try to avoid littering a PC's hard drive with document files that are not required, because:
a.  They take up client device space on a finite volume - space that could probably be better left empty for something more useful.
b.  They can add to backup process CPU resources and duration and backup storage requirements.
c.  They take up client CPU time - as I have WDS set to index document files (including PDF files).

I initially considered deleting them, but then decided against it.  I reckoned that, if they were all in a single compressed file, then they would not take up too much space and would require less resources as a single file on backup (multiple file handling also takes more time).  So I decided to compress them into a single file.

I only needed the English version - which was a 2.1MB file named 824463-001.pdf (the suffix -001 is apparently the language ID code used by HP for discriminating between languages (I am not familiar with whatever codification method they use for this ID). So, I selected the 37 "unwanted" (non-English) PDF files in the directory, and xplorer² showed me that they were 85.3MB in total volume.

Using Send To, I intended to send them to a 7-ZIP (7z) compressed file - i.e., rather than ZIP, as I didn't need WDS to search/index the documents) - but I was       `a little preoccupied with my 6y/o son.  who wanted me to help him with something and, by mistake, I sent them as a selected group to a ZIP archive.
That resulted in a compressed ZIP file of 61.1MB.
The compression saving was ((85.3-61.1)/85.3)*100=28.3705% I then realised my mistake and at the same time I observed that that compression didn't look like a very significant compression ratio (85.3:61.1).
Curious to see the comparison, I then sent the same files to a 7-ZIP (7z) compressed file.
That resulted in a compressed 7z file of 26.4MB.  The compression saving was ((85.3-26.4)/85.3)*100=69.0504% So, 7-ZIP's compression was 69.1-28.4=40.7% greater for the same set of documents.

This was a timely reminder to me of how significantly more efficient a compression algorithm 7-ZIP had than that of the standard ZIP.
Obviously compression rates could vary depending on the types of file being compressed, but I had forgotten that the differential between ZIP and 7-ZIP could       be that significant!


xtabber

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 618
    • View Profile
    • Donate to Member
The ZIP routine built into Windows Explorer is optimized for speed rather than compression and there is no way to change the settings.

Nearly all standalone compression programs, including 7-Zip, will allow you to create standard ZIP files with a far greater level of compression than you will get from Windows Explorer.  ZIP files at maximum compression usually won't be much bigger than 7z files and are much faster to create, while remaining readable by Windows.

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
When archiving big dump files from the Oracle databases I maintain, 7-zip is the best compressor. But not with the default settings. The compression level in 7-zip format can also be adjusted. While this can be time consuming, I need to pull those archives through a slow(er) internet connection. The extra time I lose on archiving and unpacking pales in comparison with the time I need to spend on transferring these files.

Two things:
1. 7-zip comes with a Gui application, but also with a version for scripts. When using both applications with the most extreme 7-zip (LZMA) compression setting, it would make sense to expect similar sized archives. Not true, the script version compresses significantly better. After compressing 1.6GByte of executables, dll's, images, HTML and other text-based scripts I end up with an archive that is around 280MByte in size when created with 7-zip GUI version. With the 7-zip script version the resulting archive varies between 180 to 190MByte.
2. Especially with big(ger) data files, you will notice an improvement in compression speed and resulting archive size when you set 7-zip to ultra and change the dictionary size to 32MByte and the word size to 256. Those settings are not the default, but make quite a big difference in my case. Playing with these settings can have both positive and negative effects and compression time and archive size. But it does pay off to play a bit with these settings.

All in all, 7-zip compresses way better than zip does for me. And as I don't have a need for Windows to index my archives, I gain a lot more storage space this way. However, in case you need to have archive content indexed by Windows, as xtabber states, there could be something interesting here at this link. There you can download a piece of software that can replace the Microsoft Zip functionality with the raw power and options of 7-zip, directly from within the Windows Explorer. No, not an extra context menu item, really replace MS zip with 7-zip. That way you have the best of both worlds.

xtabber

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 618
    • View Profile
    • Donate to Member
Raymond.cc had a fairly thorough comparison of archivers a few years ago.  There are a couple that create even smaller archives than 7-Zip, but take even longer to do it.

The problem for me is that formats like 7z may save space and are definitely worthwhile for transmitting large amounts of data, but they are just too slow for my everyday use.  7-Zip is also very good (and fast) for creating ZIP archives and I occasionally use it for that purpose, but my regular archiver is WinRAR because it has a very good GUI with a lot of options, and is much faster at extracting from archives, which I do more often than creating them.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,544
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
@Shades and @xtabber: Thanks for the interesting comments. I hadn't considered that 7-ZIP might be able to make significantly smaller ZIP files, so I tried it out with the same set of documents and it produced a .ZIP file size 61.0MB, which is only slightly smaller than the 61.1MB that the standard system ZIP created.
So then I tried the "ultra" ZIP compression setting in 7-ZIP and it output 60.6MB - again, not a big difference.
One could probably play around with this all day, but I suspect that at the end of it one would be unlikely to have made a particularly significant dent in it.

Come to think of it, I are now confuzzled as I do not see how the same ZIP standard algorithm is being used if the compressed sizes of the same files would differ as significantly as is being suggested (above), merely from using one ZIP tool or another ZIP tool, so I must be missing something there. I thought that was why there are several different compression tools, each using their own peculiar algorithms for different standards of compression.
However, even if one might be able to make a significantly smaller .ZIP file of the documents, could WDS still open and read/index the contents? I'd have to test that to be sure.

Target

  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 1,832
    • View Profile
    • Donate to Member
try it with something other than PDF's. 

My experience indicates that PDF's are not typically compressible, or at least not significantly so...

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,544
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
^^That may well be the case, but it's only PDF files that I want to zip up!    :huh:

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
At the risk of sounding like a fanboy, here goes:
I manage a total of 28 Oracle DB's and 5 MS-SQL DB's which I regularly make dump files from. Most are around 100GByte in size, but some are close to 1TByte. So you hopefully can understand that the rate of compression really matters. Not only for storage, but also for transfer speed 7-zip is much better than zip for my use case.

The dump files I create are a maximum of 4GByte in size and 7-zip often reduces those files to 200MByte. Zip doesn't even come close. Rar is quite a lot better zip, but it can't match the brute force of 7-zip either. Besides, I have all this scripted, so it is really turning it on at night and being greeted by archived (and backed up) dump files including MD5 hashtags for verification in the morning.

Target is right about PDFs. The same can be said for .xlsx, .docx etc., which are in essence already zipped when you save those types of documents. But text-based log files of several Gigabytes in size, 7-zip often reduces those to only several Megabytes. And its hardly slower than zip when doing so.

Ah well, to each their own.  :)

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member
Lots of compressors compared here: http://www.squeezechart.com/

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,649
    • View Profile
    • Donate to Member
^^That may well be the case, but it's only PDF files that I want to zip up!    :huh:

Have you considered optimizing them first? I've gotten exceptional results using NXPowerLite on PDF's to reduce their size by upwards of 70% without appreciable quality loss. After that kind of treatment it shouldn't matter much what you zip them with.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,544
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
No. I hadn't. I had thought that was what archive compression tools were for. Maybe I was mistuken.
I had not wanted lossy compression.

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,649
    • View Profile
    • Donate to Member
No. I hadn't. I had thought that was what archive compression tools were for. Maybe I was mistuken.
I had not wanted lossy compression.

Understandable, but what's being lost in the optimization process, is both adjustable, and in the default settings virtually imperceivable. But I'm guessing there is a great deal of cruft in the PDF format. Part of which is the boiler plate header trash that froths on about how cool Adobe is, and the rest is duplicitous formatting and object description code.

The example that sold me on the program was a 35Mb (in house created) sales brochure that one of the staff was trying to stuff through our mail server. I ran it through NX and it gave me a 3Mb file that looked (and printed) identically to the bloated original.

I bought the program immediately thereafter.

xtabber

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 618
    • View Profile
    • Donate to Member
Understandable, but what's being lost in the optimization process, is both adjustable, and in the default settings virtually imperceivable. But I'm guessing there is a great deal of cruft in the PDF format. Part of which is the boiler plate header trash that froths on about how cool Adobe is, and the rest is duplicitous formatting and object description code.

The example that sold me on the program was a 35Mb (in house created) sales brochure that one of the staff was trying to stuff through our mail server. I ran it through NX and it gave me a 3Mb file that looked (and printed) identically to the bloated original.
I'd guess that they main problem with your sales brochure was large jpeg image files embedded within it and that most of the size gain came from reducing the resolution of those.

PDF editors like Adobe Acrobat and PDFXchange Editor provide optimization tools that give you full control over the size and compatibility of a PDF file.  How you handle the many different options depends on what you want to do with the PDF.

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member
A great saving of space from newer archivers come from the so called "solid" mode: basically each file is processed after another as if it was a single  big stream, so that it's possible to exploit similarities among different files.
Probably the first widely used solid compressor in the DOS/Windows world was RAR. The same results can be obtained creating first a non compressed archive of the files, and the compressing the result (like the usual tar+gzip). Of course the compressor need to support windows of adequate size.

In the case of many PDFs with the same document in different languages, that means that a solid archiver can probably identify the pictures as the same blobs of data, so that they will be stored only one time.

https://en.wikipedia...ki/Solid_compression

wraith808

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 11,190
    • View Profile
    • Donate to Member
Understandable, but what's being lost in the optimization process, is both adjustable, and in the default settings virtually imperceivable. But I'm guessing there is a great deal of cruft in the PDF format. Part of which is the boiler plate header trash that froths on about how cool Adobe is, and the rest is duplicitous formatting and object description code.

The example that sold me on the program was a 35Mb (in house created) sales brochure that one of the staff was trying to stuff through our mail server. I ran it through NX and it gave me a 3Mb file that looked (and printed) identically to the bloated original.
I'd guess that they main problem with your sales brochure was large jpeg image files embedded within it and that most of the size gain came from reducing the resolution of those.

PDF editors like Adobe Acrobat and PDFXchange Editor provide optimization tools that give you full control over the size and compatibility of a PDF file.  How you handle the many different options depends on what you want to do with the PDF.

Many people don't do that, however, and you can be stuck with the results if you don't have an editor, and the time to check the settings.  Having something do that for you is well worth it.  I had someone accidentally distribute a book unoptimized.  A bit later, they redistributed the optimized version, but this would have made me not have to even deal with opening a 256MB PDF in the meantime.

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,649
    • View Profile
    • Donate to Member
^Quite True :Thmbsup:

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,544
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Updated comparison. The 7-ZIP algorithm looks to be the best by a mile.
20_240x274_CF6FAB52.png

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
One could probably play around with this all day, but I suspect that at the end of it one would be unlikely to have made a particularly significant dent in it.


I suspect the more efficient compress Shades is talking about is with .7z format.  The other thing would be the type of data compressed.  Text compresses quite a bit.  Whereas binary data that Shades is dealing with, is difficult to compress.  Text tends to have patterns susceptible to compression.  In English anyway you tend to have a lot of 'e's and not many 'z's.  So you can probably denote an 'e' with only a couple of bits and perhaps even use more than a byte for a 'z' without losing too much etc..

Stuff like graphics image files often end up bigger since they are a compressed format to begin with.

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
Anyone interested can get a decent, albeit simple explanation about compression in the BBC documentary: 'The joy of data'.

@40hz:
Even if you are not interested in the subject, the woman who does the presentation perfectly fits in your niche... ;)