ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Interesting compression ratio difference between file compression tools.

(1/4) > >>

IainB:
(The text from the image below has been copied into the spoiler below the image.)



SpoilerNotes on compression ratio difference between ZIP and 7-ZIP.
I tend to use the standard Windows ZIP (using the built-in Windows compression tool) to compress old documents into an archive state, but where I still need to have those documents indexed/searchable.  I use an iFilter to enable WDS (Windows Desktop Search) to search within .ZIP files.
Otherwise I tend to use 7-ZIP, which generally has much better file compression.
The HP Support utilities program for the new (refurbished) HP Pavilion I am currently setting up uses several directories of information.  One is a folder holding the User Guides (C:\Program Files\HP\Documentation\platform_guides\ug).
The guides are all PDF documents.
There were 38 PDF files (one file for each of the 38 different languages catered for), each typically of about 2.2MB in size (with some occasional variation).
Wherever possible, I try to avoid littering a PC's hard drive with document files that are not required, because:
a.  They take up client device space on a finite volume - space that could probably be better left empty for something more useful.
b.  They can add to backup process CPU resources and duration and backup storage requirements.
c.  They take up client CPU time - as I have WDS set to index document files (including PDF files).

I initially considered deleting them, but then decided against it.  I reckoned that, if they were all in a single compressed file, then they would not take up too much space and would require less resources as a single file on backup (multiple file handling also takes more time).  So I decided to compress them into a single file.

I only needed the English version - which was a 2.1MB file named 824463-001.pdf (the suffix -001 is apparently the language ID code used by HP for discriminating between languages (I am not familiar with whatever codification method they use for this ID). So, I selected the 37 "unwanted" (non-English) PDF files in the directory, and xplorer² showed me that they were 85.3MB in total volume.

Using Send To, I intended to send them to a 7-ZIP (7z) compressed file - i.e., rather than ZIP, as I didn't need WDS to search/index the documents) - but I was       `a little preoccupied with my 6y/o son.  who wanted me to help him with something and, by mistake, I sent them as a selected group to a ZIP archive.
That resulted in a compressed ZIP file of 61.1MB.
The compression saving was ((85.3-61.1)/85.3)*100=28.3705% I then realised my mistake and at the same time I observed that that compression didn't look like a very significant compression ratio (85.3:61.1).
Curious to see the comparison, I then sent the same files to a 7-ZIP (7z) compressed file.
That resulted in a compressed 7z file of 26.4MB.  The compression saving was ((85.3-26.4)/85.3)*100=69.0504% So, 7-ZIP's compression was 69.1-28.4=40.7% greater for the same set of documents.

This was a timely reminder to me of how significantly more efficient a compression algorithm 7-ZIP had than that of the standard ZIP.
Obviously compression rates could vary depending on the types of file being compressed, but I had forgotten that the differential between ZIP and 7-ZIP could       be that significant!

xtabber:
The ZIP routine built into Windows Explorer is optimized for speed rather than compression and there is no way to change the settings.

Nearly all standalone compression programs, including 7-Zip, will allow you to create standard ZIP files with a far greater level of compression than you will get from Windows Explorer.  ZIP files at maximum compression usually won't be much bigger than 7z files and are much faster to create, while remaining readable by Windows.

Shades:
When archiving big dump files from the Oracle databases I maintain, 7-zip is the best compressor. But not with the default settings. The compression level in 7-zip format can also be adjusted. While this can be time consuming, I need to pull those archives through a slow(er) internet connection. The extra time I lose on archiving and unpacking pales in comparison with the time I need to spend on transferring these files.

Two things:
1. 7-zip comes with a Gui application, but also with a version for scripts. When using both applications with the most extreme 7-zip (LZMA) compression setting, it would make sense to expect similar sized archives. Not true, the script version compresses significantly better. After compressing 1.6GByte of executables, dll's, images, HTML and other text-based scripts I end up with an archive that is around 280MByte in size when created with 7-zip GUI version. With the 7-zip script version the resulting archive varies between 180 to 190MByte.
2. Especially with big(ger) data files, you will notice an improvement in compression speed and resulting archive size when you set 7-zip to ultra and change the dictionary size to 32MByte and the word size to 256. Those settings are not the default, but make quite a big difference in my case. Playing with these settings can have both positive and negative effects and compression time and archive size. But it does pay off to play a bit with these settings.

All in all, 7-zip compresses way better than zip does for me. And as I don't have a need for Windows to index my archives, I gain a lot more storage space this way. However, in case you need to have archive content indexed by Windows, as xtabber states, there could be something interesting here at this link. There you can download a piece of software that can replace the Microsoft Zip functionality with the raw power and options of 7-zip, directly from within the Windows Explorer. No, not an extra context menu item, really replace MS zip with 7-zip. That way you have the best of both worlds.

xtabber:
Raymond.cc had a fairly thorough comparison of archivers a few years ago.  There are a couple that create even smaller archives than 7-Zip, but take even longer to do it.

The problem for me is that formats like 7z may save space and are definitely worthwhile for transmitting large amounts of data, but they are just too slow for my everyday use.  7-Zip is also very good (and fast) for creating ZIP archives and I occasionally use it for that purpose, but my regular archiver is WinRAR because it has a very good GUI with a lot of options, and is much faster at extracting from archives, which I do more often than creating them.

IainB:
@Shades and @xtabber: Thanks for the interesting comments. I hadn't considered that 7-ZIP might be able to make significantly smaller ZIP files, so I tried it out with the same set of documents and it produced a .ZIP file size 61.0MB, which is only slightly smaller than the 61.1MB that the standard system ZIP created.
So then I tried the "ultra" ZIP compression setting in 7-ZIP and it output 60.6MB - again, not a big difference.
One could probably play around with this all day, but I suspect that at the end of it one would be unlikely to have made a particularly significant dent in it.

Come to think of it, I are now confuzzled as I do not see how the same ZIP standard algorithm is being used if the compressed sizes of the same files would differ as significantly as is being suggested (above), merely from using one ZIP tool or another ZIP tool, so I must be missing something there. I thought that was why there are several different compression tools, each using their own peculiar algorithms for different standards of compression.
However, even if one might be able to make a significantly smaller .ZIP file of the documents, could WDS still open and read/index the contents? I'd have to test that to be sure.

Navigation

[0] Message Index

[#] Next page

Go to full version