spacer spacer
Inhaltverzeichnis   Page down
spacer

Zusammenfassung

Das Hauptziel dieser Implementierung ist ein Beispiel von einem Komprimierungs-Standard direkt für den Free Pascal Compiler programmiert zu liefern unter Benutzung von mancher seiner gut geglückten Erweiterungen.
spacer spacer
Einführung  
Page up Page down
In order to avoid any misunderstandings the author would like to emphasize that this implementation is not meant to replace or concurrence the original Gzip/Gunzip utilities written by Jean-Loup Gailly and Mark Adler. This implementation has been programmed from scratch using specifically the Free Pascal Compiler (FPC) as development tool. Two rather unconciliable objectives have driven this implementation: readability of the code and good performance in term of execution time.  The compression standard for Gzip/Gunzip is described in the three documents RFC1950, RFC1951, and RFC1952. This, together with a couple of additional documents found on the site www.gzip.org, has laid the basis for the present implementation.
spacer spacer
What is the Free Pascal Compiler?  
Page up Page down
Free Pascal Compiler (FPC) is a 32 bit pascal compiler available for different processors and operating Systems. For this implementation of Gzip/Gunzip version 1.06 for Windows has been used. For more details visit the site www.freepascal.org . The source code has been written using the Scintilla Editor but it should be noted that FPC now comes with its own IDE.  The name pascal for the compiler is somewhat reductive since it incorporates syntax constructs and interpretations which have been introduced only in descendant languages like for ex. Modula-2, Turbo pascal from Borland etc... So far known to the author, FPC is the richest development tool you can find in the considerable family descended from pascal. Particularly welcome features are for example procedure and operator overloading, the extended interpretation of pointers etc... just to name a few.
spacer spacer
How the code is organised  
Page up Page down
This implementation for Gzip/Gunzip is contained in a package named GZfpc downloadable from this site and is distributed under the GNU General Public License. Although FPC has no problems with long file names all the unit file names are truncated to eight characters. The compressing program (Gzip) is kept separate from the decompressing program (Gunzip) because this facilitates the testing of code variants and the debugging in general. In the unit GZformat are declared all the constants which control the operation of the two programs.  Throughout all the units there are statements which are controlled by the conditional TEST; they are useful to debug, check or display essential steps during the compression/decompression process. Various units contain validation code called from the initialisation section which can be activated by defining the conditional VALIDATION; it may be useful to test the unit behaviour with different processors and/or operating systems.
spacer spacer
Compiling with variations  
Page up Page down
In many units you will find conditionals which control variants or activate routines displaying statistical information (see the comments inside the units). The idea is to allow to play a little with the code and see the effect of certain choices. Please note that not all combinations of conditionals have been checked therefore compilation errors may arise.  The core unit for the compression part is represented by the Deflation unit whereas the CircularBuffer unit is the centerpiece of decompression. Defining the conditional CHRONO during compilation will cause the execution time to be displayed and defining METER will make display a progression bar during execution.
spacer spacer
How does compression compare  
Page up Page down
Gzip and Gunzip have been tested on a variegated corpus of files totalling about 470 Mb and found to work properly. The achieved compression does not differ significantly from the one obtained with the original program and have their origin mainly in the different way the decision is taken to start a new compressed block.  For example the Calgary Corpus contained in 'text.compression.corpus.tar' is compressed to 1'065'300 bytes whereas GZIP.EXE 1.2.4 compresses it to 1'064'912 bytes. As could be expected the execution time of the implemented compression program is longer (5.9s against 2.7s for GZIP 1.2.4 on the author's notebook) due to various factors: the non-optimal code generated by FPC, and the code readability constraint among others.
spacer spacer
Schlussfolgerungen  
Page up Page down
This implementation works well but a look at the source code will show that only part of the features on offer by FPC have been used and by certain standards the code may look somewhat old fashioned. This apparent limitation was the result of the concern not to be able to produce a sufficiently performant code. As a matter of fact it has marginally obscured the clarity of the code which was originally the main objective pursued. A different approach using the object paradigm is under consideration by the author.  Some times the only way to grasp the fine details of a rather complex task like compressing data is to code it from scratch yourself. Some times when you follow a path which many other have followed you may well discover some new interesting variant in achieving the same thing. Some times you may even have done a significant breakthrough setting this way a new milestone. The author would be glad if this implementation could contribute an alternative insight into the Gzip/Gunzip compression/decompression process.

Louis JEAN-RICHARD

March 2003

spacer spacer
Literaturnachweis   Page up
  1. Free Pascal - Home Page
  2. Internet RFC/FYI/STD/BCP Archives
  3. The gzip home page
  4. DataCompression.info
  5. GZfpc.zip with sources of the GZip/Gunzip reimplementation
  6. Scintilla free source code editing component
  7. The Calgary Corpus