Please purchase the course to watch this video.

Full Course
Data compression is a vital technique for reducing file sizes, which optimizes storage and network transfers by saving disk space and bandwidth. In Linux systems, the gzip algorithm—using the .gz file extension—is a common tool for lossless compression, ensuring that no data is lost during compression or decompression. While compressing large files often yields significant size reductions, smaller files may not benefit due to metadata overhead. Go’s standard library provides robust support for file compression and decompression, particularly through the compress/gzip package, which leverages io.Reader and io.Writer interfaces for flexible data handling. Implementing CLI tools in Go to compress and decompress files involves opening input/output files, utilizing gzip’s Writer and Reader types, and handling error cases such as incorrect file extensions or potential issues like zip bombs in untrusted inputs. Practical enhancements—like customizable output file names, size limits to prevent resource exhaustion, and control over whether to keep or remove originals—can further improve such tools for real-world use.
When it comes to working with both files and data on your system, at some point you'll encounter a concept called data compression. Data compression involves the use of compression algorithms in order to reduce the size of data, be it stored in memory or on disk through a file.
By doing so, it allows for a number of key benefits, such as:
- Being able to send data across the network in an optimised way
- Reducing bandwidth and therefore saving time
- Reducing the amount of disk space taken up on your file system when you archive data
Working with Gzip Files
In Linux, the most common form of file compression that you'll encounter are those that end in the extension of .gz
, which have been compressed using the gzip algorithm.
Example: Examining a Compressed File
For example, if we take a look at my terminal, and I quickly do an ls -lah
:
ls -lah
Here you can see I have a file called dockerfile.gz
, which is a file that has been compressed using the gzip algorithm. If I go ahead and print out the contents of this file:
cat dockerfile.gz
You can see it contains a bunch of garbage. Although this isn't actually garbage, this is ASCII representation of binary data. However, it's not exactly human readable. You'll also notice that this file's size is 1.5 kilobytes.
Uncompressing Files
If I go ahead and uncompress this file using the gunzip
command:
gunzip dockerfile.gz
This time if I go ahead and print the contents of this directory, you can see that my dockerfile.gz
file no longer exists, and has been replaced with a standard dockerfile
, which, if you take a look at the size of, is 3.3 kilobytes in size.
cat dockerfile
Now the contents are readable in human form.
Compressing Files
As well as being able to unzip files using the gunzip
command on unix systems, we can also zip files as well, using the gzip
command:
gzip -k dockerfile
I'm going to pass in the -k
flag which tells gzip to keep the original file. This time if I print the contents of my directory, you can see I have the dockerfile
and the dockerfile.gz
. This is the zipped version that ends in the .gz
extension, standing for gzip.
Additionally you can see that the size of the dockerfile beforehand was 3.3 kilobytes, and after it's been compressed it's 1.5 kilobytes - about 50% the size it was before.
Important Note: File Size Matters
One thing to note about file compression: if I go ahead and copy a file from the counter project:
cp words.txt .
ls -lah words.txt
You can see the words.txt
is only 24 bytes. If I go ahead and use gzip on the words.txt
:
gzip -k words.txt
ls -lah
You can see this time the file size has actually increased. This is because when it comes to file compression, smaller files don't actually compress that well, due to the fact that the compression algorithm needs to add a lot of metadata into the file in order for the decompression to work.
Key Point: Typically when it comes to smaller files, you don't actually gain any benefit from using compression algorithms. Instead it's worthwhile saving for files that are a bit larger.
Lossless vs Lossy Compression
One thing you'll notice is whenever we decompress and compress files, we don't lose any data. For instance, the docker file that we compressed and the docker file that I uncompressed:
mv dockerfile dockerfile_original
gunzip dockerfile.gz
diff dockerfile dockerfile_original
You'll see there's no difference between the two. This is because gzip is known as a lossless compression algorithm, meaning that no data is lost when it compresses or decompresses, which when it comes to text and data is actually very useful.
That being said, there are some situations where you'll want lossy data, where having the data fidelity doesn't actually matter too much. These are for things such as videos and images, where the compression algorithm having a slight loss of data doesn't actually impact the overall outcome of the image or video.
Note: For the rest of this video, we're going to be focusing on gzip, which is known as a lossless algorithm, so no data is lost in the compression or decompression operations.
Implementing Gzip in Go
Being able to reduce the size of a file is very useful for a certain number of operations when it comes to working with computers. As I mentioned before, especially when it comes to:
- Sending data over the network (reduces bandwidth and saves time)
- Archiving data on your file system (allows smaller files which can easily be uncompressed later)
Therefore, in this video, we're going to take a look at how we can actually do this when it comes to our own Go code.
Go's Compress Package
To do so in Go is actually rather simple, as the Go standard library provides the compress
package, which contains a number of different compression algorithms underneath, such as:
bzip2
- implements bzip2 decompressionflate
- implements the deflate compression algorithmgzip
- the one we're going to be usinglzw
andzlib
as well
In our case, we're just going to focus on the gzip compression algorithm, and we're going to do so to implement our own gzip compression tool.
Project Setup
I have a sample project that I've already set up, which you can clone down yourself using the link in the description below. This CLI tool provides two subcommands using the same approach that we saw in the last lesson:
compress
- calls the compress functiondecompress
- calls the decompress function
Both of these functions accept a filename as an argument, which comes from os.Args[1]
.
Let's test the current setup:
go build
./compress compress foo.txt
Here you can see that we need to implement the compressing of a file foo.txt
, with the file being passed in as the second argument.
Implementing the Compress Function
First, beginning with the compress function. In order to begin, we're going to need to have a file that we're going to compress. For this video, I'm actually going to go ahead and use the lots_of_words.txt
file that we saw back at the beginning of the course, which was that file that was just short of one gigabyte in size.
Step 1: Import Required Packages
import (
"compress/gzip"
"fmt"
"io"
"log"
"os"
)
Step 2: Open the Input File
func compress(fileName string) {
// Open the file for reading
inFile, err := os.Open(fileName)
if err != nil {
log.Fatalf("Error opening input file: %v", err)
}
defer inFile.Close() // ABC: Always Be Closing
}
Step 3: Create the Output File
// Create the output file name with .gz extension
zippedFileName := fmt.Sprintf("%s.gz", fileName)
// Create or truncate the output file
outFile, err := os.Create(zippedFileName)
if err != nil {
log.Fatalf("Error creating output file: %v", err)
}
defer outFile.Close()
Step 4: Create the Gzip Writer
The gzip
package provides the Writer
type, which is an io.WriteCloser
. Writes to the writer are compressed and written to the underlying writer.
// Create a new gzip writer
wr := gzip.NewWriter(outFile)
defer wr.Close() // IMPORTANT: Must close to write correct headers
Important: It is the caller's responsibility to call
Close()
on the writer when done. This is really important when it comes to working with the gzip package, as in order for the file to be written with the correct header, it needs to have the close method called on it. This is because writes may be buffered and not flushed until the close method is called.
Step 5: Copy Data
// Copy data from input file to gzip writer
_, err = io.Copy(wr, inFile)
if err != nil {
log.Fatalf("Error copying data: %v", err)
}
Testing the Compress Function
go build
./compress compress lots_of_words.txt
This operation should take a bit of time. When I ran this using the gzip command on the actual CLI, in total it took about one minute. Although I think when it comes to go, it actually might be a little bit quicker.
After 34 seconds...
ls -lah
Now if I go ahead and print out the contents of the directory, you can see we have the lots_of_words.txt.gz
, which is currently now 469 megabytes, which is 46% smaller than the original file size.
Let's verify the compression worked correctly:
# Backup the original
mv lots_of_words.txt lots_of_words.txt.back
# Decompress using system gzip
gunzip -k lots_of_words.txt.gz
# Compare the files
diff lots_of_words.txt lots_of_words.txt.back
Great! No difference between the two files, and they're the same size as well.
Implementing the Decompress Function
The decompression function is going to be very similar to the compress function, however, just kind of an inverse of logic.
Step 1: Validate File Extension
import (
"path/filepath"
"strings"
)
func decompress(fileName string) {
// Check if file has .gz extension
ext := filepath.Ext(fileName)
if ext != ".gz" {
log.Fatalf("Expected .gz extension")
}
}
Step 2: Create Output Filename
// Remove .gz extension for output filename
outFileName := strings.TrimSuffix(fileName, ".gz")
// Or more explicitly:
// outFileName := strings.TrimSuffix(fileName, ext)
Step 3: Open Input File and Create Gzip Reader
// Open input file
inFile, err := os.Open(fileName)
if err != nil {
log.Fatalf("Error opening input file: %v", err)
}
defer inFile.Close()
// Create gzip reader
r, err := gzip.NewReader(inFile)
if err != nil {
log.Fatalf("Error creating gzip reader: %v", err)
}
defer r.Close()
Step 4: Create Output File with Safe Flags
Instead of using os.Create()
which will overwrite existing files, it's better to use os.OpenFile()
with specific flags:
// Create output file with safe flags
outFile, err := os.OpenFile(outFileName,
os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0644)
if err != nil {
log.Fatalf("Error creating output file: %v", err)
}
defer outFile.Close()
The flags used:
os.O_WRONLY
- write onlyos.O_CREATE
- create file if it doesn't existos.O_EXCL
- fail if file already exists (prevents overwriting)
Step 5: Copy Decompressed Data
// Copy decompressed data to output file
_, err = io.Copy(outFile, r)
if err != nil {
log.Fatalf("Error copying data: %v", err)
}
Security Consideration: Zip Bombs
When it comes to copying data from a zipped file, especially in production systems or when reading files from untrusted sources, there's the potential of receiving what's known as a zip bomb.
A zip bomb is a malicious zip or archive file designed to crash your system. The file itself basically performs recursion whenever you try to decompress it. So a very small file, which is 42 kilobytes of compressed data, when being unzipped, can end up being 4.5 petabytes, which will cause your system to run out of available disk space or memory.
For production systems, use io.CopyN()
instead:
// Safer approach for untrusted files
maxBytes := int64(100 * 1024 * 1024) // 100MB limit
_, err = io.CopyN(outFile, r, maxBytes)
Testing the Decompress Function
go build
./compress decompress lots_of_words.txt.back
# Error: Expected .gz extension (good - our guard works!)
./compress decompress lots_of_words.txt.gz
After about 5-6 seconds, we should have a lots_of_words.txt
file:
ls -lah
diff lots_of_words.txt lots_of_words.txt.back
# No difference - success!
# Try to decompress again
./compress decompress lots_of_words.txt.gz
# Error: file already exists (our safety flags work!)
Key Takeaways
We've managed to take a look at how we can use the compress/gzip
package in order to perform compression and decompression of both files and data. Because the gzip package makes use of both io.Writer
and io.Reader
interfaces under the hood, it means you can use both these compression types with any data stream that you like, such as:
- Files (as we've seen already)
- TCP sockets
- HTTP request/response bodies
- Standard in-memory data (such as
bytes.Buffer
orio.Pipe
)
Homework Tasks
As mentioned throughout this lesson, there are a few tasks to perform as homework to improve the functionality of this code:
Task 1: Add -o
Flag for Output Filename
Add the -o
flag to both compress and decompress functions to specify custom output filenames.
Example:
./compress compress -o words.txt.gz lots_of_words.txt
./compress decompress -o words.txt lots_of_words.txt.gz
Make sure to validate that compression output ends in .gz
.
Task 2: Implement Size Limits with io.CopyN
Change the io.Copy
command to use io.CopyN
, specifying a maximum number of bytes. Test with a lower size (like 100 bytes) against the lots_of_words.txt
file.
Add a -l
flag to specify the size limit in bytes. For an advanced challenge, support size prefixes like "2g" for 2 gigabytes.
Task 3: Automatic File Deletion with -k
Flag
Add automatic file deletion of input files (similar to the standard gzip
command behavior) and implement the -k
flag to keep original files when needed.
Examples:
# Standard behavior - deletes input file
./compress compress lots_of_words.txt
# Keep original file
./compress compress -k lots_of_words.txt
Use os.Remove()
function for file deletion.
Once you've completed these tasks, you'll be ready to move on to the next lesson!
Add `-o` Flag for Output Filename
Add the -o
flag to both compress and decompress functions to allow users to specify custom output filenames instead of using the default naming convention.
Requirements:
- Implement
-o
flag support for both compress and decompress commands - Validate that compression output ends in
.gz
extension - Handle cases where the output file already exists
Example Usage:
./compress compress -o words.txt.gz lots_of_words.txt
./compress decompress -o words.txt lots_of_words.txt.gz
Implementation Notes:
- Use Go's
flag
package to parse the-o
option - Maintain existing functionality when
-o
flag is not provided - Ensure proper error handling for invalid output filenames
Implement Size Limits with `io.CopyN`
Replace the io.Copy
command with io.CopyN
to specify a maximum number of bytes that can be processed, protecting against zip bombs and controlling resource usage.
Requirements:
- Change
io.Copy
toio.CopyN
in both compress and decompress functions - Add a
-l
flag to specify the size limit in bytes - Test with a lower size (like 100 bytes) against the
lots_of_words.txt
file - Advanced Challenge: Support size prefixes like "2g" for 2 gigabytes, "100m" for 100 megabytes
Example Usage:
./compress compress -l 1048576 lots_of_words.txt # 1MB limit
./compress decompress -l 2g compressed_file.gz # 2GB limit (advanced)
Implementation Notes:
- Use
io.CopyN(dst, src, n)
instead ofio.Copy(dst, src)
- Handle the case where the limit is reached before all data is processed
- For the advanced challenge, parse size suffixes (k, m, g) and convert to bytes
Automatic File Deletion with `-k` Flag
Implement automatic deletion of input files (matching standard gzip
command behavior) and add the -k
flag to keep original files when needed.
Requirements:
- By default, delete the input file after successful compression/decompression
- Add
-k
flag to keep the original file - Only delete files if the operation completes successfully
- Use
os.Remove()
function for file deletion
Example Usage:
# Standard behavior - deletes input file after compression
./compress compress lots_of_words.txt
# Keep original file
./compress compress -k lots_of_words.txt
# Decompress and delete .gz file
./compress decompress lots_of_words.txt.gz
# Decompress and keep .gz file
./compress decompress -k lots_of_words.txt.gz
Implementation Notes:
- Ensure file deletion only occurs after successful operations
- Handle permission errors gracefully when attempting to delete files
- Consider using defer statements for cleanup in error scenarios
- Test with files that have different permissions to ensure robust error handling