Command Line Applications in Go

Locked video

Please purchase the course to watch this video.

Powerful Command Line Applications

Compressing files & data with GZip

Data compression is a vital technique for reducing file sizes, which optimizes storage and network transfers by saving disk space and bandwidth. In Linux systems, the gzip algorithm—using the .gz file extension—is a common tool for lossless compression, ensuring that no data is lost during compression or decompression. While compressing large files often yields significant size reductions, smaller files may not benefit due to metadata overhead. Go’s standard library provides robust support for file compression and decompression, particularly through the compress/gzip package, which leverages io.Reader and io.Writer interfaces for flexible data handling. Implementing CLI tools in Go to compress and decompress files involves opening input/output files, utilizing gzip’s Writer and Reader types, and handling error cases such as incorrect file extensions or potential issues like zip bombs in untrusted inputs. Practical enhancements—like customizable output file names, size limits to prevent resource exhaustion, and control over whether to keep or remove originals—can further improve such tools for real-world use.

When it comes to working with both files and data on your system, at some point you'll encounter a concept called data compression. Data compression involves the use of compression algorithms in order to reduce the size of data, be it stored in memory or on disk through a file.

By doing so, it allows for a number of key benefits, such as:

Being able to send data across the network in an optimised way
Reducing bandwidth and therefore saving time
Reducing the amount of disk space taken up on your file system when you archive data

Working with Gzip Files

In Linux, the most common form of file compression that you'll encounter are those that end in the extension of .gz, which have been compressed using the gzip algorithm.

Example: Examining a Compressed File

For example, if we take a look at my terminal, and I quickly do an ls -lah:

bash

ls -lah

Here you can see I have a file called dockerfile.gz, which is a file that has been compressed using the gzip algorithm. If I go ahead and print out the contents of this file:

bash

cat dockerfile.gz

You can see it contains a bunch of garbage. Although this isn't actually garbage, this is ASCII representation of binary data. However, it's not exactly human readable. You'll also notice that this file's size is 1.5 kilobytes.

Uncompressing Files

If I go ahead and uncompress this file using the gunzip command:

bash

gunzip dockerfile.gz

This time if I go ahead and print the contents of this directory, you can see that my dockerfile.gz file no longer exists, and has been replaced with a standard dockerfile, which, if you take a look at the size of, is 3.3 kilobytes in size.

bash

cat dockerfile

Now the contents are readable in human form.

Compressing Files

As well as being able to unzip files using the gunzip command on unix systems, we can also zip files as well, using the gzip command:

bash

gzip -k dockerfile

I'm going to pass in the -k flag which tells gzip to keep the original file. This time if I print the contents of my directory, you can see I have the dockerfile and the dockerfile.gz. This is the zipped version that ends in the .gz extension, standing for gzip.

Additionally you can see that the size of the dockerfile beforehand was 3.3 kilobytes, and after it's been compressed it's 1.5 kilobytes - about 50% the size it was before.

Important Note: File Size Matters

One thing to note about file compression: if I go ahead and copy a file from the counter project:

bash

cp words.txt .
ls -lah words.txt

You can see the words.txt is only 24 bytes. If I go ahead and use gzip on the words.txt:

bash

gzip -k words.txt
ls -lah

You can see this time the file size has actually increased. This is because when it comes to file compression, smaller files don't actually compress that well, due to the fact that the compression algorithm needs to add a lot of metadata into the file in order for the decompression to work.

Key Point: Typically when it comes to smaller files, you don't actually gain any benefit from using compression algorithms. Instead it's worthwhile saving for files that are a bit larger.

Lossless vs Lossy Compression

One thing you'll notice is whenever we decompress and compress files, we don't lose any data. For instance, the docker file that we compressed and the docker file that I uncompressed:

bash

mv dockerfile dockerfile_original
gunzip dockerfile.gz
diff dockerfile dockerfile_original

You'll see there's no difference between the two. This is because gzip is known as a lossless compression algorithm, meaning that no data is lost when it compresses or decompresses, which when it comes to text and data is actually very useful.

That being said, there are some situations where you'll want lossy data, where having the data fidelity doesn't actually matter too much. These are for things such as videos and images, where the compression algorithm having a slight loss of data doesn't actually impact the overall outcome of the image or video.

Note: For the rest of this video, we're going to be focusing on gzip, which is known as a lossless algorithm, so no data is lost in the compression or decompression operations.

Implementing Gzip in Go

Being able to reduce the size of a file is very useful for a certain number of operations when it comes to working with computers. As I mentioned before, especially when it comes to:

Sending data over the network (reduces bandwidth and saves time)
Archiving data on your file system (allows smaller files which can easily be uncompressed later)

Therefore, in this video, we're going to take a look at how we can actually do this when it comes to our own Go code.

Go's Compress Package

To do so in Go is actually rather simple, as the Go standard library provides the compress package, which contains a number of different compression algorithms underneath, such as:

bzip2 - implements bzip2 decompression
flate - implements the deflate compression algorithm
gzip - the one we're going to be using
lzw and zlib as well

In our case, we're just going to focus on the gzip compression algorithm, and we're going to do so to implement our own gzip compression tool.

Project Setup

I have a sample project that I've already set up, which you can clone down yourself using the link in the description below. This CLI tool provides two subcommands using the same approach that we saw in the last lesson:

compress - calls the compress function
decompress - calls the decompress function

Both of these functions accept a filename as an argument, which comes from os.Args[1].

Let's test the current setup:

bash

go build
./compress compress foo.txt

Here you can see that we need to implement the compressing of a file foo.txt, with the file being passed in as the second argument.

Implementing the Compress Function

First, beginning with the compress function. In order to begin, we're going to need to have a file that we're going to compress. For this video, I'm actually going to go ahead and use the lots_of_words.txt file that we saw back at the beginning of the course, which was that file that was just short of one gigabyte in size.

Step 1: Import Required Packages

import (
    "compress/gzip"
    "fmt"
    "io"
    "log"
    "os"
)

Step 2: Open the Input File

func compress(fileName string) {
    // Open the file for reading
    inFile, err := os.Open(fileName)
    if err != nil {
        log.Fatalf("Error opening input file: %v", err)
    }
    defer inFile.Close() // ABC: Always Be Closing
}

Step 3: Create the Output File

// Create the output file name with .gz extension
zippedFileName := fmt.Sprintf("%s.gz", fileName)

// Create or truncate the output file
outFile, err := os.Create(zippedFileName)
if err != nil {
    log.Fatalf("Error creating output file: %v", err)
}
defer outFile.Close()

Step 4: Create the Gzip Writer

The gzip package provides the Writer type, which is an io.WriteCloser. Writes to the writer are compressed and written to the underlying writer.

// Create a new gzip writer
wr := gzip.NewWriter(outFile)
defer wr.Close() // IMPORTANT: Must close to write correct headers

Important: It is the caller's responsibility to call Close() on the writer when done. This is really important when it comes to working with the gzip package, as in order for the file to be written with the correct header, it needs to have the close method called on it. This is because writes may be buffered and not flushed until the close method is called.

Step 5: Copy Data

// Copy data from input file to gzip writer
_, err = io.Copy(wr, inFile)
if err != nil {
    log.Fatalf("Error copying data: %v", err)
}

Testing the Compress Function

bash

go build
./compress compress lots_of_words.txt

This operation should take a bit of time. When I ran this using the gzip command on the actual CLI, in total it took about one minute. Although I think when it comes to go, it actually might be a little bit quicker.

After 34 seconds...

bash

ls -lah

Now if I go ahead and print out the contents of the directory, you can see we have the lots_of_words.txt.gz, which is currently now 469 megabytes, which is 46% smaller than the original file size.

Let's verify the compression worked correctly:

bash

# Backup the original
mv lots_of_words.txt lots_of_words.txt.back

# Decompress using system gzip
gunzip -k lots_of_words.txt.gz

# Compare the files
diff lots_of_words.txt lots_of_words.txt.back

Great! No difference between the two files, and they're the same size as well.

Implementing the Decompress Function

The decompression function is going to be very similar to the compress function, however, just kind of an inverse of logic.

Step 1: Validate File Extension

import (
    "path/filepath"
    "strings"
)

func decompress(fileName string) {
    // Check if file has .gz extension
    ext := filepath.Ext(fileName)
    if ext != ".gz" {
        log.Fatalf("Expected .gz extension")
    }
}

Step 2: Create Output Filename

// Remove .gz extension for output filename
outFileName := strings.TrimSuffix(fileName, ".gz")
// Or more explicitly:
// outFileName := strings.TrimSuffix(fileName, ext)

Step 3: Open Input File and Create Gzip Reader

// Open input file
inFile, err := os.Open(fileName)
if err != nil {
    log.Fatalf("Error opening input file: %v", err)
}
defer inFile.Close()

// Create gzip reader
r, err := gzip.NewReader(inFile)
if err != nil {
    log.Fatalf("Error creating gzip reader: %v", err)
}
defer r.Close()

Step 4: Create Output File with Safe Flags

Instead of using os.Create() which will overwrite existing files, it's better to use os.OpenFile() with specific flags:

// Create output file with safe flags
outFile, err := os.OpenFile(outFileName, 
    os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0644)
if err != nil {
    log.Fatalf("Error creating output file: %v", err)
}
defer outFile.Close()

The flags used:

os.O_WRONLY - write only
os.O_CREATE - create file if it doesn't exist
os.O_EXCL - fail if file already exists (prevents overwriting)

Step 5: Copy Decompressed Data

// Copy decompressed data to output file
_, err = io.Copy(outFile, r)
if err != nil {
    log.Fatalf("Error copying data: %v", err)
}

Security Consideration: Zip Bombs

When it comes to copying data from a zipped file, especially in production systems or when reading files from untrusted sources, there's the potential of receiving what's known as a zip bomb.

A zip bomb is a malicious zip or archive file designed to crash your system. The file itself basically performs recursion whenever you try to decompress it. So a very small file, which is 42 kilobytes of compressed data, when being unzipped, can end up being 4.5 petabytes, which will cause your system to run out of available disk space or memory.

For production systems, use io.CopyN() instead:

// Safer approach for untrusted files
maxBytes := int64(100 * 1024 * 1024) // 100MB limit
_, err = io.CopyN(outFile, r, maxBytes)

Testing the Decompress Function

bash

go build
./compress decompress lots_of_words.txt.back
# Error: Expected .gz extension (good - our guard works!)

./compress decompress lots_of_words.txt.gz

After about 5-6 seconds, we should have a lots_of_words.txt file:

bash

ls -lah
diff lots_of_words.txt lots_of_words.txt.back
# No difference - success!

# Try to decompress again
./compress decompress lots_of_words.txt.gz
# Error: file already exists (our safety flags work!)

Key Takeaways

We've managed to take a look at how we can use the compress/gzip package in order to perform compression and decompression of both files and data. Because the gzip package makes use of both io.Writer and io.Reader interfaces under the hood, it means you can use both these compression types with any data stream that you like, such as:

Files (as we've seen already)
TCP sockets
HTTP request/response bodies
Standard in-memory data (such as bytes.Buffer or io.Pipe)

Homework Tasks

As mentioned throughout this lesson, there are a few tasks to perform as homework to improve the functionality of this code:

Task 1: Add `-o` Flag for Output Filename

Add the -o flag to both compress and decompress functions to specify custom output filenames.

Example:

bash

./compress compress -o words.txt.gz lots_of_words.txt
./compress decompress -o words.txt lots_of_words.txt.gz

Make sure to validate that compression output ends in .gz.

Task 2: Implement Size Limits with `io.CopyN`

Change the io.Copy command to use io.CopyN, specifying a maximum number of bytes. Test with a lower size (like 100 bytes) against the lots_of_words.txt file.

Add a -l flag to specify the size limit in bytes. For an advanced challenge, support size prefixes like "2g" for 2 gigabytes.

Task 3: Automatic File Deletion with `-k` Flag

Add automatic file deletion of input files (similar to the standard gzip command behavior) and implement the -k flag to keep original files when needed.

Examples:

bash

# Standard behavior - deletes input file
./compress compress lots_of_words.txt

# Keep original file  
./compress compress -k lots_of_words.txt

Use os.Remove() function for file deletion.

Once you've completed these tasks, you'll be ready to move on to the next lesson!

Add `-o` Flag for Output Filename

Completed

Pending

Add the -o flag to both compress and decompress functions to allow users to specify custom output filenames instead of using the default naming convention.

Requirements:

Implement -o flag support for both compress and decompress commands
Validate that compression output ends in .gz extension
Handle cases where the output file already exists

Example Usage:

bash

./compress compress -o words.txt.gz lots_of_words.txt
./compress decompress -o words.txt lots_of_words.txt.gz

Implementation Notes:

Use Go's flag package to parse the -o option
Maintain existing functionality when -o flag is not provided
Ensure proper error handling for invalid output filenames

Implement Size Limits with `io.CopyN`

Completed

Pending

Replace the io.Copy command with io.CopyN to specify a maximum number of bytes that can be processed, protecting against zip bombs and controlling resource usage.

Requirements:

Change io.Copy to io.CopyN in both compress and decompress functions
Add a -l flag to specify the size limit in bytes
Test with a lower size (like 100 bytes) against the lots_of_words.txt file
Advanced Challenge: Support size prefixes like "2g" for 2 gigabytes, "100m" for 100 megabytes

Example Usage:

bash

./compress compress -l 1048576 lots_of_words.txt  # 1MB limit
./compress decompress -l 2g compressed_file.gz     # 2GB limit (advanced)

Implementation Notes:

Use io.CopyN(dst, src, n) instead of io.Copy(dst, src)
Handle the case where the limit is reached before all data is processed
For the advanced challenge, parse size suffixes (k, m, g) and convert to bytes

Automatic File Deletion with `-k` Flag

Completed

Pending

Implement automatic deletion of input files (matching standard gzip command behavior) and add the -k flag to keep original files when needed.

Requirements:

By default, delete the input file after successful compression/decompression
Add -k flag to keep the original file
Only delete files if the operation completes successfully
Use os.Remove() function for file deletion

Example Usage:

bash

# Standard behavior - deletes input file after compression
./compress compress lots_of_words.txt

# Keep original file  
./compress compress -k lots_of_words.txt

# Decompress and delete .gz file
./compress decompress lots_of_words.txt.gz

# Decompress and keep .gz file
./compress decompress -k lots_of_words.txt.gz

Implementation Notes:

Ensure file deletion only occurs after successful operations
Handle permission errors gracefully when attempting to delete files
Consider using defer statements for cleanup in error scenarios
Test with files that have different permissions to ensure robust error handling

How would you rate this lesson?

Full Course

Working with Gzip Files

Example: Examining a Compressed File

Uncompressing Files

Compressing Files

Important Note: File Size Matters

Lossless vs Lossy Compression

Implementing Gzip in Go

Go's Compress Package

Project Setup

Implementing the Compress Function

Step 1: Import Required Packages

Step 2: Open the Input File

Step 3: Create the Output File

Step 4: Create the Gzip Writer

Step 5: Copy Data

Testing the Compress Function

Implementing the Decompress Function

Step 1: Validate File Extension

Step 2: Create Output Filename

Step 3: Open Input File and Create Gzip Reader

Step 4: Create Output File with Safe Flags

Step 5: Copy Decompressed Data

Security Consideration: Zip Bombs

Testing the Decompress Function

Key Takeaways

Homework Tasks

Task 1: Add `-o` Flag for Output Filename

Task 2: Implement Size Limits with `io.CopyN`

Task 3: Automatic File Deletion with `-k` Flag

Add `-o` Flag for Output Filename

Implement Size Limits with `io.CopyN`

Automatic File Deletion with `-k` Flag

Get Support

Full Course

Working with Gzip Files

Example: Examining a Compressed File

Uncompressing Files

Compressing Files

Important Note: File Size Matters

Lossless vs Lossy Compression

Implementing Gzip in Go

Go's Compress Package

Project Setup

Implementing the Compress Function

Step 1: Import Required Packages

Step 2: Open the Input File

Step 3: Create the Output File

Step 4: Create the Gzip Writer

Step 5: Copy Data

Testing the Compress Function

Implementing the Decompress Function

Step 1: Validate File Extension

Step 2: Create Output Filename

Step 3: Open Input File and Create Gzip Reader

Step 4: Create Output File with Safe Flags

Step 5: Copy Decompressed Data

Security Consideration: Zip Bombs

Testing the Decompress Function

Key Takeaways

Homework Tasks

Task 1: Add -o Flag for Output Filename

Task 2: Implement Size Limits with io.CopyN

Task 3: Automatic File Deletion with -k Flag

Add `-o` Flag for Output Filename

Implement Size Limits with `io.CopyN`

Automatic File Deletion with `-k` Flag

Get Support

Task 1: Add `-o` Flag for Output Filename

Task 2: Implement Size Limits with `io.CopyN`

Task 3: Automatic File Deletion with `-k` Flag