Command Line Applications in Go

Locked video

Please purchase the course to watch this video.

Powerful Command Line Applications

Regular expressions

Regular expressions (regex) are powerful tools that enable developers to efficiently search, match, and manipulate patterns within text strings, enhancing coding workflows significantly. Unlike manually iterating through characters to count spaces or words, regex simplifies these tasks with concise patterns, allowing for dynamic text processing applicable across various programming languages. The lesson demonstrates how to implement regex in Go to create a straightforward word counting algorithm, using the regex package for practicality. It covers key techniques such as defining patterns for word and space detection, utilizing capture groups for extracting specific data like email domains, and performing text replacements based on matched criteria. With a recommendation to consult resources for mastering regex syntax, the discussion emphasizes the versatility and efficiency regex brings to string handling tasks in programming.

Back when we were implementing our word counter at the start of the course, we implemented our own simple algorithm in order to detect whenever a word crossed over to a space and used that in order to count the number of words inside of a text string.

Whilst we ended up implementing that algorithm with the io.Reader type in order to be able to handle large files or files of any size, initially we did it with the use of a string (or a slice of bytes) and iterated through each character or rune in that slice in order to determine whether or not it was a whitespace character.

The Problem with Custom Algorithms

When it comes to strings and dealing with patterns, whilst implementing your own algorithm is a perfectly viable option, there tends to be a better approach when it comes to writing code.

Enter Regular Expressions

This is to make use of something called regular expressions, or regex for short, which allow you to perform:

Finding patterns in text
Matching specific patterns
Replacement of parts of a string

Regular expressions are a way of performing pattern matching when it comes to text, and allow you to describe these patterns through the use of text themselves.

Example: Counting Words with Regex

Let's say we want to be able to count the number of words in the following text string: "1 2 3 4 5" (similar to what we were doing before).

Step 1: Understanding Regex Patterns

Before we implement this in Go, let's use regex101.com to define what our regex should be. This website is incredibly useful when working with regex, as it allows you to specify different flavors of regular expressions, including:

PCRE (Perl Compatible Regular Expressions)
PCRE2
Python
Golang ← We want this one!

Testing Our Pattern

Test String: 1 2 3 4 5

Goal: Match each word (should return 5 matches)

Basic Word Character Matching

Looking at the reference guide, we have:

\w - Any word character (letters, digits, underscore)

regex

\w

Result: This matches each individual character (19 matches) - not what we want!

Adding Quantifiers

We need the + quantifier, which means "one or more":

+ - One or more of the preceding character

regex

\w+

Result: Now we get exactly 5 matches - one for each word! ✅

Implementing Regex in Go

Now let's implement this word counting algorithm using Go's regexp package.

Basic Setup

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // Test string
    text := "1 2 3 4 5"
    
    // Count words using regex
    wordCount := countWords(text)
    fmt.Printf("Word count: %d\n", wordCount)
}

func countWords(text string) int {
    // Define regex pattern
    re := regexp.MustCompile(`\w+`)
    
    // Find all matches
    matches := re.FindAllString(text, -1)
    
    return len(matches)
}

Understanding the Code

1. Importing the Package

import "regexp"

The regexp package implements regular expression search in Go.

2. Compiling the Pattern

re := regexp.MustCompile(`\w+`)

Key Points:

MustCompile() - Like Compile() but panics if the expression cannot be parsed
Raw strings (backticks) - Avoid escaping issues with backslashes
Package variable - Compile once, use many times for performance

3. Raw Strings vs Regular Strings

❌ Problem with regular strings:

re := regexp.MustCompile("\\w+")  // Need to escape the backslash

✅ Solution with raw strings:

re := regexp.MustCompile(`\w+`)   // No escaping needed

4. Finding Matches

matches := re.FindAllString(text, -1)

Parameters:

text - The string to search in
-1 - Return all matches (use positive number to limit results)

Returns: []string containing all matches

Testing the Implementation

bash

go run main.go
# Output: Word count: 5

More Pattern Examples

Counting Whitespace Characters

Let's modify our example to count spaces instead of words:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "1 2 3 4 5"
    
    // Count whitespace characters
    re := regexp.MustCompile(`\s`)  // \s matches any whitespace
    matches := re.FindAllString(text, -1)
    
    fmt.Printf("Whitespace count: %d\n", len(matches))
}

Output: Whitespace count: 4

Common Character Classes

Pattern	Description	Example Matches
`\w`	Word characters	`a`, `B`, `3`, `_`
`\W`	Non-word characters	, `!`, `@`, `-`
`\s`	Whitespace	space, tab, newline
`\S`	Non-whitespace	`a`, `1`, `!`
`\d`	Digits	`0`, `1`, `9`
`\D`	Non-digits	`a`, `!`,
`.`	Any character except newline	`a`, `1`, , `!`

Testing with Complex Text

text := "Hello   World\t\nTest  123"
re := regexp.MustCompile(`\s`)
matches := re.FindAllString(text, -1)
fmt.Printf("Whitespace count: %d\n", len(matches))
// Output: Whitespace count: 9 (spaces, tabs, newlines)

Capture Groups

Regular expressions become really powerful when you use capture groups to extract specific parts of a match.

Example: Extracting Email Parts

package main

import (
    "fmt"
    "regexp"
)

func main() {
    email := "[email protected]"
    
    // Define regex with capture groups
    re := regexp.MustCompile(`(.*)@(.*)`)
    
    // Find submatches
    matches := re.FindStringSubmatch(email)
    
    if len(matches) >= 3 {
        fmt.Printf("Full match: %s\n", matches[0])
        fmt.Printf("Username: %s\n", matches[1])
        fmt.Printf("Domain: %s\n", matches[2])
    }
}

Output:

Full match: [email protected]
Username: foo
Domain: bar.com

Understanding Capture Groups

Regex Pattern: `(.)@(.)`

(.*) - First capture group: Match any characters before @
@ - Literal: Match the @ symbol
(.*) - Second capture group: Match any characters after @

Match Results Array

matches[0] - Full match: The entire matched string
matches[1] - Group 1: Content of first capture group
matches[2] - Group 2: Content of second capture group

More Realistic Email Example

package main

import (
    "fmt"
    "regexp"
)

func parseEmail(email string) {
    // More specific email pattern
    re := regexp.MustCompile(`([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})`)
    
    matches := re.FindStringSubmatch(email)
    
    if len(matches) >= 3 {
        username := matches[1]
        domain := matches[2]
        
        fmt.Printf("User: %s\n", username)
        fmt.Printf("Domain: %s\n", domain)
    } else {
        fmt.Println("Invalid email format")
    }
}

func main() {
    parseEmail("[email protected]")
    // Output:
    // User: test
    // Domain: dreamsofcode.io
    
    parseEmail("invalid-email")
    // Output: Invalid email format
}

String Replacement

Regular expressions can also be used to perform sophisticated text replacement.

Example: Replacing Development Emails

package main

import (
    "fmt"
    "regexp"
)

func main() {
    emails := []string{
        "[email protected]",
        "[email protected]", 
        "[email protected]",
        "[email protected]",
    }
    
    // Pattern to match emails starting with "dev"
    re := regexp.MustCompile(`dev@.*`)
    
    for _, email := range emails {
        // Replace dev emails with test email
        newEmail := re.ReplaceAllString(email, "[email protected]")
        fmt.Printf("Original: %s -> Replaced: %s\n", email, newEmail)
    }
}

Output:

Original: [email protected] -> Replaced: [email protected]
Original: [email protected] -> Replaced: [email protected]
Original: [email protected] -> Replaced: [email protected]
Original: [email protected] -> Replaced: [email protected]

Replacement Methods

Method	Description	Use Case
`ReplaceAllString(src, repl string)`	Replace with literal string	Simple replacements
`ReplaceAllStringFunc(src string, repl func(string) string)`	Replace using function	Dynamic replacements
`ReplaceAllLiteralString(src, repl string)`	Replace with literal (no regex in replacement)	When replacement contains `$`

Advanced Replacement with Capture Groups

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "Contact us at [email protected] or [email protected]"
    
    // Pattern with capture groups
    re := regexp.MustCompile(`(\w+)@company\.com`)
    
    // Replace using capture groups in replacement string
    result := re.ReplaceAllString(text, "${1}@newcompany.io")
    
    fmt.Println("Original:", text)
    fmt.Println("Modified:", result)
}

Output:

Original: Contact us at [email protected] or [email protected]
Modified: Contact us at [email protected] or [email protected]

Performance Considerations

Compile Once, Use Many Times

package main

import (
    "fmt"
    "regexp"
)

// ✅ Good: Compile once as package variable
var emailRegex = regexp.MustCompile(`([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})`)

func validateEmail(email string) bool {
    return emailRegex.MatchString(email)
}

func parseEmails(emails []string) {
    for _, email := range emails {
        if validateEmail(email) {
            matches := emailRegex.FindStringSubmatch(email)
            fmt.Printf("Valid: %s (user: %s, domain: %s)\n", 
                email, matches[1], matches[2])
        } else {
            fmt.Printf("Invalid: %s\n", email)
        }
    }
}

func main() {
    emails := []string{
        "[email protected]",
        "[email protected]", 
        "invalid-email",
        "user@domain",
    }
    
    parseEmails(emails)
}

❌ Anti-pattern: Compiling in Loops

// Don't do this!
func badValidateEmails(emails []string) {
    for _, email := range emails {
        // ❌ Compiling regex in every iteration
        re := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)
        if re.MatchString(email) {
            fmt.Printf("Valid: %s\n", email)
        }
    }
}

Common Regex Patterns

1. Validation Patterns

// Email validation
var emailRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)

// Phone number (US format)
var phoneRegex = regexp.MustCompile(`^\(\d{3}\) \d{3}-\d{4}$`)

// URL validation
var urlRegex = regexp.MustCompile(`^https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/.*)?$`)

// IP address
var ipRegex = regexp.MustCompile(`^(\d{1,3}\.){3}\d{1,3}$`)

2. Extraction Patterns

// Extract hashtags from text
var hashtagRegex = regexp.MustCompile(`#\w+`)

// Extract mentions from text  
var mentionRegex = regexp.MustCompile(`@\w+`)

// Extract numbers from text
var numberRegex = regexp.MustCompile(`\d+`)

// Extract dates (MM/DD/YYYY)
var dateRegex = regexp.MustCompile(`\d{1,2}/\d{1,2}/\d{4}`)

3. Text Processing Patterns

// Remove extra whitespace
var whitespaceRegex = regexp.MustCompile(`\s+`)
text = whitespaceRegex.ReplaceAllString(text, " ")

// Remove HTML tags
var htmlRegex = regexp.MustCompile(`<[^>]*>`)
text = htmlRegex.ReplaceAllString(text, "")

// Convert camelCase to snake_case
var camelCaseRegex = regexp.MustCompile(`([a-z])([A-Z])`)
snake_case = camelCaseRegex.ReplaceAllString(text, "${1}_${2}")

Regex Methods Reference

Finding Methods

Method	Returns	Description
`MatchString(s)`	`bool`	Does the string match?
`FindString(s)`	`string`	First match
`FindAllString(s, n)`	`[]string`	All matches (n = -1 for all)
`FindStringSubmatch(s)`	`[]string`	First match + capture groups
`FindAllStringSubmatch(s, n)`	`[][]string`	All matches + capture groups

Replacement Methods

Method	Description
`ReplaceAllString(src, repl)`	Replace all matches with string
`ReplaceAllStringFunc(src, func)`	Replace all matches using function
`ReplaceAllLiteralString(src, repl)`	Replace all matches with literal string

Complete Example: Log Parser

Here's a practical example that combines multiple regex concepts:

package main

import (
    "fmt"
    "regexp"
    "strings"
)

// Log entry patterns
var (
    logRegex = regexp.MustCompile(`^(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] (.*)$`)
    ipRegex  = regexp.MustCompile(`\b(\d{1,3}\.){3}\d{1,3}\b`)
    urlRegex = regexp.MustCompile(`"[A-Z]+\s+([^\s"]+)`)
)

type LogEntry struct {
    Date    string
    Time    string
    Level   string
    Message string
    IPs     []string
    URLs    []string
}

func parseLogEntry(line string) (*LogEntry, error) {
    matches := logRegex.FindStringSubmatch(line)
    if len(matches) < 5 {
        return nil, fmt.Errorf("invalid log format")
    }
    
    entry := &LogEntry{
        Date:    matches[1],
        Time:    matches[2],
        Level:   matches[3],
        Message: matches[4],
    }
    
    // Extract IP addresses
    entry.IPs = ipRegex.FindAllString(entry.Message, -1)
    
    // Extract URLs
    urlMatches := urlRegex.FindAllStringSubmatch(entry.Message, -1)
    for _, match := range urlMatches {
        if len(match) > 1 {
            entry.URLs = append(entry.URLs, match[1])
        }
    }
    
    return entry, nil
}

func main() {
    logLines := []string{
        `2024-01-15 10:30:45 [INFO] User 192.168.1.100 accessed "GET /api/users"`,
        `2024-01-15 10:31:02 [ERROR] Failed login from 10.0.0.5 for "POST /auth/login"`,
        `2024-01-15 10:31:15 [WARN] Rate limit exceeded for 172.16.0.1 on "GET /api/data"`,
    }
    
    for i, line := range logLines {
        entry, err := parseLogEntry(line)
        if err != nil {
            fmt.Printf("Error parsing line %d: %v\n", i+1, err)
            continue
        }
        
        fmt.Printf("Entry %d:\n", i+1)
        fmt.Printf("  Date: %s, Time: %s\n", entry.Date, entry.Time)
        fmt.Printf("  Level: %s\n", entry.Level)
        fmt.Printf("  Message: %s\n", entry.Message)
        fmt.Printf("  IPs: %v\n", entry.IPs)
        fmt.Printf("  URLs: %v\n", entry.URLs)
        fmt.Println()
    }
}

Homework Assignments

Now that we've looked at how to use regular expressions, let me set some homework to practice:

Assignment 1: Count Newline Characters

Create a regular expression that can match newline characters, similar to our countLines function from earlier.

Test String:

Expected Result: 4 newline characters

Hint: Use \n to match newline characters.

package main

import (
    "fmt"
    "regexp"
)

func countNewlines(text string) int {
    // TODO: Implement using regex
    // Pattern should match newline characters
    
    return 0 // Replace with actual implementation
}

func main() {
    text := "1\n2\n3\n4\n5"
    count := countNewlines(text)
    fmt.Printf("Newline count: %d\n", count) // Should print: 4
}

Assignment 2: Count Uppercase Letters

Write a regular expression that can count the number of uppercase letters (A-Z).

Test String: "Hello World! This Has 7 Uppercase Letters"

Expected Result: 7 uppercase letters

Hint: Use character ranges with [A-Z] to match uppercase letters.

package main

import (
    "fmt"
    "regexp"
)

func countUppercase(text string) int {
    // TODO: Implement using regex
    // Pattern should match uppercase letters A-Z
    
    return 0 // Replace with actual implementation
}

func main() {
    text := "Hello World! This Has 7 Uppercase Letters"
    count := countUppercase(text)
    fmt.Printf("Uppercase count: %d\n", count) // Should print: 7
}

Bonus Assignment: Extract Phone Numbers

Create a regex that can extract phone numbers in the format (XXX) XXX-XXXX:

func extractPhoneNumbers(text string) []string {
    // TODO: Implement pattern for (XXX) XXX-XXXX format
    // Example: "(555) 123-4567"
    
    return nil
}

func main() {
    text := "Call me at (555) 123-4567 or (999) 888-7777 for more info"
    phones := extractPhoneNumbers(text)
    fmt.Printf("Phone numbers: %v\n", phones)
    // Should print: [(555) 123-4567 (999) 888-7777]
}

Testing Your Solutions

Create test cases to verify your regex patterns work correctly:

func TestCountNewlines(t *testing.T) {
    tests := []struct {
        input    string
        expected int
    }{
        {"1\n2\n3", 2},
        {"no newlines", 0},
        {"one\nline", 1},
        {"\n\n\n", 3},
    }
    
    for _, test := range tests {
        result := countNewlines(test.input)
        if result != test.expected {
            t.Errorf("countNewlines(%q) = %d, want %d", 
                test.input, result, test.expected)
        }
    }
}

Summary

Regular expressions are a powerful tool for pattern matching and text processing:

Pattern Matching - Find specific patterns in text
Capture Groups - Extract parts of matches using parentheses
String Replacement - Replace matched patterns with new text
Performance - Compile patterns once, use many times
Versatility - Works across many programming languages

Key Takeaways:

Use raw strings (backticks) to avoid escaping issues
Compile once as package variables for performance
Capture groups enable powerful text extraction
Regular expressions are a large topic - consider dedicated study
Useful for validation, extraction, and text processing

Once you've completed the homework assignments, you should have a good understanding of how to use regular expressions in Go and be ready to move on to the next lesson.

Additional Resources

regex101.com - Interactive regex testing and learning
Regular Expressions books - For deep understanding of regex patterns
Go regexp documentation - Complete method reference
Common regex patterns - Cheat sheets for typical use cases

Regular expressions can seem daunting at first, especially as the syntax is rather abstract, but they're useful in so many places that I really recommend studying them further!

Count Newline Characters

Completed

Pending

Create a regular expression that can match newline characters, similar to the countLines function from earlier.

Test String:

Expected Result: 4 newline characters

Hint: Use \n to match newline characters.

package main

import (
    "fmt"
    "regexp"
)

func countNewlines(text string) int {
    // TODO: Implement using regex
    // Pattern should match newline characters

    return 0 // Replace with actual implementation
}

func main() {
    text := "1\n2\n3\n4\n5"
    count := countNewlines(text)
    fmt.Printf("Newline count: %d\n", count) // Should print: 4
}

Count Uppercase Letters

Completed

Pending

Write a regular expression that can count the number of uppercase letters (A-Z).

Test String: "Hello World! This Has 7 Uppercase Letters"

Expected Result: 7 uppercase letters

Hint: Use character ranges with [A-Z] to match uppercase letters.

package main

import (
    "fmt"
    "regexp"
)

func countUppercase(text string) int {
    // TODO: Implement using regex
    // Pattern should match uppercase letters A-Z

    return 0 // Replace with actual implementation
}

func main() {
    text := "Hello World! This Has 7 Uppercase Letters"
    count := countUppercase(text)
    fmt.Printf("Uppercase count: %d\n", count) // Should print: 7
}

Extract Phone Numbers (Bonus)

Completed

Pending

Create a regex that can extract phone numbers in the format (XXX) XXX-XXXX.

func extractPhoneNumbers(text string) []string {
    // TODO: Implement pattern for (XXX) XXX-XXXX format
    // Example: "(555) 123-4567"

    return nil
}

func main() {
    text := "Call me at (555) 123-4567 or (999) 888-7777 for more info"
    phones := extractPhoneNumbers(text)
    fmt.Printf("Phone numbers: %v\n", phones)
    // Should print: [(555) 123-4567 (999) 888-7777]
}

Testing Your Solutions:

Create test cases to verify your regex patterns work correctly:

func TestCountNewlines(t *testing.T) {
    tests := []struct {
        input    string
        expected int
    }{
        {"1\n2\n3", 2},
        {"no newlines", 0},
        {"one\nline", 1},
        {"\n\n\n", 3},
    }

    for _, test := range tests {
        result := countNewlines(test.input)
        if result != test.expected {
            t.Errorf("countNewlines(%q) = %d, want %d",
                test.input, result, test.expected)
        }
    }
}

How would you rate this lesson?

Full Course

The Problem with Custom Algorithms

Enter Regular Expressions

Example: Counting Words with Regex

Step 1: Understanding Regex Patterns

Testing Our Pattern

Basic Word Character Matching

Adding Quantifiers

Implementing Regex in Go

Basic Setup

Understanding the Code

1. Importing the Package

2. Compiling the Pattern

3. Raw Strings vs Regular Strings

4. Finding Matches

Testing the Implementation

More Pattern Examples

Counting Whitespace Characters

Common Character Classes

Testing with Complex Text

Capture Groups

Example: Extracting Email Parts

Understanding Capture Groups

Regex Pattern: (.*)@(.*)

Match Results Array

More Realistic Email Example

String Replacement

Example: Replacing Development Emails

Replacement Methods

Advanced Replacement with Capture Groups

Performance Considerations

Compile Once, Use Many Times

❌ Anti-pattern: Compiling in Loops

Common Regex Patterns

1. Validation Patterns

2. Extraction Patterns

3. Text Processing Patterns

Regex Methods Reference

Finding Methods

Replacement Methods

Complete Example: Log Parser

Homework Assignments

Assignment 1: Count Newline Characters

Assignment 2: Count Uppercase Letters

Bonus Assignment: Extract Phone Numbers

Testing Your Solutions

Summary

Key Takeaways:

Additional Resources

Count Newline Characters

Count Uppercase Letters

Extract Phone Numbers (Bonus)

Get Support

Regex Pattern: `(.)@(.)`