Please purchase the course to watch this video.

Full Course
Regular expressions (regex) are powerful tools that enable developers to efficiently search, match, and manipulate patterns within text strings, enhancing coding workflows significantly. Unlike manually iterating through characters to count spaces or words, regex simplifies these tasks with concise patterns, allowing for dynamic text processing applicable across various programming languages. The lesson demonstrates how to implement regex in Go to create a straightforward word counting algorithm, using the regex package for practicality. It covers key techniques such as defining patterns for word and space detection, utilizing capture groups for extracting specific data like email domains, and performing text replacements based on matched criteria. With a recommendation to consult resources for mastering regex syntax, the discussion emphasizes the versatility and efficiency regex brings to string handling tasks in programming.
Back when we were implementing our word counter at the start of the course, we implemented our own simple algorithm in order to detect whenever a word crossed over to a space and used that in order to count the number of words inside of a text string.
Whilst we ended up implementing that algorithm with the io.Reader
type in order to be able to handle large files or files of any size, initially we did it with the use of a string (or a slice of bytes) and iterated through each character or rune in that slice in order to determine whether or not it was a whitespace character.
The Problem with Custom Algorithms
When it comes to strings and dealing with patterns, whilst implementing your own algorithm is a perfectly viable option, there tends to be a better approach when it comes to writing code.
Enter Regular Expressions
This is to make use of something called regular expressions, or regex for short, which allow you to perform:
- Finding patterns in text
- Matching specific patterns
- Replacement of parts of a string
Regular expressions are a way of performing pattern matching when it comes to text, and allow you to describe these patterns through the use of text themselves.
Example: Counting Words with Regex
Let's say we want to be able to count the number of words in the following text string: "1 2 3 4 5"
(similar to what we were doing before).
Step 1: Understanding Regex Patterns
Before we implement this in Go, let's use regex101.com to define what our regex should be. This website is incredibly useful when working with regex, as it allows you to specify different flavors of regular expressions, including:
- PCRE (Perl Compatible Regular Expressions)
- PCRE2
- Python
- Golang ← We want this one!
Testing Our Pattern
Test String: 1 2 3 4 5
Goal: Match each word (should return 5 matches)
Basic Word Character Matching
Looking at the reference guide, we have:
\w
- Any word character (letters, digits, underscore)
\w
Result: This matches each individual character (19 matches) - not what we want!
Adding Quantifiers
We need the +
quantifier, which means "one or more":
+
- One or more of the preceding character
\w+
Result: Now we get exactly 5 matches - one for each word! ✅
Implementing Regex in Go
Now let's implement this word counting algorithm using Go's regexp
package.
Basic Setup
package main
import (
"fmt"
"regexp"
)
func main() {
// Test string
text := "1 2 3 4 5"
// Count words using regex
wordCount := countWords(text)
fmt.Printf("Word count: %d\n", wordCount)
}
func countWords(text string) int {
// Define regex pattern
re := regexp.MustCompile(`\w+`)
// Find all matches
matches := re.FindAllString(text, -1)
return len(matches)
}
Understanding the Code
1. Importing the Package
import "regexp"
The regexp
package implements regular expression search in Go.
2. Compiling the Pattern
re := regexp.MustCompile(`\w+`)
Key Points:
MustCompile()
- LikeCompile()
but panics if the expression cannot be parsed- Raw strings (backticks) - Avoid escaping issues with backslashes
- Package variable - Compile once, use many times for performance
3. Raw Strings vs Regular Strings
❌ Problem with regular strings:
re := regexp.MustCompile("\\w+") // Need to escape the backslash
✅ Solution with raw strings:
re := regexp.MustCompile(`\w+`) // No escaping needed
4. Finding Matches
matches := re.FindAllString(text, -1)
Parameters:
text
- The string to search in-1
- Return all matches (use positive number to limit results)
Returns: []string
containing all matches
Testing the Implementation
go run main.go
# Output: Word count: 5
More Pattern Examples
Counting Whitespace Characters
Let's modify our example to count spaces instead of words:
package main
import (
"fmt"
"regexp"
)
func main() {
text := "1 2 3 4 5"
// Count whitespace characters
re := regexp.MustCompile(`\s`) // \s matches any whitespace
matches := re.FindAllString(text, -1)
fmt.Printf("Whitespace count: %d\n", len(matches))
}
Output: Whitespace count: 4
Common Character Classes
Pattern | Description | Example Matches |
---|---|---|
\w |
Word characters | a , B , 3 , _ |
\W |
Non-word characters | , ! , @ , - |
\s |
Whitespace | space, tab, newline |
\S |
Non-whitespace | a , 1 , ! |
\d |
Digits | 0 , 1 , 9 |
\D |
Non-digits | a , ! , |
. |
Any character except newline | a , 1 , , ! |
Testing with Complex Text
text := "Hello World\t\nTest 123"
re := regexp.MustCompile(`\s`)
matches := re.FindAllString(text, -1)
fmt.Printf("Whitespace count: %d\n", len(matches))
// Output: Whitespace count: 9 (spaces, tabs, newlines)
Capture Groups
Regular expressions become really powerful when you use capture groups to extract specific parts of a match.
Example: Extracting Email Parts
package main
import (
"fmt"
"regexp"
)
func main() {
email := "[email protected]"
// Define regex with capture groups
re := regexp.MustCompile(`(.*)@(.*)`)
// Find submatches
matches := re.FindStringSubmatch(email)
if len(matches) >= 3 {
fmt.Printf("Full match: %s\n", matches[0])
fmt.Printf("Username: %s\n", matches[1])
fmt.Printf("Domain: %s\n", matches[2])
}
}
Output:
Full match: [email protected]
Username: foo
Domain: bar.com
Understanding Capture Groups
Regex Pattern: (.*)@(.*)
(.*)
- First capture group: Match any characters before@
@
- Literal: Match the@
symbol(.*)
- Second capture group: Match any characters after@
Match Results Array
matches[0]
- Full match: The entire matched stringmatches[1]
- Group 1: Content of first capture groupmatches[2]
- Group 2: Content of second capture group
More Realistic Email Example
package main
import (
"fmt"
"regexp"
)
func parseEmail(email string) {
// More specific email pattern
re := regexp.MustCompile(`([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})`)
matches := re.FindStringSubmatch(email)
if len(matches) >= 3 {
username := matches[1]
domain := matches[2]
fmt.Printf("User: %s\n", username)
fmt.Printf("Domain: %s\n", domain)
} else {
fmt.Println("Invalid email format")
}
}
func main() {
parseEmail("[email protected]")
// Output:
// User: test
// Domain: dreamsofcode.io
parseEmail("invalid-email")
// Output: Invalid email format
}
String Replacement
Regular expressions can also be used to perform sophisticated text replacement.
Example: Replacing Development Emails
package main
import (
"fmt"
"regexp"
)
func main() {
emails := []string{
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
}
// Pattern to match emails starting with "dev"
re := regexp.MustCompile(`dev@.*`)
for _, email := range emails {
// Replace dev emails with test email
newEmail := re.ReplaceAllString(email, "[email protected]")
fmt.Printf("Original: %s -> Replaced: %s\n", email, newEmail)
}
}
Output:
Original: [email protected] -> Replaced: [email protected]
Original: [email protected] -> Replaced: [email protected]
Original: [email protected] -> Replaced: [email protected]
Original: [email protected] -> Replaced: [email protected]
Replacement Methods
Method | Description | Use Case |
---|---|---|
ReplaceAllString(src, repl string) |
Replace with literal string | Simple replacements |
ReplaceAllStringFunc(src string, repl func(string) string) |
Replace using function | Dynamic replacements |
ReplaceAllLiteralString(src, repl string) |
Replace with literal (no regex in replacement) | When replacement contains $ |
Advanced Replacement with Capture Groups
package main
import (
"fmt"
"regexp"
)
func main() {
text := "Contact us at [email protected] or [email protected]"
// Pattern with capture groups
re := regexp.MustCompile(`(\w+)@company\.com`)
// Replace using capture groups in replacement string
result := re.ReplaceAllString(text, "${1}@newcompany.io")
fmt.Println("Original:", text)
fmt.Println("Modified:", result)
}
Output:
Original: Contact us at [email protected] or [email protected]
Modified: Contact us at [email protected] or [email protected]
Performance Considerations
Compile Once, Use Many Times
package main
import (
"fmt"
"regexp"
)
// ✅ Good: Compile once as package variable
var emailRegex = regexp.MustCompile(`([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})`)
func validateEmail(email string) bool {
return emailRegex.MatchString(email)
}
func parseEmails(emails []string) {
for _, email := range emails {
if validateEmail(email) {
matches := emailRegex.FindStringSubmatch(email)
fmt.Printf("Valid: %s (user: %s, domain: %s)\n",
email, matches[1], matches[2])
} else {
fmt.Printf("Invalid: %s\n", email)
}
}
}
func main() {
emails := []string{
"[email protected]",
"[email protected]",
"invalid-email",
"user@domain",
}
parseEmails(emails)
}
❌ Anti-pattern: Compiling in Loops
// Don't do this!
func badValidateEmails(emails []string) {
for _, email := range emails {
// ❌ Compiling regex in every iteration
re := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)
if re.MatchString(email) {
fmt.Printf("Valid: %s\n", email)
}
}
}
Common Regex Patterns
1. Validation Patterns
// Email validation
var emailRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
// Phone number (US format)
var phoneRegex = regexp.MustCompile(`^\(\d{3}\) \d{3}-\d{4}$`)
// URL validation
var urlRegex = regexp.MustCompile(`^https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/.*)?$`)
// IP address
var ipRegex = regexp.MustCompile(`^(\d{1,3}\.){3}\d{1,3}$`)
2. Extraction Patterns
// Extract hashtags from text
var hashtagRegex = regexp.MustCompile(`#\w+`)
// Extract mentions from text
var mentionRegex = regexp.MustCompile(`@\w+`)
// Extract numbers from text
var numberRegex = regexp.MustCompile(`\d+`)
// Extract dates (MM/DD/YYYY)
var dateRegex = regexp.MustCompile(`\d{1,2}/\d{1,2}/\d{4}`)
3. Text Processing Patterns
// Remove extra whitespace
var whitespaceRegex = regexp.MustCompile(`\s+`)
text = whitespaceRegex.ReplaceAllString(text, " ")
// Remove HTML tags
var htmlRegex = regexp.MustCompile(`<[^>]*>`)
text = htmlRegex.ReplaceAllString(text, "")
// Convert camelCase to snake_case
var camelCaseRegex = regexp.MustCompile(`([a-z])([A-Z])`)
snake_case = camelCaseRegex.ReplaceAllString(text, "${1}_${2}")
Regex Methods Reference
Finding Methods
Method | Returns | Description |
---|---|---|
MatchString(s) |
bool |
Does the string match? |
FindString(s) |
string |
First match |
FindAllString(s, n) |
[]string |
All matches (n = -1 for all) |
FindStringSubmatch(s) |
[]string |
First match + capture groups |
FindAllStringSubmatch(s, n) |
[][]string |
All matches + capture groups |
Replacement Methods
Method | Description |
---|---|
ReplaceAllString(src, repl) |
Replace all matches with string |
ReplaceAllStringFunc(src, func) |
Replace all matches using function |
ReplaceAllLiteralString(src, repl) |
Replace all matches with literal string |
Complete Example: Log Parser
Here's a practical example that combines multiple regex concepts:
package main
import (
"fmt"
"regexp"
"strings"
)
// Log entry patterns
var (
logRegex = regexp.MustCompile(`^(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] (.*)$`)
ipRegex = regexp.MustCompile(`\b(\d{1,3}\.){3}\d{1,3}\b`)
urlRegex = regexp.MustCompile(`"[A-Z]+\s+([^\s"]+)`)
)
type LogEntry struct {
Date string
Time string
Level string
Message string
IPs []string
URLs []string
}
func parseLogEntry(line string) (*LogEntry, error) {
matches := logRegex.FindStringSubmatch(line)
if len(matches) < 5 {
return nil, fmt.Errorf("invalid log format")
}
entry := &LogEntry{
Date: matches[1],
Time: matches[2],
Level: matches[3],
Message: matches[4],
}
// Extract IP addresses
entry.IPs = ipRegex.FindAllString(entry.Message, -1)
// Extract URLs
urlMatches := urlRegex.FindAllStringSubmatch(entry.Message, -1)
for _, match := range urlMatches {
if len(match) > 1 {
entry.URLs = append(entry.URLs, match[1])
}
}
return entry, nil
}
func main() {
logLines := []string{
`2024-01-15 10:30:45 [INFO] User 192.168.1.100 accessed "GET /api/users"`,
`2024-01-15 10:31:02 [ERROR] Failed login from 10.0.0.5 for "POST /auth/login"`,
`2024-01-15 10:31:15 [WARN] Rate limit exceeded for 172.16.0.1 on "GET /api/data"`,
}
for i, line := range logLines {
entry, err := parseLogEntry(line)
if err != nil {
fmt.Printf("Error parsing line %d: %v\n", i+1, err)
continue
}
fmt.Printf("Entry %d:\n", i+1)
fmt.Printf(" Date: %s, Time: %s\n", entry.Date, entry.Time)
fmt.Printf(" Level: %s\n", entry.Level)
fmt.Printf(" Message: %s\n", entry.Message)
fmt.Printf(" IPs: %v\n", entry.IPs)
fmt.Printf(" URLs: %v\n", entry.URLs)
fmt.Println()
}
}
Homework Assignments
Now that we've looked at how to use regular expressions, let me set some homework to practice:
Assignment 1: Count Newline Characters
Create a regular expression that can match newline characters, similar to our countLines
function from earlier.
Test String:
1
2
3
4
5
Expected Result: 4 newline characters
Hint: Use \n
to match newline characters.
package main
import (
"fmt"
"regexp"
)
func countNewlines(text string) int {
// TODO: Implement using regex
// Pattern should match newline characters
return 0 // Replace with actual implementation
}
func main() {
text := "1\n2\n3\n4\n5"
count := countNewlines(text)
fmt.Printf("Newline count: %d\n", count) // Should print: 4
}
Assignment 2: Count Uppercase Letters
Write a regular expression that can count the number of uppercase letters (A-Z).
Test String: "Hello World! This Has 7 Uppercase Letters"
Expected Result: 7 uppercase letters
Hint: Use character ranges with [A-Z]
to match uppercase letters.
package main
import (
"fmt"
"regexp"
)
func countUppercase(text string) int {
// TODO: Implement using regex
// Pattern should match uppercase letters A-Z
return 0 // Replace with actual implementation
}
func main() {
text := "Hello World! This Has 7 Uppercase Letters"
count := countUppercase(text)
fmt.Printf("Uppercase count: %d\n", count) // Should print: 7
}
Bonus Assignment: Extract Phone Numbers
Create a regex that can extract phone numbers in the format (XXX) XXX-XXXX
:
func extractPhoneNumbers(text string) []string {
// TODO: Implement pattern for (XXX) XXX-XXXX format
// Example: "(555) 123-4567"
return nil
}
func main() {
text := "Call me at (555) 123-4567 or (999) 888-7777 for more info"
phones := extractPhoneNumbers(text)
fmt.Printf("Phone numbers: %v\n", phones)
// Should print: [(555) 123-4567 (999) 888-7777]
}
Testing Your Solutions
Create test cases to verify your regex patterns work correctly:
func TestCountNewlines(t *testing.T) {
tests := []struct {
input string
expected int
}{
{"1\n2\n3", 2},
{"no newlines", 0},
{"one\nline", 1},
{"\n\n\n", 3},
}
for _, test := range tests {
result := countNewlines(test.input)
if result != test.expected {
t.Errorf("countNewlines(%q) = %d, want %d",
test.input, result, test.expected)
}
}
}
Summary
Regular expressions are a powerful tool for pattern matching and text processing:
- Pattern Matching - Find specific patterns in text
- Capture Groups - Extract parts of matches using parentheses
- String Replacement - Replace matched patterns with new text
- Performance - Compile patterns once, use many times
- Versatility - Works across many programming languages
Key Takeaways:
- Use raw strings (backticks) to avoid escaping issues
- Compile once as package variables for performance
- Capture groups enable powerful text extraction
- Regular expressions are a large topic - consider dedicated study
- Useful for validation, extraction, and text processing
Once you've completed the homework assignments, you should have a good understanding of how to use regular expressions in Go and be ready to move on to the next lesson.
Additional Resources
- regex101.com - Interactive regex testing and learning
- Regular Expressions books - For deep understanding of regex patterns
- Go regexp documentation - Complete method reference
- Common regex patterns - Cheat sheets for typical use cases
Regular expressions can seem daunting at first, especially as the syntax is rather abstract, but they're useful in so many places that I really recommend studying them further!
Count Newline Characters
Create a regular expression that can match newline characters, similar to the countLines
function from earlier.
Test String:
1
2
3
4
5
Expected Result: 4 newline characters
Hint: Use \n
to match newline characters.
package main
import (
"fmt"
"regexp"
)
func countNewlines(text string) int {
// TODO: Implement using regex
// Pattern should match newline characters
return 0 // Replace with actual implementation
}
func main() {
text := "1\n2\n3\n4\n5"
count := countNewlines(text)
fmt.Printf("Newline count: %d\n", count) // Should print: 4
}
Count Uppercase Letters
Write a regular expression that can count the number of uppercase letters (A-Z).
Test String: "Hello World! This Has 7 Uppercase Letters"
Expected Result: 7 uppercase letters
Hint: Use character ranges with [A-Z]
to match uppercase letters.
package main
import (
"fmt"
"regexp"
)
func countUppercase(text string) int {
// TODO: Implement using regex
// Pattern should match uppercase letters A-Z
return 0 // Replace with actual implementation
}
func main() {
text := "Hello World! This Has 7 Uppercase Letters"
count := countUppercase(text)
fmt.Printf("Uppercase count: %d\n", count) // Should print: 7
}
Extract Phone Numbers (Bonus)
Create a regex that can extract phone numbers in the format (XXX) XXX-XXXX
.
func extractPhoneNumbers(text string) []string {
// TODO: Implement pattern for (XXX) XXX-XXXX format
// Example: "(555) 123-4567"
return nil
}
func main() {
text := "Call me at (555) 123-4567 or (999) 888-7777 for more info"
phones := extractPhoneNumbers(text)
fmt.Printf("Phone numbers: %v\n", phones)
// Should print: [(555) 123-4567 (999) 888-7777]
}
Testing Your Solutions:
Create test cases to verify your regex patterns work correctly:
func TestCountNewlines(t *testing.T) {
tests := []struct {
input string
expected int
}{
{"1\n2\n3", 2},
{"no newlines", 0},
{"one\nline", 1},
{"\n\n\n", 3},
}
for _, test := range tests {
result := countNewlines(test.input)
if result != test.expected {
t.Errorf("countNewlines(%q) = %d, want %d",
test.input, result, test.expected)
}
}
}