Day 10: Automating Log Analysis with Bash Script - Log Analyzer and Report Generator

Day 10: Automating Log Analysis with Bash Script - Log Analyzer and Report Generator

Welcome to Day 10 of my #90DaysOfDevOps challenge!

Today, I’ll walk you through a task where I created a Bash script to automate the process of analyzing server log files. This script helps system administrators identify critical events, count errors, and generate a daily summary report. The task also includes archiving processed log files for future reference.


Task Overview

As a DevOps enthusiast, log analysis plays an important role in maintaining system health and troubleshooting issues. In this challenge, I automated the process of log file analysis using a Bash script that:

  • Counts total error messages.

  • Finds critical events based on specific keywords (e.g., “CRITICAL”).

  • Identifies the top 5 most common error messages.

  • Generates a summary report and archives the log file.

Let’s dive into how I tackled this challenge!


1. Script Overview

The script log_analyzer.sh performs the following steps:

  • Validates the input to ensure a log file path is provided.

  • Counts the total number of lines and errors in the log file.

  • Finds and displays critical events containing the keyword “CRITICAL”.

  • Identifies the top 5 most common error messages in the log file.

  • Generates a summary report and saves it to a .txt file.

  • Archives the processed log file to a dedicated directory for future reference.

Here’s the core of the script:


#!/bin/bash

<<Info
Author       : Amitabh Soni
Date         : 5/12/2024
Description  : This script analyzes a log file and generates a summary report. It counts errors, finds critical events,
                  identifies the top 5 most common errors, and creates a summary report. Optionally, it archives the log file.
Usage        : ./log_analyzer.sh <log_file_path>
Info

# Function to check input arguments
validate_input() {
   if [[ $# -ne 1 ]]; then
    echo "Usage: $0 <log_file_path>"
    exit 1
   fi

   log_file=$1

   if [[ ! -f $log_file || ! -r $log_file ]]; then
    echo "Error: File '$log_file' does not exist or is not readable."
    exit 1
   fi
}

# Function to count total lines in the log file
count_total_lines() {
   wc -l < "$log_file"
}

# Function to count total errors
count_total_errors() {
   grep -i -c "error" "$log_file"
}

# Function to find critical events
find_critical_events() {
   awk '/CRITICAL/ {print NR, $0}' "$log_file"
}

# Function to find top 5 most common error messages
find_top_errors() {
   grep -i "error" "$log_file" | \
   awk '{$1=$2=$3=""; print $0}' | \
   sort | uniq -c | sort -nr | head -n 5
}

# Function to generate the summary report
generate_report() {
   echo "Log Analysis Report" > "$report_file"
   echo "--------------------" >> "$report_file"
   echo "Date of Analysis: $(date +'%Y-%m-%d_%H:%M:%S')" >> "$report_file"
   echo "Log File: $log_file" >> "$report_file"
   echo "Total Lines Processed: $total_lines" >> "$report_file"
   echo "Total Error Count: $error_count" >> "$report_file"
   echo "Critical Events:" >> "$report_file"
   find_critical_events >> "$report_file"
   echo "Top 5 Most Common Error Messages:" >> "$report_file"
   find_top_errors >> "$report_file"
}

# Function to archive the log file
archive_log_file() {
   mkdir -p "$archive_dir"
   mv "$log_file" "$archive_dir/" && \
   echo "Log file has been archived to $archive_dir/"
}

# Main script execution starts here
validate_input "$@"

log_file=$1

# Extract the base name of the log file (without the path) for unique report naming
log_file_basename=$(basename "$log_file" | sed 's/\..*$//') # Removes the extension for cleaner naming
timestamp=$(date +'%Y%m%d_%H%M%S')                          # Generate a timestamp
report_file="log_analysis_${log_file_basename}_${timestamp}.txt" # Unique report file name
archive_dir="./archived_logs"

total_lines=$(count_total_lines)
error_count=$(count_total_errors)

generate_report
archive_log_file

echo "Log analysis completed. Report saved to $report_file."

2. Key Functions in the Script

1. validate_input

This function ensures the user provides the log file as an argument, and that the file exists and is readable.

2. count_total_lines

This function counts the total number of lines in the log file.

3. count_total_errors

It counts the number of occurrences of the keyword “error” in the log file, which helps track error messages.

4. find_critical_events

It searches for critical events using the keyword “CRITICAL” and prints them along with the line number.

5. find_top_errors

This function finds the top 5 most common error messages by:

  • Extracting error-related lines.

  • Stripping out the first three fields (timestamp, log level, etc.).

  • Sorting and counting unique occurrences.

6. generate_report

This function creates a report in a text file containing:

  • The date of analysis

  • The log file name

  • Total lines and errors

  • Critical events

  • Top 5 errors

7. archive_log_file

After the analysis, the script moves the processed log file into an archive directory.


3. Testing the Script

I tested the script using a sample log file (sample_log.log) to simulate real-world scenarios. The following steps were performed:

Step 1: Download the Sample Log File

wget https://raw.githubusercontent.com/logpai/loghub/master/Zookeeper/Zookeeper_2k.log

This command fetched the log file from GitHub. You can also test it with your own logs.

Step 2: Run the Script

./log_analyzer.sh Zookeeper_2k.log

This runs the analysis, which generates a report and archives the log file.

Step 3: Check the Output

  • Log analysis report: A .txt file is generated with all the necessary details.

  • Archived log file: The processed log file is moved to the ./archived_logs directory.


4. Conclusion

Automating log analysis can save a lot of time, especially in large environments where logs are generated in high volumes. By using this simple Bash script, you can easily analyze logs, track errors, identify critical events, and generate reports.

Results

Here are a few snapshots of the output:

  • Error count:

  • Critical events:

    • Since there is nothing keyword like "CRITICAL" , that's why it doesn't show here.
  • Top 5 errors:

  • Summary report:

  • Archived log file:

This was a fun and challenging task, and it’s another step forward in my #90DaysOfDevOps journey. I hope this helps you automate log file analysis and reporting in your DevOps workflows.

See the complete code and updates on my GitHub repository.