AwkApacheLogs

From KallestadWiki

Jump to: navigation, search

Contents

Parsing Apache Logs with awk, sed, and cut

There are a few extraordinarily quick command line utilities that can provide great summary information from apache log files. Here are a few sample commands I'm using -

Page Views

This gives a rough idea of the number of page views:

 egrep -vc '(\.gif|\.jpg|\.ico)' logfilename

First and Last Entries

This is good for determining the date range covered in the log file

 head -1 logfilename; tail -1 logfilename

Unique Visitors

Like page views above, this is only a guestimation. It doesn't account for robots, etc.

awk {'print $1'} logfilename | sort | uniq | wc -l

Top 5 Search Terms

This is a convoluted set of pipes, but essentially this is how it works:

  1. search the log file for google, msn, or yahoo
  2. extract the referrer field
  3. take those results and further filter them to referals from google, msn, or yahoo
  4. grab everything after the first q (up to 4 'q's)
  5. trim that at the first ampersand
  6. get rid of that pesky equals sign
  7. convert the URI into human readable text
  8. convert +'s to spaces
  9. sort the results, grab uniques, sort and print the counts
  10. spit out just the top 5
egrep '(google|msn|yahoo)' logfilename \
|awk '{print $11}'\
| egrep '(google|msn|yahoo)'\
|cut -dq -f2,3,4,5\
|cut -d\& -f1\
|sed 's/=/ /'\
|sed "$( printf 's/%%%02X/\\x%02X/ig;' $( ( seq 38 255 ; seq 0 37 ) | sed p ) )"\
|sed 's/+/ /g'\
|sort \
|uniq -c\
|sort -rn\
|head -5

I ran this on a 2.1 Gigabyte log file (modified to kick out the top 30 terms), and it took just over 4 minutes to run. Your log file format may require a change in line 2 - 11 is the number of my referer field. To check, just run

 egrep '(google|msn|yahoo)' logfilename \
| head -1
|awk '{print $11}'

and check the output. Try a few different values in place of $11 until you see a good referer. Still stumped? Maybe your top results don't have a referer, try head -10 or head -20 to see if you can find one.

Personal tools