AwkApacheLogs
From KallestadWiki
Contents |
Parsing Apache Logs with awk, sed, and cut
There are a few extraordinarily quick command line utilities that can provide great summary information from apache log files. Here are a few sample commands I'm using -
Page Views
This gives a rough idea of the number of page views:
egrep -vc '(\.gif|\.jpg|\.ico)' logfilename
First and Last Entries
This is good for determining the date range covered in the log file
head -1 logfilename; tail -1 logfilename
Unique Visitors
Like page views above, this is only a guestimation. It doesn't account for robots, etc.
awk {'print $1'} logfilename | sort | uniq | wc -l
Top 5 Search Terms
This is a convoluted set of pipes, but essentially this is how it works:
- search the log file for google, msn, or yahoo
- extract the referrer field
- take those results and further filter them to referals from google, msn, or yahoo
- grab everything after the first q (up to 4 'q's)
- trim that at the first ampersand
- get rid of that pesky equals sign
- convert the URI into human readable text
- convert +'s to spaces
- sort the results, grab uniques, sort and print the counts
- spit out just the top 5
egrep '(google|msn|yahoo)' logfilename \
|awk '{print $11}'\
| egrep '(google|msn|yahoo)'\
|cut -dq -f2,3,4,5\
|cut -d\& -f1\
|sed 's/=/ /'\
|sed "$( printf 's/%%%02X/\\x%02X/ig;' $( ( seq 38 255 ; seq 0 37 ) | sed p ) )"\
|sed 's/+/ /g'\
|sort \
|uniq -c\
|sort -rn\
|head -5
I ran this on a 2.1 Gigabyte log file (modified to kick out the top 30 terms), and it took just over 4 minutes to run. Your log file format may require a change in line 2 - 11 is the number of my referer field. To check, just run
egrep '(google|msn|yahoo)' logfilename \
| head -1
|awk '{print $11}'
and check the output. Try a few different values in place of $11 until you see a good referer. Still stumped? Maybe your top results don't have a referer, try head -10 or head -20 to see if you can find one.
