Apache Log Analysis

While there are some fine programs to analyze your apache-logs, you can extract some basics information with bash-commands. These are some real-life examples I used after I put pictures I took online.
Those pictures where taken at the traditional “Narrensuppe” (Jester’s Soup) which in our town kicks off the [carneval]-activities. I put them online, because many people asked me if they could have them. So now I want to know if they got them… :)


How many different computers read my site?

This number is realy the number of ‘people’ visiting the site. There might be many people watching the photos on screen, but they are only counted once. On the other hand, people on dial-up might get counted more than once if they get different IP-adresses when connecting multiple times. So, while the exact number is hard to measure, this is still interesting:

cat /var/log/apache2/narrensuppe.krone-neuenburg.de-access_log | cut  -d- -f1 | sort | uniq | wc -l

cat /var/log/apache2/narrensuppe.krone-neuenburg.de-access_log’ reads the logfile and starts the pipe.
cut -d- -f1’ splits the line at “-” signs (“-d-” ) and takes the first part (“field”, “-f1”), which happens to be the ip-adress of the visitor.
sort’ sorts the ip-adresses, and ‘uniq’ supresses identical lines.
Finally, ‘wc -l’ counts the number of lines. (some 100 visitors after 3 days. Not bad for a crowd of 130 Jesters!)

p15156159:~ # cat /var/log/apache2/narrensuppe.krone-neuenburg.de-access_log | cut  -d- -f1 | sort | uniq | wc -l

    100


Which is the top-downloaded picture?

Ok, apache-mirror, tell me who is the most beautiful Jester here!

cat /var/log/apache2/narrensuppe.krone-neuenburg.de-access_log | grep "GET /NSuppe" | cut -d] -f2 | cut -d/ -f2 | cut -d' ' -f1 | sort |uniq -c | sort

cat /var/log/apache2/narrensuppe.krone-neuenburg.de-access_log’ reads the logfile and starts the pipe.
This is how such a line looks:

  192.168.141.114 - - [06/Feb/2005:15:57:30 +0100] "GET /NSuppe2005-252.jpg HTTP/1.1" 200 980809 "http://narrensuppe.krone-neuenburg.de/1.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; DigExt)"

grep “GET /NSuppe”’ filters, so only relevant requests are counted. (Thumbnails are in a different directory, so they don’t match).
Now we have to extract the filename of the JPG. Some sed-wiz would probably do this in one command, but I don’t know sed, so I have to use cut.
I want to get to the first slash (“/”) after the date. ‘cut -d] -f2’ get everything after the first “]”, ‘cut -d/ -f2’ get everything after the first slash (that is still left. The slashes in the date are removed by the first cut).
Come to think of it, I could have done this in one ‘cut -d/ -f4’, but it’s too late now :)

Now, this is left of the original line:

  /NSuppe2005-252.jpg HTTP/1.1" 200 980809 "http://narrensuppe.krone-neuenburg.de/1.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; DigExt)"

cut -d’ ‘ -f1’ removes everything after the first blank. I had to quote the blank. Now, we are on the home-stretch:

sort |uniq -c | sort’: The first sort prepares the list for the following uniq. The -c option tells uniq to count the occurences of each line (here: image) and give the number before the line. The second sort takes care of the ranking.

p15156159:~ # cat /var/log/apache2/narrensuppe.krone-neuenburg.de-access_log | grep "GET /NSuppe" | cut -d] -f2 | cut -d/ -f2 | cut -d' ' -f1 | sort | uniq -c | sort

      1 NSuppe2005-268.jpg

      1 NSuppe2005-270.jpg

 [...]

      8 NSuppe2005-287.jpg

      8 NSuppe2005-294.jpg

      8 NSuppe2005-400.jpg

     10 NSuppe2005-247.jpg

     11 NSuppe2005-248.jpg

     11 NSuppe2005-252.jpg

     12 NSuppe2005-263.jpg

     20 NSuppe2005-245.jpg

Created by stwaidele


No comments yet.

Leave a Reply

Comments links could be nofollow free.