Understanding a Web server log file (Part 2)
In the first installment of this article, we discussed the log file format and detailed what information is contained in a log file. In this part we will analyze an actual log file so that you are able to see what sort of information is included in a log file.
Analyzing a typical log file
The following is an extract from a log file generated by Microsoft IIS4 server. As you can see by the field names listed across the top of the log file, it is using only a part of the W3C extended format.
#Software: Microsoft Internet Information Server 4.0
#Version: 1.0
#Date: 2000-01-01 23:08:51
#Fields: date time c-ip cs-username cs-method cs-uri-stem cs-uri-query sc-status scbytes cs(User-Agent) cs(Cookie) cs(Referer)
2000-01-01 23:08:51 206.175.107.206 - GET - 200 8476 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+DigExt) - -
2000-01-01 23:08:51 206.175.107.206 - GET / - 200 8476 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+DigExt) - -
2000-01-01 23:09:15 206.175.107.206 - GET /hotjob.asp - 200 644 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+DigExt) ASPSESSIONIDGQQGGGOY=LMODMDLCPDPKDBLOPAPNOBNJ http://www.systemtwo.com/
If we take a look at a single line from this log we can see: (note that for ease of viewing
we have applied some formatting.)
Datetime c-ip cs-username
2000-01-01 23:09:15 206.175.107.206 -
cs-method cs-uri-stem cs-uri-query sc-status sc-bytes
GET /hotjob.asp - 200 644
cs(User-Agent)
Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+DigExt)
cs(Cookie)
ASPSESSIONIDGQQGGGOY=LMODMDLCPDPKDBLOPAPNOBNJ
cs(Referer)
http://www.systemtwo.com/
You could interpret this entry to say that on January 1st, 2000, at 11:09pm (GMT) a
user on a computer with an IP address of 206.175.107.206 (c-ip) who was not
authenticated (cs-username = -), asked to GET the file hotjob.asp.
There was no additional query information passed and the server returned the page successfully (sc-status = 200). The file size delivered was 644 bytes. The user was using the Internet Explorer 5 browser running on Windows 98. The application has allocated a unique cookie for identification purposes. The user was on the web site www.systemtwo.com when they requested the page.
That's it. What initially appeared to be a collection of meaningless data was easily interpreted to provide specific details about a visitor's activity on a web site. If you consider that the full extended log format contains significantly more fields, you could actually track a visitor's complete journey through your web site.
Be warned, though, that to do manual analysis such as this is not for the faint-hearted or time-strapped individual. If you consider a log file entry is created for each request of the server, including all graphics etc, even a relatively small site can generate log files containing tens or even hundreds of thousands of lines each day.
To quickly and accurately analyze your web site's traffic, we advise you use a standalone web site analysis package such as Funnel Web. With a web site analysis program, you can easily specify which log file fields you want analyzed and how the resulting reports should be presented (tabular, graphical etc).
