Understanding a web server log file (Part 1)
Introduction
Several years ago, webmasters - people who understand the technology behind a web site - would have been in charge of analyzing a web site's traffic. With the proliferation of easy-to-use web site building tools, most people can now create and operate their own site. Unless these people know about log files, they may not realize their log files contain important information which can be incorporated into their day-to-day business. A basic understanding of how log files are structured may provide additional information to the user, allowing them to better understand their Web site's traffic.
What is a log file?
A log file is simply a text file created by the server hosting your web site, containing a single entry for each request made of the server. There are numerous servers available, but the majority of web sites use Apache, IIS or Netscape.
There is a set of recommended standards, although there is no compulsion on server developers to abide by these. Each type of server's log files contains a different set of information. You should contact your server developer to determine what information is available for analysis in your log files. For the purpose of this article we will concentrate on the W3C extended format since this covers the largest range of possibilities. Most of the other formats are subsets of this format.
W3C Extended Format
The W3C extended format is specified by the W3C (World Wide Web Consortium -www. w3.org). This is a customizable ASCII (standard format which most hardware/software developers adhere to) format that provides a variety of different fields.
It offers users the option of specifying the fields of interest and eliminating fields not required, thereby offering some degree of control over the size of the log file. The full details of specific fields and their meanings follow. The field labels are typical of IIS servers.
- Date - date - The date on which the request was made.
- Time - time - The time at which the request was made. Recorded as UTC (Greenwich Mean Time).
- Client IP Address - c-ip - The IP address of the client making the request e.g. 203.103.44.20.
- User Name - c-username - The name of the authenticated user making the request. This is only recorded if an authentication system is in place. If there is no authentication the user is 'anonymous' and is represented in the log file by a hyphen.
- Service Name & Instance Number - s-sitename - The Internet service and instance number that was running on the client computer.
- Server Name - s-computername - The name of the server that made the log entry.
- Server IP - s-ip - The IP address of the server that made the log entry.
- Method - cs-method - The type of action that was requested by the client (for example, a POST).
- URI Stem - cs-uri-stem - The actual resource accessed (for example, index.html).
- URI Query - cs-uri-query - The query that the client was trying to perform (if any).
- HTTP Status - sc-status - The HTTP status code of the action. See below for HTTP status code details.
- Win32 Status - sc-win32-status - The Win32 status code of the action.
- Bytes Sent - sc-bytes - The actual number of bytes sent by the server.
- Bytes received - cs-bytes - The actual number of bytes received by the server.
- Server Port - s-port - The port number the client is connected to.
- Time Taken - time-taken - The length of time taken to perform the requested action.
- Protocol Version - cs-protocol - The actual protocol used (HTTP or FTP) and version used by the client to make the request.
- User Agent - cs(User-Agent) - The browser used by the client.
- Cookie - cs(Cookie) - The content of the cookie either sent or received (if any).
- Referrer - cs(referrer) - The previous site visited by the user - direct link.
Notes:
1. Fields are separated by spaces.
2. Time is recorded in UTC (Greenwich Mean Time). As shown in the table, the amount of information that can be obtained about the traffic on your web site is directly dependent on your server settings. Essentially if you want detailed traffic analysis you should ensure your server captures as much information as possible. Server status codes Your log files also document your server status by using a universal status code. Some of the common status codes are described in brief below, some or which you will probably recognize. Further detail can be obtained from www.w3.org.
Status codes
Successful codes
OK (200)
Created (201)
Accepted (202)
No content (204)
Partial content (206)
Redirection Codes (3xx)
Moved permanently (301)
Redirect (302)
Not modified (304)
Client Error Codes (4xx)
Bad Request (400)
Auth Required (401)
Forbidden (403)
File Not Found (404)
Request timeout (408)
Server Error Codes (5xx)
Internal Server Error (500)
Not Implemented (501)
Bad Gateway (502)
Service unavailable (503)
