Log Rhythms
The Access Log
Let's see what's lurking inside that log.
For the purposes of this look at a typical set of logs, I'm
assuming your Apache server has been configured to use
Common Log Format (CLF), the default in a fresh Apache installation. Your
httpd.conf file
should contain the following configuration directive:
CustomLog logs/access_log common
Look at your access log, the location of which will
depend upon your layout preferences and installation method.
The Apache 1.3.9 RPM installation under Red Hat 6.1 places logs in an
/etc/httpd/logs directory. The source and binary installs
typically use /usr/local/apache/logs/access_log. The default
filename under Windows is access.log.
Let's zoom in on one fairly representative line in a log:
123.45.678.90 - - [07/Mar/2000:14:27:12 -0800]
"GET /mypage.html HTTP/1.1" 200 10369
123.45.678.90
|
The visitor's IP address. If you particularly need the visitor's
host name, read the Apache documentation on the
HostNameLookups
directive.
|
- -
|
The first of the two dashes is a placeholder for something called
ident, a less trustworthy form of client
identification. That's about all I'll say on this; for further
information, see Apache's
IdentityCheck
directive.
The second dash is a placeholder for the user name supplied
by a visitor if required to log in to gain access to a
password-protected
section of the web site. Say, for example, I restricted access
to a private directory on my server to only myself.
Upon visiting http://www.memyselfandi.net/private,
I'd have to log in (say, as the user "me") to gain access to
that directory's contents. Thereafter, all my requests for
items in that directory are logged, replacing the dash with
me.
|
[07/Mar/2000:14:27:12 -0800]
|
The date, time, and time-zone.
|
GET /mypage.html
|
The visitor's request, in this case the mypage.html
document in the web server's document root.
You'll often
see requests consisting only of a slash, GET /,
or composed of a directory path and ending in only a slash,
GET /some/path/. This denotes a request for the
default document within the server's document root or along
some directory path. So, if your default
DirectoryIndex
is index.html, every request for /
results in the return of that directory's index.html
document to the visitor's browser. If no DirectoryIndex
document exists in the requested directory, the browser will
display either a listing of the files in that directory or a
"Forbidden" message, depending on your
IndexOptions
and
FancyIndexing
settings.
|
HTTP/1.1
|
The browser's request protocol, in this case HTTP,
version
1.1. An older, yet still very common protocol, is HTTP 1.0.
|
200
|
An
HTTP status code is
returned as part of the response to the visitor's browser.
200 signifies "OK" -- request fulfilled.
A common error you might have come across in your Web travels
is "404 Not Found," indicating that the
request does not match anything on the server. Also, a code
of "304 Not Modified" says that the content
has not changed since it was last requested. In other words,
you've visited before and already have the latest copy of this
content in your browser's cache, so the content is not resent for
efficiency's sake.
|
10369
|
The number of bytes returned to the visitor, excluding headers
(status codes and the like). In the case of a 304
Not Modified status (see above), this value is the usual
- placeholder.
|
Logging in Apache (version 1.2 and later) is handled by the Apache module,
mod_log_config,
which enables you to customize how your logs look and work. Your
httpd.conf file contains some popular log formats to get you started:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
Each log format starts out with the
LogFormat
directive, followed by a string of tokens that describe how each line of
the log file should look, and ending with a nickname given to the format.
Click
here
for a comprehensive list of tokens and their meanings. How you want
your logs displayed and into how many files you want them sorted is up
to you. Some site authors separate log files into referrer and agent logs.
I prefer to use the "combined" log format and keep everything in one
place.
Let's say I wish to use "common" log format, but also want to keep track
of who is linking to my site. I could just use "combined" format, but
I don't really care what type of browser (agent) my visitor is using.
Instead, I'll create a new LogFormat directive like so:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"" commonish
Now that I've defined my preferred log format, I need to tell Apache to
use this format. Using my "commonish" log format above:
CustomLog logs/commonish_log commonish
where logs/commonish_log is the path to my log file relative
to my
ServerRoot.
You can actually skip the LogFormat directive and include
your preferred log format string in place of the nickname in your
CustomLog directive -- it's up to you.
We've only just scratched the surface of log customization. For much
more, be sure to read the
detailed mod_log_config documentation.
Prev [1] [2] [3] Next