Cluster Analysis
Web servers, for all their beautiful features, have limitations. Even the most modern computers running the most stable operating systems communicating over the fastest Internet connections are unable to cope with the sheer amount of traffic today's big-name web sites are fortunate enough to receive.
For common web sites with limited graphics, a single server can sufficiently handle a reasonably large number of simultaneous users. But for bigger corporate sites like Yahoo, AltaVista and CNET, a single machine would be drastically insufficient.
However, since a single domain name (like yahoo.com) can only be assigned to one IP address, a problem arises - a problem solved by "load balancing" servers.
Load balancers sit on the computer to which the organization's primary IP number has been assigned. This computer is connected via an internal network to one or more web servers, each of which houses a complete copy of the web site.
When a visitor arrives at the "front door" of the site, the load balancer routes the visitor to the server which is handling the least number of users at that particular moment. When that same user chooses to visit another page on the site, the load balancer again determines which server is best able to fulfill the request.
Each group of web servers (called "clusters") can handle up to 32 individual servers. This offers a rather elegant way for system administrators to cope with rising traffic levels. When their current servers become overloaded, they can simply "plug-in" more computers to take some of the stress off the existing servers.
As useful as clustering is for handling large amounts of traffic, it presents a unique problem when it comes to producing accurate site usage statistics.
Due to the fact that clustered sites run across multiple web servers, there is no unified log file containing complete data for each visitor who has come to the site. Since each server's log file contains only data pertaining to that particular machine, a single user session would show up as many sessions if processed with conventional e-commerce analysis technology.
Certain web site analysis products, like Funnel Web Enterprise, offer the ability to track users by session across multiple clustered servers. In essence, the software re-compiles the information so that it "thinks" it's reading logs from a single site being hosted on a single server. The result is a comprehensive and accurate report on site usage where normally there would be only a raw hit count.
To use Funnel Web Enterprise's clustering feature, simply open the program's settings interface, choose "Virtual" from the left-hand scroll menu, select the "Clustering" tab and then click the checkbox to turn clustering on.
