| View previous topic :: View next topic |
| Author |
Message |
FWnewbie
Joined: 06 Jul 2006 Posts: 6
|
Posted: Thu Oct 26, 2006 2:01 pm Post subject: Crawlers and Bots |
|
|
I'm getting a lot of traffic due to crawlers and bots (i.e. msnbot.msn.com , crawler100.ask.com etc.), I need these excluded from my data since I want to report user traffic. Does anyone use filters to exclude the bots or is there another way to go about doing this?
Thanks |
|
| Back to top |
|
 |
dapease Site Admin
Joined: 31 Jan 2005 Posts: 57
|
Posted: Fri Oct 27, 2006 9:50 am Post subject: |
|
|
Well, there was a thread talking about MSN bots specifically, Robot additions not being recognised. But I have found that a particularly good place to start is making sure your FWASettings.txt file is current. Since development was stopped several years ago, there are quite a few search engines and bots that FWA doesn't recognize.
Luckily you can get a really good version from Dan Stouts at Manufactured Environments.
Try that first and see how it does, then we can worry about adding more bots not in the list.
dapease _________________ One person not willing to let a good product fade away. |
|
| Back to top |
|
 |
FWnewbie
Joined: 06 Jul 2006 Posts: 6
|
Posted: Mon Nov 06, 2006 5:02 pm Post subject: |
|
|
After updating the FWASettings.txt file, FWA has picked up on more robots, however still missing many. In the visitor report, msnbot, googlebot and a few others are still dominating. I also find it odd that although FWA is picking up googlebot as a bot, it is still allowing it to go into the visitor report as "crawl-66-249-66-109.googlebot.com", the same goes for msnbot.
Any suggestions? |
|
| Back to top |
|
 |
dapease Site Admin
Joined: 31 Jan 2005 Posts: 57
|
Posted: Mon Nov 06, 2006 8:41 pm Post subject: |
|
|
Would it be possible to see a sample of your log file? I am curious to know if those host names are being stored in the log file or if they are coming from your DNS lookup. It is possible that the names are not consistent with the referenced names in the settings file.
dapease _________________ One person not willing to let a good product fade away. |
|
| Back to top |
|
 |
FWnewbie
Joined: 06 Jul 2006 Posts: 6
|
Posted: Wed Nov 08, 2006 3:59 pm Post subject: |
|
|
Below is a sample straight from my log files for an msnbot.
2006-09-10 15:26:30 192.168.192.52 GET /AM/Template.cfm Section=Practice_Resources&Template=/CM/ContentDisplay.cfm&ContentID=7101 80 - 207.46.98.55 msnbot/1.0+(+http://search.msn.com/msnbot.htm) 200 0 0
Thanks |
|
| Back to top |
|
 |
dapease Site Admin
Joined: 31 Jan 2005 Posts: 57
|
Posted: Wed Nov 08, 2006 4:19 pm Post subject: |
|
|
Ah.
I think I can see why you may be getting odd reports when you run FWA.
What is the web server you are using to serve your content? The reason I ask is your log file does not appear to be in Common Log Format, the format that FWA prefers. Presented with other formats, it tries it's best (it does understand IIS5 and older, for example), but it doesn't always make the best choices and I think that might be causing your frustration here.
Check this thread No referrals and systems/browsers but they ARE there
Though not directly related to your issue, I go over the Common Log Format, which may be of help to you. The way I see it, you either need to change the format in which your web server saves it's logs, OR you need to define your log format manually when running FWA. Honestly, though, I have never had much luck with the latter.
dapease _________________ One person not willing to let a good product fade away. |
|
| Back to top |
|
 |
|