Opened 5 years ago

Closed 5 years ago

#958 closed Bug (fixed)

bing-bot not filtered out

Reported by: sbrf Owned by:
Priority: normal Milestone: Piwik 0.4.4
Component: Core Keywords:
Cc: Sensitive: no

Description

As described in http://forum.piwik.org/index.php?showtopic=1451&hl= , bing-bot is being recognized as a normal user oder link from external page.

In my log I can find these entries for example:

access.log:65.55.110.25 - - [26/Aug/2009:22:51:27 +0000] "GET /piwik/piwik.php?idsite=1&url=http%3A%2F%2Fwww.xxx%2F&res=800x600&h=15&m=52&s=41&cookie=1&urlref=http%3A%2F%2Fwww.bing.com%2Fsearch%3Fq%3Denertrag&rand=0.8916126511009081&pdf=0&qt=0&realp=0&wma=0&dir=0&fla=0&java=0&gears=0&ag=0&action_name=XXX HTTP/1.1" 200 43 "http://www.xxx" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729)"
access.log:65.55.109.173 - - [27/Aug/2009:00:22:28 +0000] "GET /piwik/piwik.php?idsite=1&url=http%3A%2F%2Fwww.xxx2F&res=800x600&h=17&m=23&s=43&cookie=1&urlref=http%3A%2F%2Fwww.bing.com%2Fsearch%3Fq%3Dwebseite&rand=0.2069609280545549&pdf=0&qt=0&realp=0&wma=0&dir=0&fla=0&java=0&gears=0&ag=0&action_name=XXX HTTP/1.1" 200 43 "http://www.xxx" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)"

Change History (4)

comment:1 Changed 5 years ago by vipsoft (robocoder)

  • Milestone set to 1 - Piwik 0.4.4

comment:2 Changed 5 years ago by vipsoft (robocoder)

  • Resolution set to worksforme
  • Status changed from new to closed

handleNewVisit() looks ok,

From what I can piece together, it appears that the Bing bot may have cookies enabled. So, if Piwik logged the bot before the software update, subsequent visits may not be treated as new visits (and thus, not caught by the filter). Since, cookies expire in 30 days (Piwik default in global.ini.php), this should self-correct itself.

However, it looks like Microsoft is now further cloaking the bot by removing the referer field from the http request. I suggest we add the Microsoft IP to #43 and then remove the hardcoded IP from Visit.php.

Reference: http://www.bing.com/community/forums/t/648805.aspx

comment:3 Changed 5 years ago by vipsoft (robocoder)

  • Resolution worksforme deleted
  • Status changed from closed to reopened

Re-opening since we have to deal with googlebot.

comment:4 Changed 5 years ago by vipsoft (robocoder)

  • Resolution set to fixed
  • Status changed from reopened to closed

In [1470], fixes #918 and #958 - Filter out Googlebot and Bing bot

Note: See TracTickets for help on using tickets.