Opened 5 years ago

Closed 5 years ago

#918 closed Bug (fixed)

User from unknown with provider googlebot

Reported by: prasi Owned by:
Priority: normal Milestone: Piwik 0.4.4
Component: Core Keywords: googlebot
Cc: Sensitive: no

Description

i'm getting very often a user from the country "unknown" with the provider "googlebot", resolution 1024 x 1024, Browser: Mozilla 5.0 and an unknown operating system.

i'm getting this on a few sites.
maybe a new version of the googlebot?

Change History (9)

comment:1 Changed 5 years ago by vipsoft (robocoder)

  • Summary changed from User from unkown with provider googlebot to User from unknown with provider googlebot

Can you check your web server log and give us a User Agent string? Sounds like Google's version of the Bing spambot.

comment:2 Changed 5 years ago by prasi

"Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

comment:3 Changed 5 years ago by vipsoft (robocoder)

Reference: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80553

Maybe it's time to add an example bot tracking plugin to move bot-specific detection logic out of Visit.php...

comment:4 Changed 5 years ago by vipsoft (robocoder)

Can you provide a few lines from your web server's access log showing the Googlebot requests? Thanks.

comment:5 Changed 5 years ago by prasi

66.249.71.35 - - [09/Sep/2009:13:11:04 +0200] "GET /ro/tag/englisch/ HTTP/1.1" 200 10033 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:11:42 +0200] "GET /wp-content/plugins/simple-ajax-shoutbox/ajax_shoutbox_process.php?1252281600 HTTP/1.1" 200 83 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:02:29 +0200] "GET /sk/tag/linux/ HTTP/1.1" 200 10899 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
83.64.31.37 - - [09/Sep/2009:13:04:01 +0200] "POST /wp-cron.php?doing_wp_cron HTTP/1.0" 200 - "-" "WordPress/2.8.4; http://blog.prasi.at"

66.249.71.35 - - [09/Sep/2009:13:04:00 +0200] "GET /tag/englisch/&rurl=translate.google.com&lang=de&usg=ALkJrhja1EyzL9WcZVjz7LgKxhfVOVrJEw HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:04:10 +0200] "GET /2009/06/20/jailbreak-iphone-os-3-0-ist-ab-sofort-verfugbar/ HTTP/1.1" 200 10777 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:06:16 +0200] "GET /tag/nova-rock/&rurl=translate.google.com&lang=de&usg=ALkJrhi6maYH0aia7iAfuV7rpgHyGmOUMA HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:08:17 +0200] "GET /tag/magento/&rurl=translate.google.com&lang=de&usg=ALkJrhgLB2Y3EEXHn82mkWp80wufYKHKwA HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:10:30 +0200] "GET /tag/usb-stick/&rurl=translate.google.com&lang=de&usg=ALkJrhigtRYLjyjLlIUykrlJ13sz7ofdNw HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:12:36 +0200] "GET /en/about-me/ HTTP/1.1" 200 8398 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:12:43 +0200] "GET /2009/06/23/left-4-dead-patch-diese-woche/&rurl=translate.google.com&lang=de&usg=ALkJrhhG8ZQLkAD2-GicJfebbgFaFc7bng HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - [09/Sep/2009:13:14:55 +0200] "GET /tag/schweden/&rurl=translate.google.com&lang=de&usg=ALkJrhhPiufQvYU-FDqR0mTbZW8qKcqViA HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

comment:6 Changed 5 years ago by matt (mattab)

A DNS lookup of the visitor host is done in the provider plugin when it is enabled.

Technically we should not require this DNS lookup for proper Piwik behavior, it should always be optional (as it can cause performance issues if DNS latency goes up).

comment:7 Changed 5 years ago by vipsoft (robocoder)

prasi: do you have one showing Googlebot fetching piwik.php?

matt: *nod* a similar latency issue arises with the honeypot suggestion in #653

comment:8 Changed 5 years ago by prasi

66.249.71.210 - - [09/Sep/2009:13:13:44 +0200] "GET /robots.txt HTTP/1.1" 200 21 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.210 - - [09/Sep/2009:13:13:44 +0200] "GET /piwik.php?idsite=1&url=http%3A%2F%2Fblog.prasi.at%2F&res=1024x1024&h=3&m=51&s=21&cookie=1&urlref=&rand=0.278324234&pdf=0&qt=0&realp=0&wma=0&dir=0&fla=0&java=0&gears=0&ag=0&action_name= HTTP/1.1" 200 43 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

comment:9 Changed 5 years ago by vipsoft (robocoder)

  • Resolution set to fixed
  • Status changed from new to closed

In [1470], fixes #918 and #958 - Filter out Googlebot and Bing bot

Note: See TracTickets for help on using tickets.