Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User from unknown with provider googlebot #918

Closed
anonymous-matomo-user opened this issue Aug 11, 2009 · 9 comments
Closed

User from unknown with provider googlebot #918

anonymous-matomo-user opened this issue Aug 11, 2009 · 9 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@anonymous-matomo-user
Copy link

i'm getting very often a user from the country "unknown" with the provider "googlebot", resolution 1024 x 1024, Browser: Mozilla 5.0 and an unknown operating system.

i'm getting this on a few sites.
maybe a new version of the googlebot?
Keywords: googlebot

@robocoder
Copy link
Contributor

Can you check your web server log and give us a User Agent string? Sounds like Google's version of the Bing spambot.

@anonymous-matomo-user
Copy link
Author

"Mozilla/5.0 (compatible; Googlebot/2.1; !http://www.google.com/bot.html)"

@robocoder
Copy link
Contributor

Reference: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80553

Maybe it's time to add an example bot tracking plugin to move bot-specific detection logic out of Visit.php...

@robocoder
Copy link
Contributor

Can you provide a few lines from your web server's access log showing the Googlebot requests? Thanks.

@anonymous-matomo-user
Copy link
Author

66.249.71.35 - - +0200 "GET /ro/tag/englisch/ HTTP/1.1" 200 10033 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /wp-content/plugins/simple-ajax-shoutbox/ajax_shoutbox_process.php?1252281600 HTTP/1.1" 200 83 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /sk/tag/linux/ HTTP/1.1" 200 10899 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
83.64.31.37 - - +0200 "POST /wp-cron.php?doing_wp_cron HTTP/1.0" 200 - "-" "WordPress/2.8.4; http://blog.prasi.at"

66.249.71.35 - - +0200 "GET /tag/englisch/&rurl=translate.google.com&lang=de&usg=ALkJrhja1EyzL9WcZVjz7LgKxhfVOVrJEw HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /2009/06/20/jailbreak-iphone-os-3-0-ist-ab-sofort-verfugbar/ HTTP/1.1" 200 10777 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /tag/nova-rock/&rurl=translate.google.com&lang=de&usg=ALkJrhi6maYH0aia7iAfuV7rpgHyGmOUMA HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /tag/magento/&rurl=translate.google.com&lang=de&usg=ALkJrhgLB2Y3EEXHn82mkWp80wufYKHKwA HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /tag/usb-stick/&rurl=translate.google.com&lang=de&usg=ALkJrhigtRYLjyjLlIUykrlJ13sz7ofdNw HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /en/about-me/ HTTP/1.1" 200 8398 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /2009/06/23/left-4-dead-patch-diese-woche/&rurl=translate.google.com&lang=de&usg=ALkJrhhG8ZQLkAD2-GicJfebbgFaFc7bng HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.35 - - +0200 "GET /tag/schweden/&rurl=translate.google.com&lang=de&usg=ALkJrhhPiufQvYU-FDqR0mTbZW8qKcqViA HTTP/1.1" 302 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

@mattab
Copy link
Member

mattab commented Sep 12, 2009

A DNS lookup of the visitor host is done in the provider plugin when it is enabled.

Technically we should not require this DNS lookup for proper Piwik behavior, it should always be optional (as it can cause performance issues if DNS latency goes up).

@robocoder
Copy link
Contributor

prasi: do you have one showing Googlebot fetching piwik.php?

matt: nod a similar latency issue arises with the honeypot suggestion in #653

@anonymous-matomo-user
Copy link
Author

66.249.71.210 - - +0200 "GET /robots.txt HTTP/1.1" 200 21 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.71.210 - - +0200 "GET /piwik.php?idsite=1&url=http%3A%2F%2Fblog.prasi.at%2F&res=1024x1024&h=3&m=51&s=21&cookie=1&urlref=&rand=0.278324234&pdf=0&qt=0&realp=0&wma=0&dir=0&fla=0&java=0&gears=0&ag=0&action_name= HTTP/1.1" 200 43 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

@robocoder
Copy link
Contributor

In [1470], fixes #918 and #958 - Filter out Googlebot and Bing bot

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

3 participants