Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The <noscript> image call doesn't currently record any visit, but it could #653

Closed
mattab opened this issue Apr 13, 2009 · 9 comments
Closed
Labels
Critical Indicates the severity of an issue is very critical and the issue has a very high priority. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Apr 13, 2009

Currently the Piwik tracking code has a noscript which could be used to record visits from people without Javascript enabled.

There is some work required to
- filter out search engine bots
- filter out spam bots
- filter out all other type of bots

Of course this could also be used to log bots and show them in a specific Piwik report “Bot activity”.

The initial design decision was to not record any visitor without Javascript as it is a lot of work to ensure that the data coming from Javascript-disabled devices is accurate and not bot initiated.

To record a visit without JS, you must call

```
piwik.php?idsite=$ID_SITE
&rec=1
&action_name=$ACTION_NAME
```

See also PUSH API without Javascript #134
Keywords: bots noscript

@anonymous-matomo-user
Copy link

To me this is a major issue, as non-javascript users still give us valuable information. I would love to see this implemented before 1.0

Suggestion 1

Couldn't the http:BL by Project Honeypot be used to filter out any bots? They offer an API to identify Search Engines, Spammers and other bots by IP address.

Piwik could work like this:

  • Javascript enabled:
    • count users as usual
  • Javascript disabled:
    • Discard all users that are known search engine bots (by User-Agent)
    • Check IP of all remaining users against http:BL, discard if known bot, count otherwise.

This way traffic for the blacklist server would be kept low. I still think every Piwik installation would need their own API key, though.

Suggestion 2

Piwik should include its own tiny honeypot. The <noscript> tag should include a link that is invisible to the user and that has rel=nofollow.

<a href="http://domain/piwik/honeypot.php" rel="nofollow">&nbsp;</a>

Only malicious crawlers will follow this link, so Piwik can exclude their IPs from tracking. Known, well-behaving search bots can still be identified by User-Agent. This way, most bots will probably get identified.

@anonymous-matomo-user
Copy link

Replying to matt:
I wan't this feature too. Not only users are interessting. I want see which bots crawl my site.

@philmck
Copy link

philmck commented Jan 13, 2010

Can I add my vote for this as well? We're missing out on visits from many mobile phone users and disabled people using screen readers, for example, because they don't have javascript. And there are legitimate reasons for disabling javascript in a normal browser as well. I agree we need to separate out the bots somehow for the statistics, but really that's a separate issue. I'd like the option of counting all visitors, even if that includes bots.

@anonymous-matomo-user
Copy link

The code

<a href="http://domain/piwik/honeypot.php" rel="nofollow">&nbsp;</a>

will be visible to blind persons using screen-reader software. It would be better to code this as

<a href="http://domain/piwik/honeypot.php" rel="nofollow" style="display:none;">&nbsp;</a>

which will also hide it from the screen readers.

Hope this helps,
Charles Belov
SFMTA Webmaster
www.sfmta.com/webmaster

@robocoder
Copy link
Contributor

Charles: that's not our tracking code. Piwik's tracking code doesn't contain an anchor link (honeypot or otherwise).

@robocoder
Copy link
Contributor

re: comment:3 - The idea behind the noscript tag is to track Javascript-disabled visitors. We'll provide a hook here so third-party plugins can implement suggestion 2.

@mattab
Copy link
Member Author

mattab commented Mar 18, 2010

In order to report search engine bot activity, we could reuse some of the GPL code from http://www.crawltrack.net/ which is a php bot tracker tool. The logic could sit in a Piwik plugin. There could be a new sub tab, that would report bot activity for each bot that was seen during the selected date range.

Bots would be identified by user agents and / or IPs, see eg. the list at crawltrack: http://www.crawltrack.net/crawlerlist.php

Additional features could include:

  • give ratio of bots VS human activity on the website (what percentage of traffic comes from bots VS humans)
  • for a given bot on a given day, list all pages crawled
  • list bot crawling frequency in a new column (next to Visits, Page views, etc.). eg. google can crawl one page every 10s, other bots would crawl one page every 1 min, etc.

@anonymous-matomo-user
Copy link

So i think it would be interesting to track also robots f.e. for big sites.
With this feature you can see how many bots a scraping your site. But it make sense to see Googlebot, Msnbot and maybe Slurp (Yahoo)

But this should track in a seperatet table with a special plugin - like Live Bots ;-)

In my tool http://www.spider-trap.de/en_index.html i ban a lot of bad bots. Maybe Piwik can report the webmaster if an bot is crawling.

@mattab
Copy link
Member Author

mattab commented Jul 29, 2010

The Tracking API has been released, which can help track visitors without Javascript, or even track visits Mobile apps, desktop apps and more.

http://piwik.org/docs/tracking-api/

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Critical Indicates the severity of an issue is very critical and the issue has a very high priority. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

4 participants