Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a default Referrer spam blacklist #2268

Closed
mattab opened this issue Apr 5, 2011 · 20 comments
Closed

Implement a default Referrer spam blacklist #2268

mattab opened this issue Apr 5, 2011 · 20 comments
Assignees
Labels
Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Apr 5, 2011

it looks like ###.com are referer spamming us. Time to add a referer blacklist...

http://demo.piwik.org/index.php?module=CoreHome&action=index&idSite=7&period=day&date=yesterday#module=Referers&action=getSearchEnginesAndKeywords&idSite=7&period=day&date=yesterday

@robocoder
Copy link
Contributor

I looked at the logs, and this appears to be a result of http://forum.piwik.org/read.php?2,74422. I've changed the 'X' to '#'. Hopefully, Google refreshes its cache soon.

@mattab
Copy link
Member Author

mattab commented Apr 6, 2011

I did the search but didn't see piwik forums in the results, hence why I thought this was spam - did you see the piwik forums in the results?. This is funny hehe

@mattab
Copy link
Member Author

mattab commented Apr 8, 2011

ok it appears it's not spam indeed (don't look at what are the keywords for today that's just wrong...)

@robocoder
Copy link
Contributor

LOL

@mattab
Copy link
Member Author

mattab commented Apr 17, 2011

Reopening, I found this list of referrer spam which we could reuse... when this becomes a problem.

@anonymous-matomo-user
Copy link

see also
http://forum.piwik.org/read.php?2,75066,75102

short summary:
a lot of fake visits (http://en.wikipedia.org/wiki/Referrer_spam) making the statistics almost unusable (we do not publish our stats. Nonetheless, internal "reading" of the webstats is now a pain in the heck due to this many false data).

Solution would be: allow a admin to create/update a black list (maybe a regex list) for referrer spam sites, which are simply ignored (not counted at all) by piwik. Similar to the IP blacklist.

@anonymous-matomo-user
Copy link

I get a lot of referrer spam too. As I have very little visitors, this spammers makes for round about 90% of the unique visitors. Beside visitors coming from Facebook.com filtering all referrers that do not contain any url path, query or fragment would do the trick in my case.

Having a central repository would be a nice solution, even if that could be abused. Maybe this could be a solution that could be implemented by some piwik hosters.

@mattab
Copy link
Member Author

mattab commented Mar 18, 2014

Looks like the guys semalt.com are hard at spamming the world. we should really blacklist all visits on Piwik from this domain. And put the list in the config file so we can add new websites to block later.

@mattab
Copy link
Member Author

mattab commented Apr 28, 2014

semalt is acting up again and causing user frustration eg. in the forums. Let's kill it!

@mattab
Copy link
Member Author

mattab commented Apr 28, 2014

In b1fe857: Fixes #2268 Implement a default Referrer spam blacklist to block semalt.com spammer. (can be extended to other spammers on request!)

; All Visits with a Referrer URL host set to one of these will be excluded.
; If you find new spam entries in Referrers>Websites, please report them here: #2268
; List of known Referrer Spammers, ie. bot visits that set a fake Referrer field:
referrer_urls_spam = "semalt.com"

@anonymous-matomo-user
Copy link

Sites we track are being spammed by

semalt.com
www.semalt.com
semalt.com/crawler.php
www.semalt.com/crawler.php

@mattab
Copy link
Member Author

mattab commented Apr 28, 2014

Good, all these visits will now be excluded!

@anonymous-matomo-user
Copy link

I would like to point out that neither the forums, nor the issue tracker software, nor just about any other publicly accessible website allow lists of spam sites to be posted.

@mattab
Copy link
Member Author

mattab commented May 1, 2014

Please write them without http:// in front, or put them on pastebin.com and put the link here

@anonymous-matomo-user
Copy link

I get the following message when trying to post here:

Submission rejected as potential spam (Akismet says content is spam, Content contained these blacklisted patterns: ...)

Here is a pastebin of the last week:

http://pastebin.com/mpqCp5rZ

@mattab
Copy link
Member Author

mattab commented May 1, 2014

pastebin is good.

Could you please let me know How did you create this list?

@anonymous-matomo-user
Copy link

I have a low traffic site that the spammers seem to love. All of the legitimate referrers come from Wikipedia, search engines, and several forums that I recognize, so in looking at the reports it is easy to see which referrers are spam. Also, the referring spammer links are nearly always just home pages, where as legitimate links look something like www.example.com/index.php?123-Topic-Title, or something slightly different when not a forum, but almost never a home page.

Another tell sign is that ALL of the spam comes from Ukraine and Russian Federation. Sometimes I wonder if the war there is happening to battle who will be final spam king (bad joke). I have observed several IP addresses that generate many different fake visits.

@anonymous-matomo-user
Copy link

I have had a spate of visits from semalt today on a couple of my sites using the following

semalt.semalt.com/crawler.php?

@mattab
Copy link
Member Author

mattab commented May 6, 2014

I've created a new ticket: #5099 Extend list of known Referrer Spammers

Please, let's continue the discussion about annoying referrer spammers, in this #5099 ticket. We'll find a way to win!

@mattab
Copy link
Member Author

mattab commented May 11, 2014

In acb1bc2: Actually call the Referrer Spam check.
Fixes #2268 Refs #5099

@mattab mattab added this to the 2.x - The Great Piwik 2.x Backlog milestone Jul 8, 2014
@mattab mattab self-assigned this Jul 8, 2014
sabl0r pushed a commit to sabl0r/piwik that referenced this issue Sep 23, 2014
…block semalt.com spammer. (can be extended to other spammers on request!)

; All Visits with a Referrer URL host set to one of these will be excluded.
; If you find new spam entries in Referrers>Websites, please report them here: http://dev.piwik.org/trac/ticket/2268
; List of known Referrer Spammers, ie. bot visits that set a fake Referrer field:
referrer_urls_spam = "semalt.com"
sabl0r pushed a commit to sabl0r/piwik that referenced this issue Sep 23, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests

3 participants