Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#1916 closed Bug (fixed)

Edge case: each page is a new visit

Reported by: matt Owned by: matt
Priority: critical Milestone: Piwik 1.2
Component: Core Keywords:
Cc: Zorro, awesome, ts77 Sensitive: no

Description

When the cookie is somehow read only, old timestamps will be read and new visits generated every pageview for these buggy requests. This could maybe be caused by a Adblock type extension blocking writes to the cookie, but still passing it to the request.

<?php

$host = "piwik-domain.com";

$request = "GET /piwik.php?idsite=2&rec=1&url=http%3A%2F%2Fwww.domain.de%2F&res=1280x1024&h=7&m=57&s=51&cookie=1&urlref=http%3A%2F%2Fwww.domain.de%2F&rand=0.6439636907182041&pdf=1&qt=1&realp=0&wma=1&dir=0&fla=1&java=1&gears=0&ag=1&action_name=Some%20Action HTTP/1.1
Host: $host
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)
Connection: close
Referer: http://www.refdomain.de/somepage
Cookie: piwik_visitor=[INSERT COOKIE DATA]

";

$fsock_fp = fsockopen($host, 80, $errno, $errstr, 10);
fwrite($fsock_fp, $request);

echo '<pre>';
echo $request;
while (!feof($fsock_fp))
{
    echo fgets($fsock_fp, 128);
}
echo '</pre>';

fclose($fsock_fp);

?> 

Change History (14)

comment:1 Changed 3 years ago by matt (mattab)

Maybe a solution would be to consolidate the visits at the beginning of archiving: deleting all visits from the same visitor that happen within 30min ranges.

comment:2 Changed 3 years ago by vipsoft (robocoder)

We should be able to fix this in #409.

comment:3 Changed 3 years ago by matt (mattab)

  • Cc Zorro awesome ts77 added
  • Owner set to matt
  • Priority changed from major to critical

comment:4 follow-up: Changed 3 years ago by vipsoft (robocoder)

It's possible this is caused by bots (e.g., web scrapers). On the initial request, the bot saves cookies to its cookie jar, and on subsequent requests, sends the cookies without updating the cookie jar.

Another possibility is that the Tracker has gotten slower, and that this is a duplicate of #1108, experiencing the race condition where:

  • user browses page A, sending cookie X1
  • before server can respond with updated cookie X2, user browses page B, resending cookie X1

We can mitigate this by calling $this->end() before Piwik_Common::runScheduledTasks().

When we implement #409, we'll only be sending idcookie, so $this->end() can be called even sooner, e.g., as soon as we've confirmed it's a returning visitor. (This will also improve perceived tracker responsiveness.)

comment:5 Changed 3 years ago by awesome

Do you still have problems to reproduce this issue?
I am willing to give you ssh access to my server to analyse this live on an affected machine.

comment:6 Changed 3 years ago by matt (mattab)

awesome, I can replicate so it's OK. stay tuned..

vipsoft, I'm going to force the tracker to check the cookie value on each request. This will be overhead compared to current algorithm, but that's the price to pay for accuracy when bad data is coming in.

Then we'll be pretty close to have 1st party cookie only, since the code will be based on the unique ID.

comment:7 Changed 3 years ago by matt (mattab)

Could also be triggered in use case:

  • go to homepage,
  • before Piwik loads (and with a more than 30min old piwik cookie)...
  • ... middle click and open many other pages

Each piwik request will receive a page view with the old cookie until the new one is set in the browser cookie jar.

comment:8 Changed 3 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

(In [3634]) Fixes #1916

Now always checking in the DB if we saw the visitor earlier. The cookie also becomes much smaller.
Renamed the setting enable_detect_unique_visitor_using_settings now called trust_visitors_cookies as it is different logic, and should only be enabled in intranet where IP is same for all users.
This will also help getting 1st party cookie implemented Refs #409

comment:9 Changed 3 years ago by awesome

Can you provide a patch or will you release a new update soon?

comment:10 Changed 3 years ago by matt (mattab)

Please try the new beta at: http://builds.piwik.org/piwik-1.1.2b1.zip

let me know if it fixes the issue completely :)

comment:11 Changed 3 years ago by awesome

Thanks matt.

I installed the version, let's see what happens. I will report later the day if it worked out.

FYI: I got a JS Alert when I first opened the page :)

There is no/bad markup for form tag

Dunno if this has something to do with Piwik. However it just appeared once, now it's gone even on page reload.

comment:12 Changed 3 years ago by awesome

Matt: Seems to work like a charm with 1.1.2b1! Great work, thanks for your fast help.

I guess the wrong counts cannot be undone in db, right? So my daily (doesn't really matter) but also weekly and monthly data is not usable for analysis anymore!?

Or might there be a way to re-parse the data of the last day?

comment:13 in reply to: ↑ 4 Changed 3 years ago by webapp

Replying to vipsoft:

Another possibility is that the Tracker has gotten slower, and that this is a duplicate of #1108, experiencing the race condition where:

  • user browses page A, sending cookie X1
  • before server can respond with updated cookie X2, user browses page B, resending cookie X1

What happened to us yesterday and today seconds this hypothesis :
After upgrade to 1.1, the issue appeared (visit miscount). Maybe the tracker code got slower, because, indeed, our piwik server load increased.
After upgrade to 1.1.2b1 issue disappeared.
The issue caused the most severe spikes on sites with the most returning visitors, and sites with high number of actions / high action frequency (tracked ajax requests, for an example)

I second awesome's question, is there a way to rebuild visits, and repair yesterday stats (we can code something and contribute it if you give us some hints) ?

comment:14 Changed 3 years ago by matt (mattab)

I haven't tested (WARNING) but a query like this might work:

delete from piwik_log_visit
where visit_server_date = $THE_DATE
and where visitor_idcookie IN (
SELECT visitor_idcookie from piwik_log_visit 
where visit_server_date = $THE_DATE
group by visitor_idcookie
having count(*)> 1
)

This will delete all visits from visitors beyond their first visit on $THE_DATE and therefore keep only one visit per visitor on that day

Please test on a test dataset before applying to your real one (or use on a copy of the table)

Note: See TracTickets for help on using tickets.