Opened 5 years ago

Closed 3 years ago

Last modified 3 months ago

#409 closed New feature (fixed)

Implement first party cookie in Piwik

Reported by: matt Owned by: matt
Priority: critical Milestone: Piwik 1.2
Component: Core Keywords: scalability, cookie, 1st party cookie
Cc: daniel.blanco@… Sensitive: no

Description (last modified by matt)

Currently Piwik is using several third party cookies. we want Piwik to create, by default, 1st party cookies only. This is mainly for privacy reasons, but also for better accuracy in counting unique visitors (1st party cookies are more often accepted and less often deleted by users)

This ticket is a requirement for #134 and #1984

Change History (92)

comment:1 Changed 5 years ago by matt (mattab)

  • Milestone changed from DigitalVibes to Stable release

comment:2 Changed 5 years ago by matt (mattab)

  • Keywords scalability cookie 1st party cookie added
  • Summary changed from creating one cookie per website is not scalable > one cookie per piwik install to creating one cookie per website doesn't scale > one cookie per piwik install

comment:3 Changed 5 years ago by matt (mattab)

  • Description modified (diff)

comment:4 Changed 5 years ago by matt (mattab)

  • Description modified (diff)
  • Sensitive unset
  • Summary changed from creating one cookie per website doesn't scale > one cookie per piwik install to creating one cookie per website doesn't scale > need server side cookie persisted data store

comment:5 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

comment:6 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

comment:7 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

comment:8 Changed 4 years ago by brutuscat

+1 for this

Any news? We have piwik deployed to track widgets views (LOTS of hits from differents domains) and we are forced to increase header size in apache...

comment:9 Changed 4 years ago by ts77

same issue here. I already had to increase allowed header size in nginx 2 times with just a couple thousand sites.

comment:10 Changed 4 years ago by matt (mattab)

This is planned to be fixed before Piwik 1.0, which means in the next 2 months. If you can help with implementation or testing, please let us know. This is def a high priority issue.

comment:11 Changed 4 years ago by ts77

I would love to help with testing

comment:12 Changed 4 years ago by matt (mattab)

  • Milestone changed from 4 - Piwik 1.0 - Stable release to 1 - Piwik 0.7 - DigitalVibes

comment:13 Changed 4 years ago by matt (mattab)

  • Priority changed from major to critical

We should do the quick fix solution for 1.0, ensuring we store the last websites data, up to a reasonnable limit (1kb?). If a cookie does on average 200b we could still store 5 sites without failing as it is now.

We could then do the scalable long term solution post 1.0.

comment:14 Changed 4 years ago by matt (mattab)

The goal would be to slightly update the Cookie mechanism in Tracker to have it store a total max of 1kb, discarding older tracking cookies.

comment:15 Changed 4 years ago by vipsoft (robocoder)

Long term solution should also look at the race condition is #1107 and multi-site "ignore" cookie in #1376.

comment:16 Changed 4 years ago by matt (mattab)

  • Owner set to matt

I will implement the quick fix..

comment:17 Changed 4 years ago by matt (mattab)

(In [2777]) Refs #409

  • Quick fixes; ensuring tracking cookies never exceed 1k. it was surprisingly simple to implement, nice...
  • also adding small test failure script in misc/

comment:18 Changed 4 years ago by matt (mattab)

  • Milestone changed from 1 - Piwik 0.7 - DigitalVibes to Features requests - after Piwik 1.0

comment:19 Changed 3 years ago by zjuul

Any news on when Piwik is going to support 1st party cookies?

3rd party cookies are a less well-accepted. Not only by browsers, but also by people.
I think it'll be good for stats, for Piwik PR and Piwik acceptance to switch over.

thanks!

comment:20 Changed 3 years ago by nightwolf67

comment:21 Changed 3 years ago by matt (mattab)

  • Milestone changed from Features requests to Piwik 1.x

comment:22 follow-up: Changed 3 years ago by matt (mattab)

When implemented, we should also have the PiwikTracker api class set the 1st party cookie forwarded from the piwik server response.

comment:23 Changed 3 years ago by vipsoft (robocoder)

In [3544], I added core/Tracker/Cookie.php to encapsulate the ignore_cookie. But it too suffers from the third-party cookie issue.

comment:24 in reply to: ↑ 22 Changed 3 years ago by vipsoft (robocoder)

Replying to matt:

When implemented, we should also have the PiwikTracker api class set the 1st party cookie forwarded from the piwik server response.

The first-party "cookie" will actually be a UUID (not necessarily rfc4122 compliant) generated by piwik.js and passed to piwik.php via a new parameter. Any allowed third-party cookies will continue to be signed and sent via the Cookie: header.

The tracker session table will map first and third party visitor id_cookies (plus idsite to act as indices) to rows that contain the former cookie store.

comment:25 Changed 3 years ago by vipsoft (robocoder)

  • Milestone changed from 1.x - Piwik 1.x to 1.2 - Piwik 1.2

comment:26 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

comment:27 Changed 3 years ago by matt (mattab)

Use cases for this feature:

  • User tracks one main domain name
    • standard use case, there is only one set of cookie
  • User tracks domain name AND many subdomains within one Piwik website
    • cookies are shared across all subdomains, via a call to setCookieDomain()
  • User tracks domain name in one Piwik website, and other subdomains in other Piwik websites
    • cookies are NOT be shared across subdomains when setCookieDomain() is not called
  • User tracks one domain name under several Piwik websites (ie. sepearate sections in separate Piwik website)
    • cookies are NOT shared if setCookiePath() was called with the path ot set the cookie to. Similar to GA
  • User tracks one domain name, but specific pages are different Piwik websites - for example when tracking a 'user page' on a social network type website. If the URL is not in a sub-directory, then first party cookies will be shared across all websites. If we had cookies for each page, then we would quickly overflow the cookie limit (assuming visitors view many user pages). This use case is not supported in Piwik.
  • User tracks several domain names, inside one Piwik website - This use case is not covered in this proposal: cookies will NOT be shared across domains. This is what setAllowLinked GA feature does, but we are OK not implementing this at this stage.

Requirements piwik.js

  • New cookie _pk_id
    • Valid 2 years after the latest page view
    • Contains a 64b int UUID generated on cookie create. How to build a random good UUID? keeping first 16 bytes of md5 would work well (need 16b,not 8 only, since it is hex string)
    • Contains timestamp of cookie creation date, in UTC and seconds Math.round(new Date().getTime() / 1000) This will be used to process 'Days to conversion' for goal conversions.
    • Contains visits count, initially 1 (updated when _pk_ses is created)
    • Contains timestamp of last page view of the last visit before this visit. This is used to process "Days since last visit" #583 and "Days to purchase" #2031
  • New cookie _pk_ses
    • Valid 30minutes after the latest page view
    • Contains no data
    • Every time _pk_ses is created, increase _pk_id visits counter by 1. This will be used to report "Visits to conversion"
  • New cookie _pk_ref
    • Valid 6 months, from date of creation.
    • Contains ref URL, truncated at 1024b
    • Contains time at which ref URL was set
    • The referer URL set in this cookie depends on first/last referer attribution. Also, a direct entry will always be overwritten by non direct referers. Pseudo code:
        IF the visit is new (ie. there was no cookie _pk_ses when track* was called initially)
        AND there is a referer URL which domain is not the current domain, or any subdomain set in setDomainNames
        AND (_pk_ref is empty // if _pk_ref cookie is not set, we always set it
             OR setConversionAttributionFirstReferer == false // if _pk_ref cookie is already set, but overwrite the value since we want to attribute last known referer
             OR _pk_ref is set AND hostname of _pk_ref URL is the current domain, OR any subdomain // the _pk_ref was set to a referer, but as we evaluate this URL again now, it seems this URL does not fit the spec. This could happen if a _pk_ref URL was set earlier, and then user updated website to setDomainNames(..). We want to improve visitors cookies data in this case.
             )
      THEN update _pk_ref with current referer URL truncated 1k
      
    • To test a URL hostname, we can simply use JS .indexOf as it will do the job nicely and be easier to maintain than parsing URLs properly
  • All new cookies must be as space efficient as possible, ie.
    • no named index for 'arrays like' cookies, just use a . separator for values
    • records as little info as possible, and always truncate when user input data
  • All 1st party cookies are sent along with each request to piwik.php
    • &_id=UUID_IN_PK_ID
    • &_idts=UUID_CREATED_TIMESTAMP_IN_PK_ID
    • &_idvc=VISITS_COUNT_IN_PK_ID
    • &_idn=1
      • If _pk_id was created on this page, set _idn=1, otherwise set _idn=0. This means idnew, ie. 'new visitor' (or 'returning visitor')
    • &_ref=ENCODED_URL_IN_CONTENT_PK_REF
    • &_ses=1
    • &_viewts=TIMESTAMP_OF_LAST_PAGE_VIEW_OF_LAST_VISIT
    • &_refts=TIMESTAMP_OF_REFERRAL_URL
  • Cookies should work on 'localhost' or 'intranet' host names (but JS cookies need a proper domain name to be set)
  • API
    • setCookieDomain() - '.example.org' to set to all subdomains as well
    • setCookieNamePrefix() - to change _pk to something else
    • setCookiePath() - sets the path on which to set the cookie. Useful to track a specific section of a website separately from the main website (unique visitors, referer attributions, etc.).
    • setVisitorCookieTimeout() - to change default 2yo
    • setConversionAttributionFirstReferer() - by default, we attribute last referer set for a visit (call setConversionAttributionFirstReferer(false) in constructor) but if called by used, we would attribute a conversion to the first referer set in a past visit
    • getVisitorId() - returns the 16 characters ID from the cookie (without the visit count & other info)

Requirements piwik.php

  • Update code to get the various new parameters and use them in Tracker
  • Allow to use third party cookies with a setting. If enabled, Piwik will use 1st party AND 3rd party cookies. [Tracker] use_third_party_cookies = 0 by default
  • add log_conversion.days_to_conversion that counts days to conversion trusting the js timestamp (better than nothing)
  • add log_conversion.visits_to_conversion that counts visits until conversion
  • delete from schema log_conversion.referer_idvisit since it is unused
  • Add new report in Piwik "Days to Conversion"
  • Add new report in Piwik "Visits to Conversion"

Documentation:

  • Add doc of new public JS API functions in the JS doc

Ideas for V2

  • Set the 1st party cookies in PiwikTracker so that this is consistent with piwik.js
  • A concern is that cookie jar size will be potentially large because of _pk_ref containing full ref URL, ie. around 1k (since we truncate URL at 1k). A fix for this would be to do a basic parsing of the referer in piwik.js (like GA and other WA tools do). For example parsing keywords of top 50 search engines. I think we don't need to do this in V1 since it is really too much effort / QA, but worth keeping in mind for a future improvement.

comment:28 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

comment:29 Changed 3 years ago by matt (mattab)

Also I think the piwik_ignore cookie should stay 3rd party (and signed), to avoid abuse.

comment:30 Changed 3 years ago by matt (mattab)

(In [3634]) Fixes #1916

Now always checking in the DB if we saw the visitor earlier. The cookie also becomes much smaller.
Renamed the setting enable_detect_unique_visitor_using_settings now called trust_visitors_cookies as it is different logic, and should only be enabled in intranet where IP is same for all users.
This will also help getting 1st party cookie implemented Refs #409

comment:31 Changed 3 years ago by matt (mattab)

  • Summary changed from creating one cookie per website doesn't scale > need server side cookie persisted data store to Implement first party cookie in Piwik

comment:32 Changed 3 years ago by matt (mattab)

Also we need to think about subdomains tracking and first party cookies. How does GA handle this for example? see for reference: http://www.roirevolution.com/blog/2011/01/google_analytics_subdomain_tracking.php

and http://www.dannytalk.com/how-to-track-sub-domains-cross-domains-in-google-analytics/

comment:33 Changed 3 years ago by matt (mattab)

comment:35 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

comment:36 Changed 3 years ago by matt (mattab)

  • Type changed from Bug to New feature

comment:37 Changed 3 years ago by vipsoft (robocoder)

matt: do you still want this one? It doesn't appear in the request. To manage this on the client, requires also keeping track of the timestamp for the most recent page view of the current visit.

  • Contains timestamp of last page view of the last visit before this visit. This is used to process "Days since last visit" #583 and "Days to purchase" #2031

comment:38 Changed 3 years ago by vipsoft (robocoder)

Because of this condition:

AND there is a referer URL which domain is not the current domain, or any subdomain set in setDomainNames

_pk_ref will never contain a referer for the current domain or subdomain; so, this expression will never be true:

OR _pk_ref is set AND hostname of _pk_ref URL is the current domain, OR any subdomain

comment:39 Changed 3 years ago by vipsoft (robocoder)

re: comment:38 - oops, I didn't scroll all the way to the right to read your comment; got it

The timestamp in comment:37 is still an open question.

Also, you mention that _pk_ref "Contains time at which ref URL was set", but this timestamp doesn't appear in the request either. (If I store this in the cookie, I need to change the delimeter, as the referrer may contain '.')

comment:40 Changed 3 years ago by matt (mattab)

To manage this on the client, requires also keeping track of the timestamp for the most recent page view of the current visit.

OK that's right, this timestamp can also be saved in the cookie (ie. _pk_ses cookie?)

"Contains time at which ref URL was set", but this timestamp doesn't appear in the request either. (If I store this in the cookie, I need to change the delimeter, as the referrer may contain '.')

what do you mean by "it doesnt appear in the request"? I mean, the _pk_ref must contain the URL as well as the client timestamp when the cookie was last updated with a ref URL.

Thx

comment:41 Changed 3 years ago by vipsoft (robocoder)

I mean your specification doesn't show any parameters in the request to piwik.php for these timestamps.

&_viewts=TIMESTAMP_OF_LAST_PAGE_VIEW_OF_LAST_VISIT
&_refts=TIMESTAMP_OF_REFERRAL

If I understand _pk_ses correctly, the timestamp of the most recent page view (cvts) would have to instead be stored in _pk_id.

comment:42 Changed 3 years ago by matt (mattab)

Indeed, I now updated the request to add these 2 timestamp

Also I'm not sure what I meant by: &_ses=1 in the URL... ? maybe this is not useful.

comment:43 Changed 3 years ago by vipsoft (robocoder)

Maybe if _ses=0, the server should use third-party cookies?

comment:44 Changed 3 years ago by vipsoft (robocoder)

(In [3783]) refs #409 - first party cookies

  • API changes:
    • added: setCookieNamePrefix(cookieNamePrefix)
    • added: setCookieDomain(domain)
    • added: setCookiePath(path)
    • added: setVisitorCookieTimeout(timeout) - defaults to 2 years since last page view
    • added: setSessionCookieTimeout(timeout) - defaults to 30 minutes since last activity
    • added: setReferralCookieTimeout(timeout) - defaults to 6 months from the first visit
    • added: setConversionAttributionFirstReferer(enable)
    • added: getVisitorId()
      • for asynchronous tracking, use:
        	var visitorId;
        
        	_paq.push(function () {
        		visitorId = this.getVisitorId();
        	});
        
  • Cookie notes:
    • The default cookie path is '/'. This might be viewed as a potentially insecure default because it allows cookies to be shared across directories on the same domain. (Again, see the social network example.) This is unfortunately, a necessity. If we leave the path blank, the behaviour is undefined (i.e., browser or browser-version dependent). For example, earlier versions of Firefox would default to '/'; later versions default to the origin path.
    • I was hoping to avoid this, but I added a hash to the cookie content similar to GA's setAllowHash(). This is needed for two reasons:
      1. Cookies are uniquely identified by the tuple (key,domain,path). Hashing only the domain is a bug. (See "social network website" use case.)
      2. There's a long-standing cookie+subdomain bug in Firefox (Gecko) dating back to 1.0 that leaks cookies from "example.com" (not ".example.com") to "xyz.example.com". @see https://bugzilla.mozilla.org/show_bug.cgi?id=363872
  • changed internal setCookie() method to take expiry time in milliseconds (was days)
  • removed internal dropCookie() method as it was never used

@todo Missing unit tests and cross browser testing

refs #739 - piwik.js improvements

  • jslint 2011-01-09
  • new unit tests (integrated jslint, is_a functions, sha1(), utf8_encode(), etc)
  • use ECMAScript String.substring() instead of non-standard (although widely supported) String.substr()
  • implement domainFixup() so "example.com" and "example.com." are equivalent
  • API changes:
    • added: killFrame() - a frame buster
    • added: redirectFile( url ) - redirect if browsing off-line, aka file: buster; url is where to redirect to
    • added: setHeartBeatTimer( delay ) - send heart beat 'delay' milliseconds after initial trackPageView(); set to 0 to disable
    • removed: piwik_log() - legacy tracking code; see trackLink()
    • removed: piwik_track() - legacy tracking code; see trackPageView()
    • removed: setDownloadClass() - deprecated; see setDownloadClasses()
    • removed: setLinkClass() - deprecated; see setLinkClasses()

refs #752 - track middle mouse button clicks (via mousedown+mouseup pseudo-click handler); defaults to tracking true "clicks"

  • API changes:
    • modified: addListener( element, enablePseudoClickHandler = false )
    • modified: enableLinkTracking( enablePseudoClickHandler = false )

refs #1984 - custom variables vs custom data

@todo These are just stubs.

  • API changes:
    • added: setCustomVar(slotId, key, value, opt_scope) - scope is 1 (visitor), 2 (sesson), 3 (page)
    • added: getCustomVar(slotId)
    • added: deleteCustomVar(slotId)
  • API changes for consistency:
    • added: setCustomVar(slotId, obj, opt_scope)
    • added: setCustomData(key, value)
    • for the equivalent of deleteCustomData(), use:
          tracker.setCustomData(null);
      

comment:45 Changed 3 years ago by vipsoft (robocoder)

(In [3784]) refs #409 - use getCookieName() in hasCookies() test

comment:46 Changed 3 years ago by vipsoft (robocoder)

  • Resolution set to fixed
  • Status changed from new to closed

Mark as fixed. Future commits to #1984.

comment:47 Changed 3 years ago by matt (mattab)

  • Resolution fixed deleted
  • Status changed from closed to reopened

I still have to do some work :)

  • Requirements piwik.php
  • Integration testing

Also,

comment:48 Changed 3 years ago by vipsoft (robocoder)

ok. on my todo list.

comment:49 Changed 3 years ago by matt (mattab)

JS code review

  • great commit, the Piwik JS api is now very much excellent and full featured.

Questions/feedback

  • Are ref URLs encoded by default? in the cases where: it comes from the browser itself, OR when it was set via setReferrerUrl ?
  • If ref URL can contain a space (ie. sometimes not encoded), it will record a bogus cookie - should ref.split(' '); be ref.split(' ', limit = 1) ?
  • Referrer url doesn't seem to be truncated at 1k, important for keeping cookie space in control
  • Running the new JS for the first time, I see in the http request:
    _ref	undefined
    _refts	undefined
    _viewts	undefined
    
    I think these should be set only when they have a value
  • can all cookie timeout methods take seconds as input? this is less risky (if they enter the timeout in seconds but expects milliseconds, things will break), but also more consistent/user friendly
  • getVisitorId() returns undefined (visitorId not set)
  • I looked at the cookie after some testing, and noticed the last field of 'id' cookie is undefined: PREFIXid.1fffd42e=fb6f5c3ec259b00e.1295573291.1.1295573291.undefined;
  • I don't think we need enableServerCookies(): enabling 3rd party cookies will be done in server side via config setting, will the client side have a use?

Pending more items as well docs

  • pending unit tests covering new functions and as much code coverage as possible
  • pending the run of these unit tests on most browsers to check errors are not triggered (most important) and check that cookies / requests are set correctly (to avoid an error such as #1962)

comment:50 follow-up: Changed 3 years ago by matt (mattab)

  • For compability with https pages, the cookie secure flag should be set automatically based on the current URL protocol (in setCookie())

comment:51 in reply to: ↑ 50 ; follow-up: Changed 3 years ago by vipsoft (robocoder)

Replying to matt:

  • Are ref URLs encoded by default? in the cases where: it comes from the browser itself, OR when it was set via setReferrerUrl ?

Browser-dependent. We have to encode it in case it isn't.

  • If ref URL can contain a space (ie. sometimes not encoded), it will record a bogus cookie - should ref.split(' '); be ref.split(' ', limit = 1) ?

Good point. I've changed it to use limit=1 and '.' as a separator (consistent with id).

  • Referrer url doesn't seem to be truncated at 1k, important for keeping cookie space in control

No, it isn't. The spec is 4K. The actual limit is browser dependent, and also subject to server configuration limits.

  • Running the new JS for the first time, I see in the http request:

I'll fix that.

  • can all cookie timeout methods take seconds as input? this is less risky (if they enter the timeout in seconds but expects milliseconds, things will break), but also more consistent/user friendly

This is for consistency with G.

  • getVisitorId() returns undefined (visitorId not set)

I'll fix that.

  • I looked at the cookie after some testing, and noticed the last field of 'id' cookie is undefined: PREFIXid.1fffd42e=fb6f5c3ec259b00e.1295573291.1.1295573291.undefined;

Same bug as running JS for the first time.

  • I don't think we need enableServerCookies(): enabling 3rd party cookies will be done in server side via config setting, will the client side have a use?

Another analytics offers a thirdParty setting via JS. Removed for now.

Replying to matt:

  • For compability with https pages, the cookie secure flag should be set automatically based on the current URL protocol (in setCookie())

Ok.

comment:52 Changed 3 years ago by vipsoft (robocoder)

(In [3789]) refs #409 - remove enableServerCookies(); fix bugs found in matt's review

comment:53 Changed 3 years ago by vipsoft (robocoder)

(In [3794]) refs #409 - set secure flag in cookies per comment:51

comment:54 Changed 3 years ago by vipsoft (robocoder)

(In [3797]) refs #409 - rename setConversionAttributionFirstReferer to setConversionAttributionFirstReferrer for correctness/consistency, i.e., referrer/referral

comment:55 Changed 3 years ago by vipsoft (robocoder)

(In [3814]) refs #409 - reorg js unit tests

comment:56 in reply to: ↑ description Changed 3 years ago by patioheater12

comment:57 Changed 3 years ago by vipsoft (robocoder)

(In [3817]) refs #409 - added setDoNotTrack(bool); updated jslint to 2011-01-26

comment:58 Changed 3 years ago by vipsoft (robocoder)

(In [3818]) refs #409 - small optimization to r3817

comment:59 Changed 3 years ago by vipsoft (robocoder)

_ref is showing up undefined in my logs; I'll fix this and add some more unit tests (tomorrow?)

comment:60 in reply to: ↑ 51 Changed 3 years ago by matt (mattab)

Replying to vipsoft:

Replying to matt:

  • Are ref URLs encoded by default? in the cases where: it comes from the browser itself, OR when it was set via setReferrerUrl ?

Browser-dependent. We have to encode it in case it isn't.

OK, should JS ensure all URLs are encoded before working on them?

  • Referrer url doesn't seem to be truncated at 1k, important for keeping cookie space in control

No, it isn't. The spec is 4K. The actual limit is browser dependent, and also subject to server configuration limits.

A cookie too big is not desirable as it will show up in all http request and slow the page load,plus it could cause other problems with cookie space.

we must truncate at some lenght, maybe 2k?

  • can all cookie timeout methods take seconds as input? this is less risky (if they enter the timeout in seconds but expects milliseconds, things will break), but also more consistent/user friendly

This is for consistency with G.

OK, I vote for using seconds as ms doesn't make sense in this case. Let's not follow GA API since it will cause user errors (and we have already a few differences anyway)

OK for other modifications, good stuff. Is there anything still open appart from the points above?

comment:61 follow-up: Changed 3 years ago by vipsoft (robocoder)

We already assume URLs are decoded when working on them. Values are decoded by getCookie; conversely, values are encoded by setCookie and sendRequest. I don't see any need to change this.

This isn't a problem that we need to solve. Users may want to be aware of potential limits, but they shouldn't be artificially constrained. Tracking requests are sent asynchronously, and shouldn't affect page load time. Loading piwik.js (minified at 14K), when it isn't in the cache, has more impact on page load times.

I'll change the API methods to expect seconds, but we should do so for all methods. For setLinkTrackingTimer() this will be a compat-buster.

As an observation, when Piwik is on the same domain as the site being tracked, first party cookies will be sent in the Cookie: header, in addition to being in the tracking request. Some ideas would be to (a) leave this as is, (b) add a method to disable first party cookies, or (c) detect when the site being tracked and tracker are on the same domain and in this case, shorten the request string by excluding the cookie values.

comment:62 Changed 3 years ago by vipsoft (robocoder)

(In [3846]) refs #409 - fix _ref=undefined bug caused by split('.', 1); also external API methods now expect seconds, and convert to milliseconds internally

comment:63 in reply to: ↑ 61 Changed 3 years ago by vipsoft (robocoder)

Replying to vipsoft:

I'll change the API methods to expect seconds, but we should do so for all methods. For setLinkTrackingTimer() this will be a compat-buster.

Done.

As an observation, when Piwik is on the same domain as the site being tracked, first party cookies will be sent in the Cookie: header, in addition to being in the tracking request. Some ideas would be to (a) leave this as is, (b) add a method to disable first party cookies, or (c) detect when the site being tracked and tracker are on the same domain and in this case, shorten the request string by excluding the cookie values.

The problem with (c) is that the cookies are unsigned, so the server discards the value.

comment:64 Changed 3 years ago by vipsoft (robocoder)

(d) detect when the site being tracked and tracker are on the same domain, and in this case, automatically disable first party cookies

comment:65 Changed 3 years ago by vipsoft (robocoder)

for (b) and (d), cvar would be an exception.

comment:66 Changed 3 years ago by vipsoft (robocoder)

fwiw I think the redundancy in the Cookie: header is a low priority -- it isn't a problem we need to solve now.

comment:67 Changed 3 years ago by matt (mattab)

setLinkTrackingTimer is fine in milliseconds, since it requires this precision (which is not needed/desired for cookie timeouts). We can clarify what parameter we expect in the documentation and in the parameter names. I vote for revert as introducing an API change in the documented method at this stage is not possible - thoughts?

My concern with cookie sizes was purely around slowing down the whole website experience, since 1st party cookies are in the cookie headers. So with a 2k cookies, fetching 10 images and 5 other resources will cause an overhead of 2k * 15 = 30k data transmitted over http, which could result in worsen user experience. I still think we must truncate to 1 or 2k, but agreed that this should be documented and maybe could be changed via a new setConversionReferrerUrlTruncation() or something similar.

comment:68 Changed 3 years ago by vipsoft (robocoder)

(In [3852]) refs #409 - revert API change to setLinkTrackingTimer()

comment:69 Changed 3 years ago by vipsoft (robocoder)

Since the conversion referral URL is set (if needed) at the beginning of a new session and used (currently) at most once per visit, one idea would be to store this server side. This would minimize the cookie size and transmission overhead; the tradeoff is executing some extra (albeit infrequent) SQL on the server.

comment:70 Changed 3 years ago by vipsoft (robocoder)

There's also a small privacy/security issue with storing the referral URL in a cookie.

  • It's persistent (unlike document.referer).
  • May be targeted by a browsing history hijack.
  • It could be used for competitive intelligence by third-parties. (e.g., Microsoft's Customer Experience Improvement Program)

comment:71 Changed 3 years ago by matt (mattab)

vipsoft, I updated my comment about the visitor log table new feature, see http://dev.piwik.org/trac/ticket/1434#comment:11 - I think it would be best to go this way in the future indeed. Just more overhead for more features :)

comment:72 Changed 3 years ago by vipsoft (robocoder)

Ok. Hopefully it won't take as long as it did this ticket... ;)

(The space/transmission overhead gets worse when there are multiple trackers on the same page, using different cookie name prefixes.)

comment:73 Changed 3 years ago by vipsoft (robocoder)

  • Status changed from reopened to new

comment:74 Changed 3 years ago by vipsoft (robocoder)

(In [3868]) refs #409 - add back legacy tracking; update jslint

comment:75 Changed 3 years ago by matt (mattab)

(In [3888]) Refs #409

  • Deprecated setting, moved to JS API instead
    ; if set to 0, any goal conversion will be credited to the last more recent non empty referer. 
    ; when set to 1, the first ever referer used to reach the website will be used
    use_first_referer_to_determine_goal_referer = 0
    
  • New setting to allow using 3rd party cookies for visitor ID cookie only
    ; Piwik uses first party cookies by default. If set to 1, 
    ; the visit ID cookie will be set on the Piwik server domain as well
    ; this is useful when you want to do cross websites analysis 
    use_third_party_cookies = 0
    
  • Tracker uses 1st cookie values for Goals referrer attribution
  • removed log_conversion.referer_idvisit field, unused

comment:76 Changed 3 years ago by matt (mattab)

(In [3892]) Refs #409

  • Adding new metrics: Visit count, Days since first visit, Days since last visit, these are new fields in the table
  • The new Reports will be done in 1.3
  • Reading the timestamps and visit count from the 1st party cookie
  • Fixing tests that are using the 1st party cookies (added also tests for the 3rd party cookie use case)

comment:77 Changed 3 years ago by matt (mattab)

(In [3893]) Refs #409 Disabling getVisitorId() for now as it doesn't work when called before track* (the object should init the uuid member before getRequest())

Would be nice to have though, to make it trivial to get the visitorId from piwik into other systems (Salesforce, Form fill), and then also allow querying the Live! API to fetch data about this visitor.

comment:78 Changed 3 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

I think all outstanding points, appart from JS tests and JS Doc, are in trunk and working?

comment:79 Changed 3 years ago by vipsoft (robocoder)

We can't reliably retrieve an existing uuid until the cookie domain, path, and prefix are definite. If we pre-initialize it and then re-read the cookie each time domain, path, or prefix is changed, then the side effect is that the uuid may be differ depending on when getvisitorid is called.

Vote to either re-enable the as-implemented behaviour or remove this feature entirely.

comment:80 Changed 3 years ago by matt (mattab)

My idea was to have getVisitorId() call a loadIdCookie or similar, that would only pre-load this cookie so we can read it. User should call the getVisitorId when all setCookie* have been called, but he shouldn't have to call it after track*, since he might require it before we can wait for the request (eg. when sending a form in the page, wanting to attach the Piwik ID)

comment:81 Changed 3 years ago by leo12

comment:82 Changed 3 years ago by leo12

comment:83 Changed 3 years ago by vipsoft (robocoder)

(In [3939]) refs #409:

  • always use Crockford's JSON module (renamed to JSON2) to workaround broken "native implementations"
  • add JSON unit tests
  • revert [3893] and [3900]; rewrite getVisitorId() per comment:80
  • refactor browser feature detection for fingerprinting (used to generate uuid)
  • setDomains() now takes either '*.domain' or '.domain'
  • Safari emits warnings for Content-Length and Connection as "unsafe headers" in XHR POST request

refs #1984:

  • partially revert [3882] in order for the unit tests to run
  • fix inconsistency in getCustomVariable() depending on whether it is loaded from memory or from a cookie

refs #2078 Webkit bug ("Failed to load resource") when link target is the current window/tab

  • requires further discussion because the workaround may not be desirable behavior, i.e.,
    if ((new RegExp('WebKit')).test(navigatorAlias.userAgent)
        && (!sourceElement.target.length || sourceElement.target === '_self')
        && linkType === 'link')
    {
        // open outlink in a new window
        sourceElement.target = '_blank';
    }
    

comment:84 Changed 3 years ago by vipsoft (robocoder)

(In [3960]) refs #409 - add site ID to cookie name; shorten domain hash to 16 bits (4 hexit characters)

This is a hybrid between the previous implementation and what I proposed.

  • Adding idsite to the cookie name means subdomains that track using different site IDs can still use/share subdomain cookies
  • Keeping the domain hash in the cookie name will make it easier in future to delete invalid cookies (integrity check)

Decided not to auto-set www.example.com's cookie domain=.example.com -- as the convenience introduces side-effects, and I have a feeling will be more trouble than beneficial. Will continue to leave it to the user to explicitly set the cookie domain. Users should be advised to redirect example.com to www.example.com (or vice-versa) to:

a) to avoid separate cookies between the two domains, and
b) to improve SEO. (Google for "seo www vs no www".)

comment:85 Changed 3 years ago by sam_gabriel

the revision [3960] is leading to a lot of breaking on our sites. We track multiple siteIds using the same domain name. On each request we call trackPageView twice once for each siteid. The new mechanism of adding the site id to the cookie name is causing the headers to overflow the server buffers. Leading to numerous errors on our server.

The goal that was to be achieved by this change i believe was to be able to track different site Ids for the sub domains. But if that is a requirement of the application then the application should do so by calling trackPageView twice or three times.

The current implementation would lead to an endless increase in the number of cookies as the user moves from one site to the next. which is what is happening on our side.

comment:86 follow-up: Changed 3 years ago by vipsoft (robocoder)

Sam: it depends how you use piwik.js. (fyi the reason for the hash is mentioned in comment:44)

Are you using two sites ids across the entire site?

Or many more? eg one site-wide, and another that varies/depends on some area of the website? In this scenario, you should use setCookiePath()

Can you see if the TrackSiteByUrl plugin can be adapted for your environment?

comment:87 in reply to: ↑ 86 Changed 3 years ago by sam_gabriel

I looked at comment:44 the only thing I can see regarding the relevance would be the social network example. Unfortunately I couldn't find the description of this use case.

In our setup, we are using the same URL for all the various sites so cookie paths are not going to work.

Regarding the site ids we have one siteId that is site wide and another one based on the client. We have thousands of clients that we track. we developed our own plugin that creates the site during our own account creation process, retrieves the site id and embed that into the db for tracking.

One thing to note here is if you think about it abstractly, you have one user one browser. Can that user really have multiple identities, referral URLs ..etc based on the siteId????!!! I think this fix is trying to do something Piwik shouldn't be responsible for.

This is on a side note, but referral URLs can be monstrous in size. adding them to cookies can be a real pain on the server. if you already have them in the db based on the visitor id/siteId should they really be in the cookie as well?

comment:88 follow-up: Changed 3 years ago by vipsoft (robocoder)

The hash addresses the subdomain cookie leak problem in Firefox.

Each tracker instance can point to a different Piwik server. If you're using cookie domains and/or paths, then it is possible for the cookie contents to be different.

comment:89 in reply to: ↑ 88 Changed 3 years ago by sam_gabriel

But if that is case then you can create the hash based on the piwik domain url instead of based on the siteId.

I still don't understand how adding the site Ids will fix the FF issue. Wouldn't the cookies still leak to the subdomains?

Regarding different siteIds values for subdomains. I might be wrong here but if there are two cookies with the same name if the subdomain is set for one of them and you visited the subdomain, wouldn't that return the one that has the subdomain set?

comment:90 Changed 3 years ago by vipsoft (robocoder)

the hash is only on cookie domain and path

in any case, I think you're focussing too much on the hash

the bigger picture is that your visitors are amassing many, large cookies. What you expect/want is one client side cookie with server-side storage for the bulk of the cookie contents, that can somehow be mapped to one or more tracking site IDs. This wasn't part of the scope of this ticket, so it isn't something Piwik does right now. I'll create a new ticket for this feature request and we'll figure it out from there.

comment:91 Changed 3 years ago by matt (mattab)

See the new ticket at #2680

comment:92 Changed 3 months ago by matt (mattab)

See also #2211 piwik.js: Cross domain tracking

Note: See TracTickets for help on using tickets.