Opened 4 years ago

Closed 3 years ago

#1736 closed New feature (fixed)

Segmentation

Reported by: vipsoft Owned by: matt
Priority: critical Milestone: Piwik 1.2
Component: Core Keywords:
Cc: Sensitive: no

Description (last modified by matt)

The goal is to implement segmentation in Piwik. First a simpler version allowing pre-chosen segments and reports, then evolve into a more open segmentation model similar to what GA offers.

Segmentation: API

All get* functions returning analytics reports will get a new parameter $segment. This parameter, by default empty, means we return data for all visitors (current behavior).

When set eg. $segment = "country==fr"
the report returned is segmented to visits having this custom value.

We must define and document a list of available dimensions to query, eg.:

  • country (match to location_country)
  • keyword (match to referer_keyword)
  • customVarName1 (match to custom_name_1 see #1984 )
  • etc.

Later we can imagine extending the syntax to support AND and OR eg. $segment = "customName1==loggedIn,customValue1=yes" or even support 'contain', 'is not' operators, etc.

Segmentation: Archiving

Archiving reports for segments mean: filter these segments, then aggregate.

Queries doing the Piwik archiving have to be updated
WHERE ....

AND segment1 = $value

GROUP BY label, segment1

For example segment= country==fr, the query for best browsers would become select count ... where ... and location_country=fr ..... group by visitor_browser, location_country

Archives stored must also somehow store the segment query as well as site,date1,date2: do we need a new key in the archive tables? or probably this can be encoded in the archive.name attribute somehow (one more reason we need to keep CONCAT(custom names + values) small in length)

Segmentation: Known segments control list

To keep things fast for 'known segments', we can
allow user to create a list of segments he is interested in.

User would also provide a list of reports to pre-process, rather than pre-processing all reports.

For example:
'keyword==Piwik' known valid segment

=> array('Referer.getKeywords','Page.getPageUrls', .. ) reports forced to be pre-processed

In this case Piwik will, during archiving, only process these reports and not more. If later the user would want to access more reports, and if logs are still available, he could change the list which would affect reports going forward.

In V1, if a requested segment/report pair wasn't pre-processed, archiving will return no data.

New setting

  • Enable segmentation from the API (and UI) since this could impact performance for anonymous allowed piwik (eg. piwik demo). Disabled by default.
  • Enable segmentation preprocessed reports control list (see above).
  • Update the code to handle multiple segmentation constraints AND and OR, which would allow any kind of segmentation based on visit attributes

Performance Impact

Possibly huge depending on data set. Real time performance likely to be slow with mysql, so pre-processing reports highly recommended (defining a list of known reports, see above)

Feature requests
See other ticket #2092

Please submit any feedback/question.

Change History (17)

comment:1 Changed 3 years ago by matt (mattab)

  • Description modified (diff)
  • Summary changed from Segmentation to Custom reports

comment:2 Changed 3 years ago by matt (mattab)

  • Summary changed from Custom reports to Segmentation

comment:3 Changed 3 years ago by matt (mattab)

  • Description modified (diff)
  • Milestone changed from Feature requests to 1.x - Piwik 1.x
  • Owner set to matt

comment:4 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

comment:5 Changed 3 years ago by matt (mattab)

  • Milestone changed from 1.x - Piwik 1.x to 1.2 Piwik 1.2

I am going to work on Phase 1: API / Archiving modification to allow custom segment querying.

Phase 2 (post 1.2 release) will include UI modifications to create, edit, delete and visualize segments.

comment:6 Changed 3 years ago by matt (mattab)

(In [3858]) Cosmetic changes/refactoring preparing for code reuse for Segmentation Refs #1736

comment:7 Changed 3 years ago by matt (mattab)

  • Priority changed from normal to critical

comment:8 Changed 3 years ago by matt (mattab)

(In [3869]) Various code changes to prepare for Segmentation refs #1736

  • API function variables names starting with _ will not be displayed in the API documentation page
  • Properly setting Request array in the use case where a controller calls an API which itself calls another API
  • Referer-> Referrer rename in variable names


comment:9 Changed 3 years ago by matt (mattab)

(In [3870]) Refs #1736

  • API functions returning data now have a new optional 'segment' parameter. segment can define a Visitor segment dynamically that will be applied to the report. For example, &segment=country==FR;actions>=3 (AND, OR supported. Only == and != supported currently, but easy to add more)
  • For API requests with a segment parameter, the reports will now be processed on the fly, and only the requested plugin report will be archived.
  • All plugins now define the 'segments', with a name, category, SQL field, filter, etc.
  • Simplifying archiving code a bit
  • Fixes #2069 Exit rate computation
  • New widget: lists the Top Keywords for a page URL, Widgets for a website only. Maybe later we could create a widget category "For your site"?
    • This widget is pretty cool SEO wise, but maybe the PHP snippet should do caching (not so good hitting the API on each page view... but why not?)
  • still to do!

comment:10 Changed 3 years ago by matt (mattab)

(In [3871]) Refs #1736 Sorting segments list to prevent random order test fail
Renaming one segment

comment:11 Changed 3 years ago by matt (mattab)

(In [3873]) Refs #1736 Adding new setting to disable Segmentation for Anonymous user, as a preventive measure

comment:12 Changed 3 years ago by matt (mattab)

(In [3874]) Refs #1736 Adding new setting to force the list of Segments to process during cron execution.

TODO:

  • Test the archive.sh script in a real setup
  • Write the added logic for the windows script

Example in config.ini.php

[Segments]
; Pre-process the visitor types segment
Segments[]="visitorType==new"
Segments[]="country==IN"

comment:13 Changed 3 years ago by matt (mattab)

(In [3875]) Refs #1736 Only showing the widget "Top Keywords for Page" when segmentation is enabled (ie. if anonymous user, check setting)

comment:14 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

comment:16 Changed 3 years ago by matt (mattab)

(In [3879]) Refs #1736

  • Adding integration tests
  • Fixing preFetch blob bug found

comment:17 Changed 3 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

V1 implemented, see #2092 for the next iteration

Note: See TracTickets for help on using tickets.