Opened 20 months ago

Closed 3 months ago

#3330 closed New feature (fixed)

"Ranking Query" to decrease archiving memory consumption and transfered data between MySQL and PHP

Reported by: EZdesign Owned by: capedfuzz
Priority: major Milestone: 2.x - The Great Piwik 2.x Backlog
Component: Performance Keywords:
Cc: Sensitive: no

Description

Based on the discussions in #766, this ticket is the place for code that limits the number of results returned from MySQL.

The idea is to create a class that wraps an arbitrary SQL query with more SQL that limits the number of results while grouping the rest to "Others" and allows for some more fancy things that can be configured via an instance of the ranking query class.

Change History (26)

comment:1 Changed 20 months ago by EZdesign (BeezyT)

(In [6803]) refs #3330 ranking query

  • Piwik_RankingQuery encapsulates the logic to apply the limit + grouping of others to an arbitrary select query
  • queryActionsByDimension() and queryVisitsByDimension() in Piwik_ArchiveProcessing_Day get new parameters to use the ranking query

comment:2 Changed 20 months ago by EZdesign (BeezyT)

(In [6804]) refs #3330 properties for RankingQuery.php

comment:3 Changed 20 months ago by EZdesign (BeezyT)

(In [6805]) refs #3330 PHPUnit tests for RankingQuery

comment:4 Changed 20 months ago by EZdesign (BeezyT)

(In [6812]) refs #3330 Transitions

  • Archiving code that doesn't hook into the actual archiving
  • API code that fakes an archive processing instance and calls archiving
  • PHPUnit test

comment:5 Changed 20 months ago by EZdesign (BeezyT)

The previous commit belongs to #3332. Too many commits for one day... ;-)

comment:6 Changed 20 months ago by matt (mattab)

  • Milestone changed from 1.8.x - Piwik 1.8.x to 1.8.4 - Piwik 1.8.4

comment:7 Changed 20 months ago by capedfuzz (diosmosis)

(In [6966]) Refs #3330, #766 add integration test for blob limiting behavior present in certain plugins. Also, fix three bugs:

  • CustomVariables plugin does not sort by visit before applying truncation to reports.
  • ReplaceSummaryLabel should be recursive so subtables w/ limited rows show 'Others' as label.
  • Pass user agent & language explicitly when doing bulk tracking in PiwikTracker.

comment:8 Changed 20 months ago by matt (mattab)

Timo, amazing work on this feature!! :-)

This will make a huge difference in piwik performance and memory usage. Congrats...

Benaka, thanks for adding the integration tests & the recursive fix. Looking forward to seeing the RankingQuery code used in all plugins and archiving calls! :-)

comment:9 Changed 20 months ago by peterb (peterbo)

Great commits Timo! - I'll test it within a lower memory environment in the next weeks. Judging from the source code, this should be a huge step for the memory consumption problems!

I'll post the results of the testing in this ticket.

comment:10 Changed 20 months ago by matt (mattab)

@Peter, wait for the next RC which Benaka is working on to add RankingQuery to all archiving. However it will be limited to 50,000 so unless your DB has more than 50k unique URLs you won't see much improvement...

comment:11 Changed 20 months ago by capedfuzz (diosmosis)

(In [6976]) Refs #3330, #766 improved BlobReportLimitingTest.

comment:12 Changed 20 months ago by capedfuzz (diosmosis)

(In [6977]) Refs #3330, fix small bug in BlobReportLimitingTest.

comment:13 Changed 19 months ago by capedfuzz (diosmosis)

(In [6980]) Refs #3330, #766, #3227 use RankingQuery and truncate tables as they are created in Actions plugin. Also modified phpunit integration testing mechanism so all API calls are tested and outputted before a test case throws.

comment:14 Changed 19 months ago by capedfuzz (diosmosis)

(In [6981]) Refs #3330, add missing test results.

comment:15 Changed 19 months ago by capedfuzz (diosmosis)

(In [6986]) Refs #3330, make sure ranking query limit is >= datatable_archiving_maximum_rows*_actions options & refactor a little.

comment:16 Changed 19 months ago by matt (mattab)

(In [7065]) Refs #3330 Refactoring of the Actions.php and moving code to: Archiving and ArchivingHelper

comment:17 Changed 19 months ago by capedfuzz (diosmosis)

(In [7080]) Refs #3330, #766 refactor truncate-as-you-go approach used in Actions. Added maximumAllowedRows member and walkPath method to DataTable and some other utility methods to DataTable_Row types.

comment:18 Changed 19 months ago by capedfuzz (diosmosis)

(In [7083]) Refs #3330, #766 remove unnecessary methods and rename one function for clarity.

comment:19 follow-up: Changed 18 months ago by matt (mattab)

  • Owner set to capedfuzz

@capedfuzz, can you please list, what you think is the remaining work in this ticket, in order to apply the code to all other reports where it makes sense? Or do we only apply RQ to Actions report, for now, which seems acceptable too.

Should we however create a ticket to port "Update in place" to other plugins archiving as this would save memory?

comment:20 in reply to: ↑ 19 Changed 18 months ago by capedfuzz (diosmosis)

Replying to matt:

@capedfuzz, can you please list, what you think is the remaining work in this ticket, in order to apply the code to all other reports where it makes sense? Or do we only apply RQ to Actions report, for now, which seems acceptable too.

It can be applied fairly easily to custom variables, maybe not referers, however I'm not sure it's really necessary, since archiving is pretty fast w/o it. Also, since truncating in place can result in large speedups, that should be done first. I think if RQ needs to be added, it can wait.

Should we however create a ticket to port "Update in place" to other plugins archiving as this would save memory?

Another ticket sounds like a good idea. It would take a lot of refactoring to do it in some places. Also, for CustomVariables + Referers, the way they perform queries will have to change (which means testing for performance degradation).

comment:21 Changed 18 months ago by EZdesign (BeezyT)

It would work for referrers. If you want, take a look at the queryExternalReferrers-method in Transitions.

comment:22 Changed 18 months ago by matt (mattab)

  • Milestone changed from 1.9.2 - Piwik 1.9.2 to 1.x - Piwik 1.x

Decreasing priority for now.

@capedfuzz would you mind creating a ticket for applying Update-in-place to all reports in 1.9.x roadmap (normal priority)?

comment:23 Changed 18 months ago by mauser (zawadzinski)

Setting
'archiving_ranking_query_row_limit' to 0 (no limit) causes archiving process not to process Actions.

The following errors shows up during archiving:

Notice:</strong> <i>Undefined index: type</i> in <b>public_html/piwik/plugins/Actions/ArchivingHelper.php</b> on line <b>49</b>
<br /><br />Backtrace --&gt;<div style="font-family:Courier;font-size:10pt"><br />
#0  Piwik_ErrorHandler(...) called at [public_html/piwik/plugins/Actions/ArchivingHelper.php:49]<br />
#1  Piwik_Actions_ArchivingHelper::updateActionsTableWithRowQuery(...) called at [public_html/piwik/plugins/Actions/Archiving.php:478]<br />
#2  Piwik_Actions_Archiving-&gt;archiveDayQueryProcess(...) called at [public_html/piwik/plugins/Actions/Archiving.php:282]<br />
#3  Piwik_Actions_Archiving-&gt;archiveDayActionsTime(...) called at [public_html/piwik/plugins/Actions/Archiving.php:100]<br />
#4  Piwik_Actions_Archiving-&gt;archiveDay(...) called at [public_html/piwik/plugins/Actions/Actions.php:592
...

var_dump'ing $row at plugins/Actions/ArchivingHelper.php:36, shows:

array(6) {
  ["idaction"]=>
  string(7) "1792657"
  [17]=>
  string(1) "1"
  [19]=>
  string(1) "1"
  [20]=>
  string(1) "3"
  [21]=>
  string(3) "279"
  [22]=>
  string(1) "0"
}

so we have a problem with translation of keys in the row.

Last edited 18 months ago by mauser (previous) (diff)

comment:24 Changed 18 months ago by matt (mattab)

Thanks for the report, I created a ticket at: #3482

comment:25 Changed 18 months ago by matt (mattab)

  • Component changed from Core to Performance

comment:26 Changed 3 months ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

This feature is done, we may improve it later or reuse RankingQuery in other archivers.

Note: See TracTickets for help on using tickets.