Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#2944 closed Bug (fixed)

Performance regression in tracker for high traffic website

Reported by: ghosts Owned by:
Priority: major Milestone: 1.7.x - Piwik 1.7.1
Component: Core Keywords:
Cc: Sensitive: no

Description

i have also a problem with very high load.

on my mysql server i see lots of querys like the following one, and each one is running over 1 minute ....
SELECT idvisitor,

visit_last_action_time,
visit_first_action_time,
idvisit,
visit_exit_idaction_url,
visit_exit_idaction_name,
visitor_returning,
visitor_days_since_first,
visitor_days_since_order,
location_country,
location_continent,
referer_name,
referer_keyword,
referer_type,
case when idvisitor = 'W?O????' then 1 else 0 end AS priority,
visitor_count_visits,
visit_goal_buyer


, custom_var_k1, custom_var_v1,
custom_var_k2, custom_var_v2,
custom_var_k3, custom_var_v3,
custom_var_k4, custom_var_v4,
custom_var_k5, custom_var_v5
FROM piwik_log_visit WHERE visit_last_action_time >= '2012-02-16 13:39:17'

AND idsite = '3' AND (idvisitor = 'W?O????' OR config_id = '?j?_6?')

ORDER BY priority DESC, visit_last_action_time DESC
LIMIT 1

i have broken it down to the fact that according to an explain there are thousands of estimated rows:

* 1. row *

id: 1

select_type: SIMPLE

table: piwik_log_visit

type: ref

possible_keys: index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor

key: index_idsite_datetime

key_len: 4

ref: const

rows: 145600

Extra: Using where; Using filesort

the complete piwik_log_visit table have 781971 entries in total

i played a little bit with the select statement and ended up with an interresting observation, after i told the select to ignore all indexes, the statement runs in 0,8 seconds "IGNORE INDEX (index_idsite_idvisitor,index_idsite_datetime,index_idsite_config_datetime)" which is a more acceptable time and got my server again up and running, but still is not optimal.

i don't understand why this query much better without indexes as with the given ones ...

I have found a changeset that might have changed the behavior to this: r5531

i think one of the problems is that with the OR in the WHERE part the indexes didn't match as good and the second is that for the ordering on priority the server will have examine each row for the case on idvisitors, which will take its time with that much estimatet rows ... :(

Change History (8)

comment:1 Changed 2 years ago by matt (mattab)

  • Summary changed from High CPU Load because of long sortin in log_visit table to Performance regression in tracker for high traffic website

Thanks for the report. there is a "quick fix" in the forum thread at: http://forum.piwik.org/read.php?2,85727

But I'm not sure how to fix the issue keeping the current logic, while not issuing a slow SQL query.

comment:2 Changed 2 years ago by ghosts

thank you for the hotfix.
As i wrote, an IGNORE off the current indexes had worked for me ( the query time went under 1 second runtime ) and your query logic don't wasn't changed this way.
i'm sorry that i didn't post this diff on ticket create ... :(

--- ./core/Tracker/Visit.php 2012-02-15 03:42:31.000000000 +0100
+++ ./core/Tracker/Visit.php 2012-02-17 16:04:02.000000000 +0100
@@ -986,7 +986,7 @@

visit_goal_buyer
$selectCustomVariables

FROM ".Piwik_Common::prefixTable('log_visit').

  • " WHERE ".$where."

+ " WHERE ".$where." IGNORE INDEX (index_idsite_idvisitor,index_idsite_datetime,index_idsite_config_datetime)

ORDER BY priority DESC, visit_last_action_time DESC
LIMIT 1";

$visitRow = Piwik_Tracker::getDatabase()->fetch($sql, $bindSql);

in my simple mind it should be a qestion of indexing ...
unfortunately i'm not as good with mysql and index logick, so i wasn't able to create an index that works better, otherwise i had postet it willingly :)

comment:3 Changed 2 years ago by matt (mattab)

(In [5886]) Refs #2944
Applying the so called "hot fix" (since I can't find a better solution for now):

  • Removing the ORDER BY priority which was resulting in a filesort which is horribly slow
  • This solution is correct in terms of logic
  • However it is not "optimal", because we don't necessarily "reuse" the optimal past visit, but at least we won't create artificial visit. No need to reopen #2785
  • Now looking back only 1 hour instead of 24 hours. Less rows to look through.

Note: after the fix, it is still not perfect performance wise:

  • The query does AND (idvisitor = ? OR config_id = ?)

which results in matching potentially thousands of rows, then mysql has to look through these and sort by time DESC.
The sorting does not use the index since it can't.

  • Maybe we should issue a UNION query, the first one matching WHERE visitor_id = ? The second one matching WHERE config_id = ? I think that would work better, since both queries could each use an INDEX.

comment:4 Changed 2 years ago by matt (mattab)

(In [5887]) Refs #2944

Fixing build: Since we now look back only 1 hour, stats will be slightly less accurate in terms of recognizing visitors.

In this integration test for example, there is no first party cookie support, so we are not able to match the visit 1 hour later to the previous visit, so the visitor appears as "new".
this is a reasonnable tradeoff for performance...
In this case if first party cookies were supported it would match the visitor and appear as "returning" since the "_idvc" parameter would be found (indicating number of visits)

comment:5 Changed 2 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

(In [5892]) Fixes #2944

  • Now issuing the UNION query so that each subquery uses its own index, for fast and optimal performance, while also providing optimal visitor matching algorithm results.
  • Reverting [5889] and [5887] and the "hot fix" in [5886]

PLEASE EVERYONE LISTENING would you mind testing the new "core/Tracker/Visit.php" file simply replacing your existing file?
You can grab it from: http://dev.piwik.org/svn/trunk/core/Tracker/Visit.php

It would be great if you could confirm what my tests show, that the code is now very fast. I don't have access to an enormous Piwik as you might have, so your test is very appreciated!!
Thanks!

comment:6 Changed 2 years ago by matt (mattab)

(In [5897]) Refs #2944
This sort is slow at least (showing up as slow query on the demo)... Still testing, I'm not 100% sure the code is working well

comment:7 Changed 2 years ago by matt (mattab)

(In [5898]) Refs #2944

Aggressively only looking in the past for how long a visit can be. This is more efficient than looking further back.

Note: the build will fail, i will commit test fixes separately

comment:8 Changed 2 years ago by matt (mattab)

It looks like it is working fine now.

@ghosts can you please test the new file and confirm it is very fast on your setup too?

Note: See TracTickets for help on using tickets.