Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visitor tracking becomes really slow on large installation (v1.2) #2220

Closed
anonymous-matomo-user opened this issue Mar 23, 2011 · 12 comments
Closed
Labels
Bug For errors / faults / flaws / inconsistencies etc. Critical Indicates the severity of an issue is very critical and the issue has a very high priority.
Milestone

Comments

@anonymous-matomo-user
Copy link

We are using piwik for a quite large installation (2.5 - 3 million visitors per month). Since the update to 1.2 we have severe problems with the performance. The problem seems to be a select statement. Here is an example from our slow query log:

# Query_time: 3  Lock_time: 0  Rows_sent: 0  Rows_examined: 99728
SELECT [...]
                                FROM piwik_log_visit WHERE visit_last_action_time >= '2011-03-21 22:01:07'
                                        AND idsite = '1'
                                        AND config_id = '...'
                                ORDER BY visit_last_action_time DESC
                                LIMIT 1;

The problems seems to be that about 100k rows have to be examined to find a single row, though indexes are properly set and used.

To keep performance impact low, we are currently moving data from log_visitor which is older than yesterday to an archive table. At the moment we have about 250.000 entries in the table. Before the archiving, we hav 20 million with tracking started in april 2009.

@anonymous-matomo-user
Copy link
Author

a first info I would be interested in is if it is safe to move data to an archive table or if it affects the accuracy of the reports.

@robocoder
Copy link
Contributor

Reports are affected if raw data (e.g., piwik_log_visit) is moved/purged before archives have been generated for the period (e.g., unique visitors for the past month or week).

Are there a lot of config_id collisions?

Is order important in the index? e.g.,

 INDEX index_idsite_datetime_config (idsite, visit_last_action_time, config_id)

vs

 INDEX index_idsite_config_datetime (idsite, config_id, visit_last_action_time)

@robocoder
Copy link
Contributor

To answer my own question, yes, order of columns is important in the index. http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

We'll add the latter index (above), and see if there other queries that depend on the former index. (If not, we can remove that one.)

@robocoder
Copy link
Contributor

It looks like the archiving queries benefit from the former index, but can probably be simplified by removing the config_id column. So index_idsite_datetime_config would be replaced by these indexes:

INDEX index_idsite_datetime (idsite, visit_last_action_time)

INDEX index_idsite_config_datetime (idsite, config_id, visit_last_action_time)

@mattab
Copy link
Member

mattab commented Mar 24, 2011

vipsoft I think the same as #2220

indeed order is very important in INDEXes, sorry I missed this issue with new schema in 1.2...

pdfforge, would be able to test these INDEXes by any chance? It would be great to test on your high traffic Piwik the following

  • Shut down apache to stop tracking - or set in the config file:
[Tracker] 
record_statistics           = 0
  • Run the following query to set the correct INDEXes
ALTER TABLE piwik_log_visit 
DROP INDEX index_idsite_datetime_config,
ADD INDEX index_idsite_config_datetime (idsite, config_id, visit_last_action_time),
ADD INDEX index_idsite_datetime (idsite, visit_last_action_time)

I think this should fix the issue.

However it is still not ideal since the INDEX index_idsite_config_datetime will be rather large, but this will not be an issue once we can implement log purge #5

pddfforge, please let us know if you can test these changes, thanks!

@anonymous-matomo-user
Copy link
Author

It made a slight improvement. Before the update, mysql had 150% CPU-Utilization and caused a load average of 2.5. Now it is 0.7% and 0.2 util.

that's really significant... Thank you very much!

I will keep watching how it develops over the day, but it looks very good.

@robocoder
Copy link
Contributor

That looks like an idle box. Did you re-enable tracking after the index change, i.e., record_statistics=1 ?

@anonymous-matomo-user
Copy link
Author

yes, I re-enabled tracking and the cpu usage really is very low. The above query now only checks one row instead of 100k. Most of the time is now spent with the php processes.

The data also look good (100k visitors yesterday spread over the whole day in server time).

@mattab
Copy link
Member

mattab commented Mar 25, 2011

Thx for the follow up. Glad it's all working fine again!

@mattab
Copy link
Member

mattab commented Mar 26, 2011

pdfforge do you still experience very good performance since the INDEX change? there is no other problem?

@mattab
Copy link
Member

mattab commented Mar 27, 2011

(In [4204]) Fixes #2220

  • Dropping unused INDEX index_idsite_idvisit (the primary key will do just as good, no need for this INDEX)
  • Dropping bad INDEX & adding proper one, as per discussion in ticket

@anonymous-matomo-user
Copy link
Author

@matt: yes, the performance is still excellent with cpu load around 0.3 oder 0.4. Thank you very much for this quick fix

@anonymous-matomo-user anonymous-matomo-user added this to the Piwik 1.3 milestone Jul 8, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Critical Indicates the severity of an issue is very critical and the issue has a very high priority.
Projects
None yet
Development

No branches or pull requests

3 participants