Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduled PDF Reports very slow when thousands of websites in Piwik #1981

Closed
cbay opened this issue Jan 7, 2011 · 8 comments
Closed

Scheduled PDF Reports very slow when thousands of websites in Piwik #1981

cbay opened this issue Jan 7, 2011 · 8 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@cbay
Copy link
Contributor

cbay commented Jan 7, 2011

My Piwik installation has around 15000 sites and users. Calling the tracker took around 70ms in 1.0 (I test in command line, using curl). After upgrading to 1.1.1, the very same request takes nearly 800ms.

I have both my 1.0 and 1.1.1 installations in parallel (using the same database, same config file), so it's perfecty reproductible.

I've tried disabling the Live plugin, it doesn't change anything.

I've enabled the slow query log; my 1.0 install has zero slow query, the 1.1.1 has this one:

SELECT * FROM piwik_site WHERE idsite IN (1, 63, 64, 65, 66, 67, 68, 69, 70, 71, ...)

(thousands of IDs). This query isn't that slow (0.07), but I suspect Piwik 1.1.1, for some reason, iterates over all sites to do something.

MySQL CPU usage has increased x3, PHP processes are using 100% CPU, so PHP is definitely the limiting factor.

@julienmoumne
Copy link
Member

must be because of https://github.com/piwik/piwik/blob/master/plugins/PDFReports/PDFReports.php#L45

can you try disabling the pdf report plugin ?

@mattab
Copy link
Member

mattab commented Jan 7, 2011

Yes this is probably the cause, however this should only be triggered via the browser if you did not enable cron archiving.
Are you using Cron archiving and did you disable 'trigger archiving via browser' in the admin setings?

Also reading the code runScheduledTasks() it is designe to run once maximum in parallel, but maybe it fails to ensure this condition.

@mattab
Copy link
Member

mattab commented Jan 7, 2011

Julien, if you want to replicate, you can use the script in misc/test_cookies_GenerateHundredsWebsitesAndVisits.php to generate thousands of websites and replicate the issue.

@cbay
Copy link
Contributor Author

cbay commented Jan 7, 2011

Thanks, disabling the PDF report plugin indeed solves the issue.

I am not using cron archiving because it's too slow, unfortunately (we're processing a huge number of requests every day, and even a 8-core dedicated server with SSD can't cope with that). Since many of our Piwik users never login to Piwik, triggering archiving via the browser avoids archiving all the sites whose statistics are never viewed.

I'll probably write a Piwik plugin that stores the date of the last visit for each user, and use that information to run archiving via cron only for users that have login recently.

@julienmoumne
Copy link
Member

casting a vote for implementing suggested solution in 1836#comment:7 :

It is not the best performance wise (each call to runTasks() will result in selecting all websites and comparing all timezones), but it will work fine. To make it fast for a Piwik server with thousands of websites, we could for example record, on each website update, the earliest and latest timezone available in the _option table (but this will be a feature request when we hit problems, not yet the case :)

@mattab
Copy link
Member

mattab commented Jan 7, 2011

Cyril can you please contact me by email, maybe I can use your dataset for performance testing and improving Archiving speed.

@cbay
Copy link
Contributor Author

cbay commented Jan 7, 2011

OK.

@julienmoumne
Copy link
Member

(In [3684]) fixes #1981, adding getUniqueSiteTimezones to SitesManager API

@cbay cbay added this to the Piwik 1.2 milestone Jul 8, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

3 participants