New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New optimized archive.php script for faster and optimized archiving when hundreds/thousands of websites #2327
Comments
Maybe we could preselect the sites to archive, e.g.,
(Implicitly, num_visits > 0.) |
(In [5087]) Refs #2327
|
(In [5090]) Refs #2327
TODO
|
(In [5095]) Refs #2327
|
(In [5098]) Refs #2327
|
(In [5101]) When running the archive.php script as CLI, and that piwik files were upgraded, fail gracefully and report as a critical error. Refs #2327 |
(In [5102]) Refs #2327
|
(In [5110]) Refs #2327
|
(In [5111]) REfs #2327
|
(In [5185]) Refs #2327
|
(In [5186]) Refs #2327 last fix to noreply@localhost instead of proper domain in email from: in scheduled tasks |
As I said in http://forum.piwik.org/read.php?2,82544 archive.php takes the same timestamp of last daily archive and periods archive. I made a small patch (beware: I'm no experienced programmer, very like this can have something wrong inside) |
Ok, I tested my modification and seems to work. I also added some lines to create a better output for scheduled tasks execution (show what is executed like the old archive.sh), feel free to use it if you need. |
Attachment: |
May I ask why this script uses a http call to do the actual archiving? Is there a way to just use cli php for the actual processing? I don't really want to give my webaccessible php a memory limit of 2gb and max execution time of many minutes just for archiving with the new archive.php. |
When running via the archive.php script (only in this case!), Piwik will try to increase memory more than normal. It is set by the config parameter under section [General] called minimum_memory_limit_when_archiving set to 768M by default. It requires your php to allow to run |
You should change the execution timeout too ;-). Just as an example for the maximum execution timeout:
I really don't know how long that will take for that large site, its the largest of my 6k sites and the full processing (of all sites) with archive.sh takes 280 minutes each day. Couldn't there be a simple call to cli php instead of a web call, maybe even with forking and running a couple of processes in parallel? |
(In [5429]) Refs #2327 ts77, thanks for the tip. please try this patch. Does it work after? |
I seem to get farther now but are hitting my memory limit again (and from the error message it seems to be my original memory limit from the php.ini - 512M - and not from minimum archiving memory limit from piwik - 768M). I even tried a test script to see if ini_set for memory limit is getting into effect for me: it does. Any ideas? I'm still in favor of an (alternative?) cli version for large sites and I'm worried that it will give more support issues with larger sites. |
what's the exact error message? in your config.ini.php add under [General] minimum_memory_limit_when_archiving=1024 we would prefer to keep it http only, since it allows to use the multithread easily which makes the script much faster... |
No dice, its not getting into effect, while the max execution timeout is working.
What can I do for debugging it further? |
I'm wondering if the error message is from a code path that is just NOT setting the memory limit? |
Ok, please try the patch:
What does it output now in archive.php run? Thanks for your tests! |
Thanks. I had a similar code some lines above, before the condition and got no output for the error cases but I'm gonna try again with your patch and let you know. |
If you get no output try to put some debug code outside the IFs in this same function, maybe the code path isnt triggered? (which would be explain your issue) |
saldsl, what problem is your patch trying to fix? please explain |
(In [5467]) Refs #2327 Thanks for your tests, indeed one call was missing! please check with this patch if the script now executes ? |
Replying to matt:
It also add some line to add at the end of the output what scheduled jobs are executed (if any) rather than the actual "executing scheduled jobs.... done!" output. |
Far better now, thanks!
|
ts77 is it working on all sites after both patches? |
I just get two time-outs 09:39:58 [6.21 Mb Time elapsed: 315.415s Fatal error: 2 total errors during this script execution, please investigate and try and fix these errors. First error was: Got invalid response from API request: http://xxx/stats//index.php?module=API&method=VisitsSummary.getVisits&idSite=5&period=month&date=last2&format=php&token_auth=b81973cfe5c887599faf426971867e13&trigger=archivephp. Response was '<br /> |
did you add the patches discussed in this thread? |
Replying to matt:
Yes... the other changes are not strictly necessary. |
I've changed this code to protected function lastRunKey($idsite, $period) and now I get many Notice: Undefined variable: period in xxx\piwik\misc\cron\archive.php on line 407 is it more than a 2 line update? Have I missed something else? |
That part is not relevant to your timeout issues anyway. |
ok taken the whole new file -will see how it runs tonight..thanks |
Its two files btw. ;) |
argghh -thanks, have patched Archive.php and Piwik.php will report back any problems -thanks guys |
Success! No errors. Am I right in thinking using running it every 24 hours with -86400 will do the job? I with these changes can it be executed more frequently without impact? |
Replying to matt:
|
Replying to saldsl:
This patch transforms the output string in an array to display tasks performed on new lines (and removes the "tasks,output" in the first line). |
Left TODO before 1.7 release:
Also, I will clean up the parameters and add named parameter. Currently it is a mess since the parameters are not named and must be ordered. Very confusing. So, it will break backward compatibility for those of you who are already using this script, but it won't be that bad ;) |
(In [5820])
Refs #2327 |
(In [5822]) Refs #2327
|
(In [5823]) Refs #2327
|
(In [5824]) Fixes #2327
Anyone listening here, testing archive.php from SVN trunk would be very appreciated :) |
(In [5860]) Refs #2327 |
I updated the documentation for archive.php script cron piwik setup -- if you have any suggestion please comment here or on the form at the bottom of the page. |
I am not sure that I should comment here when the ticket is closed or start a new one but we hit quite a big problem. We have more than 3000 piwik sites in one installation with many sites with 100-500k views a day. The archive proces for one site is not our biggest concern but error handling is. When archive runs after 00:00 all websites are processed but the problem is when error occur on any of those 3k+ sites. So when archive.php hit problem at siteID 2999 all sites are reprocessed at another archive.php run even there are no new visits and everything was ok. |
Hi John, why do errors occur on your websites? In general we try to prevent these errors as they are usually memory/CPU/misconfig. if you have further requirements let us know... or contact Piwik experts: http://piwik.org/consulting/#contact-consultant |
If you run archive.sh with a lot of empty sites, it takes 200ms per request on average. When archiving 1000 empty sites, for day/week/month/year periods, for N segments, that is already: 800 * N seconds.
The problem is that then it takes a long time to reach these websites that have traffic, when most sites don't have traffic.
I am not sure what the best solution is, but some ideas are:
Archive.sh modifications:
[Segments]
in config file for more info): we could only process the list of segments, if there are some visits when for the request without segment (otherwise we know in advance there is no data for the segments)The text was updated successfully, but these errors were encountered: