Opened 3 years ago

Closed 2 years ago

Last modified 12 months ago

#2327 closed New feature (fixed)

New optimized archive.php script for faster and optimized archiving when hundreds/thousands of websites

Reported by: matt Owned by:
Priority: critical Milestone: 1.7 Piwik 1.7
Component: Core Keywords:
Cc: Sensitive: no

Description (last modified by matt)

If you run archive.sh with a lot of empty sites, it takes 200ms per request on average. When archiving 1000 empty sites, for day/week/month/year periods, for N segments, that is already: 800 * N seconds.

The problem is that then it takes a long time to reach these websites that have traffic, when most sites don't have traffic.

I am not sure what the best solution is, but some ideas are:

  • profile the code and make an empty site archiving request faster (most of the time is spent in PHP, not SQL, so there is probably optimization there)

Archive.sh modifications:

  • could remember last time the archive.sh ran till the end, then run it the next time replacing "last52" with "last2" for example
  • Could run it multithreading, triggering archiving for multiple sites on each core #2563
  • could run, first, the websites that have traffic (requires modification in the SitesManager API or a new API to return sites "in order of importance")
  • we could run archiving only for websites that received some data since the last archiving run
  • when there are segments to pre-process (see [Segments] in config file for more info): we could only process the list of segments, if there are some visits when for the request without segment (otherwise we know in advance there is no data for the segments)
  • could archive first, sites that have been queried via the API recently (add a new "set flag" in the API Proxy to say "this site data was requested")
  • ...?

Attachments (1)

archive_b.diff (3.9 KB) - added by saldsl 2 years ago.

Download all attachments as: .zip

Change History (60)

comment:1 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

comment:2 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

comment:3 Changed 3 years ago by matt (mattab)

  • Priority changed from normal to major

comment:4 Changed 3 years ago by matt (mattab)

  • Priority changed from major to critical

comment:5 Changed 3 years ago by vipsoft (robocoder)

Maybe we could preselect the sites to archive, e.g.,

SELECT idsite, num_visits FROM
    (
        SELECT idsite, COUNT(idvisit) AS num_visits FROM piwik_log_visit
            GROUP BY idsite
    ) AS t
    ORDER BY num_visits DESC;

(Implicitly, num_visits > 0.)

comment:6 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

comment:7 Changed 3 years ago by matt (mattab)

(In [5087]) Refs #2327

  • Adding new archive.php optimized rewrite of archive.sh - see description below
  • Adding new API to return only active website ID with visits since $timestamp (which is used to get all sites with visits since last archive execution)
     * Description
     * This script is a much optimized rewrite of archive.sh in PHP 
     * allowing for more flexibility and better performance when Piwik tracks thousands of websites.
     * 
     * What this script does:
     * - Fetches Super User token_auth from config file
     * - Calls API to get the list of all websites Ids with new visits since the last archive.php succesful run
     * - Calls API to get the list of segments to pre-process
     * The script then loops over these websites & segments and calls the API to pre-process these reports.
     * It does try to achieve Near real time for "daily" reports, processing them as often as possible.
     * 
     * Notes about the algorithm:
     * - To improve performance, API is called with date=last1 whenever possible, instead of last52 
     * - The script will only process (or re-process) reports for Current week / Current month  
     * 	 or Current year at most once per hour. 
     *   You can change this timeout as a parameter of the archive.php script.
     *   The concept is to archive daily report as often as possible, to stay near real time on "daily" reports,
     *   while allowing less real time weekly/monthly/yearly reporting. 
     */
    
    /**
     * TODO/Ideas
     * - Process first all period=day, then all other periods (less important)
     * - Ensure script can only run once at a time
     * - Add "report last processed X s ago" in UI grey box "About"
     * - piwik.org update FAQ / online doc
     * - check that when ran as crontab, it will email errors when there is any
    

comment:8 Changed 3 years ago by matt (mattab)

(In [5090]) Refs #2327

  • Tested & fixed behavior for bad use cases: wrong piwik server, apache shutdown, mysql shutdown
  • Now exceptions thrown during init() (DB connection errors etc) are thrown properly from piwik when piwik files used internally and where PIWIK_ENABLE_DISPATCH is off
  • Now authorizing archiving requests from this archive.php script, when "browser triggered archiving" is disabled, these http requests still will work (when super user authenticated + using &trigger=archivephp in http request as flag)
  • added help option which displays the doc
  • Ensures that script can only be executed from CLI

TODO

  • Check error handling in cron
  • Fix bug when initial run of the day and ignores inactive websites
  • Add sample output link

comment:9 Changed 3 years ago by matt (mattab)

(In [5095]) Refs #2327

Archive.php improvements

  • Added strong errorhandling, handling sql/php/network errors from the script itself or returned from the http requests If there is a critical error during script exec, such as wrong token_auth or mysql shutdown, then the fatal error is throw, PHP error as well, and the script exits directly. If there was any non critical errors during execution, the script simply logs errors on screen. Then at the end, it logs them all again on screen for summary then exits (and triggers a PHP error to ensure we trigger cron error handling & email message)
  • Added summary error logs at end of script output + other improvements in the output metrics and messaging
  • Added flags (a different one for days and periods, one per website) to record a website archiving as succesful and not re-trigger the http request when not necessary. Flags are maintained via the piwik_option lookup table.
  • archive.php is now consistently using direct calls to some internal APIs (those that are not processing data) rather than calling over http

comment:10 Changed 3 years ago by matt (mattab)

(In [5098]) Refs #2327

  • Added parameter <reset|forceall>: you can either specify
    • reset: the script will run as if it was never executed before, therefore will trigger archiving on all websites with some traffic in the last 7 days.
    • forceall: the script will trigger archiving on all websites for all periods, sequentially

comment:11 Changed 3 years ago by matt (mattab)

(In [5101]) When running the archive.php script as CLI, and that piwik files were upgraded, fail gracefully and report as a critical error. Refs #2327

comment:12 Changed 3 years ago by matt (mattab)

(In [5102]) Refs #2327

  • adding option forceall+reset which does imitate closely the current archive.sh behavior (with still the added bonus) Fixes #1938 added segment in lock name. I have tested the code path but haven't actually verifier that this improved performance

comment:13 Changed 3 years ago by matt (mattab)

(In [5110]) Refs #2327

  • Adding parameter to reset:

<reset[window_back_seconds]|forceall>: you can either specify

  • reset: the script will run as if it was never executed before, therefore will trigger archiving on all websites with some traffic in the last 7 days.

You can specify a number of seconds to use instead of 7 days window, for example call archive.php 1 reset 86400 to archive all reports for all sites that had visits in the last 24 hours

  • Also hopefully fixing client timeout error at 15s by default file_get_contents. Now using Piwik_Http with timeout 300 seconds, which should leave enough for websites to process.

comment:14 Changed 3 years ago by matt (mattab)

(In [5111]) REfs #2327

  • Now catching exception thrown by Piwik_Http and simply reporting as network error
  • Added one line output summary for easy grep

Example output:
[2011-08-14 10:11:12] [bc1a080f] [6.38 Mb] done: 2/2 100%, 16 v, 2 wtoday, 2 wperiods, 24 req, 24019 ms, no error


Example with error:
[2011-08-14 10:25:16] [c80dec0b] [6.39 Mb] done: 18/21 86%, 248 v, 18 wtoday, 18 wperiods, 216 req, 122250 ms, 9 errors. eg. 'Got invalid response f

e=1&period=month&date=last52&format=php&token_auth=0b809661490&trigger=archivephp. Response was ' '

comment:15 Changed 3 years ago by matt (mattab)

  • Summary changed from Archiving is slow when hundreds/thousands of empty sites to New optimized archive.php script, Archiving is slow when hundreds/thousands of empty sites,

comment:16 Changed 3 years ago by matt (mattab)

  • Milestone changed from 1.7 Piwik 1.7 to 1.6 Piwik 1.6

comment:17 Changed 3 years ago by matt (mattab)

(In [5185]) Refs #2327

  • BUG: one hour bug: Archiving was last executed without error 59 min 53s ago
  • BUG: noreply@localhost instead of proper domain in email from: in scheduled tasks

comment:18 Changed 3 years ago by matt (mattab)

(In [5186]) Refs #2327 last fix to noreply@localhost instead of proper domain in email from: in scheduled tasks

comment:19 Changed 2 years ago by saldsl

As I said in http://forum.piwik.org/read.php?2,82544 archive.php takes the same timestamp of last daily archive and periods archive. I made a small patch (beware: I'm no experienced programmer, very like this can have something wrong inside)

comment:20 Changed 2 years ago by saldsl

Ok, I tested my modification and seems to work. I also added some lines to create a better output for scheduled tasks execution (show what is executed like the old archive.sh), feel free to use it if you need.

Changed 2 years ago by saldsl

comment:21 Changed 2 years ago by ts77

May I ask why this script uses a http call to do the actual archiving?
This IMO shares the same issues as with auto archiving by user access as it hits the same memory limits set for the fastcgi or webserver and time limits as well while the cli php usually got far higher limits and at least can be configured separately.

Is there a way to just use cli php for the actual processing?

I don't really want to give my webaccessible php a memory limit of 2gb and max execution time of many minutes just for archiving with the new archive.php.

comment:22 Changed 2 years ago by matt (mattab)

When running via the archive.php script (only in this case!), Piwik will try to increase memory more than normal. It is set by the config parameter under section [General] called minimum_memory_limit_when_archiving set to 768M by default.

It requires your php to allow to run ini_set('memory_limit', $minimumMemoryLimit.'M')

comment:23 Changed 2 years ago by ts77

You should change the execution timeout too ;-).
In any case I'd say that the webserver or curl (does it have a timeout too?) timeout will kick in first.

Just as an example for the maximum execution timeout:

[2011-11-08 12:36:54] [05a7e3d7] [12.57 Mb] ERROR: Got invalid response from API request: http://et-test.xxxxx.de/index.php?module=API&method=VisitsSummary.getVisits&idSite=1&period=day&date=last52&format=php&token_auth=xxxxxxxx&trigger=archivephp. Response was '<br /> <b>Fatal error</b>:  Maximum execution time of 30 seconds exceeded in <b>/home/xxxxx/tracking-host-test/www/core/DataTable/Row.php</b> on line <b>247</b><br />'
[2011-11-08 12:36:54] [05a7e3d7] [12.56 Mb] WARNING: Empty or invalid response for website id 1, Time elapsed: 37.310s, skipping

I really don't know how long that will take for that large site, its the largest of my 6k sites and the full processing (of all sites) with archive.sh takes 280 minutes each day.
Maybe archive.php would be overall faster but it will hit more limits this way.

Couldn't there be a simple call to cli php instead of a web call, maybe even with forking and running a couple of processes in parallel?

comment:24 Changed 2 years ago by matt (mattab)

(In [5429]) Refs #2327 ts77, thanks for the tip. please try this patch. Does it work after?

comment:25 Changed 2 years ago by ts77

I seem to get farther now but are hitting my memory limit again (and from the error message it seems to be my original memory limit from the php.ini - 512M - and not from minimum archiving memory limit from piwik - 768M). I even tried a test script to see if ini_set for memory limit is getting into effect for me: it does.

Any ideas? I'm still in favor of an (alternative?) cli version for large sites and I'm worried that it will give more support issues with larger sites.
Couldn't there be a commandline switch to do a command line call to php instead of a curl request but otherwise keeping the logic the same?

comment:26 Changed 2 years ago by matt (mattab)

what's the exact error message?

in your config.ini.php add under [General] minimum_memory_limit_when_archiving=1024

we would prefer to keep it http only, since it allows to use the multithread easily which makes the script much faster...

comment:27 Changed 2 years ago by ts77

No dice, its not getting into effect, while the max execution timeout is working.

[2011-11-14 07:32:05] [07774e39] [12.75 Mb] Archived website id = 1, period = week, 4861859 visits, Time elapsed: 46.602s
[2011-11-14 07:32:29] [07774e39] [12.74 Mb] ERROR: Got invalid response from API request: http://et-test.xxx.de/index.php?module=API&method=VisitsSummary.getVisits&idSite=1&period=month&date=last52&format=php&token_auth=xxx&trigger=archivephp. Response was '<br /> <b>Fatal error</b>:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 8208 bytes) in <b>/home/xxx/tracking-host-test/www/core/DataTable.php</b> on line <b>1022</b><br />


What can I do for debugging it further?
We can continue this by email if you like, you know the address ;-).

comment:28 Changed 2 years ago by ts77

I'm wondering if the error message is from a code path that is just NOT setting the memory limit?

comment:29 Changed 2 years ago by matt (mattab)

Ok, please try the patch:

Index: core/Piwik.php
===================================================================
--- core/Piwik.php	(revision 5455)
+++ core/Piwik.php	(working copy)
@@ -984,7 +984,10 @@
 			$minimumMemoryLimitWhenArchiving = Zend_Registry::get('config')->General->minimum_memory_limit_when_archiving;
 			if($memoryLimit < $minimumMemoryLimitWhenArchiving)
 			{
-				return self::setMemoryLimit($minimumMemoryLimitWhenArchiving);
+				$return = self::setMemoryLimit($minimumMemoryLimitWhenArchiving);
+				echo "Memory limit status:" . $return;
+				echo " - Current memory value: ". Piwik::getMemoryLimitValue() . "M";
+				return $return;				
 			}
 			return false;
 		}

What does it output now in archive.php run? Thanks for your tests!

comment:30 Changed 2 years ago by ts77

Thanks. I had a similar code some lines above, before the condition and got *no* output for the error cases but I'm gonna try again with your patch and let you know.

comment:31 Changed 2 years ago by matt (mattab)

If you get no output try to put some debug code outside the IFs in this same function, maybe the code path isnt triggered? (which would be explain your issue)

comment:32 follow-up: Changed 2 years ago by matt (mattab)

saldsl, what problem is your patch trying to fix? please explain

comment:33 Changed 2 years ago by matt (mattab)

(In [5467]) Refs #2327 Thanks for your tests, indeed one call was missing! please check with this patch if the script now executes ?

comment:34 in reply to: ↑ 32 Changed 2 years ago by saldsl

Replying to matt:

saldsl, what problem is your patch trying to fix? please explain

Archive.php is set to run the daily archiving "at most every 1800 seconds" (30 minutes) and weekly/monthly/yearly archiving "at most everu
y 6200 seconds" (103 minutes).
The problem is that both archiving operations check the last execution against the same timestamp. If you run cron the daily archiving (if executed) updates the timestamp every hour, so the weekly/monthly/yearly archiving is not run the second hour because the last execution is less than 103 minutes. My patch creates two timestamps, one for the last execution of the daily archiving and one for the weekly/monthly/yearly archiving.

It also add some line to add at the end of the output what scheduled jobs are executed (if any) rather than the actual "executing scheduled jobs.... done!" output.

comment:35 Changed 2 years ago by ts77

Far better now, thanks!
The unserialization warning is probably from the debugging output added earlier which I removed now after the first site.

[2011-11-23 09:08:30] [aafd7be9] [12.71 Mb] Starting Piwik reports archiving...

Notice: unserialize(): Error at offset 0 of 174 bytes in /home/xxx/tracking-host-test/www/misc/cron/archive.php on line 486

Warning: end(): Passed variable is not an array or object in /home/xxx/tracking-host-test/www/misc/cron/archive.php on line 487

Warning: array_sum(): The argument should be an array in /home/xxx/tracking-host-test/www/misc/cron/archive.php on line 500
[2011-11-23 09:09:30] [aafd7be9] [12.72 Mb] Archived website id = 1, period = day, Time elapsed: 60.022s

Notice: unserialize(): Error at offset 0 of 176 bytes in /home/xxx/tracking-host-test/www/misc/cron/archive.php on line 620
[2011-11-23 09:10:30] [aafd7be9] [12.73 Mb] Archived website id = 1, period = week, 0 visits, Time elapsed: 60.035s

Notice: unserialize(): Error at offset 0 of 176 bytes in /home/xxx/tracking-host-test/www/misc/cron/archive.php on line 620
[2011-11-23 09:11:30] [aafd7be9] [12.73 Mb] Archived website id = 1, period = month, 0 visits, Time elapsed: 60.027s

Notice: unserialize(): Error at offset 0 of 176 bytes in /home/xxx/tracking-host-test/www/misc/cron/archive.php on line 620
[2011-11-23 09:12:30] [aafd7be9] [12.73 Mb] Archived website id = 1, period = year, 0 visits, Time elapsed: 60.030s
[2011-11-23 09:12:30] [aafd7be9] [12.72 Mb] Archived website id = 1, today =  visits, 4 API requests, Time elapsed: 240.117s [1/568 done]

comment:36 Changed 2 years ago by matt (mattab)

ts77 is it working on all sites after both patches?

comment:37 follow-ups: Changed 2 years ago by matt (mattab)

saldsl thanks for your patch & explanations!
Am I right that the bug fix can be summarized to this one line change? Changeset [5470]

comment:38 Changed 2 years ago by ts77

Yeah, it has run through all sites now without error (just had to wait the hour it takes to run through them all ;)).

comment:39 Changed 2 years ago by budg1e

I just get two time-outs

[2011-11-23 09:39:58] [10d90004] [6.21 Mb] Time elapsed: 315.415s
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] ---------------------------
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] SCHEDULED TASKS
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] Starting Scheduled tasks...
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] done
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] ---------------------------
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] SUMMARY OF ERRORS
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] Error: Got invalid response from API request: http://xxx/stats//index.php?module=API&method=VisitsSummary.getVisits&idSite=5&period=month&date=last2&format=php&token_auth=b81973cfe5c887599faf426971867e13&trigger=archivephp. Response was '<br /> <b>Fatal error</b>: Maximum execution time of 60 seconds exceeded in <b>xxx\piwik\core\DataTable.php</b> on line <b>1022</b><br />
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] Error: Got invalid response from API request: http://xxx/stats//index.php?module=API&method=VisitsSummary.getVisits&idSite=5&period=year&date=last2&format=php&token_auth=b81973cfe5c887599faf426971867e13&trigger=archivephp. Response was
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] 2 total errors during this script execution, please investigate and try and fix these errors
[2011-11-23 09:39:58] [10d90004] [6.21 Mb] ERROR: 2 total errors during this script execution, please investigate and try and fix these errors. First error was: Got invalid response from API request: http://xxx/stats//index.php?module=API&method=VisitsSummary.getVisits&idSite=5&period=month&date=last2&format=php&token_auth=b81973cfe5c887599faf426971867e13&trigger=archivephp. Response was '<br /> <b>Fatal error</b>: Maximum execution time of 60 seconds exceeded in <b>xxx\piwik\core\DataTable.php</b> on line <b>1022</b><br />

Fatal error: 2 total errors during this script execution, please investigate and try and fix these errors. First error was: Got invalid response from API request: http://xxx/stats//index.php?module=API&method=VisitsSummary.getVisits&idSite=5&period=month&date=last2&format=php&token_auth=b81973cfe5c887599faf426971867e13&trigger=archivephp. Response was '<br />
<b>Fatal error</b>: Maximum execution time of 60 seconds exceeded in <b>xxx\piwik\core\DataTable.php</b> on line <b>1022</b><br />

in xxx\piwik\misc\cron\archive.php on line 179

comment:40 Changed 2 years ago by ts77

did you add the patches discussed in this thread?

comment:41 in reply to: ↑ 37 Changed 2 years ago by saldsl

Replying to matt:

saldsl thanks for your patch & explanations!
Am I right that the bug fix can be summarized to this one line change? Changeset [5470]

Yes... the other changes are not strictly necessary.

comment:42 Changed 2 years ago by budg1e

I've changed this code to

protected function lastRunKey($idsite, $period)
{ return "lastRunArchive". $period ."_". $idsite; }

and now I get many

Notice: Undefined variable: period in xxx\piwik\misc\cron\archive.php on line 407

is it more than a 2 line update? Have I missed something else?

comment:43 Changed 2 years ago by ts77

That part is not relevant to your timeout issues anyway.
You need
http://dev.piwik.org/trac/changeset/5429
http://dev.piwik.org/trac/changeset/5467
to hopefully fix the timeout issues for you.

comment:44 Changed 2 years ago by budg1e

ok taken the whole new file -will see how it runs tonight..thanks

comment:45 Changed 2 years ago by ts77

Its two files btw. ;)

comment:46 Changed 2 years ago by budg1e

argghh -thanks, have patched Archive.php and Piwik.php

will report back any problems -thanks guys

comment:47 Changed 2 years ago by budg1e

Success! No errors. Am I right in thinking using running it every 24 hours with -86400 will do the job? I with these changes can it be executed more frequently without impact?

comment:48 in reply to: ↑ 37 ; follow-up: Changed 2 years ago by saldsl

Replying to matt:

saldsl thanks for your patch & explanations!
Am I right that the bug fix can be summarized to this one line change? Changeset [5470]

matt, is it possible to add some more output to the scheduled tasks part? With archive.sh the output showed what tasks were executed, but archive.php doesn't show anything. In my patch I also tried to extract the tasks performed and put the list in the output, maybe there's a cleaner way to achieve that...

comment:49 in reply to: ↑ 48 Changed 2 years ago by saldsl

Replying to saldsl:

Replying to matt:

saldsl thanks for your patch & explanations!
Am I right that the bug fix can be summarized to this one line change? Changeset [5470]

matt, is it possible to add some more output to the scheduled tasks part? With archive.sh the output showed what tasks were executed, but archive.php doesn't show anything. In my patch I also tried to extract the tasks performed and put the list in the output, maybe there's a cleaner way to achieve that...

Opss... I didn't see that in rev 5474 you have already added this. Great!
To better formatting the output of tasks result may I propose this patch:

--- archive_orig.php	2011-11-26 16:17:59.229298023 +0100
+++ archive.php	2011-11-26 16:18:59.291871401 +0100
@@ -571,8 +571,18 @@
 		if($tasksOutput == "No data available")
 		{
 			$tasksOutput = " No task to run";
+			$this->log($tasksOutput);
 		}
-		$this->log($tasksOutput);
+		else
+		{
+			$tasksOutput = trim(str_replace("task,output","",$tasksOutput));
+			$tasksOutput = mb_split("\n",$tasksOutput);
+			foreach ($tasksOutput as $taskResult) 
+			{
+				      $this->log($taskResult);
+			}
+		}
+ 
 		$this->log("done");
 		
 	}

This patch transforms the output string in an array to display tasks performed on new lines (and removes the "tasks,output" in the first line).

comment:50 Changed 2 years ago by matt (mattab)

Left TODO before 1.7 release:

  • archive.php should work without any argument by default (for ease of use)
    • detect the piwik URL automatically since we know it in piwik
  • Once an hour max, and on request: run archiving for previousN for websites which days have just finished in the last 2 hours in their timezones
    • then uncomment "TODO when implemented full archiving"
  • Update documentation and this faq
    • The goal would be that all new piwik users use this script from 1.7 onwards
  • The script should send an email to super user every time it is finished IF there is an error. Otherwise, only send if --email-superuser-summary
  • Allow to trigger from non CLI when SU token_auth is specified

Also, I will clean up the parameters and add named parameter. Currently it is a mess since the parameters are not named and must be ordered. Very confusing. So, it will break backward compatibility for those of you who are already using this script, but it won't be that bad ;)

comment:51 Changed 2 years ago by matt (mattab)

(In [5820])
Work in progress

  • refactored code & rewrote the command line parameter handling code
  • renamed parameters & updated doc
  • auto detect piwik URL (and use HTTPS URL if force_ssl is set)
  • Do not display the memory usage in the log output, easier on the eyes

Refs #2327

comment:52 Changed 2 years ago by matt (mattab)

(In [5822]) Refs #2327

  • we now check all websites that were last processed on a different day, and will process those. This ensures that even websites with no visits recently, will still have the week/month/year archives kept up to date. Use case: visit on Jan 5th/6th. Then no visit. Processing on Feb 1st: before it would not trigger January monthly archive, because there was no visit since last script run. Now it will trigger monthly archiving.
  • Added new API in SitesManager to fetch all websites which are set to specified timezones +tests

comment:53 Changed 2 years ago by matt (mattab)

(In [5823]) Refs #2327

  • archive.php can now be excecuter through the browser (ie. "WEB CRON") if the Super user token_auth is passed as a parameter

This is to enable to run this script on some hosts / shared hosts where cron is not allowed but web cron is allowed.

comment:54 Changed 2 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

(In [5824]) Fixes #2327

  • AFAIK this is fixed. BOOM! Will test on demo.

Anyone listening here, testing archive.php from SVN trunk would be very appreciated :)

comment:55 Changed 2 years ago by matt (mattab)

  • Summary changed from New optimized archive.php script, Archiving is slow when hundreds/thousands of empty sites, to New optimized archive.php script for faster and optimized archiving when hundreds/thousands of websites

comment:56 Changed 2 years ago by matt (mattab)

(In [5860]) Refs #2327

Fixing bug ensuring all periods are processed for low traffic websites

comment:57 Changed 2 years ago by matt (mattab)

I updated the documentation for archive.php script cron piwik setup -- if you have any suggestion please comment here or on the form at the bottom of the page.

comment:58 Changed 12 months ago by fjohn

I am not sure that I should comment here when the ticket is closed or start a new one but we hit quite a big problem.

We have more than 3000 piwik sites in one installation with many sites with 100-500k views a day.

The archive proces for one site is not our biggest concern but error handling is.

When archive runs after 00:00 all websites are processed but the problem is when error occur on any of those 3k+ sites.

So when archive.php hit problem at siteID 2999 all sites are reprocessed at another archive.php run even there are no new visits and everything was ok.

comment:59 Changed 12 months ago by matt (mattab)

Hi John, why do errors occur on your websites? In general we try to prevent these errors as they are usually memory/CPU/misconfig. if you have further requirements let us know... or contact Piwik experts: http://piwik.org/consulting/#contact-consultant

Note: See TracTickets for help on using tickets.