Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#1184 closed New feature (fixed)

Plugin API for Scheduled Tasks

Reported by: vipsoft Owned by: JulienM
Priority: normal Milestone: Piwik 0.8 - A Web Analytics platform
Component: Plugins Wishlist Keywords:
Cc: Sensitive: no

Description (last modified by matt)

Use one crontab entry to trigger Piwik archiving, daily report generation, bots, etc.

This plugin:

  • exposes a new hook for other plugins to register and run some scheduled processing when called
  • either provides a helper function for other plugins to know if they should run, passing a crontab-like schedule, e.g., isItTimeToRun('* * * * *'); or adds a method to Piwik_Plugin that returns a crontab-like schedule, e.g, getSchedule() that can be evaluated

Updates the UI Settings 'general settings'

  • Reports on automatic maintenance is working or not
    • cron not detected in the last 24 hours but tracker maintenance triggered
    • cron is detected every 10s-1h?
    • Maintenance is not executed. Check that Piwik is tracking visitors.

This plugin is not #817.

Attachments (1)

piwik-dev1 (#1184).patch (61.4 KB) - added by JulienM 4 years ago.

Download all attachments as: .zip

Change History (23)

comment:1 Changed 4 years ago by matt (mattab)

See also #587 which could allow triggering these cron tab like tasks from piwik.php requests in case users don't setup automatic crontabs.

If automatic crontab is setup (which can be automatically detected by Piwik), then cron tabs tasks are not triggered by piwik.php (see #587)

comment:2 Changed 4 years ago by matt (mattab)

I believe we should update the documentation and have the crontab fire more regularly, ie. every 15 minutes, in case some plugins need to run tasks more frequently. The standard archiving task would only trigger after config.ini.php > time_before_today_archive_considered_outdated seconds.

comment:3 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

#5 and #53 are feature candidates for this hook

comment:4 Changed 4 years ago by matt (mattab)

We need to think about the current archive.sh script and how it would be changed to accomodate this new hook (either call this plugin specifically, or change the way archive.sh work to make it call this plugin that would trigger archiving?). Note that it might be better to leave archive.sh with the current "looping over websites and periods" to archive them separately because otherwise, triggering all archives at once will result in memory issues for Piwik installs with hundreds/thousands websites.

comment:5 Changed 4 years ago by matt (mattab)

Also, do we need system to enforce that such task can not be ran twice at the same time (a software (or DB?) level lock mechanism).

comment:6 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

comment:7 Changed 4 years ago by matt (mattab)

Sending email reports is also candidate for this hook, see for example PDF plugin #71

comment:8 Changed 4 years ago by matt (mattab)

  • Summary changed from Launcher - standard hook for launching multiple cron-based scripts to Add hook to launch multiple cron-based scripts (plugin define cheduled tasks)

comment:9 Changed 4 years ago by vipsoft (robocoder)

  • Summary changed from Add hook to launch multiple cron-based scripts (plugin define cheduled tasks) to Add hook to launch multiple cron-based scripts (plugin-defined scheduled tasks)

comment:10 Changed 4 years ago by matt (mattab)

Implementation proposal

  • new plugin TaskScheduler
  • exposes an API TaskScheduler.runTasks. This function triggers a hook, and existing plugins can register tasks to run.
    // pseudo code of function hooking on runTasks
    function runOptimizeTables($notification)
    {
       // run every Mondays at 2AM
       if( TaskScheduler.shouldRunTask( 'my task ID name', 'weekly' ))
       { 
            // execute task
       }
    }
    
  • Available schedules are hourly, daily, weekly, monthly

Note that we don't have minutes, because smaller possible granularity is the hour. (cron tabs are setup to run once per hour and probably should never run more often)

  • This new code should be built with the future feature in mind: #587. When #587 is implemented, setup the cron tab will be optional. This means that many small Piwik installs will not have the cron installed. Therefore we will use piwik.php (triggered once every page view) to trigger the equivalent of the cron job, to
    • trigger archiving
    • run scheduled tasks

The difference between running scheduled tasks via cron or via piwik.php is that, it might be triggered more than once per hour (even though all requests to piwik.php will not trigger the Scheduled tasks, for obvious optimization reasons, only one random out of many will trigger scheduled tasks).

A solution to this issue is to plan for schedules ahead of time (process the time at which the task will run next). Then, when the task successfully runs, re-schedule it for next time (eg. next week for a weekly task)

pseudo code

function shouldTaskRun( taskID, interval, [ minimumTimestamp ] )
  if(minimumTimestamp > time()) return false;

  schedule = Piwik_GetOption('schedule')
  shouldRunTask = false;
  if(isset(schedule[taskID]))
  {
    // task already scheduled, run only if scheduled_time is > time()
    if(schedule[taskID]['scheduled_time'] > time())
    {
       shouldRunTask = true;
    } 
  }
  else
  {
     // new task, always run once first time cron is ran
     shouldRunTask = true;
  }
  
  // process next time at which should run
  nextScheduleTime = time() + (if hourly then 3600 elseif daily then 86400 etc.);
  schedule[taskID][scheduled_time] = nextScheduleTime;

  // record updated schedule in DB
  Piwik_SetOption('schedule', schedule);

  return shouldRunTask;

minimumTimestamp can be used to define exactly what time of day should tasks run.

For example, if one wants to run a daily job at 2AM, you would write in your plugin

if( TaskScheduler.shouldRunTask( 'my task ID name', 'weekly', mktime(2,0,0,date('m'),date('d'),date('Y'))  ))

What will happen is that, the first time the cron triggers after 2AM, this scheduled task will be allowed to run. ShouldRunTask will then process next time it should run, which is 2AM the next day.

Edge case: if the cron didn't run before 5AM (for some reasons), it will trigger the 2Am task. However you wouldnt want to schedule tomorrow's task at 5AM but at 2AM. You can use code such as

 now = time();
 interval = 86400; // for example
 nextScheduleTime = now + interval - ((now - minimumTimestamp) % $interval);
  • There is no need to update the UI in this ticket. We can do the UI updates at the same time as #587

let me know if this makes sense, cheers

comment:11 Changed 4 years ago by matt (mattab)

Note: inspired from WP implementation see
http://phpxref.ftwr.co.uk/wordpress/nav.html?wp-includes/cron.php.html#wp_schedule_event

http://phpxref.ftwr.co.uk/wordpress/nav.html?wp-cron.php.html

while their implementation is over complicated, we can do the same thing in a few lines of code :)

comment:12 Changed 4 years ago by matt (mattab)

  • Owner set to JulienM

comment:13 Changed 4 years ago by JulienM (JulienMoumne)

I'm ok with the proposal except for one bit.

I would like the implementation to be more object oriented.

There would be a Piwik_ScheduledTask, a Piwik_ScheduledTime.

Instead of having :

function getListHooksRegistered()
{
	return array(
		'TaskScheduler.getScheduledTasks' => 'runOptimizeTables',
	);
}

function runOptimizeTables($notification)
{
   // run every Mondays at 2AM
   if( TaskScheduler.shouldRunTask( 'my task ID name', 'weekly' ))
   { 
        // execute task
   }
}

it would be

function getListHooksRegistered()
{
	return array(
		'TaskScheduler.getScheduledTasks' => 'getScheduledTasks',
	);
}

function getScheduledTasks($notification)
{
    $scheduledTasks = &$notification->getNotificationObject();

    $tableOptimisationScheduledTime = Piwik_ScheduledTime::factory('weekly');
    $tableOptimisationScheduledTime->setDay('monday');
    $tableOptimisationScheduledTime->setHour(13);
    $tableOptimisationScheduledTime->setMinute(20);

    $scheduledTasks[] = new Piwik_ScheduledTask('runOptimizeTables', $tableOptimisationScheduledTime);

}

function runOptimizeTables()
{
    // execute task
}

comment:14 Changed 4 years ago by matt (mattab)

  • setMinute() will not exist as the smaller granularity is the hour (when using cron, so we should limit it globally to the hour).
  • To call the actual method 'runOptimizeTables' from the TaskScheduler plugin, you would need to know the plugin name on which to call runOptimizeTables. You could pass the callback by doing array($this, 'runOptimizeTables') as first parameter and use call_user_func
  • the internal task ID could then be get_class( $callback[0] ) . '.' . $callback[1] (ie. Piwik_UserSettings.runSomeTasks

proposal looks good to me!

comment:15 Changed 4 years ago by JulienM (JulienMoumne)

I have submitted a patch in which I decided to remove all modulo calculus in favor of easier to read and easier to maintain computations.

comment:16 Changed 4 years ago by matt (mattab)

  • Summary changed from Add hook to launch multiple cron-based scripts (plugin-defined scheduled tasks) to Plugin API for Scheduled Tasks

comment:17 Changed 4 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

(In [2648]) Fixes #1184 Great patch by Julien Moumne to add Scheduled Task API in Piwik

  • possibilty to schedule daily/weekly/monthly tasks
  • tasks are executed via the crontab script for now (refs #1411 should be updated to trigger the tasks as well)
  • features the first use case: a Monthly OPTIMIZE TABLE statement ran on all piwik archive tables (to defragment the space after we run the DELETE statements)
  • Next candidates: PDF reports by email, custom Alerts
  • comes his very serious unit testing

comment:18 Changed 4 years ago by matt (mattab)

(In [2697]) Refs #71

  • Adding PDF plugin, based on the submission from jeremy lavaux and Lyzun Oleksandr.
  • I rewrote nearly all code to comply with Piwik coding styles/guidelines/ etc. and also because it had to use the Metadata which wasn't yet created when the PDF code was initially written
  • Features customizable PDF reports (based on metadata reports), with description + scheduling (daily/weekly/monthly) and send to current user as well as additional emails listed
  • Added helper function Piwik::getCurrentUserEmail()
  • Refactored window modal popover into a helper method piwikHelper.windowModal (used to ask confirmation when deleting stuff) Refs #1490
  • Refactored the Goals CSS into generic CSS which can be reused and is reused for PDF UI

Refs #1184

  • The callback must pass $this instead of the class name as it

TODO

  • test scheduled tasks send email properly (email looks good + attachment works)
  • Add comment header in PDFReports files
  • Can we remove some files in TCPDF which adds a lot of space in the release (eg. some fonts in libs/fonts ?)
  • Test PDF with UTF8 characters

comment:19 Changed 4 years ago by matt (mattab)

(In [2737]) Refs #71

  • Scheduled PDF reports by email work as expected
    • fixed issue with current week used instead of last finished period,
    • fixed issue that all recipients were listed in the same TO: field, now sending one email per address.
    • Super user API methods will return all PDF reports by default, but UI now only displays PDF created by Super User.
  • Refs #1184 Better logging of what task was ran and how long it took
  • The API call to run scheduled tasks must also be ported to Powershell refs #1411

comment:20 Changed 4 years ago by Beatgarantie

is there a possibility to schedule PDF reports without the usage of the crontab?
For me it would be nice to run reports e.g. when somebody logs in because I do not have the possibility to create crontabs.

comment:21 Changed 4 years ago by matt (mattab)

Beatgarantie, scheduled reports should work without crontab in 0.7. Requests to the Tracker will trigger scheduled tasks hourly. See #587 - let me know if it works for you

comment:22 Changed 4 years ago by Beatgarantie

@matt: OK, I will test.

It would the nice to see the PDF-template after switching to another tracked page via the website-dropdown.

Note: See TracTickets for help on using tickets.