Ticket #5 (new Bug)
Tables log_ are not purged after archiving is done
| Reported by: | matt | Owned by: | |
|---|---|---|---|
| Priority: | critical | Milestone: | Features requests - after Piwik 1.0 |
| Component: | Core | Keywords: | |
| Cc: | Sensitive: | no |
Description (last modified by matt) (diff)
The information contained in the tables _log_* should be purged automatically and regularly. Keeping all time logs in a single table significantly slows down the stats logging process (mysql having to rebuild indices, selecting from this million rows table is time consuming, maintenance is hard, etc.).
The goal of this task is to provide automatic purge of the tracking logs, every day or every month and with optional backup in a yearly table (customized with UI settings).
Processing uniques over weeks / months without using logs In #409, we will implement a cookie store mechanism based on a mysql lookup table. This table will contain enough data (idvisitor, ip, idsite, date first visit, date last visit) to process unique visitors over a week or a month
Log purge execution The purge task would be triggered during the 'Maintenance process' (see #1184), and once a day maximum it would execute and try to purge logs for the day (or month) before.
- read settings (purge every day? every month?)
- ensure all archives have been processed for the day archives contained in the log tables. Weeks/months are not necessary to be processed as they are the sum of daily archives (and uniques can be processed using the cookie store rather than keeping the full visitor logs for the period)
- purge tracker for all websites at once
- backs up the logs into a yearly archive table
- INSERT INTO log_X_backups_2010 SELECT * FROM log_X
- one backup per purged table (log_visit, log_conversion, etc.)
- record the auto increment values (result of SELECT MAX() from log_X)
- purge logs (TRUNCATE TABLE)
- inserts last auto increment in each table
New Super User admin settings
- Purge traffic logs and process reports at least every day/month
- Backup traffic logs into a backup table containing all logs for the year (table is piwik_log_visit_backup2010)?
- Could also add admin setting for 'Keep visitor cookies for 1 year' (from #409)
A few interesting resources
- How to write efficient archiving and purging jobs in SQL http://www.xaprb.com/blog/2006/05/02/how-to-write-efficient-archiving-and-purging-jobs-in-sql/
- MK-archiver tool http://www.maatkit.org/doc/mk-archiver.htm
