Ticket #2328 (closed New feature: duplicate)
When visits or pages are recorded in the past via the Tracking API, reports are not re-created/deleted
| Reported by: | matt | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 1.6 Piwik 1.6 |
| Component: | Core | Keywords: | |
| Cc: | Sensitive: | no |
Description
The Tracking API is more and more used. A new use case, is that when the function setForceVisitDateTime() is used, and a date recorded in the past, if Piwik reports are already processed for this date (and the week/month/year containing this date), then the reports are never re-processed. This causes discrepencies.
There are 2 main use cases for this problem:
- Log import use case #703
- Paypal IPN Tracking use case, recording a conversion that happened a few hours ago #2222
- or any kind of "after the fact" tracking
We need a mechanism to force such visit/pages inserted in the past to 'flush' the past reports and they will be re-processed at the next archiving run.
The challenge is to make this efficient.
A proposal for this:
- in Tracker, if the request being tracked is in the past (before today midnight), then we are in the case that some existing reports become out of date
- when this is the case, we store in Option table, all the unique "days" that have been tracked in the past
- Now, when a report is requested via API, the API we would if there are any dates that were loaded in the past recently
- if there is such dates, loop and execute the query:
DELETE FROM archive WHERE date1 <= '$date' AND '$date' <= date2
on all archive tables.
- this should be optimized to only run on these archive tables that may contain such record (not good to loop over ALL archive tables for ALL dates to delete)
- might be better performance to run one query only per archive table, deleting records for all these dates in the past
- once done, reset the flag. If deleting one day at a time, delete the flag for this day after being done.
Note
- it is important that piwik works when the Import apache log script is running, while archiving is also running (be careful about handling the flag of 'dates' and not lose the information that is being saved by Tracker while importing logs, while Archiving is reading/updating the flag as well)
I think this will work while not adding much overhead?
