Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#2584 closed New feature (fixed)

Allow visits, pages and conversions (orders, cart updates) to be recorded for days in the past

Reported by: matt Owned by: matt
Priority: normal Milestone: 1.12.x - Piwik 1.12.x
Component: Core Keywords:
Cc: Sensitive: no

Description

we sometimes will have to push conversion data for days in the past.

Currently, such conversions will be tracked however old days reports will not be updated the next time they are requested.

When such conversions happen in the past, we should set a flag that will force this report to be refreshed the next time it is requested, or the next time archiving runs.

Change History (10)

comment:1 Changed 3 years ago by matt (mattab)

This function would also work for "updating" orders in previous days (which would invalidate past reports)

comment:2 Changed 3 years ago by matt (mattab)

See also duplicated #2328

comment:3 Changed 3 years ago by matt (mattab)

  • Summary changed from Allow conversions (orders, cart updates) to be recorded for days in the past to Allow visits, pages and conversions (orders, cart updates) to be recorded for days in the past

comment:4 Changed 3 years ago by matt (mattab)

  • Milestone changed from 1.7 Piwik 1.7 to 1.6.x Piwik 1.6.x
  • Owner set to matt

comment:5 Changed 2 years ago by matt (mattab)

  • Priority changed from normal to critical

comment:6 Changed 2 years ago by matt (mattab)

  • If visitors are recorded before the creation date of the website, should we record it in the past? If so, then we should also update the "website.ts_created" field when this happens (note: make it efficient).
  • When registering orders with revenue=0, what happens? SQL must ignore these when archiving?
  • See #1052 - we should implement this. When Cron not setup, the "delete invalid date" logic should be triggered from the browser triggered archives. Otherwise, when cron is running,
    • only cron should trigger delete of archives, and
    • only if it re-processes the archives just after the deletion of outdated one (prevent the "no data" bug!)
    • only if "delete old logs" was not run for this date (should keep a counter of max deleted log date and 'refuse' to invalidate old data? Output warning message in log output?)

comment:7 Changed 2 years ago by matt (mattab)

Why this ticket?

AWStats/Urchin alternative script #703 will push server log data to Piwik for days in the past, sometimes users will play logs from the last 3 months at once, or in several go, processing dates in random orders, websites in random order.

For example:

  • import logs for 2012 March for Site 1-1000.
  • Run archive.php
  • Data shows only since the day of install, but at least some data is shown
  • User tries imports logs from 2011 Aug 2012 Feb
  • Run archive.php. The calendar does not show before March??
  • User finds out about updating ts_created in piwik_site table
  • Calendar now allows to select older dates, but old data does not show??

The goal is to accommodate this use case in a user friendly manner: Piwik should transparently force reprocessing for the websites/days/weeks/months/years where new Data was inserted.

Implementation

  • This feature will work ONLY if archive.php cron is setup
    • we implement it in archive.php for efficiency / decoupling. Doing the logic on each Piwik_Archive::build or similar would be too slow for a feature that will only be used by power users
  • Task implementation
    • Assumptions
      • idvisit is growing in log_visit
      • we only invalidate days that have NEW visits. If the changes are "updating" one or several visitors (or orders, pageviews, etc) without adding new visits, it will not cause a re-processing. It should be fine since the goal is to deal with newly pushed data from logs, which will always create new visitors.
    • Logic
      • at the end of archive.php run (scheduled task?), keep track of the MAX(idvisit) last processed
      • Next run: look for (idsite,date) that are from past date SELECT count( idvisit ) , DATE( visit_last_action_time ) AS date, idsite FROM piwik_log_visit WHERE idvisit >4000000 GROUP BY idsite, date
        • This query will be potentially slow. Allow for a config setting to disable this feature completely for users who don't use it and don't want the added performance hit (ie. takes 20s to run on 1.5M visits on demo)
        • Cache in DB rather than in memory (when running concurrent archive.php)
      • Delete archived reports...
        • Wait until a website is being processed, to delete the old reports -- so that a user consulting stats can access them until the last second that the re-process starts
        • Delete all matching reports from archive_XX tables for this site and all dates that are invalidated for this site.
          • Delete matched days, and weeks/month/year containing these deleted days
          • The function to delete archives for a given site/date should be refactored in a private API so that it can be reused for Piwik lightweight mode: #53
          • If using the feature "Delete logs older than N days", we should only delete reports for dates that are more recent than N days
            • Write in the archive.php output a WARNING when it happens that older data is recorded. Ask user to increase the "Delete logs older than N days" in Settings to proceed and not lose their data.
      • Force re-processing of these old reports...
        • Update site.ts_created to earliest date now known
        • Set the proper values to ensure these dates are triggered when archiving old data (ie. set last52 or more to API calls)

This should work!

comment:8 Changed 2 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

(In [6039]) Fixes #2584

  • Implemented instead as a public API, that can invalidate any report from any day or list of idsites
  • the API will be called by the Log Import script
  • the archive.php next run will process these dates in priority
  • the ts_created is updated in the websites to make sure calendar is selectable
  • Handles when "Delete logs older than N days". Only invalidate reports that are more recent than N days (for which we are likely to still have logs)
  • Added integration test that
    • first calls the reports, old data not displayed
    • then calls the API invalidating reports for newer dates
    • then calls the API again, now with data!

comment:9 Changed 2 years ago by matt (mattab)

(In [6040]) remove debug Refs #2584

comment:10 Changed 2 years ago by matt (mattab)

  • Priority changed from critical to normal
Note: See TracTickets for help on using tickets.