Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow visits, pages and conversions (orders, cart updates) to be recorded for days in the past #2584

Closed
mattab opened this issue Jul 22, 2011 · 6 comments
Assignees
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Jul 22, 2011

we sometimes will have to push conversion data for days in the past.

Currently, such conversions will be tracked however old days reports will not be updated the next time they are requested.

When such conversions happen in the past, we should set a flag that will force this report to be refreshed the next time it is requested, or the next time archiving runs.

@mattab
Copy link
Member Author

mattab commented Jul 22, 2011

This function would also work for "updating" orders in previous days (which would invalidate past reports)

@mattab
Copy link
Member Author

mattab commented Sep 7, 2011

See also duplicated #2328

@mattab
Copy link
Member Author

mattab commented Nov 18, 2011

  • If visitors are recorded before the creation date of the website, should we record it in the past? If so, then we should also update the "website.ts_created" field when this happens (note: make it efficient).
  • When registering orders with revenue=0, what happens? SQL must ignore these when archiving?
  • See Display the time that last Archive ran "Reports were generated X seconds|minutes|hours ago" #1052 - we should implement this. When Cron not setup, the "delete invalid date" logic should be triggered from the browser triggered archives. Otherwise, when cron is running,
    • only cron should trigger delete of archives, and
    • only if it re-processes the archives just after the deletion of outdated one (prevent the "no data" bug!)
    • only if "delete old logs" was not run for this date (should keep a counter of max deleted log date and 'refuse' to invalidate old data? Output warning message in log output?)

@mattab
Copy link
Member Author

mattab commented Mar 6, 2012

Why this ticket?

AWStats/Urchin alternative script #703 will push server log data to Piwik for days in the past, sometimes users will play logs from the last 3 months at once, or in several go, processing dates in random orders, websites in random order.

For example:

  • import logs for 2012 March for Site 1-1000.
  • Run archive.php
  • Data shows only since the day of install, but at least some data is shown
  • User tries imports logs from 2011 Aug 2012 Feb
  • Run archive.php. The calendar does not show before March??
  • User finds out about updating ts_created in piwik_site table
  • Calendar now allows to select older dates, but old data does not show??

The goal is to accommodate this use case in a user friendly manner: Piwik should transparently force reprocessing for the websites/days/weeks/months/years where new Data was inserted.

Implementation

  • This feature will work ONLY if archive.php cron is setup
    • we implement it in archive.php for efficiency / decoupling. Doing the logic on each Piwik_Archive::build or similar would be too slow for a feature that will only be used by power users
  • Task implementation
    • Assumptions
    • idvisit is growing in log_visit
    • we only invalidate days that have NEW visits. If the changes are "updating" one or several visitors (or orders, pageviews, etc) without adding new visits, it will not cause a re-processing. It should be fine since the goal is to deal with newly pushed data from logs, which will always create new visitors.
    • Logic
    • at the end of archive.php run (scheduled task?), keep track of the MAX(idvisit) last processed
    • Next run: look for (idsite,date) that are from past date SELECT count( idvisit ) , DATE( visit_last_action_time ) AS date, idsite FROM piwik_log_visit WHERE idvisit >4000000 GROUP BY idsite, date
      • This query will be potentially slow. Allow for a config setting to disable this feature completely for users who don't use it and don't want the added performance hit (ie. takes 20s to run on 1.5M visits on demo)
      • Cache in DB rather than in memory (when running concurrent archive.php)
    • Delete archived reports...
    • Wait until a website is being processed, to delete the old reports -- so that a user consulting stats can access them until the last second that the re-process starts
    • Delete all matching reports from archive_XX tables for this site and all dates that are invalidated for this site.
      • Delete matched days, and weeks/month/year containing these deleted days
      • The function to delete archives for a given site/date should be refactored in a private API so that it can be reused for Piwik lightweight mode: refs #3882 Use table instead of tableGoals on Visits-Days to Conversion reports #53
      • If using the feature "Delete logs older than N days", we should only delete reports for dates that are more recent than N days
      • Write in the archive.php output a WARNING when it happens that older data is recorded. Ask user to increase the "Delete logs older than N days" in Settings to proceed and not lose their data.
    • Force re-processing of these old reports...
    • Update site.ts_created to earliest date now known
    • Set the proper values to ensure these dates are triggered when archiving old data (ie. set last52 or more to API calls)

This should work!

@mattab
Copy link
Member Author

mattab commented Mar 13, 2012

(In [6039]) Fixes #2584

  • Implemented instead as a public API, that can invalidate any report from any day or list of idsites
  • the API will be called by the Log Import script
  • the archive.php next run will process these dates in priority
  • the ts_created is updated in the websites to make sure calendar is selectable
  • Handles when "Delete logs older than N days". Only invalidate reports that are more recent than N days (for which we are likely to still have logs)
  • Added integration test that
    • first calls the reports, old data not displayed
    • then calls the API invalidating reports for newer dates
    • then calls the API again, now with data!

@mattab
Copy link
Member Author

mattab commented Mar 13, 2012

(In [6040]) remove debug Refs #2584

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

1 participant