Opened 2 years ago

Closed 13 months ago

Last modified 11 months ago

#3097 closed Bug (fixed)

Make Visitor Log load very fast even with hundreds of thousands of visitors, or if a visitor views thousands of pages

Reported by: matt Owned by:
Priority: critical Milestone: 1.12 - The Great 1.x Backlog
Component: Performance Keywords:
Cc: Sensitive: no

Description

On a test Piwik, select year period, then visitor log, loading the page is really slow and I suppose could even crash Mysql in some cases.

When looking at year or month or day it should be as fast in all cases, since we select the last N visitors for the last day of the period.

I thought this was fixed already, but it isn't, we should run a clever SQL that is fast just like when looking at one day data.

Change History (8)

comment:1 Changed 16 months ago by matt (mattab)

  • Component changed from Core to Performance
  • Milestone changed from 1.9.x - Piwik 1.9.x to Feature requests
  • Priority changed from major to normal

comment:2 Changed 16 months ago by EZdesign (BeezyT)

As far as I can see, the problem is as follows:

Take a look at Piwik_Live_API::loadLastVisitorDetailsFromDatabase().
It only does a LIMIT if $filter_limit is set, but there is none.

It loads all visits in the period and the data table widget does pagination.
It's not hard to see why this fails for many visits.

If you do echo $subQuery['sql'];, you get:

SELECT log_visit.* FROM piwik_log_visit AS log_visit WHERE log_visit.idsite = ? AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ? ORDER BY idsite, visit_last_action_time DESC

It selects everything in the range (which can be millions of rows), sorts it, ships it to PHP and builds a data table. The controller method getVisitorLog does $view->setSortedColumn('idVisit', 'ASC');, which might even sort it again in PHP before applying pagination. Obviously, that won't work for millions of rows in the data table.

What we would have to do is this:

  • Use LIMIT in the SQL query
  • Use calc_found_rows in SQL and add new methods to the data table to generate the pagination without having the actual data
  • Maybe we should avoid sorting on high traffic websites and rely on the ording of the table. Not sure whether this is possible because a GROUP BY is applied to the select query quoted above and AFAIK there is no specified ordering after a GROUP BY

comment:3 Changed 16 months ago by matt (mattab)

I would propose instead to keep this SQL and call it, for 1 day at a time until there are enough visitors for the display? that would be simpler I think...

comment:4 Changed 16 months ago by EZdesign (BeezyT)

One day still has an unlimited number of rows. Doing it for one day at a time is an improvement but still not a fix for very high traffic websites.

comment:5 Changed 13 months ago by EZdesign (BeezyT)

  • Resolution set to fixed
  • Status changed from new to closed

The problem was not as described here. Nevertheless, it should be fixed in [3ac47e89eb7cc6024cffad435e9a6ef2a29488ff]

comment:6 Changed 13 months ago by matt (mattab)

Great work on this one!

comment:7 Changed 13 months ago by matt (mattab)

  • Milestone changed from Feature requests to 1.12 - The Great 1.x Backlog
  • Priority changed from normal to major
  • Summary changed from Loading Visitor Log is slow when the period contains dozens of thousands of visits to Visitor Log should load very fast even if there are thousands of visitors, or if a visitor visits thousands of pages

comment:8 Changed 11 months ago by matt (mattab)

  • Priority changed from major to critical
  • Summary changed from Visitor Log should load very fast even if there are thousands of visitors, or if a visitor visits thousands of pages to Make Visitor Log load very fast even with hundreds of thousands of visitors, or if a visitor views thousands of pages
Note: See TracTickets for help on using tickets.