Opened 4 years ago

Closed 2 years ago

Last modified 2 years ago

#1077 closed Bug (fixed)

"All Websites dashboard" should load fast even when tracking hundreds of websites! MultiSites

Reported by: matt Owned by:
Priority: critical Milestone: 1.7 Piwik 1.7
Component: Core Keywords:
Cc: Sensitive: no

Description (last modified by matt)

Currently, the multi sites dashboard page render takes 6 API requests (ie. 6 mysql SELECTs) for each website displayed. If a Piwik install has 1000 registered websites, the page load will simply fail, because the php page would try and execute 6,000 queries which is not a good idea.

Instead, only one request should be made to select the 6 values necessary for each website. This query would therefore select 6,000 values at once which would be as fast as we can get with that many values.

This query would join the site, access and archive_* tables. See Piwik_Access::reloadAccess() and core/Archive/Array/IndexedBySite.php for the current behavior. I think the solution might be to have a new class in core/Archive/Array indexed by user?

Attachments (4)

issue1077.diff.tar.gz (15.0 KB) - added by capedfuzz 3 years ago.
Patch for this issue.
issue1077.diff (14.1 KB) - added by capedfuzz 2 years ago.
Patch for issue #1077. Optimizes the IndexedBySite archive type. Speeds up All Websites page by 12 seconds when Piwik tracks 800 websites.
test_all_sites_time.coffee (664 bytes) - added by capedfuzz 2 years ago.
A quick benchmark used to test how long it takes to retrieve the All Websites HTML. Written in CoffeeScript & uses node.js.
issue1077_assorted_speedups.diff (7.2 KB) - added by capedfuzz 2 years ago.
Patch for issue #1077. Several smaller optimizations included. Made for rev 5492.

Download all attachments as: .zip

Change History (45)

comment:1 Changed 4 years ago by wayo121

Hi, is there any update on this?
One of the main reasons I installed Piwik was to use the multisite feature (it worked great when I had 10 sites). However I now have 500+ sites and this feature is unusable. If I try to run it, it times out of after a few minutes. Cheers.

comment:2 Changed 4 years ago by vnese

+1 vote for this issue

comment:3 Changed 4 years ago by dbellenger

+1 vote for this issue

comment:4 Changed 4 years ago by matt (mattab)

Other bug report:

I am tracking 22 sites with Piwik. Two have titles that start with A (we'll call them A1 and A2), and the other 20 have titles starting with C-Z.

The main MultiSites screen is a bit funky. The initial view shows sites C-Z, in descending alphabetical order. Clicking on Next shows A1 and A2, in alpha order. These should be shown at the top of the other screen, while the last 2 should be on this second screen.

Also, the text at the bottom reads "1-21 of 22", when in fact it should read "1-20 of 22"

comment:5 Changed 4 years ago by mauser (zawadzinski)

[1872] fixes sorting and other small UI issues

comment:6 Changed 4 years ago by vipsoft (robocoder)

(In [1874]) refs #1077 - re-apply fix from [1810] where site name contains quotes

comment:7 Changed 4 years ago by matt (mattab)

  • Milestone changed from 1 - Piwik 0.5.5 to 1 - Piwik 0.5.6

comment:8 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

comment:9 Changed 4 years ago by matt (mattab)

  • Priority changed from major to critical

Doable in one query, we can go from 1,000 websites= 6,000 SELECT to one SELECT which would greatly improve performance. This is one of the last performance blockers for 1.0 with #409

comment:10 Changed 4 years ago by vipsoft (robocoder)

  • Summary changed from MultiSites plugin: improve performance to make it work on Piwik with hundreds of websites to MultiSites: improve performance to make it work on Piwik with hundreds of websites

See also #1315

comment:11 Changed 3 years ago by matt (mattab)

Also it would be useful to have a global.ini.php setting controlling how many websites to show in the UI by default (so Super Users can decide to show 1000 if they want to)

comment:12 Changed 3 years ago by matt (mattab)

  • Milestone changed from 1.1 - Piwik 1.1 to Piwik 1.x

comment:13 Changed 3 years ago by matt (mattab)

  • for consistency, we could add a link to download the 'All websites' data in xml/excel/etc. at the bottom of the dashboard

comment:14 Changed 3 years ago by matt (mattab)

Feature requests from forum post

  • Show more than 20 websites in the page (could be a config file setting)
  • Show Goal conversions in a new column

comment:15 Changed 3 years ago by matt (mattab)

  • Type changed from Bug to Task

comment:16 Changed 3 years ago by matt (mattab)

  • Summary changed from MultiSites: improve performance to make it work on Piwik with hundreds of websites to MultiSites: All Websites dashboard slow when hundreds of websites

comment:17 Changed 3 years ago by matt (mattab)

  • Owner mauser deleted

comment:18 Changed 3 years ago by capedfuzz (diosmosis)

Hello all, I'm interested in fixing this bug. I think I've worked out a way to fix it, but I'm not completely familiar with the code base, so I wanted to run my idea by you guys before I created a patch.

I looked at the queries being made when the 'All Websites' page is loaded and found that the large number of queries was due to the Array Archive types looping over an array of other Archives. I thought a possible solution would be a new archive type (called Piwik_Archive_Cached) that cached the archive IDs that result when prepareArchive() is called in a BLOB archive record. This would reduce the number of queries made from two per site per API call to two queries per api call, but it wouldn't make the page any faster when archiving is triggered through the browser.

Does this sound like an acceptable approach?

comment:19 Changed 3 years ago by matt (mattab)

capedfuzz, in your proposal, will you run a query such as WHERE idarchive IN (.........) ?

This could be a problem, when there are for example, 5,000 websites, the WHERE idarchive IN ... will be rather slow. But, it is worth testing anyway such patch, as it might still improve performance?

In any case, it is normal that the page will work nicely only when automatic archiving is setup (ie. browser archiving is disabled), so this limitation is definitely OK.

looking forward to seeing your patch or ideas about this!

comment:20 Changed 3 years ago by capedfuzz (diosmosis)

I planned on using IN (...), but I might be able to optimize the query... I'll create a prototype and see what happens.

comment:21 Changed 3 years ago by capedfuzz (diosmosis)

A quick update, wanted to let you know I'm still working on this. I had to go a different route than what I proposed: instead of storing archive IDs I select all the IDs in one go. From my simple tests (which didn't include loading sparklines), this sped up the page from ~46 secs to ~17 secs when loading data for 800 websites.

I still have a fair amount of work to do, but hopefully I'll have a patch posted soon.

comment:22 Changed 3 years ago by capedfuzz (diosmosis)

Actually, sorry, that's ~24 to 17secs. Forgot to disable archiving when testing the normal version. So not quite as monumental a speed up :).

comment:23 Changed 3 years ago by matt (mattab)

capedfuzz, any update on your patch? Any improvement is definitely welcome for the next release :) thx

comment:24 Changed 3 years ago by capedfuzz (diosmosis)

Yes, I've got an update. I ended up adding some more improvements and got the page to load in < 4 secs for a sufficiently small amount of sites per page. However, to do that I had to make it possible to do the sorting/pagination logic server-side. So the patch is a bit big. I should be able to submit it some time tomorrow (EST), but in the mean time, here's a list of the big changes I made:

0) I've added a new class, Piwik_QueryOptions, which holds extra query info (like what to use in the LIMIT/OFFSET/ORDER BY parts of a query). I added a parameter to the VisitsSummary API's get & the Goals API's get methods, as well as each Archive type's getDataTableFromNumeric method. Currently, its only used by the IndexedBySite type.
1) IndexedBySite won't call prepareArchive unless archiving is enabled. It'll also select archive ids in one query.
2) I added a getTotal method to the VisitSummary API & the Goals API to compute the sum of a set of archived values, and I added a getSum method to every Archive type. Currently, its only implemented by the IndexedBySite archive type (the others throw an exception).

The page will still load slowly when looking for a custom range that initiates archiving.

Changed 3 years ago by capedfuzz (diosmosis)

Patch for this issue.

comment:25 Changed 3 years ago by capedfuzz (diosmosis)

My patch is attached. It will speed things up greatly when there aren't that many sites shown, and speed things up a bit when a lot of sites are shown.

Some notes:

  • The amount of time it takes to load the page is still proportional to the amount of websites Piwik tracks. When there are ~1000 sites it'll display ~50 entries very quickly, but when I increased the number of sites to ~5000, it took a couple seconds longer.
  • The page will still load slowly (if at all) when looking for a custom range that initiates archiving.
  • When sorting, results are sorted by the desired field (ie, nb_visits, nb_actions, etc.), and then by site ID (as opposed to name). This might make the results look a bit weird.

Changed 2 years ago by capedfuzz (diosmosis)

Patch for issue #1077. Optimizes the IndexedBySite archive type. Speeds up All Websites page by 12 seconds when Piwik tracks 800 websites.

comment:26 Changed 2 years ago by capedfuzz (diosmosis)

I've created some optimizations that speed up the All Websites page by ~12 seconds when tracking 800 websites. I had to make changes to the Piwik_ArchiveProcessing class, but I don't have detailed knowledge how this class & its descendants work. Would anyone mind taking a quick look at my changes to make sure there are no potential issues w/ it (I've uploaded a patch)? Thanks in advance.

comment:27 Changed 2 years ago by matt (mattab)

capedfuzz, looks like an excellent patch! I haven't tried applying it, but will definitely test performance before and after you have committed.

Feedback

  • Code looks v. good!
  • testability: it is important to test code coverage of most archiving code since it is core to Piwik. Would you mind adding an integration test testing with idSites=all or a comma separated list? The test would create fake visits for 2 or 3 different websites. then would call a few APIs to trigger archiving. Then, disable browser archiving. Then call idSites=1,2,3 and check the new code path is executed, and code is tested and works as expected. Then, enable browser archiving.
  • To "disable/enable browser archiving" temporarily in your test, you could add a new helper such as:
    Index: core/ArchiveProcessing.php
    ===================================================================
    --- core/ArchiveProcessing.php	(revision 5455)
    +++ core/ArchiveProcessing.php	(working copy)
    @@ -900,15 +900,22 @@
     		return $isDisabled;
     	}
     	
    +	protected $archivingDisabled = false;
     	protected function isRequestAuthorizedToArchive()
     	{
    -		return self::isBrowserTriggerArchivingEnabled()
    -				|| Piwik_Common::isPhpCliMode()
    -				|| (Piwik::isUserIsSuperUser() 
    -					&& Piwik_Common::isArchivePhpTriggered())
    -					;
    +		return !$this->archivingDisabled
    +				&& (self::isBrowserTriggerArchivingEnabled()
    +					|| Piwik_Common::isPhpCliMode()
    +					|| (Piwik::isUserIsSuperUser() 
    +						&& Piwik_Common::isArchivePhpTriggered())
    +					);
     	}
     	
    +	static public function disableArchiving($disable = true)
    +	{
    +		$this->archivingDisabled = $disable;
    +	}
    +	
     	/**
     	 * Returns true when 
     	 * - there is no segment and period is not range 
    
    
  • excellent idea to submit code bit by bit otherwise it is much harder to review

I don't see any potential issues otherwise, please commit when you manage to add the test. if you need any help let me know :)

How long does the page take before and after? what % improvement?

comment:28 Changed 2 years ago by capedfuzz (diosmosis)

(In [5475]) Refs #1077. Improved speed of the IndexedBySite archive type by optimizing the case when launching the archiving process is disabled. Instead of selecting archive IDs one at a time, IndexedBySite will now select them all at once.

Notes:

  • Also added an integration test that will test whether Piwik works when archiving is disabled.
  • Modified getKnownSegmentsToArchive() to cache its result as static function data instead of having ArchiveProcessing cache it as instance data.
  • Changed several instance methods in ArchiveProcessing to static methods. These methods are responsible for determining if archiving is disabled.
  • Removed meaningless idSite related code from TablePartitioning.

Changed 2 years ago by capedfuzz (diosmosis)

A quick benchmark used to test how long it takes to retrieve the All Websites HTML. Written in CoffeeScript & uses node.js.

comment:29 Changed 2 years ago by capedfuzz (diosmosis)

I committed the change & there's a new integration test.

The times I record when I test aren't constant, but the most conservative view of the speed improvement is from 14s to 6s (~57%).

I've also uploaded the script I've used to test the times.

comment:30 follow-up: Changed 2 years ago by matt (mattab)

Very nice commit and ideas. New integration test good.

I see from 17s to 7s improvements on my 600 websites test setup... Very nice!!! Great work.

When I enable profiler and debug logging, I see that now there is this executed for each site:

Executed 565 times in 564.3ms (average = 1ms)
	SELECT * FROM piwik_site WHERE idsite = ?

I guess this could be detected upstream and select * instead, which would probably bring another 10% improvement.

Executed 4 times in 40.2ms (average = 10.1ms)

	SELECT idsite, MAX(idarchive) AS idarchive FROM piwik_archive_numeric_2011_11 WHERE date1 = ? AND date2 = ? AND period = ? AND (name = 'done' ) AND (value = '1' OR value = '3') AND idsite IN (1,3,4,5,6,30,2944,2945,[... hundreds of ids ...]05,3506) GROUP BY idsite

This one is also interesting. Executed 4 times, because of: Piwik_MultiSites_Controller->getSitesInfo() called at [D:\piwik\svn\trunk\plugins\MultiSites\Controller.php:37]

  • we could group these 4 API calls into one (maybe via the new API.get helper?
    • For example, maybe we create a custom API call for the ALl Websites dashboard that selects the few metrics needed. This is more a hack, but that would work well and would be quite easy to implement.

Downside: tied to naming of metrics in VisitsSummary,Goals plugin, but this is fine since these metrics won't change their name anytime soon.

comment:31 Changed 2 years ago by matt (mattab)

(In [5489]) Refs #1077 Pre-load all websites when user is super user only. Saves hundreds of select * from piwik_site where idsite = ?

comment:32 Changed 2 years ago by matt (mattab)

(In [5490]) Refs #1077 Missing code preventing multiple executions

comment:33 in reply to: ↑ 30 Changed 2 years ago by capedfuzz (diosmosis)

Replying to matt:

When I enable profiler and debug logging, I see that now there is this executed for each site:

Didn't know there was a profiler built into Piwik, thanks for pointing it out!

Executed 4 times in 40.2ms (average = 10.1ms)

	SELECT idsite, MAX(idarchive) AS idarchive FROM piwik_archive_numeric_2011_11 WHERE date1 = ? AND date2 = ? AND period = ? AND (name = 'done' ) AND (value = '1' OR value = '3') AND idsite IN (1,3,4,5,6,30,2944,2945,[... hundreds of ids ...]05,3506) GROUP BY idsite

This one is also interesting. Executed 4 times, because of: Piwik_MultiSites_Controller->getSitesInfo() called at [D:\piwik\svn\trunk\plugins\MultiSites\Controller.php:37]

  • we could group these 4 API calls into one (maybe via the new API.get helper?
    • For example, maybe we create a custom API call for the ALl Websites dashboard that selects the few metrics needed. This is more a hack, but that would work well and would be quite easy to implement.

Downside: tied to naming of metrics in VisitsSummary,Goals plugin, but this is fine since these metrics won't change their name anytime soon.

I think you could get rid of these by allowing Piwik_Archive creation to be separate from the querying. I'll upload a patch to show you what I mean. (This might result in a speed up when archiving is enabled, too...)

Though from what I can tell the remaining SQL isn't taking too much time to execute. I'm going to try and whittle the time down a couple more seconds. If I can't I'll post a comment.

Changed 2 years ago by capedfuzz (diosmosis)

Patch for issue #1077. Several smaller optimizations included. Made for rev 5492.

comment:34 Changed 2 years ago by capedfuzz (diosmosis)

I uploaded another patch with a bunch of smaller fixes. This patch speeds All Sites up for me from 5.2s to 2.2s (w/ 800 websites stored). Here's a list of the changes:

  • Modified Piwik_Archive::build so it can accept Piwik_Segment instances. This way there won't be hundreds of segments created when they get overwritten by IndexedBySite anyway. Saves ~.2s
  • Modified the previously committed site caching code. All Sites already selects those sites, so it was a bit redundant. I removed part of it, but I could apply a similar optimization to other parts of Piwik if you let me know where they are. Saves ~2.25s (no idea why so much...)
  • Modified the setMinMaxDateAcrossWebsites function and the Piwik_Date class. The aforementioned function looped through every site and instantiated several Date instances. I modified it to work w/o having to create Date instances. Saves ~.5s

I can do one more optimization that should reduce the time to ~1.5s. When the website count goes into the thousands, the speed up will be more apparent. But, it will require an added method to the VisitsSummary & Goals APIs.

I noticed that the only remaining thing that takes too much time is the creation of the IndexedBySite archive. The Archive::build method instantiates an Archive, Site, Period and Date instance and IndexedBySite will invoke it for every site. The first time its run it takes ~1s on my setup. I want to add getForArchive($archive) methods to VisitsSummary & Goals which take an already built archive. Do you think there would be a problem w/ this?

Also, I can rewrite part of the MultiSites controller and add an API class w/ a getSitesInfo method. This way it can have some integration tests.

comment:35 follow-up: Changed 2 years ago by matt (mattab)

Great!!

core/Archive/Array/IndexedBySite.php 
36	 	                        $archive = Piwik_Archive::build($idSite, $strPeriod, $strDate, $segment->getString() ); 
37	 	                        $archive->setSite(new Piwik_Site($idSite)); 
38	 	                        $archive->setSegment($segment); 
 	36	                        $archive = Piwik_Archive::build($idSite, $strPeriod, $strDate, $segment ); 

Here the call to setSite() is not done anymore - is this call to setSite() not used/useless?

  • I support you've used the profiler to test the queries are called once as expected etc.
  • Excellent changes in the Date class (makes the code much clearer!!) and good speed improvements :)

I removed part of it, but I could apply a similar optimization to other parts of Piwik if you let me know where they are. Saves ~2.25s (no idea why so much...)

I think the sql query is quite useful for the "super user", since getSitesWithViewAccess will issue a sql query ID IN(...) with thousands of IDs. For the Super User, we know that all websites are needed so this query would help, but leave your new code w/ this improvement as well for admin/view/anonymous users.

add an API class w/ a getSitesInfo method

WHat's your idea? I thought that this function could return the metrics needed, bypassing the VisitsSummary and Goals APIs. This would change from 4 sql queries to sql 2 queries.

I want to add getForArchive($archive) methods to VisitsSummary & Goals which take an already built archive. Do you think there would be a problem w/ this?

After all other changes (and calling 1 API for today, 1 for yesterday), I'm not sure if this one would be required still.

Please feel free to commit! great stuff! in fact this ticket can be closed after this, since the page will render for most users now.

comment:36 Changed 2 years ago by matt (mattab)

  • Milestone changed from 1.x - Piwik 1.x to 1.7 Piwik 1.7
  • Type changed from Task to Bug

comment:37 in reply to: ↑ 35 Changed 2 years ago by capedfuzz (diosmosis)

Replying to matt:

Here the call to setSite() is not done anymore - is this call to setSite() not used/useless?

Archive::build will set the site on a Single archive, so the call in IndexedBySite is unnecessary. It did have an effect on the IndexedByDate class, but I added a call to setSite to IndexedByDate's constructor.

  • I support you've used the profiler to test the queries are called once as expected etc.
  • Excellent changes in the Date class (makes the code much clearer!!) and good speed improvements :)

I removed part of it, but I could apply a similar optimization to other parts of Piwik if you let me know where they are. Saves ~2.25s (no idea why so much...)

I think the sql query is quite useful for the "super user", since getSitesWithViewAccess will issue a sql query ID IN(...) with thousands of IDs. For the Super User, we know that all websites are needed so this query would help, but leave your new code w/ this improvement as well for admin/view/anonymous users.

The issue I see w/ the super user improvement is that its in the Archive::build method. So any API request that checks for 'all' sites may result in two queries (if the original API method optimizes for non-super user cases). If I move the optimization to the All Sites plugin, this wouldn't occur, but the optimization would have to be explicitly added to other parts of Piwik that get data for 'all' sites. I'm going to leave it in when I commit, though.

add an API class w/ a getSitesInfo method

WHat's your idea? I thought that this function could return the metrics needed, bypassing the VisitsSummary and Goals APIs. This would change from 4 sql queries to sql 2 queries.

I want to add getForArchive($archive) methods to VisitsSummary & Goals which take an already built archive. Do you think there would be a problem w/ this?

After all other changes (and calling 1 API for today, 1 for yesterday), I'm not sure if this one would be required still.

No, it wouldn't. After I thought about it some more I realized you could just go through the data table and forget about the other APIs :).

Please feel free to commit! great stuff! in fact this ticket can be closed after this, since the page will render for most users now.

I'd still like to refactor the Multi Sites plugin, though. Do you mind if I create a new ticket: 'Refactor Multi Sites plugin to follow Piwik plugin conventions'?

comment:38 Changed 2 years ago by matt (mattab)

If I move the optimization to the All Sites plugin, this

wouldn't occur, but the optimization would have to be explicitly added to
other parts of Piwik that get data for 'all' sites. I'm going to leave it
in when I commit, though.

I think the only other parts is the "Manage Websites" page in the Settings, for the Super User. Otherwise no other page should select all websites information I think.

I'd still like to refactor the Multi Sites plugin, though. Do you mind if I create a new ticket: 'Refactor Multi Sites plugin to follow Piwik plugin conventions'?

Creating a new ticket sounds good, since this one focuses on performance which is now mostly fixed :)

comment:39 Changed 2 years ago by capedfuzz (diosmosis)

  • Resolution set to fixed
  • Status changed from new to closed

(In [5505]) Fixes #1077, Assorted optimizations for the Multi Sites plugin:

  • Modified Piwik_Archive::build so it can accept Piwik_Segment instances. This way there won't be hundreds of segments created when they get overwritten by IndexedBySite anyway.
  • Removed previously committed optimization that selected all sites in Piwik_Archive and replaced w/ a similar optimization in the code that selects data for every website (the Multi Sites controller & Sites Manager controller).
  • Modified the setMinMaxDateAcrossWebsites function of the Multi Sites controller so Piwik_Date instances wouldn't be created within its loop and modified the Piwik_Date class to make the former change possible.

comment:40 Changed 2 years ago by matt (mattab)

(In [5523]) Refs #1077 fix typo
note to self: we need UI / Widgets rendering tests :)

comment:41 Changed 2 years ago by matt (mattab)

  • Summary changed from MultiSites: All Websites dashboard slow when hundreds of websites to "All Websites dashboard" should load fast even when tracking hundreds of websites! MultiSites
Note: See TracTickets for help on using tickets.