Opened 5 years ago

Closed 16 months ago

#766 closed Bug (fixed)

Archiving Memory exhausted error - Archiving memory leaks

Reported by: konus Owned by:
Priority: normal Milestone: 1.12.x - Piwik 1.12.x
Component: Performance Keywords:
Cc: Sensitive: no

Description (last modified by matt)

Users have memory exhausted errors when running Piwik archiving. There are two distinct use cases:

  • Users with a low to medium traffic websites exhausting the fixed-sized memory of their PHP web host (eg. 64MB)
  • Users with very high traffic websites (100k or +1M pages per day) exhausting even a very high PHP memory limit (eg. 1G is not enough).

The error looks like:

Fatal error: Allowed memory size of 67108864 bytes exhausted 
(tried to allocate 73 bytes) in
 /var/www/web5/html/piwik/core/DataTable.php on line 939

-> We highly suggest users experiencing issues to upgrade to PHP 5.3 which has some improvements around memory management.

Ideas to solve this issue (less time consuming first)?

  • 1) Monthly archiving should use the smaller amount of weeks and days inside the month Currently, archiving a month in Piwik will sum the reports for all 30 days and sum all reports together. This is causing most of the php memory issues which start failing when summing all datatables for Pages URLs for a month for example. Instead, we should reuse the algorithm used to process date ranges, which select the minimum number of periods to use to process the enclosing period. So, a month would typically use 3 or 4 full weeks + the remaining ~5-10 days. This would result in up to 70% less datatables to select and sum!

ie a month archiving would select 4 weeks archiving + 3 day archiving, or 3 weeks archiving + 10 days archiving, so either 7 or 13 archives, instead of 30...

  • 2) Fix remaining Memory leaks
    • This error is most likely caused by archiving tasks not freeing memory after the task finished runnning. We did some improvements in the past around memory management in Piwik (PHP doesn't free memory automatically in all cases) - see ticket #374 for more info. There is more investigation work to do obviously.
  • 3) Process aggregation in MySQL Currently, to process "Top Page URLs" report, we select ALL URLs, then truncate to 500 URLs (see datatable_archiving_maximum_rows_actions in global.ini.php)

Loading for example 50,000 URLs in memory in PHP results in obvious memory crashes. The solution here would be to process the aggregate of the top 500 URLs (including the last "Others" which aggregated all other "truncated" rows).

  • Proposal: run the Aggregation in Mysql!

Example query that will select the sum of visits per hour, then truncate in Mysql to the first 15 rows, aggregating all rows from 16-23 in the label "Others". Imagine doing the same thing for Page URLs and only returning the top 1000 pages instead of all pages!

#This query uses the "counter" to truncate the rows after X rows. In real life, we would truncate after X=500 or 5000
SELECT SUM(count), case when counter = 15 then "Others" else hour end AS label
FROM (

#This query adds the "counter" to each row after it was sorted by visits desc
SELECT count, hour, case when @counter = 15 then 15 else @counter:=@counter+1 end AS counter
FROM
(

#This query selects count for each hour of the day
SELECT count(*) as count, 
       HOUR(visit_last_action_time) as hour
FROM `piwik_log_visit`
WHERE 1 
GROUP BY hour
ORDER BY count DESC
) T1,
(SELECT @counter:=0) T2
) T3

GROUP BY counter
ORDER BY counter ASC

Here I only sum the number of visits, but we could automatically rewrite all aggregate queries in ArchiveProcessing/Day.php using this mechanism?

Output:

SUM(count) 	label
113 	5
108 	19
94 	2
93 	20
90 	21
87 	22
80 	4
79 	9
77 	8
67 	6
65 	1
65 	3
61 	11
60 	10
467 	Others

Obviously truncating the "Hourly" report is a bad example, but replace it with Page URLs to get the picture :)
This will save a lot of bandwith and CPU processing, especially for websites that have more than 500 unique Page URLs per day or external keywords, etc.

Change History (93)

comment:2 follow-up: Changed 5 years ago by matt (mattab)

  • Cc mauser added

comment:3 Changed 5 years ago by matt (mattab)

  • Milestone set to 4- Stable release

comment:4 Changed 5 years ago by vipsoft (robocoder)

  • Milestone changed from 4- Stable release to 2- DigitalVibes

comment:6 Changed 5 years ago by vipsoft (robocoder)

  • Milestone changed from 2- DigitalVibes to 1 - Piwik 0.4.2

comment:7 Changed 5 years ago by vipsoft (robocoder)

  • Milestone changed from 1 - Piwik 0.4.3 to 2- DigitalVibes

comment:8 Changed 5 years ago by vipsoft (robocoder)

  • Owner set to vipsoft
  • Sensitive unset

comment:9 Changed 5 years ago by domtop

comment:10 Changed 5 years ago by vipsoft (robocoder)

  • Milestone changed from 2- DigitalVibes to 2 - Piwik 0.8 - A Web Analytics platform

comment:11 Changed 5 years ago by vipsoft (robocoder)

  • Owner vipsoft deleted

comment:12 Changed 5 years ago by vipsoft (robocoder)

  • Milestone changed from 2 - Piwik 0.8 - A Web Analytics platform to 2 - Piwik 0.6

comment:13 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

comment:14 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

comment:15 Changed 4 years ago by matt (mattab)

  • Summary changed from Memory leak Error - Allowed memory size Error to Archiving Memory exhausted error - Archiving memory leaks

comment:16 Changed 4 years ago by matt (mattab)

  • Description modified (diff)

comment:17 Changed 4 years ago by vipsoft (robocoder)

  • Milestone changed from 1 - Piwik 0.5 to 1 - Piwik 0.5.1

comment:18 Changed 4 years ago by vipsoft (robocoder)

In php 5.3 with zend.enable_gc, the garbage collector only runs when the root buffer is full (default max is 10000 entries). We need to know if a collection cycle is forced when the memory limit is reached, or whether we should manully call gc_collect_cycles().

comment:19 Changed 4 years ago by matt (mattab)

  • Milestone changed from 1 - Piwik 0.5.5 to 1 - Piwik 0.5.6

comment:20 Changed 4 years ago by matt (mattab)

(In [2025]) Fixes #1227
the archive script now loops over all websites and triggers a different request for each archive, helping out with the archive memory exhausted issue (refs #766)

comment:21 Changed 4 years ago by jawrat

has there been any resolution to this? I just wanted to play with piwik on my site, but it's waaaaaay too busy for piwik apparently. We're at 80k unique visitors a day or thereabouts...i set php.ini to 1GB, but no love. help?

comment:22 Changed 4 years ago by jawrat

i changed the cli php.ini setting to 2Gb and it appears to be working now, but that's really not a great idea...i fear that the server is going to lock up at some point due to memory usage...

comment:23 follow-up: Changed 4 years ago by matt (mattab)

  • Milestone changed from 0 - Piwik 0.6.5 to Features requests - after Piwik 1.0

jawrat, maybe check that you have not thousands of unique URLs to store with unique session IDs, etc. you can exclude URL parameters in the settings.

Otherwise, this bug still exists, but very few people need to increase beyond 1Gb

comment:24 in reply to: ↑ 23 Changed 4 years ago by jawrat

well, it's been moving along now for the last 8 weeks or so since I switched it to 2Gb and hasn't choked, so for my situation it appears that 2Gb works.

the issue is that there *are* several thousand unique URLs to store...at any given time, we might have 50k listings active on the site...it's really nice to know which ones are the most active.

btw, thanks for all the effort you guys have put into this...it's pretty awesome (memory issues aside)

comment:25 Changed 4 years ago by smet

+1

I totally agree with jawrat, both on the issue of you guys doing a great job, but also on the issue that 2GB is too big a memorylimit.

We would really like to run this locally, but having 1,5M+ pageviews....

Hope to see this cleared soon, even before 1.0 ;) But hey! Keep up the good work!

The description of this issue is right to the point, very clear!

comment:26 follow-up: Changed 4 years ago by matt (mattab)

Do you have issues with 1.5M pageviews? There are piwik users running that much traffic with 1G memory limit, but maybe your data set is different (eg. more unique URLs to record) ?

comment:27 in reply to: ↑ 26 Changed 4 years ago by jawrat

Replying to matt:

Do you have issues with 1.5M pageviews? There are piwik users running that much traffic with 1G memory limit, but maybe your data set is different (eg. more unique URLs to record) ?

Is there a query I can run on the db that will tell me how many uniques there are? i'm sure it's quite a few, but i have no idea how many. matt, contact me offline if you want to poke around...jawrat AT gmail DAHT com

comment:28 Changed 4 years ago by matt (mattab)

  • Priority changed from major to critical

comment:29 Changed 4 years ago by smet

I have the problem with 2M+ now, maybe a big dataset, so yes, as jawrat suggested:

How do we see this?

...and thank you for the priority-upgrade!

comment:30 Changed 3 years ago by jawrat

okay, so I finally had to shut piwik off as it was completely killing the server...mysql was complaining about too many connections (with the max_connections set to 500)...and the dashboard just times out trying to get the data. I wish I had time to dig into the code and help out here, but i'm up to my ass in alligators with a .net codebase that was dumped on me with little to no documentation...

shoot me a heads up matt if you want more info or would like to discuss...we did see a large 20% increase in traffic over the last week or so, so I think that may have been what knocked it over...

comment:31 Changed 3 years ago by vipsoft (robocoder)

  • Keywords Memory leak removed

Could we implement an alternate archiving algorithm for resource-constrained environments? (eg use temporary tables in mysql to collect results?) There's a recent forum post from a hostgator user who reports his shared hosting account has max memory limit set at 8M.

comment:33 Changed 3 years ago by matt (mattab)

see also a report in #1777

comment:34 Changed 3 years ago by matt (mattab)

jawrat, can you confirm that you setup the cron archiving? http://piwik.org/docs/setup-auto-archiving/

What kind of traffic did you receive on the day it broke?

comment:35 Changed 3 years ago by jawrat

matt, for sure it was set up to auto archive with a crontab.

as far as traffic, we'd just seen a general rise in traffic of about 20% over the course of a week or two. 85k visits, 535k pageviews...

comment:36 Changed 3 years ago by matt (mattab)

jawrat, did you have the full backtrace of the error?

comment:37 Changed 3 years ago by jawrat

unfortunately no, the server fell over when it ran during peak traffic periods. :(

comment:38 Changed 3 years ago by matt (mattab)

To debug this issue, first it is necessary to upgrade to 5.3 which has better memory management features. Then maybe the FB xhprof could be useful: http://mirror.facebook.net/facebook/xhprof/doc.html

http://pecl.php.net/package/xhprof

comment:39 Changed 3 years ago by matt (mattab)

(In [3670]) Fix some memory leak, Thanks to XHProf refs

search for top level archiving functions (archiveDay) and sort by inc memory use DESC - more than zero inc memory means mem was leaked sometimes in the call which is a priori not expected refs #766

comment:40 Changed 3 years ago by matt (mattab)

  • Milestone changed from Feature requests to 1.x - Piwik 1.x

comment:41 Changed 3 years ago by matt (mattab)

If we can't fix it completely, at least we should document known limits for memory_limits, and publish examples of large scale setups.

comment:42 Changed 3 years ago by matt (mattab)

improvement patch posted upstream for zend_db http://framework.zend.com/issues/browse/ZF-10930 - if not fixed in zend we can hack it around in piwik_db

comment:43 Changed 3 years ago by vipsoft (robocoder)

Looks like the patch was rejected upstream. Because of the inheritance tree, it'll have to be implemented in the subclassed adapters. Something like this?

	/**
	 * Prepares and executes an SQL statement with bound data.
	 * Caches prepared statements to avoid preparing the same query more than once
	 *
	 * @param mixed $sql
	 * @param mixed $bind
	 * @return Zend_Db_Statement_Interface
	 */
	public function query($sql, $bind = array())
	{
		static $cachePreparedStatement = array();

		if(isset($cachePreparedStatement[$sql]))
		{
			$stmt = $cachePreparedStatement[$sql];
			$stmt->execute($bind);
			return $stmt;
		}

		$stmt = parent::query($sql, $bind);
		$cachePreparedStatement[$sql] = $stmt;
		return $stmt;
	}

comment:44 Changed 3 years ago by matt (mattab)

yes patch rejected, since we should reuse the statement objects... which makes sense.

vipsoft, something like this but $cachePreparedStatement[$sql] = $stmt; caches the result of 'prepare', not the query with the bind parameter.

I think we have to copy paste it in all adapters which is a shame, but saves us a boring refactoring ;)

comment:45 Changed 3 years ago by matt (mattab)

ok your patch is probably right actually, I forgot to look at the zend code again. Looks good to me!

comment:46 Changed 3 years ago by vipsoft (robocoder)

Do we recommend the latest php version in the FAQ? php 5.3.3 changelog has this entry:

Fixed very rare memory leak in mysqlnd, when binding thousands of columns. (Andrey)

comment:48 Changed 3 years ago by matt (mattab)

(In [3716]) Refs #766 - performance optimization, caching prepared statements in mysqli and PDO/mysql

comment:49 Changed 3 years ago by vipsoft (robocoder)

(In [3717]) refs #766 - revert optimization from Mysqli.php (spurious unit test failures and exceptions thrown); tweak code in Pdo/Mysql.php to workaround segfault (now reported as PHP Bug 53716)

comment:50 Changed 3 years ago by vipsoft (robocoder)

Opened ZF-10943 to track down the cause of the failure in Mysqli.php.

comment:51 Changed 3 years ago by vipsoft (robocoder)

Closed ZF-10943. It's a dupe. See ZF-2388 instead. (circa ZF version 1.0)

comment:52 Changed 3 years ago by matt (mattab)

jawrat, would you be able to test trunk on your high traffic website? we have made many improvements to the code, queries and data storage. I see a 30% performance increase in archiving, and more importantly a much lower memory peak usage. This will help pushing the limits and Piwik might work for you now, if you are able to test it would be great :)

How many unique URLs do you track on your website?

comment:53 Changed 3 years ago by jawrat

I can do it, but not for the next week and a half or so. our industry trade show is taking place in las vegas next week and the following week i'll be mostly on vacation, so I don't want to disrupt my servers at present. after all the crazy is past, i'll be happy to fire it up and see what happens. let me know what I need to do (svn co or something similar I suppose?) when i'm ready.

comment:54 Changed 3 years ago by jawrat

p.s. we're running php5.2 at the moment....I can upgrade but not until after I get back...

comment:55 Changed 3 years ago by matt (mattab)

jawrat, thanks for that, I'm not sure if it will work well for 500k pages, but definitely good to try. Also, if you can keep the DB dump if it fails, it might be useful for us.

You can grab the nightly build from the link on the QA page: http://piwik.org/qa/

comment:56 Changed 3 years ago by matt (mattab)

To clarify: we haven't yet improved the 'Tracker log loading' mechanism, and maybe it was the original issue causing 'mysql too many connections' error you experienced.

If Tracking performance was OK before then it's worth trying trunk as it improves Archiving performance and memory usage.

For Tracking load performance improvement, stay tuned on #134 which we hope to have implemented in the next few months.

comment:57 follow-up: Changed 3 years ago by jawrat

okay, but the problem I was having was the memory leak issue in the archive process. when it ran, it just gobbled up all of my available memory until the server fell over (paniced). I think the server in question could probably use more swap space, but i'm not sure about that.

comment:58 in reply to: ↑ 57 Changed 3 years ago by smet

Is there a way to help testing? I can't run piwik on the high-traffic-sites we have, but if I can help otherwise, let me know, ok?

comment:59 Changed 3 years ago by matt (mattab)

FYI the current trunk should have a MUCH better memory usage, after the changeset [4239]

So if you are able, to test with TRUNK (which is stable AFAIK) it could be working better for you too! Otherwise you can wait for 1.3 which will be released in a week or so

comment:60 Changed 3 years ago by habakuk

I have still the same problems with 1.3 (PHP web host with 64MB memory limit).

comment:61 Changed 3 years ago by habakuk

Still the same Problem with Piwik 1.5 and PHP 5.2.14:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 24 bytes) in /var/www/web151/html/_statistik.trumpkin.de/core/DataTable.php on line 969

comment:62 Changed 2 years ago by habakuk

Still here in Piwik 1.6:

<result date="2011-10-19">174</result>
<result date="2011-10-20">170</result>
<result date="2011-10-21">1</result>

</results>
Archiving period = week for idsite = 2...
ALERT - script tried to increase memory_limit to 134.217.728 bytes which is above the allowed value (attacker 'REMOTE_ADDR not set', file '/var/www/web151/html/_statistik.trumpkin.de/core/Piwik.php', line 958)
PHP Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 24 bytes) in /var/www/web151/html/_statistik.trumpkin.de/core/DataTable.php on line 1022

Fatal error: Allowed memory size of 33.554.432 bytes exhausted (tried to allocate 24 bytes) in /var/www/web151/html/_statistik.trumpkin.de/core/DataTable.php on line 1022

comment:63 in reply to: ↑ 2 Changed 2 years ago by viegelinsch

same here on an installation with 10GB data in database

comment:64 follow-up: Changed 2 years ago by matt (mattab)

viegelinsch, what is the exact error message do you get? do you still get it after increasing memory limit to 1G?

comment:65 Changed 2 years ago by matt (mattab)

If you are using Piwik 1.6 and experiencing this problem, please try to increase your memory limit.

If after increasing your memory limit you still experience the issue, you must have a high traffic website.

Please post here the error message that you get, since it helps locate where in the code the problem arises, which is very important to fix this problem. Thanks!

comment:66 in reply to: ↑ 64 Changed 2 years ago by viegelinsch

Replying to matt and matt:

Yes we run definitely several high traffic web applications and our piwik instance tracks all of these.

I increased memory size to 2G - after that it works again. But this is huge for a php application, isn't it!?

I also deleted all the old detailed logs and only kept the reports. This finally helped piwik back performing well. The sad thing about that - the detailed logs are lost for future detailed analysis.

regards

comment:67 Changed 2 years ago by matt (mattab)

2G is big indeed, we want to make memory usage better.

Deleting old logs was not necessary, this does not change the memory requirement!! so please don't delete logs unless the disk data size is a problem.

Do you mind putting the limit to 1G and generate the error and paste it here as an attached file to the ticket, with the full backtrace?
It would be helpful to understand and hopefully then improve memory usage.

comment:68 Changed 2 years ago by matt (mattab)

  • Description modified (diff)

comment:69 Changed 2 years ago by matt (mattab)

  • Description modified (diff)

comment:70 Changed 2 years ago by matt (mattab)

  • Description modified (diff)

comment:71 Changed 2 years ago by matt (mattab)

  • Description modified (diff)

comment:72 Changed 2 years ago by matt (mattab)

  • Description modified (diff)

comment:73 Changed 2 years ago by mwyraz

I have a site with more visits every week. I upgrade the memory limit every few weeks. ATM we are at 2GB and it fails (I'll upgrade to 3GB). IMO this is a serious issue which will reduce piwik usage to low-traffic websites only. Otherwise one would need a deticated machine only for agregating the stats...

<result date="2011-11-07 to 2011-11-13">35186</result>
<result date="2011-11-14 to 2011-11-20">35351</result>
<result date="2011-11-21 to 2011-11-27">36732</result>
<result date="2011-11-28 to 2011-12-04">43229</result>
<result date="2011-12-05 to 2011-12-11">51124</result>
<result date="2011-12-12 to 2011-12-18">60269</result>
<result date="2011-12-19 to 2011-12-25">8886</result>

</results>
Archiving period = month for idsite = 1...
PHP Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 71 bytes) in /vol/.../stats/piwik/core/DataTable.php on line 1022

Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 71 bytes) in /vol/.../stats/piwik/core/DataTable.php on line 1022

comment:74 Changed 2 years ago by matt (mattab)

mwyraz can you please email me the full error message + backtrace at matt att piwik.org ?

comment:75 Changed 2 years ago by matt (mattab)

  • Cc mauser removed
  • Milestone changed from 1.x - Piwik 1.x to 1.8 Piwik 1.8

As per new findings (see ticket description proposal) I would like to work on this issue sooner rather than later :-)

we can make things a lot better and allow Piwik to be used on higher traffic websites. Required for Piwik 2.0 and by #703

comment:76 Changed 2 years ago by abma

we tapped into the same issue:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 2 bytes) in /wwwroot/piwik/core/DataTable.php on line 1022

(and no, i won't increase memory limit)

it can be triggered, when i increase the "date range": "week" does work, "month" triggers the error.

the error message is shown for "Last visits graph", "List of external Websites" and "Visits by server time" on the dashboard, most other pages doesn't work, too.

imo the solution sugested "3) Process aggregation in MySQL" is fine, as only the big numbers are interesting when analyzing stats.

comment:77 Changed 2 years ago by mwyraz

A good approach I used a while ago for solving such a problem is the following. I don't know it it will also be usefull for piwik:

  • Iterate over all Database entries you want to aggregate. Use "scroll" so that only the currenty entry is hold in memory. If "scroll" cannot be used, use "limit" and "offset" fetch only a limited number of rows at a time (e.g. 1.000 or 10.000 depending on the expected memory consumption)
  • Register "Listeners" for each type of aggregation. Pass each row to all this listeners. The listener adds the row's content to it's aggregation.
  • At the end, ask each listener for it's results.

So for example a "visits by hour of day" listener would contain 24 counters. For each row passed to this listener, depending on the hour of it's visit, the corresponding counter would increased by 1. During runtime, memory would only consumed for one row and the 24 counters.

comment:78 Changed 2 years ago by jstalin

Archiving period = week for idsite = 2...

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 87 bytes) in /home/site/piwik/core/DataTable/Row.php on line 380

Archiving period = month for idsite = 2...

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 3145728 bytes) in /home/site/piwik/core/Archive/Single.php on line 501

Archiving period = year for idsite = 2...

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 3145728 bytes) in /home/site/piwik/core/DataTable.php on line 1022

I've increased my PHP memory limit to 2 gigs, but this is on shared hosting, so I'm not sure if that is having an effect.

comment:79 Changed 2 years ago by tbede

It is impossible to archive pages with 20.000 unique id's and 5M Visits / Day. The php script needs more then 16GB of memory.

I tried to have a look in this issue myself, but as I'm no php developer, just perl/bash/c/c++ i don't get your code. I also have searched for some documentation what the archiving process does exactly. If you could provide such a doc, maybe we could provide our SQL KnowHow, and provide an alternative way of archiving.

As the processed data is from one of our customers, i could not provide datasamples.

comment:80 follow-up: Changed 2 years ago by matt (mattab)

  • Resolution set to duplicate
  • Status changed from new to closed

Thanks for the report. 20,000 unique Ids, do you mean 20k unique URLs?

If so, we have an idea on how to solve this problem nicely, see the proposal in #766
using some SQL magic, we would do the truncation in memory in mysql and then return the top 1000 pages (for example, customizable in config file), to PHP.
Right now we return ALL data to PHP which then does the truncating which is highly memory inefficient.

Finally, if you have some time and are willing to help, it would be very interesting to install the PHP profile XHProf from facebook on your server and give me access to it, it might help finding out other problems and fixes that could help your case. Thanks

Dupes #766

comment:81 Changed 2 years ago by matt (mattab)

  • Resolution duplicate deleted
  • Status changed from closed to reopened

I got confused here, meant to comment in #3066

comment:82 in reply to: ↑ 80 Changed 2 years ago by tbede

Replying to matt:

Thanks for the report. 20,000 unique Ids, do you mean 20k unique URLs?

Right, its a portal from a huge computer magazin

If so, we have an idea on how to solve this problem nicely, see the proposal in #766
using some SQL magic, we would do the truncation in memory in mysql and then return the top 1000 pages (for example, customizable in config file), to PHP.

As i mentioned, we don't get the code. In this ticket there is just one example, how it could work. But i don't see how to implement this on our side. We could provide performant statements if we know what exactly the current code does at the moment.

Finally, if you have some time and are willing to help, it would be very interesting to install the PHP profile XHProf from facebook on your server and give me access to it, it might help finding out other problems and fixes that could help your case. Thanks

Sorry, but we are an enterprise datacenter, so there is no way to provide 3rd party developers access to the system. But we have asked our customer, if we could forward some example logfiles with fake-ip's.

comment:83 Changed 22 months ago by tbede

No success at the customer front. We could not provide some sample data. The workaround in the issue description is not usable for us, as we really want to replace awstats and need the full data, not only the TOP X.

tbe

comment:84 Changed 22 months ago by rafe

We're also having major memory issues with Piwik, we're unable to run an archive process at all; I feel like I've been pretty thorough in investigating most of the suggestions but we still don't have a workable solution. We have implemented the Piwik front end on a medium amazon ec2 instance with nginx in place of apache for performance, and the database is on a large amazon rds. We can see the real time data coming in correctly, so we know there is data being collated. Our site is in online publishing, we have around 350,000 unique url's, and we're looking at 20-25M page views / month. We have a requirement for accessing detailed analytics data from our system via api's, so every unique url's page views are important, but for reporting through piwik that is far less important to us, so your 3rd proposal above could suit us (as i said, provided the raw page views for all urls were still available via api).

We are in the evaluation stages of finding the right analytics fit for our needs, so we are open to experimenting and losing or tainting the current piwik data while we work out a method to make it work. We are also open to the idea above of installing XHProf if that would help give insight into causes and possible solutions.

I'm going to make a forum post as well in case someone in the community can provide some direction for us.

Thanks,
Rafe

comment:85 Changed 21 months ago by matt (mattab)

@Rafe and others, if you hit a wall with Piwik and are stuck, please consider contacting Piwik Professional Services. We help Power Users run Piwik on 1M per day and more requests, on dedicated servers. PLease get in touch: http://piwik.org/consulting/

comment:86 Changed 20 months ago by EZdesign (BeezyT)

I am going to build a class that allows "3) Process aggregation in MySQL" from the ticket description. I created a separate ticket #3330 where I will commit code and where the implementation can be discussed.

comment:87 Changed 20 months ago by capedfuzz (diosmosis)

(In [6966]) Refs #3330, #766 add integration test for blob limiting behavior present in certain plugins. Also, fix three bugs:

  • CustomVariables plugin does not sort by visit before applying truncation to reports.
  • ReplaceSummaryLabel should be recursive so subtables w/ limited rows show 'Others' as label.
  • Pass user agent & language explicitly when doing bulk tracking in PiwikTracker.

comment:88 Changed 19 months ago by capedfuzz (diosmosis)

(In [6976]) Refs #3330, #766 improved BlobReportLimitingTest.

comment:89 Changed 19 months ago by capedfuzz (diosmosis)

(In [6980]) Refs #3330, #766, #3227 use RankingQuery and truncate tables as they are created in Actions plugin. Also modified phpunit integration testing mechanism so all API calls are tested and outputted before a test case throws.

comment:90 Changed 19 months ago by capedfuzz (diosmosis)

(In [7080]) Refs #3330, #766 refactor truncate-as-you-go approach used in Actions. Added maximumAllowedRows member and walkPath method to DataTable and some other utility methods to DataTable_Row types.

comment:91 Changed 19 months ago by capedfuzz (diosmosis)

(In [7083]) Refs #3330, #766 remove unnecessary methods and rename one function for clarity.

comment:92 Changed 18 months ago by matt (mattab)

  • Component changed from Core to Performance

comment:93 Changed 16 months ago by matt (mattab)

  • Priority changed from critical to normal
  • Resolution set to fixed
  • Status changed from reopened to closed

Research Work around this is finished, we have identified the two main tasks remaining to fix the archiving memory errors:

  • #3484 - PHP Optimization: update in memory
  • #3330 - Use RankingQuery hard code SQL optimizations in more archiving code

If you can sponsor these optimizations improvements, or need any other tuning done, please contact us: http://piwik.org/consulting/

Note: See TracTickets for help on using tickets.