New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the "Database usage" plugin, more statistics #3004
Comments
Attachment: Patch for this issue. |
I've uploaded a patch for this issue (a redesign of the DBStats plugin). It's a pretty big patch, so I think it should be reviewed. Let me know what you think. |
Wow, very nice UI and useful reports! Sorry for the delay in replying to the ticket. Here is my code review and feedback:
I think can be simplified eg. "Report data size" or "Report tables, data size"
Here $apiMethod will throw E_NOTICE ?
|
Some comments: Replying to matt:
RowCallback & ColumnCallback are just intermediate classes. They don't do anything except execute a callback on some data and are not meant to be used directly. GroupBy & RowCallbackAddColumn derive from them so the code is clearer. I could get rid of them, but then there'll be some code redundancy. I could also keep them and make them 'abstract'. As for filters, they seem to operate not on rows, per say, but entire DataTables. So I guess, technically, they can do whatever they want to a DataTable. Perhaps some of the logic for my RowCallback should be in the Filter base type? I was trying to make my filter code clear, w/o refactoring existing code until my changes went under review. Do you think it would be better not to use base types in creating new filters?
The problem wasn't that it wouldn't work w/ units, but that for memory values to display nicely, they'd need to use several different units (ie, B, KB, MB, ...). This would require modifying the jqplot related code, or possibly jqplot itself. Which would be hard, I'm guessing :) |
I think the code can be inlined it's pretty simple: It's a case that refactoring is not quite worth the trouble maybe, but also could be if you make them abstract.... up to you? |
I tested the blob/metric queries: BLOB TABLE RESULTS For a table w/ 100,000 rows: 6.8s NUMERIC TABLE RESULTS For a table w/ 100,000 rows: 5.66s Hopefully, this isn't too slow. I think we could maybe cache the results of each query (except the current month) in the piwik_option table. Since it would be one per archive table, I don't think it would be all that much space. Would be a bit of a hack, though. |
Attachment: Patch for this issue. (2nd) |
Uploaded a new patch. Some notes:
Let me know what you think of my patch. |
Sorry for the delay, here is the review:
feel free to commmit directly |
Replying to matt:
To be clear, that's ~6-13s per table. So if there is two years of data, it'd be 5.2 min, which I'm guessing isn't so great. I think I could speed things up w/ a scheduled task that gets run once a month. It would create the table for all existing archive tables except the current month and store them in the options table. Sound good? |
A scheduled task sounds good! In the code, I recommend to cache the result in API.php in the option table. The scheduled task should not do the caching but simply call the API to "pre-archive" the data. There are cases where new data could be added to the past tables (eg. all "january" tables contain the data of the yearly archives), so maybe the task could be weekly to make it slightly more accurate? It will only be executed for users who enabled the DBstats plugin so I'm not worried about overhead for most users since most users dont enable the plugin. |
Replying to matt:
I think I can get the last updated timestamp for a table and check if the table was modified since the last 'archive' (see http://stackoverflow.com/questions/307438/how-can-i-tell-when-a-mysql-table-was-last-updated ). This should make it accurate and fast :) |
Replying to capedfuzz:
Actually, this won't work for InnoDB tables... So for now, the best options seems like your suggestion for the weekly task. Or maybe we could patch InnoDB? :) |
could we maybe do SELECT MAX(ts_archived) ? Otherwise weekly full refresh is fine too. If so could you write in the UI in grey "inline help" style eg "Report last updated on 2012, Jan 23th" to make sure users know about the 1 week delay. |
Replying to matt:
For 200,000 rows the query takes about ~.4s, however MAX(idarchive) is instantaneous, so I'll use that. |
(In [6324]) Fixes #3004, redesigned DBStats plugin, added several new reports including database space taken up by tracker tables, database space taken up by archive blob tables, database space taken up by archive metric tables, database space taken up by individual reports & database space taken up by individual metrics. Notes:
|
(In [6329]) Refs #3004, fix PrivacyManager regression. |
Wow big commit!! Nice work.. Reopening & Code Review:
maybe should be = true or = 1 instead of -1 ?
|
(In [6340]) Refs #3004, fix regression due to keep_summary_row JS check. |
(In [6343]) Fixes #3004, tweaks. |
It would be nice to see some stats about Piwik DB usage, mostly for debug but also could be useful to the user for long term Mysql scaling and general info.
Here are the stats we could display:
total visits, pages views, unique actions, conversions, ecommerce products
of websites, users, reports, etc.
This message should give a short overview of the Piwik DB, in terms of general usage data, and DB focused summary for capacity planning.
See also #3003, #53, #2805 to help around this issue.
The text was updated successfully, but these errors were encountered: