Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log Analytics: Monitor Bandwidth for each page, download, and measure overall traffic in bytes #5248

Closed
anonymous-matomo-user opened this issue May 27, 2014 · 22 comments
Assignees
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@anonymous-matomo-user
Copy link

As a user, when importing my server logs in analytics, I want to measure the Bandwidth that was used by each page view.

How would it work?

  • The bandwidth information is commonly available in the server access log files. It is measured in Bytes.
  • Log analytics script would detect the numeric bandwidth value
  • Log analytics forwards this value to the Tracking API and this value will be stored in the action.
  • This value can be attached to downloads and pageviews
  • The Reporting UI:
    • for each page: display the total bandwidth for each page view
    • for each directory (groups of pages): display the aggregated bandwidth value (the sum) of all pages within the directory

Proposed implementation:

  • New column in log_link_visit_action: bandwidth
  • Tracking API add new parameter: file size in bytesbw_bytes
    • Log Analytics parses filesize, and sets &bw_bytes to tracking api requests
    • first log format we need to support is Apache common log format.
    • Tracker stores bandwidth in the new column log_link_visit_action.bandwidth
    • Actions/Archiver will aggregate the filesize in the Action report blobs.
    • This is similar to Average generation time.
    • The metric is processed for Pages, Page titles, and Download files.
    • User interface: the new metric "Bytes" will be displayed in the Actions tables
    • It is displayed in the existing Actions reports (Pages, Page titles, Downloads): this is not a new report.
    • If a user is not using Log Analytics, or if he is using Log Analytics but the logs don't have the file size bandwidth, then Actions reports will not have the "Bytes" column.

Other steps:

  • Add new overall metric (and sparkline) in Visitors > Overview report: Total Bandwidth
  • Add new FAQ How do I measure traffic bandwidth used by a page, and/or overall bandwidth?

To be confirmed / optional:

  • Besides Apache common log format, maybe other log formats contain the bandwidth information
  • New custom segment "Bandwidth" to let users segment traffic based on the file size in bytes
    • for example "Show me reports only for file requests of files over 500,000"
  • We could even measure in Javascript the current page size, which would be useful, but unfortunately it can only be done with approximate value. This counts the number of bytes in DOM tree: document.documentElement.innerHTML.length.
    • This is an approximation only. if there's lots of page content created on the fly from script, that content will count in innerHTML despite not being present in the original source, which could throw your calculation out source
@mattab
Copy link
Member

mattab commented May 28, 2014

Thanks for the feature suggestion!

Do you have a sample log line containing the "Bytes" information?

The steps will be:

  • Parse this byte value from the log
  • Send this byte value to Piwik Tracking API, as a custom variable
  • Process reports

What reports do we want for Bytes?

This was also discussed in forum post

@mattab
Copy link
Member

mattab commented May 28, 2014

we want to output 'transfer amount' to reports.

It's the same as Urchin's feature "Total Bytes Transferred" and "Directory vs Bytes".

Number of bytes transferred is recorded in the raw log files of HTTP server. In the case of apache LogFormat this is %b.

@anonymous-matomo-user
Copy link
Author

Sample line.

<CLIENT> - - +0200 "GET /larry/images/contactpic_32px.png HTTP/1.1" 200 287 "<REFERER>" "<USER_AGENT>"

It's the regex name "length"

https://github.com/piwik/piwik/blob/master/misc/log-analytics/import_logs.py#L247

I assume that this line add the "Bytes" information into the hit structure.

In case the assumption is right the the "Bytes"-data are already in the hit structure.

https://github.com/piwik/piwik/blob/master/misc/log-analytics/import_logs.py#L1643

Why is a custom variable necessary?

Are there any limits with custom variables compared to 'normal piwik variables'?

Isn't it possible to add a column into 'log_visit' or 'log_conversion' Table?

piwik/blob/master/core/Db/Schema/Mysql.php

What reports do we want for Bytes?

http://www.stedee.id.au/awffull_demo2/usage_200608.html#hourStats
_Volume_

http://www.nltechno.com/awstats/awstats.pl?config=destailleur.fr
_Bandwidth_

It would be nice to see traffic per site.

It would be also nice to be able to create traffic reports for the

STATIC_EXTENSIONS
piwik/blob/master/misc/log-analytics/import_logs.py#L56

and

DOWNLOAD_EXTENSIONS
piwik/blob/master/misc/log-analytics/import_logs.py#L61

based on URLs.

@mattab
Copy link
Member

mattab commented Jun 18, 2014

Original description was:

  • It would be nice when piwik also collects the traffic similar as awffull, awstats and other.
  • How much donation should we expect for sponsoring this feature?
  • What's the best startpoint to implement this feature?
  • As discussed in the forum.
  • Maybe in the https://github.com/lognormal/boomerang/ are some ideas which piwik can reuse.

@anonymous-matomo-user anonymous-matomo-user added this to the 2.x - The Great Piwik 2.x Backlog milestone Jul 8, 2014
@mattab mattab removed the P: normal label Aug 3, 2014
@git001
Copy link

git001 commented Aug 28, 2014

Sorry for the late response.
I'm still interested to see this feature in piwik.
What are now the next steps?

@mattab
Copy link
Member

mattab commented Aug 28, 2014

There are three possible next steps:

  • Implement the change via a pull request
  • Sponsor our work on this feature (contact Consulting)
  • Wait until someone else implements it or sponsors our work

@git001
Copy link

git001 commented Aug 28, 2014

Ok thanks.
I will take a look which way I choose.

@quba
Copy link
Contributor

quba commented Nov 25, 2014

Proposal: let's implement this as one of Piwik core plugins. Most of users don't use log analytics and it would be very painful for many users to update log_link_visit_actions table schema.

Question: How to extend log_analytics python script to add this new parameter to the tracking request only in case the plugin is installed (we don't want to produce hacks).

@mattab: please let me know your thoughts.

@mattab mattab modified the milestones: Piwik 2.11.0, Mid term Dec 3, 2014
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Dec 3, 2014
@mattab
Copy link
Member

mattab commented Dec 3, 2014

I guess the import_logs could always send the bytes information, and then the Tracking API would only record it in case such custom plugin is activated.

@tsteur
Copy link
Member

tsteur commented Dec 23, 2014

Would be nice to have a Tracker API parameter for this. I would maybe use this feature in Piwik Mobile as well to see how many bytes are transferred on average for one pageview.

@mattab
Copy link
Member

mattab commented Dec 28, 2014

Would be nice to have a Tracker API parameter for this

+1

@tsteur
Copy link
Member

tsteur commented Jan 4, 2015

Maybe a plugin on the marketplace? Especially since we do not directly need something in piwik.js. Could be a nice plugin example for how to do such things. Later we could also at some point work on how to make piwik.js for plugins extensible but that's another issue

@mattab
Copy link
Member

mattab commented Jan 5, 2015

Maybe a plugin on the marketplace?

this would be ideal!

@tsteur
Copy link
Member

tsteur commented Jan 11, 2015

what would be the plugin name for this?

@mattab
Copy link
Member

mattab commented Jan 12, 2015

Good question... Maybe

  • Filesize
  • or Bandwidth
  • or ActionFilesize
  • or better idea ?

tsteur added a commit to matomo-org/plugin-Bandwidth that referenced this issue Jan 15, 2015
Turns out it will be most likely not doable this way by archiving
the bandwidth report and merging it with the getPageUrls eg because
of different subtable ids. Committing the current state of this
plugin tough in case we have to go this way. The next step would
be trying to make the actions plugin extensible (Archiving and API)
tsteur added a commit to matomo-org/plugin-Bandwidth that referenced this issue Jan 16, 2015
@tsteur
Copy link
Member

tsteur commented Jan 20, 2015

I won't measure the current page size with JavaScript in Piwik JS Tracker. It is not accurate at all, especially in modern websites/applications and one would maybe expect to include traffic that was caused by CSS / Images / JavaScript files. Tracking only the HTML content is most likely not that interesting. Also it is not yet possible to extend the Piwik JS Tracker with a plugin I think but that's not really a problem.

The name will be Bandwidth as the traffic is not necessarily caused by a file and ActionFilesize / ActionBandwidth is too technical and too coupled to Actions.

I won't update Piwik.org as it will be on the Marketplace and explained there.

@tsteur tsteur assigned tsteur and unassigned diosmosis Jan 20, 2015
@tsteur
Copy link
Member

tsteur commented Jan 20, 2015

This part is not as easy and it is kinda wrong as it can be also used via the Tracking API and not only log analytics:

If a user is not using Log Analytics, or if he is using Log Analytics but the logs don't have the file size bandwidth, then Actions reports will not have the "Bytes" column.

It is complicated as there are subtables and we would maybe end up showing the columns in the first level but not in a lower level.

I might have to send another API request to get the top level report when configuring the ViewDataTable which columns to show. I'd then have to check if any of those has a bandwidth metric or not. I'd also have to request stats for period=month as it would be not as nice to show the columns on some days where there is a bandwidth recorded and on other days not for the same site. It would maybe not happen as often but can still happen, especially around midnight. Re performance this could be cached but makes everything complicated. Might have to archive more numeric records for this which would be easiest probably.

@tsteur
Copy link
Member

tsteur commented Jan 21, 2015

@diosmosis would you mind having a look at the plugin if there's something strange or missing or so? https://github.com/piwik/plugin-Bandwidth I could use maybe more Metric classes eg for OverallTotalBandwidth, PageviewTotalBandwidth and DownloadTotalBandwidth metrics. Otherwise it seems to work with latest master

@diosmosis
Copy link
Member

@tsteur The total metrics are aggregated and there's currently no way to express those as metadata classes, so they shouldn't be metrics (yet). One issue is that some aggregated metrics like min/max bandwidth have processed metrics classes, however, it's not a big deal. If/when aggregated metrics classes are supported this can be changed (or not, it will likely still work regardless).

@tsteur
Copy link
Member

tsteur commented Jan 22, 2015

Ok thx. Closing it for now and waiting for feedback from client.

@tsteur tsteur closed this as completed Jan 22, 2015
@mattab
Copy link
Member

mattab commented Feb 7, 2015

Good news: the plugin Bandwidth will be released on the Marketplace! See issue matomo-org/plugin-Bandwidth#1

@mattab
Copy link
Member

mattab commented Feb 18, 2015

The plugin Bandwidth has been released on the Marketplace: http://plugins.piwik.org/Bandwidth 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

6 participants