Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CSV/TSV Export #2714

Closed
timo-bes opened this issue Oct 18, 2011 · 37 comments
Closed

Improve CSV/TSV Export #2714

timo-bes opened this issue Oct 18, 2011 · 37 comments
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Milestone

Comments

@timo-bes
Copy link
Member

The improvements include the following API extensions

  • expanded=1 should work for CSV/TSV exports. Labels have to be put together in a path.
  • New parameter includeInnerNodes to choose whether inner nodes (e.g. folders in the pages report) should be exported as well. If set to 1, they are present and the report is easier to read for humans. If 0, they are left out and it's easier to process the report, e.g. calculate sums.
  • New parameter translateColumnNames to get localized names of metrics.

The export link in Piwik should be adjusted

  • expanded=1 by default
  • translateColumnNames=1 by default for CSV and TSV format
  • Use the API_datatable_default_limit configuration for export links
@mattab
Copy link
Member

mattab commented Oct 18, 2011

Use the API_datatable_default_limit configuration for export links
What do you mean?

New parameter includeInnerNodes to choose whether inner nodes (e.g. folders in the pages report) should be exported as well.
Why would we want not to export some parts of the report? Not sure I understand correctly what you meant.

@timo-bes
Copy link
Member Author

  1. When you click the export button, filter_limit=100 is added to the URL. The value is hardcoded in the template. This should be configurable (e.g. using API_datatable_default_limit).
  2. When we export folders (with aggregated values for metrics), this can be useful in some cases, but not always. For example, if you want to process the export further (e.g. calculate the sum of page views in excel), this might lead to unexpected/wrong results.

@mattab
Copy link
Member

mattab commented Oct 19, 2011

  1. sounds good, definitely a bug

  2. would includeInnerNodes=0 same as "hideRowsWithSubtable=1" ?
    Not sure how useful is this feature, what exactly is the use case? Because to calculate the sum of page views would still require the folders to be in the output (since folders "nb_actions" is the aggregate page views of all pages inside this folder)?

@timo-bes
Copy link
Member Author

  1. Folders that are tracked, are reported under /path/to/folder/index. The folder /path/to/folder/ has only aggregated values.

Here's an example:

Actual views:

  • /path/to/folder 3 times
  • /path/to/folder/foo.html 2 times

Full report:

  • /path/to/folder 5 views (3+2)
  • /path/to/folder/index 3 views
  • /path/to/folder/foo.html 2 views

If we export it like this, it's easier to read for humans and might be useful in some cases. But if you want to sum the pageviews, the sum is 8, which is incorrect.

If we export without inner nodes, it looks like this:

  • /path/to/folder/index 3 views
  • /path/to/folder/foo.html 2 views

The sum is 5, which is correct.

hideRowsWithSubtable=1 would hide the entire row "/path/to/folder" with its subtable, right? I want to hide only the aggregated row and keep the subtable.

@mattab
Copy link
Member

mattab commented Oct 19, 2011

I finally understand, nothing beats a good example ;)

Maybe the parameter could be called "hideExpandedRows" ?

@timo-bes
Copy link
Member Author

"hideExpandedRows" sounds a lot like "hideRowsWithSubtable". The name should make clear that only the inner nodes of the tree (i.e. the rows with aggregated values) but not their children are hidden. I couldn't come up with anything better than "includeInnerNodes"...

@timo-bes
Copy link
Member Author

In general, the default for includeInnerNodes should be 0, since this is the version of the output that is most useful for further processing.

But what about the export icon in the Piwik UI? Should it add includeInnerNodes=1 by default?

@timo-bes
Copy link
Member Author

(In [5334]) refs #2714 improved csv/tsv export

@timo-bes
Copy link
Member Author

There are columns in the export that are not used in the UI. Therefore, they are not documented (and translated) in the metadata. I added generic translations to the CSV-Renderer (Piwik_DataTable_Renderer_Csv::translateColumnNames), which reduces the impact of this ticket on other parts of Piwik. Please review and consider alternative options: we could add some of the metrics to Piwik_API_API::getDefaultProcessedMetrics or extend the metadata output of all plugins.

@timo-bes
Copy link
Member Author

On the build server, a unit test is failing that passes on my machine.

When exporting with language=de, the number format changes (1,7 instead of 1.7) on my machine but not on the build server. I think the behavior of the build server is correct, any ideas why the outputs differ?

The comma in the number also breaks the CSV format where commas are used to separate columns.

@timo-bes
Copy link
Member Author

(In [5335]) refs #2714 changed expected number format of csv export according to behavior of build server

@timo-bes
Copy link
Member Author

(In [5336]) refs #2714 using API_datatable_default_limit config for export icons, upping default filter limit to 100 (which was the value previously hardcoded in the datatable template)

@mattab
Copy link
Member

mattab commented Oct 19, 2011

(In [5345]) Refs #2714 remove echo

@mattab
Copy link
Member

mattab commented Oct 19, 2011

Cool!

Feedback

  • includeInnerNodes = false by default sounds good. This way we don't have to document or expose this parameter publicly.
  • For consistency, JSON and HTML exports (not XML) should also support the new parameters translateColumnNames... so the translation code could be refactored in the parent renderer class.

@timo-bes
Copy link
Member Author

A general problem with CSV/TSV is, that the download starts right away and the URL can't be modified afterwards (e.g. to change includeInnerNodes). Since the possible parameters are getting more and more complex, it might be a good idea to introduce an export dialog. It could look like this:

  • When the export icon (floppy disc) is clicked, a dialog opens
  • You can select the format and general parameters
  • Depending on the format you get more parameters (e.g. translateColumnNames when it's available)
  • You can either copy the link or open the URL
  • The dialog closes

This way, normal users don't have to read the API documentation (which will probably be too hard to read for them) and still can use all the cool features.

What do you think?

@mattab
Copy link
Member

mattab commented Oct 20, 2011

Sounds interesting, but I think a bit too complicated. The simple approach of putting expanded=1&translateColumnNames=1 in the URL is good. Also as proposed, the includeInnerNodes would always be set to 0 (when expanded=1) but this is done in the code, not in the URL. Sane defaults are better than offering too much control ;)

@julienmoumne
Copy link
Member

Concerning comment:15, how about creating a new ticket/place holder on the subject of defining an advanced export page in Piwik ?

@mattab
Copy link
Member

mattab commented Oct 27, 2011

(In [5382]) Refs #2714 Remove idsubtable metadata column from output being tested since it has different value in build server and local dev station

@timo-bes
Copy link
Member Author

timo-bes commented Nov 9, 2011

(In [5419]) refs #2714 additional column translations and number format fix for csv exports, refs #1454 metadata for API.get report (needed for csv column translations)

@timo-bes
Copy link
Member Author

timo-bes commented Nov 9, 2011

(In [5420]) refs #2714, #1454 test fix for previous commit

@timo-bes
Copy link
Member Author

At the moment, there's a list of translations in Piwik_DataTable_Renderer_Csv::translateColumnNames. This list is needed because plugins only supply translations for the metrics that are displayed and the API exposes more columns.

I'm not happy with this solution because the renderer should not care about the concrete translations. Also, when we add more "invisible" columns, we'll have to add them to the renderer, which is not intuitive at all. I think, we should include the invlisible columns in the meta data output. Adding them to "metrics" could be problematic because "metrics" should only contain columns that are relevant for the UI. How about adding a new array? We already have "metrics" and "processedMetrics", maybe "additionalMetrics"?

@mattab
Copy link
Member

mattab commented Nov 21, 2011

I'm not happy with this solution because the renderer should not care about the concrete translations. Also, when we add more "invisible" columns, we'll have to add them to the renderer, which is not intuitive at all. I think, we should include the invlisible columns in the meta data output. Adding them to "metrics" could be problematic because "metrics" should only contain columns that are relevant for the UI. How about adding a new array? We already have "metrics" and "processedMetrics", maybe "additionalMetrics"?

I agree that column labels in renderer is not ideal... but adding data in the output, that we don't yet use is maybe not better (performance overhead).

As a quick fix, maybe you can move the "Core/translation" logic out of the renderer into a public static function in API/API.php (in the plugin class rather than public API) - so that at least all logic is in the same place. I think it'd enough?

@timo-bes
Copy link
Member Author

(In [5458]) refs #2714 removed outdated comment

@timo-bes
Copy link
Member Author

(In [5463]) refs #2714 column translations for html and rss export, general fine tuning of export column translations

@timo-bes
Copy link
Member Author

As far as I'm concerned, we can close this ticket as soon as #2765 is fixed.

@mattab
Copy link
Member

mattab commented Nov 23, 2011

  • '&includeInnerNodes=1'

we mentioned we didn't need to show this parameter publicly, can we remove from link and assume ==1 by default in code?

Otherwise, excellent changes! It feels good to see very old features being polished :)

@mattab
Copy link
Member

mattab commented Nov 23, 2011

(In [5468]) Refs #2714 Prevent warning when cookie not init

@timo-bes
Copy link
Member Author

I still think we might need both options (includeInnerNodes=(0|1)). Leaving the inner nodes out is better for further processing because the sums are right. On the other hand, some reports contain valueable information on inner nodes only. For example, the search engines report has conversion metrics for the search engines but not for the keywords. If we leave inner nodes out, only the keywords are exported and we don't have the metrics.

At the moment includeInnerNodes is 1 when the export icon is clicked but the API default is 0. I did it this way because it's more likely that an auto-fetched report is processed further. Still, I'm not entirely happy...

  • We could ask the user whether inner nodes should be included when he clicks the export icon (simple yes/no-js-dialog)
  • We could always default to 1 (also in the API) for consistency
  • We could add another export link XTSV (extended TSV) that includes the inner nodes
  • We could introduce an export dialog (as mentioned before). The user can also configure whether column labels should be translated, which is not possible at the moment.

@mattab
Copy link
Member

mattab commented Nov 23, 2011

Thx for clarifications.

We could always default to 1 (also in the API) for consistency

Sounds good to me... There is no known problem with that, we will add a FAQ if users ask about it.

It is the easiest so the way to go maybe since we want to KISS...

@timo-bes
Copy link
Member Author

(In [5477]) fixes #2714 includeInnerNodes=1 by default, nice export filename for API.get report

@timo-bes
Copy link
Member Author

The realted ticket #2765 is still open. If it was closed, CSV export would be polished completely. Matt, did you work on it yet?

@mattab
Copy link
Member

mattab commented Nov 25, 2011

No and I will probably not fix it for the next release I'm afraid...

@timo-bes
Copy link
Member Author

(In [5495]) refs #2714 use default language of user for export file name

@timo-bes
Copy link
Member Author

(In [5496]) refs #2714 export filename compatible with firefox

@timo-bes
Copy link
Member Author

timo-bes commented Dec 6, 2011

(In [5529]) fixes #2809, refs #2714: making signature of Piwik_DataTable_Renderer::renderHeader() compatible across subclasses

@mattab
Copy link
Member

mattab commented Feb 15, 2012

(In [5838]) Refs #2714
Exporting TSV/RSS links with human readable column names, from the API page.

@timo-bes
Copy link
Member Author

(In [6181]) refs #2714, refs #3073

  • flat settings are passed to graph views and exports
  • csv renderer does not do flattening anymore. the new api parameters can be used instead - js and tests updated accordingly
  • aggregate rows are italic (imo, the plus logo would be misleading because it would suggest clickability)
  • disable row evolution on flat tables for performance reasons
  • minor language adjustments

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

3 participants