Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#2714 closed New feature (fixed)

Improve CSV/TSV Export

Reported by: EZdesign Owned by:
Priority: normal Milestone: 1.7 Piwik 1.7
Component: Core Keywords:
Cc: Sensitive: no

Description

The improvements include the following API extensions

  • expanded=1 should work for CSV/TSV exports. Labels have to be put together in a path.
  • New parameter includeInnerNodes to choose whether inner nodes (e.g. folders in the pages report) should be exported as well. If set to 1, they are present and the report is easier to read for humans. If 0, they are left out and it's easier to process the report, e.g. calculate sums.
  • New parameter translateColumnNames to get localized names of metrics.

The export link in Piwik should be adjusted

  • expanded=1 by default
  • translateColumnNames=1 by default for CSV and TSV format
  • Use the API_datatable_default_limit configuration for export links

Change History (37)

comment:1 Changed 3 years ago by matt (mattab)

Use the API_datatable_default_limit configuration for export links

What do you mean?

New parameter includeInnerNodes to choose whether inner nodes (e.g. folders in the pages report) should be exported as well.

Why would we want not to export some parts of the report? Not sure I understand correctly what you meant.

  • Please add unit tests for these new features.
  • The "expanded=1" feature request is covered in the ticket #2005

comment:2 Changed 3 years ago by EZdesign (BeezyT)

  1. When you click the export button, filter_limit=100 is added to the URL. The value is hardcoded in the template. This should be configurable (e.g. using API_datatable_default_limit).
  1. When we export folders (with aggregated values for metrics), this can be useful in some cases, but not always. For example, if you want to process the export further (e.g. calculate the sum of page views in excel), this might lead to unexpected/wrong results.

comment:3 Changed 3 years ago by matt (mattab)

1) sounds good, definitely a bug

2) would includeInnerNodes=0 same as "hideRowsWithSubtable=1" ?
Not sure how useful is this feature, what exactly is the use case? Because to calculate the sum of page views would still require the folders to be in the output (since folders "nb_actions" is the aggregate page views of all pages inside this folder)?

comment:4 Changed 3 years ago by EZdesign (BeezyT)

2) Folders that are tracked, are reported under /path/to/folder/index. The folder /path/to/folder/ has only aggregated values.

Here's an example:

Actual views:

  • /path/to/folder 3 times
  • /path/to/folder/foo.html 2 times

Full report:

  • /path/to/folder 5 views (3+2)
  • /path/to/folder/index 3 views
  • /path/to/folder/foo.html 2 views

If we export it like this, it's easier to read for humans and might be useful in some cases. But if you want to sum the pageviews, the sum is 8, which is incorrect.

If we export without inner nodes, it looks like this:

  • /path/to/folder/index 3 views
  • /path/to/folder/foo.html 2 views

The sum is 5, which is correct.

hideRowsWithSubtable=1 would hide the entire row "/path/to/folder" with its subtable, right? I want to hide only the aggregated row and keep the subtable.

comment:5 Changed 3 years ago by matt (mattab)

I finally understand, nothing beats a good example ;)

Maybe the parameter could be called "hideExpandedRows" ?

comment:6 Changed 3 years ago by EZdesign (BeezyT)

"hideExpandedRows" sounds a lot like "hideRowsWithSubtable". The name should make clear that only the inner nodes of the tree (i.e. the rows with aggregated values) but not their children are hidden. I couldn't come up with anything better than "includeInnerNodes"...

comment:7 Changed 3 years ago by EZdesign (BeezyT)

In general, the default for includeInnerNodes should be 0, since this is the version of the output that is most useful for further processing.

But what about the export icon in the Piwik UI? Should it add includeInnerNodes=1 by default?

comment:8 Changed 3 years ago by EZdesign (BeezyT)

(In [5334]) refs #2714 improved csv/tsv export

comment:9 Changed 3 years ago by EZdesign (BeezyT)

There are columns in the export that are not used in the UI. Therefore, they are not documented (and translated) in the metadata. I added generic translations to the CSV-Renderer (Piwik_DataTable_Renderer_Csv::translateColumnNames), which reduces the impact of this ticket on other parts of Piwik. Please review and consider alternative options: we could add some of the metrics to Piwik_API_API::getDefaultProcessedMetrics or extend the metadata output of all plugins.

comment:10 Changed 3 years ago by EZdesign (BeezyT)

On the build server, a unit test is failing that passes on my machine.

When exporting with language=de, the number format changes (1,7 instead of 1.7) on my machine but not on the build server. I think the behavior of the build server is correct, any ideas why the outputs differ?

The comma in the number also breaks the CSV format where commas are used to separate columns.

comment:11 Changed 3 years ago by EZdesign (BeezyT)

(In [5335]) refs #2714 changed expected number format of csv export according to behavior of build server

comment:12 Changed 3 years ago by EZdesign (BeezyT)

(In [5336]) refs #2714 using API_datatable_default_limit config for export icons, upping default filter limit to 100 (which was the value previously hardcoded in the datatable template)

comment:13 Changed 3 years ago by matt (mattab)

(In [5345]) Refs #2714 remove echo

comment:14 Changed 3 years ago by matt (mattab)

Cool!

Feedback

  • includeInnerNodes = false by default sounds good. This way we don't have to document or expose this parameter publicly.
  • For consistency, JSON and HTML exports (not XML) should also support the new parameters translateColumnNames... so the translation code could be refactored in the parent renderer class.

comment:15 Changed 2 years ago by EZdesign (BeezyT)

A general problem with CSV/TSV is, that the download starts right away and the URL can't be modified afterwards (e.g. to change includeInnerNodes). Since the possible parameters are getting more and more complex, it might be a good idea to introduce an export dialog. It could look like this:

  • When the export icon (floppy disc) is clicked, a dialog opens
  • You can select the format and general parameters
  • Depending on the format you get more parameters (e.g. translateColumnNames when it's available)
  • You can either copy the link or open the URL
  • The dialog closes

This way, normal users don't have to read the API documentation (which will probably be too hard to read for them) and still can use all the cool features.

What do you think?

comment:16 Changed 2 years ago by matt (mattab)

Sounds interesting, but I think a bit too complicated. The simple approach of putting expanded=1&translateColumnNames=1 in the URL is good. Also as proposed, the includeInnerNodes would always be set to 0 (when expanded=1) but this is done in the code, not in the URL. Sane defaults are better than offering too much control ;)

comment:17 Changed 2 years ago by JulienM (JulienMoumne)

Concerning comment:15, how about creating a new ticket/place holder on the subject of defining an advanced export page in Piwik ?

comment:18 Changed 2 years ago by matt (mattab)

(In [5382]) Refs #2714 Remove idsubtable metadata column from output being tested since it has different value in build server and local dev station

comment:19 Changed 2 years ago by EZdesign (BeezyT)

(In [5419]) refs #2714 additional column translations and number format fix for csv exports, refs #1454 metadata for API.get report (needed for csv column translations)

comment:20 Changed 2 years ago by EZdesign (BeezyT)

(In [5420]) refs #2714, #1454 test fix for previous commit

comment:21 Changed 2 years ago by EZdesign (BeezyT)

At the moment, there's a list of translations in Piwik_DataTable_Renderer_Csv::translateColumnNames. This list is needed because plugins only supply translations for the metrics that are displayed and the API exposes more columns.

I'm not happy with this solution because the renderer should not care about the concrete translations. Also, when we add more "invisible" columns, we'll have to add them to the renderer, which is not intuitive at all. I think, we should include the invlisible columns in the meta data output. Adding them to "metrics" could be problematic because "metrics" should only contain columns that are relevant for the UI. How about adding a new array? We already have "metrics" and "processedMetrics", maybe "additionalMetrics"?

comment:22 Changed 2 years ago by matt (mattab)

I'm not happy with this solution because the renderer should not care about the concrete translations. Also, when we add more "invisible" columns, we'll have to add them to the renderer, which is not intuitive at all. I think, we should include the invlisible columns in the meta data output. Adding them to "metrics" could be problematic because "metrics" should only contain columns that are relevant for the UI. How about adding a new array? We already have "metrics" and "processedMetrics", maybe "additionalMetrics"?

I agree that column labels in renderer is not ideal... but adding data in the output, that we don't yet use is maybe not better (performance overhead).

As a quick fix, maybe you can move the "Core/translation" logic out of the renderer into a public static function in API/API.php (in the plugin class rather than public API) - so that at least all logic is in the same place. I think it'd enough?

comment:23 Changed 2 years ago by EZdesign (BeezyT)

(In [5458]) refs #2714 removed outdated comment

comment:24 Changed 2 years ago by EZdesign (BeezyT)

(In [5463]) refs #2714 column translations for html and rss export, general fine tuning of export column translations

comment:25 Changed 2 years ago by EZdesign (BeezyT)

As far as I'm concerned, we can close this ticket as soon as #2765 is fixed.

comment:26 Changed 2 years ago by matt (mattab)

  • '&includeInnerNodes=1'

we mentioned we didn't need to show this parameter publicly, can we remove from link and assume ==1 by default in code?

Otherwise, excellent changes! It feels good to see very old features being polished :)

comment:27 Changed 2 years ago by matt (mattab)

(In [5468]) Refs #2714 Prevent warning when cookie not init

comment:28 Changed 2 years ago by EZdesign (BeezyT)

I still think we might need both options (includeInnerNodes=(0|1)). Leaving the inner nodes out is better for further processing because the sums are right. On the other hand, some reports contain valueable information on inner nodes only. For example, the search engines report has conversion metrics for the search engines but not for the keywords. If we leave inner nodes out, only the keywords are exported and we don't have the metrics.

At the moment includeInnerNodes is 1 when the export icon is clicked but the API default is 0. I did it this way because it's more likely that an auto-fetched report is processed further. Still, I'm not entirely happy...

  • We could ask the user whether inner nodes should be included when he clicks the export icon (simple yes/no-js-dialog)
  • We could always default to 1 (also in the API) for consistency
  • We could add another export link XTSV (extended TSV) that includes the inner nodes
  • We could introduce an export dialog (as mentioned before). The user can also configure whether column labels should be translated, which is not possible at the moment.

comment:29 Changed 2 years ago by matt (mattab)

Thx for clarifications.

We could always default to 1 (also in the API) for consistency

Sounds good to me... There is no known problem with that, we will add a FAQ if users ask about it.

It is the easiest so the way to go maybe since we want to KISS...

comment:30 Changed 2 years ago by EZdesign (BeezyT)

  • Resolution set to fixed
  • Status changed from new to closed

(In [5477]) fixes #2714 includeInnerNodes=1 by default, nice export filename for API.get report

comment:31 Changed 2 years ago by EZdesign (BeezyT)

The realted ticket #2765 is still open. If it was closed, CSV export would be polished completely. Matt, did you work on it yet?

comment:32 Changed 2 years ago by matt (mattab)

No and I will probably not fix it for the next release I'm afraid...

comment:33 Changed 2 years ago by EZdesign (BeezyT)

(In [5495]) refs #2714 use default language of user for export file name

comment:34 Changed 2 years ago by EZdesign (BeezyT)

(In [5496]) refs #2714 export filename compatible with firefox

comment:35 Changed 2 years ago by EZdesign (BeezyT)

(In [5529]) fixes #2809, refs #2714: making signature of Piwik_DataTable_Renderer::renderHeader() compatible across subclasses

comment:36 Changed 2 years ago by matt (mattab)

(In [5838]) Refs #2714

Exporting TSV/RSS links with human readable column names, from the API page.

comment:37 Changed 2 years ago by EZdesign (BeezyT)

(In [6181]) refs #2714, refs #3073

  • flat settings are passed to graph views and exports
  • csv renderer does not do flattening anymore. the new api parameters can be used instead - js and tests updated accordingly
  • aggregate rows are italic (imo, the plus logo would be misleading because it would suggest clickability)
  • disable row evolution on flat tables for performance reasons
  • minor language adjustments
Note: See TracTickets for help on using tickets.