Opened 4 years ago

Closed 4 years ago

#1467 closed Bug (fixed)

Problem with html-entities in translation

Reported by: SteveG Owned by:
Priority: major Milestone: Piwik 0.6.4
Component: Core Keywords:
Cc: Sensitive: no

Description

We got some trouble with the translations returned by the API.
In some languages the translations contain html-entities.

As we do not use a browser engine to output and there is no native javascript or Titanium function to decode entities, those entities are currently displayed in mobile application.

I guess the best way to fix this issue is to extend the language api function getTranslationsForLanguage with a optional second parameter that enables decoding entities before translations are returned.

Attachments (2)

API.php.patch (1.2 KB) - added by SteveG 4 years ago.
patch for /plugins/LanguagesManager/API.php
Json.php.patch (463 bytes) - added by SteveG 4 years ago.
patch for /core/DataTable/Renderer/Json.php

Download all attachments as: .zip

Change History (20)

comment:1 Changed 4 years ago by matt (mattab)

Can you list here example of html entities used?

Can we enforce the non use of these, or are they useful? Are there some in English too?

comment:2 Changed 4 years ago by vipsoft (robocoder)

Yes, we can add a parameter to html_entity_decode().

We need the html entities because we output HTML to the browser, e.g.,

  •   & … » › >

comment:3 Changed 4 years ago by SteveG (sgiehl)

We don't use translations using entities to output html characters, atm. But in some languages entities are used for other special chars, too.

see http://dev.piwik.org/trac/browser/trunk/lang/pt.php#L7 for example.

Using html_entity_decode() should be the easiest way. Are all translation files in UTF-8, or are some using their own charset?

comment:4 Changed 4 years ago by vipsoft (robocoder)

Yes, there are other html entities in the non-English translations. The translation files should be UTF-8 -- I already have a script to fix these that I can run.

comment:5 Changed 4 years ago by matt (mattab)

+1 for your solution, a new parameter

comment:6 Changed 4 years ago by vipsoft (robocoder)

Actually, wouldn't this be considered a new format? Otherwise, I imaging some characters would still need to be escaped depending on the format.

The easiest thing might be to use format=xml instead -- since the output is (generally) already html_entity_decode()'d -- and then on the client-side, convert a small number of specific exceptions: < > & &quot&.

comment:7 Changed 4 years ago by vipsoft (robocoder)

format=csv is also html_entity_decode()'d, and is more compact. The only tricky part is parsing multi-line fields (i.e., a quoted string that contains newlines).

comment:8 Changed 4 years ago by SteveG (sgiehl)

We are using JSON format as it can be directly used in javascript. adding a new format like json-decoded or so would do it, too. but is this decoding realy needed everywhere in the api?

comment:9 Changed 4 years ago by matt (mattab)

adding parameter to single api function sounds better, even though it will be useless for some formats. This can just be in the function comments to clarify.

comment:10 Changed 4 years ago by vipsoft (robocoder)

Sounds hackish. Could we decode by default when format=json? (Does that break the Flash chart data feed?)

comment:11 Changed 4 years ago by SteveG (sgiehl)

hm. good question. decoding by default for json format only would do it, too.

Well, both ways are even possible and I already got solutions for both.
I'll append patches. Would be nice if one solution could be implemented before 0.6.4

Changed 4 years ago by SteveG (sgiehl)

patch for /plugins/LanguagesManager/API.php

comment:12 Changed 4 years ago by vipsoft (robocoder)

I'm leaning towards the Json.php patch. Don't bother decoding the $key. It isn't passed by reference to the anonymous function, so decoding the $key wastes CPU cycles.

Changed 4 years ago by SteveG (sgiehl)

patch for /core/DataTable/Renderer/Json.php

comment:13 Changed 4 years ago by SteveG (sgiehl)

Oh, your right. I just fixed my patch.

comment:14 Changed 4 years ago by matt (mattab)

looks good to me

comment:15 Changed 4 years ago by SteveG (sgiehl)

  • Resolution set to fixed
  • Status changed from new to closed

(In [2480]) fixes #1467 always decode entities when format is json

comment:16 Changed 4 years ago by SteveG (sgiehl)

  • Milestone changed from Piwik Mobile Client to 0 - Piwik 0.6.4

comment:17 Changed 4 years ago by matt (mattab)

  • Resolution fixed deleted
  • Status changed from closed to reopened

tests are failing. I'll leave it as an exercise to fix them :)

Actually there is at least one failure that might be unexpected:

Fail: ../tests/core/DataTable/Renderer.test.php -> Test_Piwik_DataTable_Renderer -> test_JSON_test6 -> Equal expectation fails at character 9 with [{"value":false}] and [{"value":""}] at [trunk\tests\core\DataTable\Renderer.test.php line 337]

comment:18 Changed 4 years ago by SteveG (sgiehl)

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [2483]) fixes #1467 decode entities only for strings; fixed tests

Note: See TracTickets for help on using tickets.