Opened 3 years ago

Closed 3 years ago

#2185 closed Bug (fixed)

sanitizeInputValue() - improvements

Reported by: vipsoft Owned by:
Priority: normal Milestone: Piwik 1.2.1
Component: Core Keywords:
Cc: Sensitive: no

Description (last modified by vipsoft)

setDocumentTitle() expects an un-encoded string because piwik.js uses encodeURIComponent to encode parameters in the request.

Attachments (1)

2185.patch (3.1 KB) - added by vipsoft 3 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 3 years ago by vipsoft (robocoder)

  • Resolution set to fixed
  • Status changed from new to closed

(In [4080]) fixes #2185

comment:2 Changed 3 years ago by matt (mattab)

the sanitizeInputValue is called for all input values, generally very often, I think charset detection is pretty slow...

is the piwik.js fix not enough to get the page titles right?

comment:3 Changed 3 years ago by matt (mattab)

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:4 Changed 3 years ago by vipsoft (robocoder)

I'll rework it.

comment:5 Changed 3 years ago by vipsoft (robocoder)

(In [4087]) refs #2185 - revert r4080

  • for performance, move html_entity_decode out of sanitizeInputValue() since it's only used when we getRequestVar('action_name')
  • add unit tests

comment:6 Changed 3 years ago by vipsoft (robocoder)

(In [4092]) refs #2185 - sanitizeInputValue() returned if input wasn't valid UTF-8

comment:7 Changed 3 years ago by vipsoft (robocoder)

  • Description modified (diff)
  • Summary changed from PDF Reports - Page titles with ISO-8859-1 accented characters not converted to sanitizeInputValue() - improvements

Re-think:

  • re r4087, moved html_entity_decode out of sanitizeInputValue(). Matt wonders if we should also use html_entity_decode for custom variables and referer.
  • re forum post, passing an already encoded URL results in double encoding.

Changed 3 years ago by vipsoft (robocoder)

comment:8 Changed 3 years ago by vipsoft (robocoder)

The attached patch moves html_entity_code() back to sanitizeInputValue(), and tries to detect/fix double encoding.

I'll come back to this problem after I've thought more about the implications are.

comment:9 Changed 3 years ago by matt (mattab)

do we need to handle this use case though? It has never been a problem so far, and I really don't want to complicate the sanitize function because it is heavily used, and security related. It must stay simple and fast. So I vote for updating the doc and clarify that we don't accept encoded values, and leave the sanitize as is on trunk (with your new test in the function)

comment:10 Changed 3 years ago by vipsoft (robocoder)

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [4096]) fixes #2185

Note: See TracTickets for help on using tickets.