Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#2901 closed Bug (fixed)

piwik cannot decode chinese keywords properly sometime from baidu.com

Reported by: edward Owned by:
Priority: normal Milestone: 1.7 Piwik 1.7
Component: Core Keywords:
Cc: Sensitive: no

Description

I see there is a fix on Piwik 0.2.33, FIXED #589 Piwik fails to properly decode and store some chinese keywords (eg. from baidu.com).

But I still see some url with chinese keywords are decoded wrong.
take below link for example, the keywords are 二度宫颈糜烂能治好吗?, but in piwik they become "浜搴棰绯芥不濂藉?", see also in attached screenshot file.
http://www.baidu.com/s?ch=14&ie=utf-8&wd=%E4%BA%8C%E5%BA%A6%E5%AE%AB%E9%A2%88%E7%B3%9C%E7%83%82%E8%83%BD%E6%B2%BB%E5%A5%BD%E5%90%97%3F&searchRadio=on

Attachments (1)

baidu.decode.jpg (44.6 KB) - added by edward 2 years ago.
screenshot about chinese words decode

Download all attachments as: .zip

Change History (6)

Changed 2 years ago by edward

screenshot about chinese words decode

comment:1 Changed 2 years ago by vipsoft (robocoder)

  • Keywords cannot decode chinese keywords properly removed

There's a new featuren in #2761 that allows multiple encodings. We can try adding utf-8 to the baidu configuration (currently expects gb2312) and edward's url to the unit test.

comment:3 Changed 2 years ago by matt (mattab)

IT sounds like new logic might need to be introduced for baidu (use UTF-8 when it is found as a parameter value, default to gb2312 otherwise?)

comment:4 Changed 2 years ago by vipsoft (robocoder)

  • Resolution set to fixed
  • Status changed from new to closed

(In [5755]) fixes #2901 - thanks edward!

comment:5 Changed 2 years ago by matt (mattab)

I was wrong, that's good! :)

Note: See TracTickets for help on using tickets.