Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#435 closed Bug (fixed)

Exotic encoded keywords should be stored as utf-8 in the DB

Reported by: ARray Owned by:
Priority: major Milestone: RobotRock
Component: Plugins Wishlist Keywords: 0.2.23
Cc: Sensitive:

Description (last modified by matt)

Currently keywords are stored encoded in the mysql log table.

Code is around: http://dev.piwik.org/trac/browser/trunk/core/Tracker/Visit.php#L693

For some search engine, like yandex.ru, keywords are encoded in the URL. Piwik should have, for each search engine encoding keywords, the encoding used, and Piwik should only store utf-8 valid keywords in the log table.

This would fix two bugs:

  • Russian keywords from the most popular russian search-engine 'yandex.ru' are shown like number of questions in the UI.
  • Searching for a keyword using piwik in-table search, would also work for exotic keyword. Currently you search for "Ющенко" piwik will look for such a keyword in the list, but it won't look for the encoded value of this keyword. It expects the keyword to be stored at the right format.

If you encounter this bug, please report any example URL of a search in a search engine that doesn't work well with Piwik. We need more example to solve this bug. thanks!

Change History (7)

comment:1 Changed 5 years ago by matt (mattab)

  • Milestone set to RobotRock

comment:2 Changed 5 years ago by ARray

First row are showed like number of questions.
Two next rows are displayed correctly.

INSERT INTO `piwik_log_visit` VALUES (190, 1, '22:06:13', 'cefdd83aa209ddb0629b69f93bd21833', 1, '2008-11-25 20:09:37', '2008-11-25 20:53:31', '2008-11-25', 844, 1, 13, 2634, 2, 'Yandex', 'http://yandex.ru/yandsearch?rpt=rad&text=%F1%EF%EE%F0%F2%E7%E4%F0%E0%E2', '%f1%ef%ee%f0%f2%e7%e4%f0%e0%e2', '2a4878b5dc06902b8d797475aca2cf88', 'WXP', 'FF', '2.0', '1280x1024', 1, 1, 1, 0, 0, 0, 1, 1, 1334750774, 'ru,en-us;q=0.7,en;q=', 'ru', 'asi', 'rus-com.net');
INSERT INTO `piwik_log_visit` VALUES (421, 2, '11:22:44', 'de7439d63b66c5cc3ba4be30d202d5e1', 0, '2008-11-26 09:24:52', '2008-11-26 09:25:22', '2008-11-26', 1695, 1695, 3, 30, 2, 'Yandex', 'http://yandex.ru/yandsearch?text=%D0%BA%D0%B0%D0%BA+%D0%B7%D0%B0%D0%BA%D0%B0%D0%B7%D1%8B%D0%B2%D0%B0%D1%82%D1%8C+%D0%BD%D0%B0+juno&stpar2=%2Fh1%2Ftm24%2Fs4&stpar4=%2Fs4&stpar1=%2Fu0', '%d0%ba%d0%b0%d0%ba+%d0%b7%d0%b0%d0%ba%d0%b0%d0%b7%d1%8b%d0%b2%d0%b0%d1%82%d1%8c+%d0%bd%d0%b0+juno', '9a3f5eed8c75404aa5ee552dbc0eb69a', 'WXP', 'FF', '3.0', '1680x1050', 0, 1, 1, 0, 1, 1, 1, 1, 1048951584, 'ru,en-us;q=0.7,en;q=', 'ru', 'asi', 'ufacom.ru');
INSERT INTO `piwik_log_visit` VALUES (623, 2, '19:02:07', 'f195e9f77e961b4471b2e35682b7d00d', 0, '2008-11-26 19:01:10', '2008-11-26 19:01:10', '2008-11-26', 2157, 2157, 1, 10, 2, 'Yandex', 'http://yandex.ru/yandsearch?text=%D1%87%D0%B0%D1%81%D1%82%D0%BE%D1%82%D0%B0+%D1%80%D0%B0%D1%81%D0%BF%D0%B0%D0%B4%D0%B0+%D1%81%D1%82%D0%B5%D0%BA%D0%BB%D0%B0&stpar2=%2Fh1%2Ftm11%2Fs1&stpar4=%2Fs1&stpar1=%2Fu0', '%d1%87%d0%b0%d1%81%d1%82%d0%be%d1%82%d0%b0+%d1%80%d0%b0%d1%81%d0%bf%d0%b0%d0%b4%d0%b0+%d1%81%d1%82%d0%b5%d0%ba%d0%bb%d0%b0', '6e3337843122cd43da5e0f7e35806cdc', 'WXP', 'IE', '7.0', '1600x900', 0, 1, 1, 0, 0, 1, 1, 1, 1439330483, 'ru', 'ru', 'asi', 'lianet.ru');

comment:3 Changed 5 years ago by matt (mattab)

  • Description modified (diff)
  • Summary changed from List of Keywords: Russian kewords are shown like '????????' to Exotic encoded keywords should be stored as utf-8 in the DB

comment:4 Changed 5 years ago by _Hitman47

The problem isn't associated with referer search engine I think, same is happening for visits from google.com if the keywords are georgian. Here are most popular georgian search engines: google.ge, holmes.ge

comment:5 Changed 5 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

(In [1014]) - cleaning up the search engine parsing code, adding tests, recording UTF8 keywords in the DB rather than encoded (as tables are now utf8, refs #310)

  • adding tests in url.test.php and fixed double encoding in some edge cases
  • fixed #589 Piwik fails to properly decode and store some chinese keywords (eg. from baidu.com)
  • fixed #435 Exotic encoded keywords should be stored as utf-8 in the DB
  • refs #575 hopefully fixed, will give it a few days of tests on piwik.org

comment:6 Changed 5 years ago by alivenk

comment:7 Changed 4 years ago by anniehall

Note: See TracTickets for help on using tickets.