Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exotic encoded keywords should be stored as utf-8 in the DB #435

Closed
anonymous-matomo-user opened this issue Nov 25, 2008 · 3 comments
Closed
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@anonymous-matomo-user
Copy link

Currently keywords are stored encoded in the mysql log table.

Code is around: https://github.com/piwik/piwik/blob/master/core/Tracker/Visit.php#L693

For some search engine, like yandex.ru, keywords are encoded in the URL. Piwik should have, for each search engine encoding keywords, the encoding used, and Piwik should only store utf-8 valid keywords in the log table.

This would fix two bugs:
- Russian keywords from the most popular russian search-engine ‘yandex.ru’ are shown like number of questions in the UI.
- Searching for a keyword using piwik in-table search, would also work for exotic keyword. Currently you search for "" piwik will look for such a keyword in the list, but it won’t look for the encoded value of this keyword. It expects the keyword to be stored at the right format.

If you encounter this bug, please report any example URL of a search in a search engine that doesn’t work well with Piwik. We need more example to solve this bug. thanks!
Keywords: 0.2.23

@anonymous-matomo-user
Copy link
Author

First row are showed like number of questions.
Two next rows are displayed correctly.

```
INSERT INTO `piwik_log_visit` VALUES (190, 1, ‘22:06:13’, ‘cefdd83aa209ddb0629b69f93bd21833’, 1, ‘2008-11-25 20:09:37’, ‘2008-11-25 20:53:31’, ‘2008-11-25’, 844, 1, 13, 2634, 2, ‘Yandex’, ‘http://yandex.ru/yandsearch?rpt=rad&text=%F1%EF%EE%F0%F2%E7%E4%F0%E0%E2’, ‘%f1%ef%ee%f0%f2%e7%e4%f0%e0%e2’, ‘2a4878b5dc06902b8d797475aca2cf88’, ‘WXP’, ‘FF’, ‘2.0’, ‘1280×1024’, 1, 1, 1, 0, 0, 0, 1, 1, 1334750774, ‘ru,en-us;q=0.7,en;q=’, ‘ru’, ‘asi’, ‘rus-com.net’);
INSERT INTO `piwik_log_visit` VALUES (421, 2, ‘11:22:44’, ‘de7439d63b66c5cc3ba4be30d202d5e1’, 0, ‘2008-11-26 09:24:52’, ‘2008-11-26 09:25:22’, ‘2008-11-26’, 1695, 1695, 3, 30, 2, ‘Yandex’, ‘http://yandex.ru/yandsearch?text=%D0%BA%D0%B0%D0%BA+%D0%B7%D0%B0%D0%BA%D0%B0%D0%B7%D1%8B%D0%B2%D0%B0%D1%82%D1%8C+%D0%BD%D0%B0+juno&stpar2=%2Fh1%2Ftm24%2Fs4&stpar4=%2Fs4&stpar1=%2Fu0’, ‘%d0%ba%d0%b0%d0%ba+%d0%b7%d0%b0%d0%ba%d0%b0%d0%b7%d1%8b%d0%b2%d0%b0%d1%82%d1%8c+%d0%bd%d0%b0+juno’, ‘9a3f5eed8c75404aa5ee552dbc0eb69a’, ‘WXP’, ‘FF’, ‘3.0’, ‘1680×1050’, 0, 1, 1, 0, 1, 1, 1, 1, 1048951584, ‘ru,en-us;q=0.7,en;q=’, ‘ru’, ‘asi’, ‘ufacom.ru’);
INSERT INTO `piwik_log_visit` VALUES (623, 2, ‘19:02:07’, ‘f195e9f77e961b4471b2e35682b7d00d’, 0, ‘2008-11-26 19:01:10’, ‘2008-11-26 19:01:10’, ‘2008-11-26’, 2157, 2157, 1, 10, 2, ‘Yandex’, ‘http://yandex.ru/yandsearch?text=%D1%87%D0%B0%D1%81%D1%82%D0%BE%D1%82%D0%B0+%D1%80%D0%B0%D1%81%D0%BF%D0%B0%D0%B4%D0%B0+%D1%81%D1%82%D0%B5%D0%BA%D0%BB%D0%B0&stpar2=%2Fh1%2Ftm11%2Fs1&stpar4=%2Fs1&stpar1=%2Fu0’, ‘%d1%87%d0%b0%d1%81%d1%82%d0%be%d1%82%d0%b0+%d1%80%d0%b0%d1%81%d0%bf%d0%b0%d0%b4%d0%b0+%d1%81%d1%82%d0%b5%d0%ba%d0%bb%d0%b0’, ‘6e3337843122cd43da5e0f7e35806cdc’, ‘WXP’, ‘IE’, ‘7.0’, ‘1600×900’, 0, 1, 1, 0, 0, 1, 1, 1, 1439330483, ‘ru’, ‘ru’, ‘asi’, ‘lianet.ru’);
```

@anonymous-matomo-user
Copy link
Author

The problem isn’t associated with referer search engine I think, same is happening for visits from google.com if the keywords are georgian. Here are most popular georgian search engines: google.ge, holmes.ge

@mattab
Copy link
Member

mattab commented Mar 24, 2009

(In 1014) – cleaning up the search engine parsing code, adding tests, recording UTF8 keywords in the DB rather than encoded (as tables are now utf8, refs #5730)
- adding tests in url.test.php and fixed double encoding in some edge cases
- fixed #589 Piwik fails to properly decode and store some chinese keywords (eg. from baidu.com)
- fixed #435 Exotic encoded keywords should be stored as utf-8 in the DB
- refs #575 hopefully fixed, will give it a few days of tests on piwik.org

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

2 participants