Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mail.ru search engine results encoding has changed #2761

Closed
anonymous-matomo-user opened this issue Nov 5, 2011 · 8 comments
Closed

Mail.ru search engine results encoding has changed #2761

anonymous-matomo-user opened this issue Nov 5, 2011 · 8 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@anonymous-matomo-user
Copy link

File: SearchEngines.php

Original (shows incorrect encoding):
// Mail.ru
'go.mail.ru' => array('Mailru', 'q', 'search?q={k}', 'windows-1251'),

I changed to:
// Mail.ru
'go.mail.ru' => array('Mailru', 'q', 'search?rch=e&q={k}'),

And now it seems to work correctly.

@robocoder
Copy link
Contributor

(In [5413]) fixes #2761 - confirmed that go.mail.ru search results are now utf-8

@kiav
Copy link

kiav commented Jan 18, 2012

As for now, Mail.ru uses UTF-8 in most cases. But rarely it still uses windows-1251 too.

I had to change extractSearchEngineInformationFromUrl function in /core/Common.php

if(function_exists('iconv')
    && isset($searchEngines[$refererHost][3]))
{
    // accepts string, array or comma separated list string in preferred order
    if (!is_array($searchEngines[$refererHost][3]))
        $charsets = explode(',', $searchEngines[$refererHost][3]);
    else
        $charsets = $searchEngines[$refererHost][3];

    if(!empty($charsets))
    {
        $charset = mb_detect_encoding($key, $charsets);
        if ($charset === false)
            $charset = $charsets[0];

        $newkey = @iconv($charset, 'UTF-8//IGNORE', $key);
        if(!empty($newkey))
        {
            $key = $newkey;
        }
    }
}

It works with

'go.mail.ru' => array('Mailru', 'q', 'search?q={k}', array('UTF-8', 'windows-1251')),

in /core/DataFiles/SearchEngines.php

@robocoder
Copy link
Contributor

Thanks for the patch.

I don't think we need to support comma separated list. We do have to check for mbstring and have a unit test.

@kiav
Copy link

kiav commented Jan 18, 2012

Comma separated list is already supported by mb_detect_encoding.

By the way, mb_strtolower is already used in Common.php (in original Piwik code in the extractSearchEngineInformationFromUrl function) without any checks tests.

@robocoder
Copy link
Contributor

Can you provide a sample referrer url with windows-1251 encoding?

I've done some refactoring and added some more tests, but can never have enough.

@robocoder
Copy link
Contributor

Awesome! Thanks!

@robocoder
Copy link
Contributor

(In [5682]) fixes #2761

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

3 participants