New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink: SearchEngines.php #1694
Comments
I see we call strtolower on the keywords. This may not be safe to do with the 'C' locale unless it happens to be UTF-8 aware. |
Task: review the iconv() code in extractSearchEngineInformationFromUrl(). The keywords from naver.com are showing up empty. The encoding in SearchEngines.php is specified as x-windows-949 (which I gather is a superset of the search page's charset, euc-kr). |
(In [3136]) refs #1694 - detect powered by google custom search |
(In [3141]) refs #1694 - prune arrays (these will be backfilled from the master record) Separate "Powered by Google" (i.e., uses Google exclusively for search) from "Enhanced by Google" (uses Google in addition to other search engines); the latter are treated as separately branded (meta) search engines. |
(In [3144]) refs #1694 - add Piwik_Common::getLossyUrl($url) to reduce referrer URLs to |
(In [3145]) refs #1694 - fix forestle.org and add unit test (i.e., {} can't appear in master record) |
(In [3146]) refs #1694 - update favicon names |
(In [3149]) refs #1694 - applied lossy {} tld to 123people, google, lycos, and yahoo |
Replying to vipsoft: |
(In [3150]) refs #1694 |
(In [3151]) refs #1694 - lossy Bing images URL |
Note: users who view a cached page from Bing search results will result in a pageview on cc.bingj.com. I've suggested that they add the original web site's URL (uuencoded, of course) to the link. That way we can parse it out (similar to webcache.googleusercontent.com). |
(In [3161]) refs #1694 - add bing cache |
I'm thinking of adding a hook so plugins can implement their own search engine detection as there are requests for sites to be added that don't quite fit the traditional definition of a search engine. |
(In [3162]) refs #1694 - remove fix-up for webcache.googleusercontent.com; moving the logic to piwik.js |
Yahoo's Bing-powered search has an even weirder cache url. |
(In [3168]) fixes #1694 - misc fixes
|
Great work :) this will make maintenance a lot less tedious. Is there a reason www.google.cat is still listed, or can it be removed? |
Technically, .cat isn't an ISO country code. But since I've already added the MaxMind codes to Countries.php, I guess it won't hurt to add this one too. |
(In [3319]) refs #1694 - treat .cat as a pseudo country tld |
The current data file:
Proposal:
Affects:
ToDo:
The text was updated successfully, but these errors were encountered: