Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#1370 closed New feature (fixed)

Google Advanced Search support

Reported by: halfdan Owned by:
Priority: normal Milestone: Piwik 1.1
Component: Core Keywords:
Cc: Sensitive: no

Description

Saw the following URL in my referer list:

google.co.uk

Problem: The search parameter isn't &q=keywork but &as_q - Trying to figure out where this comes from brought me to this page. The "advanced search" from google doesn't use &q=. It is even possible to combine these parameters like:

&as_q=test1&as_epq=test2&as_oq=foo+bar+baz

My idea to include this would be to allow KeywordParameter to be an array in SearchEngines.php.


I found that "google.kg" (Kyrgyzstan) is not included in the list of search engines.

Attachments (3)

checkGoogleDomains.php (644 bytes) - added by halfdan 4 years ago.
missing_list.txt (638 bytes) - added by halfdan 4 years ago.
SearchEngines.php.patch (8.4 KB) - added by halfdan 4 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 4 years ago by hebbet

that is the official list of google domains
http://www.google.com/supported_domains maybe some more are missing

comment:2 Changed 4 years ago by halfdan

Oh well.. ignore my suggestion. KeywordParameter already allows arrays. We should therefore adjust the setting for all Google entries to:

array(
  'q',
  'as_q',
  'as_oq',
  'as_epq'
);

hebbet: Thank for the link. Just wrote a little script to check if any other TLDs are missing. (see Attachment)

While writing the script I noticed the entry for google.fr:

'www.google.fr.' => ..

That looks like a typo to me (www.google.fr is in the list).

matt/vipsoft: Let me know if we should include the other parameters for Google in the list and I'll prepare a patch.

Changed 4 years ago by halfdan

Changed 4 years ago by halfdan

comment:3 follow-up: Changed 4 years ago by vipsoft (robocoder)

  • Priority changed from major to normal

The problem is that these advanced query parameters (as_q=ALL_THESE_WORDS, as_epq=EXACT_WORDING, as_oq=OR1+OR2+OR3, and as_eq=UNWANTED) can all appear in the referrer URL.

halfdan: sure, a patch would be great. Don't forget to change Piwik_Common::extractSearchEngineInformationFromUrl() and please add some unit tests.

comment:4 Changed 4 years ago by halfdan

vipsoft: How do you think this should be handled? Should I just build a string from all keywords provided? This would lead to false reports e.g. in case of as_eq. Any suggestions?

comment:5 Changed 4 years ago by matt (mattab)

no problem to add missing google URLs, please provide patch

comment:6 in reply to: ↑ 3 Changed 4 years ago by matt (mattab)

Replying to vipsoft:

The problem is that these advanced query parameters (as_q=ALL_THESE_WORDS, as_epq=EXACT_WORDING, as_oq=OR1+OR2+OR3, and as_eq=UNWANTED) can all appear in the referrer URL.

I think that's fine, as long as if 'q' is found, it has priority over other variables.

comment:7 follow-up: Changed 4 years ago by matt (mattab)

actually it would be an issue... to solve this issue properly, we would need to be able to construct the query string from the list of possible parameters (q, as_q, as_qe, etc.) which is I believe undocumented by google, and probably undesirable considering the low traffic. I vote for won't fix..

comment:8 in reply to: ↑ 7 Changed 4 years ago by vipsoft (robocoder)

Replying to matt:

I vote for won't fix..

But it would be nice-to-have. I'll reserve judgement until I've seen a patch.

comment:9 Changed 4 years ago by halfdan

matt: SearchEngine patch is attached.

vipsoft: I'll need to think this through. I'm not sure yet on how to visualize the combined data (e.g. as_eq=UNWANTED should not appear as "keyword" because keywords usually suggest that the page was found _using_ that keyword and not by ignoring it).

I'm mabye postposing the patch until after 0.6.2 as I'll focus on other work first. Feel free to move the ticket to 0.8 when necessary.

Changed 4 years ago by halfdan

comment:10 Changed 4 years ago by matt (mattab)

as_eq=UNWANTED is equivalent, I believe, to using minus: "keyword -notThisKeyword"

comment:11 Changed 4 years ago by matt (mattab)

(In [2212]) Refs #1370 adding google URLs thanks halfdan!

comment:12 Changed 4 years ago by vipsoft (robocoder)

  • Summary changed from Search engine fixes. to Google Advanced Search

comment:13 Changed 4 years ago by vipsoft (robocoder)

  • Keywords search engine google removed

I'll keep this ticket open. In the meantime, I created #1381 to record the addition of the missing Google URLs/domains, for the upcoming 0.6.2 changelog.

comment:14 Changed 4 years ago by matt (mattab)

  • Milestone changed from 0 - Piwik 0.6.2 to Features requests - after Piwik 1.0

moving it to later milestone.

comment:15 Changed 4 years ago by vipsoft (robocoder)

At the same time, I propose we add some code to parse out the keywords when the referer is webcache.googleusercontent.com.

comment:16 Changed 4 years ago by vipsoft (robocoder)

  • Milestone changed from Features requests - after Piwik 1.0 to 1.1 - Piwik 1.1

comment:17 Changed 4 years ago by vipsoft (robocoder)

Opening a separate ticket for webcache.googleusercontent.com. #1692

comment:18 Changed 4 years ago by vipsoft (robocoder)

  • Resolution set to fixed
  • Status changed from new to closed

(In [3118]) fixes #1370 - constructs the equivalent q= query from the advanced search parameters

comment:19 Changed 3 years ago by matt (mattab)

  • Priority changed from normal to major
  • Type changed from Bug to New feature

comment:20 Changed 3 years ago by matt (mattab)

  • Priority changed from major to normal
  • Summary changed from Google Advanced Search to Google Advanced Search support

comment:21 Changed 3 years ago by sple007

comment:22 Changed 3 years ago by sple007

Note: See TracTickets for help on using tickets.