Opened 3 years ago

Closed 19 months ago

Last modified 16 months ago

#1823 closed New feature (fixed)

Include GeoIP in core after improvements

Reported by: matt Owned by: capedfuzz
Priority: critical Milestone: 1.9 -- Piwik 1.9
Component: Core Keywords:
Cc: greg Sensitive: no

Description (last modified by matt)

See doc: Geo Locate visitors countries cities and regions.

GeoIP plugin #45 is one of the most popular plugins. For a web analytics tool, getting user countries as accurately as possible is critical, and Piwik should help users in this direction.

When the plugin is released in trunk, we should update the FAQ, website pages and wiki pages mentionning GeoIP, and mark as closed the GeoIP ticket #45. For Goals compatibility of GeoIP plugin, see #1434.

Please let us know in the comments your feedback. If you would like to participate... well you know what to do!

Attachments (3)

GeoIP.php (15.3 KB) - added by greg 22 months ago.
Patch of GeoIP.php that allows it to store region ids.
geoip.updater.sh (588 bytes) - added by interfasys 19 months ago.
Database updater via cron
TracFaq.txt (3.8 KB) - added by thomasjones 2 months ago.
seo services

Download all attachments as: .zip

Change History (99)

comment:1 Changed 3 years ago by matt (mattab)

  • Description modified (diff)
  • Milestone changed from Features requests to Piwik 1.x
  • Priority changed from normal to critical

comment:2 Changed 3 years ago by vipsoft (robocoder)

  • Description modified (diff)

comment:3 Changed 3 years ago by greg (gka)

Just some input to clarify the terms "country" and "region". Refering to the list of administrative levels used in OpenStreetMaps a country would correspond to admin level 2 while regions would correspond to admin level 4.

comment:4 Changed 3 years ago by greg (gka)

Do we record lat/long for each visitor, or do we assume that other systems will know where to plot a given City

I think it is NOT necessary to record lat/long for each visitor. It is sufficient to record the city id. The GeoIP db would resolve each visitors location within the same city to the same lat/long anyway. In fact for each city there is only one lat/long stored in the GeoLite City DB (more precisely in the cityByCountry table).

As the number of available cities (= pairs of lat/long) differs between the different GeoIP databases, it makes no sense to put these information into other systems like the world map.

comment:5 follow-up: Changed 3 years ago by tlitody

if I might add to this. The maxmind db gives a city lookup. This does not work how people think it will.
Blocks of IP numbers are sold to to service providers who resell to end users. However, the IP issuing authority assign the city of the ISP address to all the IP numbers. At least that is how it works in the UK. Things may vary in different countries and ISPs don't reallocate city when they sell dedicated IP numbers to end users.
The result is that city lookup generally only gives the city of the ISP and not where the visitor is visting from. The ISP can be anywhere in the country and hundreds of miles from where the visitor is based. In other words, city lookup is useless except for giving the location of ISPs. This also means that lat long is useless too since it seems to be based on city lookup.
When IPV6 is rolled out and if, and only if, ISPs allocate city to users when they purchase a fixed IP then city lookup may become useful. But many ISPs still use dynamically allocated IPs so it wouldn't work in that case either.
In short the concept of providing city and/or lat/long of vistors is fundamentlly flawed.

comment:6 Changed 3 years ago by interfasys

+1 for this, especially the Apache module detection routine. I get a few fatal errors in my logs because the plugin insists on loading the local files instead of getting the data from Apache.

comment:7 Changed 3 years ago by Martin_712

comment:8 Changed 3 years ago by Martin_712

I have the commercial db of Maxmind. You can use it if you want for developing the new plugin. Let me know how I can contact you.

comment:9 Changed 3 years ago by vipsoft (robocoder)

  • Milestone changed from 1.x - Piwik 1.x to 1.2 - Piwik 1.2

comment:11 Changed 3 years ago by vipsoft (robocoder)

  • Owner set to vipsoft

I'll take this on, in conjunction with the ipv6 ticket.

comment:12 Changed 3 years ago by vipsoft (robocoder)

  • Milestone changed from 1.2 Piwik 1.2 to 1.3 - Piwik 1.3

comment:13 Changed 3 years ago by matt (mattab)

  • New report: Users companies? I believe that we can get the 'Enterprise/Company' that Visitors connect from, with the GeoIP data. It would be interesting to propose this new report in Users > Countries page.

comment:14 Changed 3 years ago by kip

Great idea.

comment:15 Changed 3 years ago by vipsoft (robocoder)

In the existing GeoIP plugin, there's a misc/.htaccess file. We don't want this in the new plugin. Access to geoipUpdateRows.php (or equivalent) should be guarded via token_auth.

comment:16 Changed 3 years ago by kip

Should I delete the .htaccess file in there?

comment:17 Changed 3 years ago by vipsoft (robocoder)

Yes, you can remove the .htaccess file. After you've run it once, you shouldn't have to run it again.

comment:18 Changed 3 years ago by kip

$ php geoipUpdateRows.php

Fatal error: Call to undefined function _parse_ini_file() in /home/kiplingw/avaneya.com/piwik/core/Config.php on line 373

???

I also removed the .htaccess file.

comment:19 Changed 3 years ago by vipsoft (robocoder)

This should be fixed in the updated .zip that I attached to #45. Heres the patch so you don't have to redownload the .zip:

Index: geoipUpdateRows.php
===================================================================
--- geoipUpdateRows.php	(revision 51)
+++ geoipUpdateRows.php	(working copy)
@@ -20,8 +20,8 @@
 		. PATH_SEPARATOR . PIWIK_INCLUDE_PATH . '/libs'
 		. PATH_SEPARATOR . PIWIK_INCLUDE_PATH . '/plugins');
 
+require_once PIWIK_INCLUDE_PATH . '/libs/upgradephp/upgrade.php';
 require_once PIWIK_INCLUDE_PATH . '/core/testMinimumPhpVersion.php';
-
 require_once PIWIK_INCLUDE_PATH . '/core/Loader.php';
 
 $GLOBALS['PIWIK_TRACKER_DEBUG'] = false;

comment:20 Changed 3 years ago by kip

Thanks. Applied. How can I test it?

comment:21 Changed 3 years ago by kip

I ran

$ php geoipUpdateRows.php

It finished execution (no output), and I noticed the UserCountry_ thing is still there in the stats. Should I just ignore that for now and assume new stats will not have that?

comment:23 Changed 3 years ago by kip

Thank you =)

comment:24 follow-up: Changed 3 years ago by matt (mattab)

To answer questions in the ticket:

  • we should track regions in the log_visit table (new field, char(2) ?) as per specification in: http://www.maxmind.com/app/fips10_4
  • we should also track cities, and add a new field for cities.
    • the API for cities, would return the 'label' being the city name, and also one column for latitude, one column for longitude, since this is required to draw the Cities Maps #1652
    • this means, that we don't store lat/long in log_visit

comment:25 in reply to: ↑ 24 ; follow-up: Changed 3 years ago by vipsoft (robocoder)

  • Cc greg added

Replying to matt:

  • this means, that we don't store lat/long in log_visit

I'm thinking of keeping lat/long because:

  • in #1652, greg says the map plots by lat/long; including it in log_visit avoids a city-to-latlong lookup
  • the GeoIP plugin already stores lat/long in log_visit
  • future support for HTML5 navigator.geolocation

comment:26 Changed 3 years ago by vipsoft (robocoder)

re: comment:13

I would like to propose:

  • rolling the provider plugin into the geolocation plugin
  • if the geolocation plugin can get the organization field, it populate location_provider
  • otherwise, fallback to the gethostbyaddr() method

comment:27 Changed 3 years ago by matt (mattab)

I would like to propose:

  • rolling the provider plugin into the geolocation plugin
  • if the geolocation plugin can get the organization field, it populate location_provider
  • otherwise, fallback to the gethostbyaddr() method

great idea!

The only thing, is please make sure the few "Provider" special cases are still working. In particular, VisitorGenerator & proxy-piwik.php disable the Provider lookup because it is too slow

comment:28 Changed 3 years ago by vipsoft (robocoder)

Ok. There are a couple of third party plugins (e.g., KSVisitorImport and TrackerSecondaryDb) that also disable the Provider plugin.

comment:29 Changed 3 years ago by matt (mattab)

as a note, these plugins will be obsolete once we implement #134

comment:30 Changed 3 years ago by vipsoft (robocoder)

  • Milestone changed from 1.6 Piwik 1.6 to 1.x - Piwik 1.x

comment:31 Changed 3 years ago by bompus

+1 vote for adding regions onto this as well. They are available in GeoLiteCity, so might as well use them. It would be great to include this into a regional map as well that the country map can drill down into.

comment:32 Changed 3 years ago by jawsmith

Copy of my coment to http://dev.piwik.org/trac/ticket/45 (sorry, I used wrong ticket, apparantly, I knew there was one specifically for integration of GeoIP into core):

This new plugin sounds promising. But I hope you are going to also keep the old browser language/country detection, maybe named as such. I personally consider that language display equally important as the IP location display.

Following scenario: I'm on a travel around the world, and have a travel blog. People accessing that blog are often people I have met on the trip, often still traveling. Now, when I see my Piwik logs, the IP location (which I currently check manually) is surely interesting, but what tells me more about a visitor is actually his browser language. If you check the IP address I am writing this from, you will see that it is Malaysian. How much do I have to do with Malaysia? Nothing. My browser language is German of Germany, which tells more. And the combination of the two – IP location and browser country (i.e. the current detection) actually provides one more detail: the visitor is most likely a traveler or an expat. I can imagine website who interested in that marketing information.

You would not believe how many travelers roam the world this days. And I would say most of them use the often free WiFi (at their place of stay, bars and restaurants all over Southeast Asia) with their own devices: Laptops, Phones, Tablets, etc. It seems to be the new way of travel, with people sticking their noses into displays half of their time, with most of that time on Facebook.

P.S.: Since there are countries with several languages (Belguim, etc.), but also countries with common language (UK, US, etc.), maybe both, the browser country and its language could be shown (if provided by browser). Additionally to the IP location provided by this plugin.

comment:33 follow-up: Changed 3 years ago by vipsoft (robocoder)

jawsmith: #638

comment:34 Changed 3 years ago by thibaut

+1 on jawsmith proposal on having a combined vision on location against visitor's preferred language.
As a belgian developer I can tell you that this kind of information can be of crucial interest in a country like Belgium, but in many others too. For example, usage of the spanish language in some regions of the US can be an important factor I think...

I imagine an ideal "Vistor countries" GeoIP plugin offering the current "Countries" split, clicking a country name would open a "Regions" list, clicking a region name would open (the currently available) "Cities" list. Then an additional button could be fit at the bottom, between the "Display simple table" and "Display a table with more metrics" that would "Display a table with languages". That table could have one additional column for each language that was detected...

comment:35 in reply to: ↑ 33 Changed 3 years ago by jawsmith

Replying to vipsoft:

jawsmith: #638

Thank you very much for the info on the browser language detection plugin! That just leaves the browser country detection, in case the GeoIP plugin replaces it in core. (E.g.: Is it a British or an American accessing my website from the Philippines?)

comment:36 Changed 3 years ago by greg (gka)

Do we recording regions as well as Countries?
Do we record Cities?

As the new world map widget will be able to display data for regions and cities, it would be amazing if Piwik would be able to record the data for regions and cities :)

Do we record lat/long for each visitor, or do we assume that other systems (eg. the world map) will know where to plot a given City (and maintain their own database)?

Nope, the world map doesn't store locations for every city. Instead, it will be able to plot any given lat/long onto the map.

comment:37 Changed 3 years ago by matt (mattab)

Here is proposal of the API functions and returned data for the GeoIP integration in core:

  • UserCountry.getCountry(idsite,period,date)
    • Returns current country report: (Country name Germany, metadata Country code DE, visits, actions, etc.)
  • UserCountry.getRegion( idsite,period,date,country_code)
    • Returns regions for the requested country_code: (Region name, metadata Region code, visits, actions, etc.)
  • UserCountry.getCity( idsite,period,date,country_code)
    • Returns all cities for the request country_code: (City name, metadata Lat, metadata Long, visits, actions, etc.)

Note:

  • by design, It is not possible to get the "Top regions" across all countries
  • by design, it is not possible to get the "top cities" across all countries
  • the API would differ slightly from the standard way of requesting "subtables" where we normally pass the idSubtable. For these APIs, we would pass the country_code (2 letters). This is for easier API usage and also because, as opposed to other API functions (keywords, URLs, Campaigns, etc.) that are always different, Countries stay the same.

comment:38 in reply to: ↑ 25 Changed 3 years ago by matt (mattab)

Replying to vipsoft:

Replying to matt:

  • this means, that we don't store lat/long in log_visit

I'm thinking of keeping lat/long because:

HTML5 Geolocation is probably never going to be used by Piwik since it requires the user to opt-in to share the location with the domain name.

  • in #1652, greg says the map plots by lat/long; including it in log_visit avoids a city-to-latlong lookup

I am reluctant to include redundant information in the log_visit table.

  • How "hard" is the lookup City->Lat/Long (does GeoIP offer this lookup, is it fast?)

At minimum, we should record in log_visit

  • Country
  • Region
  • City
  • Note: I propose to remove "Continent" and process this from the aggregated Country datatable in the Archiving function. It would be trivial/fast to process the Top continents. This would save 3 bytes per visit which is always good.
  • The question remains if we need to store lat/long, depending how fast/easy it is to query lat/long from a given City using GeoIP (maybe this is not possible?)

Thoughts?

comment:39 Changed 3 years ago by greg (gka)

The question remains if we need to store lat/long, depending how fast/easy it is to query lat/long from a given City using GeoIP (maybe this is not possible?)

Depends on what kind of database you're using. If you're using the CSV database and import it to MySQL tables, than you can run a query like

SELECT latitude,longitude FROM location WHERE city = 'Berlin'

in < 1ms. However, you will get ambiguous results when just looking for city names. Instead, a better idea would be to store the unique GeoIP location-id.

I don't know if any of the GeoIP APIs that work with the binary database (.dat) supports reverse-queries. All I saw was the IP --> location way..

comment:40 Changed 3 years ago by greg (gka)

Note: I propose to remove "Continent" and process this from the aggregated Country datatable in the Archiving function. It would be trivial/fast to process the Top continents.

+1, since 'classic' Continents are also quite useless for many scenarios. Often, people are more interested in political/economic regions, e.g. MENA

comment:41 Changed 2 years ago by matt (mattab)

  • Milestone changed from 1.x - Piwik 1.x to 1.8 Piwik 1.8

comment:42 in reply to: ↑ 5 Changed 2 years ago by jokergermany.de.vu

Replying to tlitody:

However, the IP issuing authority assign the city of the ISP address to all the IP numbers. At least that is how it works in the UK. Things may vary in different countries and ISPs don't reallocate city when they sell dedicated IP numbers to end users.

In Germany you haven't this Problem since AOL doesn't exist in Germany anymore.
You can locate the City. In rural Areas the difference between the real location and the indicated Area can be 55km... This is my experience.

comment:43 Changed 2 years ago by matt (mattab)

(In [5775]) Refs #2902

  • Provider plugin does not do dns reverse lookup when ip was anonimized since it does not work and is slow to fail
  • Changed recommended "IP Anonymization" from 1 byte to 2 bytes as per recos in http://piwik.org/privacy/
  • Now IP is anonmized only after IP Exclude was tested. All code should use the anonymized IP. If there is a need later to access the non original raw IP, we can add this in the code, but for now there is no such use case

Refs #1823

  • For Geoip it means that the plugin will only have access to the anonymized IP.
    • It would be nice if we could still guess the country at least from the anonymized IP. But I suppose that the IP ranges are not evenly distributed between countries and several countries would share the same IP ranges (with the last 2 bytes removed)...
    • If guessing country or region, is not possible from the anonymized IP, it is an acceptable tradeoff that GeoIP does not work when IP anonymization is enabled, since it's the condition to respect visitors privacy. Once confirmed, we could document this in the Anonymize IP UI tooltip.

comment:44 Changed 2 years ago by matt (mattab)

When Anonymize IP is enabled with only 1 byte removed, could we default the last byte to 1 so that we get at least an approximate User location? See also: #3023

comment:45 Changed 2 years ago by greg (gka)

In fact, in most cases that's the same level of accuracy as if you would use all 4 bytes..

comment:46 Changed 2 years ago by matt (mattab)

Sounds good, we will most likely do this then. This will limit user frustration significantly since there has been many complaints that "Provider" reports is not working at all when IP anonymized (it would be even worse if GeoIP was broken!)

comment:47 Changed 2 years ago by vipsoft (robocoder)

Reasonable assumption as long as the IP belongs to a class C address (or larger). It also depends on the quality of the geolocation data provider.

comment:48 Changed 2 years ago by matt (mattab)

Thank you guys for your feedback

  • Let's check for last byte of the IP, if it is set to 0 then set it to 1, and do the GeoIP lookup.
  • @vipsoft do you know if a beta version could be available soon, for Greg to finalize the SVG maps using the final API? Thanks!!

comment:49 Changed 23 months ago by interfasys

For some reason, the plugin behaves differently when called via the log import script and we get a fatal error.

PHP Fatal error: Cannot redeclare geoip_country_code_by_name() in /plugins/GeoIP/libs/geoip.inc on line 347

Checking if those functions have already been declared doesn't help as it seems the whole geoip.inc file shouldn't be called.

Env
PHP 5.4
Piwik 1.8.2
mod_geoIP in Apache
geoIP pecl extension in PHP

comment:50 Changed 23 months ago by vipsoft (robocoder)

Interfasys: your php-cli has the geoip extension enabled which has the same api as the php library used by the GeoIP plugin (#45).

This conflict will be addressed by the new Geolocation plugin.

comment:51 Changed 22 months ago by matt (mattab)

many users are discussing patches to the GeoIP files in: http://forum.piwik.org/read.php?2,71788

for each person posting in the forum there are probably 10 users having the same issue

it shows the very high interest of the community in having an integrated geoIP plugin in core :)

Changed 22 months ago by greg (gka)

Patch of GeoIP.php that allows it to store region ids.

comment:52 Changed 22 months ago by greg (gka)

Btw, here's my patch of the GeoIP plugin (just GeoIP.php in this case). It enables the plugin to store region information, which is essential for the map widget I develop.

comment:53 Changed 21 months ago by vipsoft (robocoder)

(In [6545]) refs #1823 - commit geolocation adapters and plugin stubs

comment:54 Changed 21 months ago by matt (mattab)

Thanks Anthon for the initial commit!!

There is quite some work left on this task:

  • Implement the Tracker Hook
    • to record country/region/lat/long in log_visit
    • How does it work when IP is anonymized? Can we lookup country before anonymization?
    • How does it work with IPV6 ?
  • Archiving task
  • API: see http://dev.piwik.org/trac/ticket/1823#comment:37
    • Also could the API read the GeoIP plugin blobs archived with the previous plugin, to keep the already processed reports?
  • Import code
    • code to import/setup the Geolocation data files for a given provider, eg. in the local Mysql DB for GeoIP maxmind.
    • For CSV files, import is mostly handled by Piwik::createTableFromCSVFile(). The only thing we need to add is a truncate method.
  • UI
    • Choose which GeoLocation provider to use
    • Configure some optional settings for this Provider
    • Allow to upgrade to Pro versions using Aff link
    • Allow to run the GeoLocation on past records, using link, but also give out the command line
  • Investigate GeoIP plugin + contributed patches, and check if we have missed anything special

If anyone is keen to help, please let me know ASAP!! :)

comment:55 Changed 21 months ago by analyst

Thanks for all the hard work. Integration would confirm Piwik as a superior alternative to Google Analytics. I've posted this in the forum but will post here as well. While replacing the provider details with the organization details adds a ton of value to the reports, occasionally the listed organization will be the same as the ISP. This detracts value from the organization report and thus it would be nice to be able to filter out a list of ISPs using a single segment/parameter. The single segment/parameter would also allow for continual updating of the list.

I can't help on the coding side, but if there is any other way to help, please don't hesitate.

comment:56 Changed 21 months ago by matt (mattab)

@analyst Thanks, there are many ways to help indeed (Marketing, White paper writing, etc.) please get in touch matt@piwik if you're interested!

comment:57 follow-up: Changed 20 months ago by matt (mattab)

In the forum post, a user submitted a "stripped down version of geoip" and updated http://speedy.sh/mRwTk/GeoIPOrg.zip (currently not loading for me)

comment:58 in reply to: ↑ 57 Changed 20 months ago by analyst

Replying to matt:

In the forum post, a user submitted a "stripped down version of geoip" and updated http://speedy.sh/mRwTk/GeoIPOrg.zip (currently not loading for me)

I was able to download the file yesterday.

Please find it reuploaded at: http://www4.zippyshare.com/v/64743110/file.html

comment:59 Changed 19 months ago by matt (mattab)

  • Description modified (diff)
  • Milestone changed from 1.10 Piwik 1.10 to 1.9 -- Piwik 1.9
  • Owner changed from vipsoft to capedfuzz

excellent news for the piwik community: we are going to work on GeoIP in core! Thanks Anthon for your initial commit :-)

I will post here the specs for the plugin.

comment:60 Changed 19 months ago by jokergermany.de.vu

Yeah, i am waiting for this for a long time =)

comment:61 Changed 19 months ago by matt (mattab)

Note: please ignore all comments above this. The following spec replaces previous propositions:

Here is a proposal specification for the new "geoip" functionnality & the very useful feature of having more accurate visitor location information!


New Admin UI

The goal of the UI is to clearly report the status of GeoIP (Enabled / disabled / enabled but not working yet):

  • Add a new Geo Location tab in Admin. This new UI will make it easy for users to understand how to install the apache/download the Free DB/buy and install the commercial DB.
  • Detect & Allows the user to select which geoip implementation to use
    • Apache module (faster, harder to setup).
      • Sometimes, the apache module is enabled and appears as working, but doesn't provide any geo info. Can we detect this and report as non working?
    • File database (has to be downloaded, slower but most users will use this solution as it's easiest to setup & works well).
      • it would be awesome if we could automatically, on click of a link, download and extract the .dat in a piwik path such as plugins/UserCountry/lib/geoip.dat -- if there is no write permission to write in this path, display error message.
    • We could also link to use the commercial GeoIP DB file. Note to self: we should always link maxmind using the affiliate parameter: http://www.maxmind.com/?rId=piwik
    • it would be nice to also support the PECL module (see related patch for old geoIP plugin in http://dev.piwik.org/trac/attachment/ticket/45/UsePeclExtension.patch )


The GeoIP setting page would also show the GeoIP lookup for the Piwik super user looking at the page.

Tracker

  • Schema updates: The log_visit and log_conversion should have new columns:
    • location_city
    • location_region
    • location_latitude
    • location_longitude
  • the schema updates can go in the core schema: core/Db/Schema/Myisam.php
  • When enabled, GeoLocation plugin will overwrite the location_country and location_continent columns
  • When GeoIP doesn't work, is disabled or does not know the country, then we default to the current default algorithm (see FAQ for explanation of current algorithm).
  • When IP is anonimized, the last byte(s) might be 0. In this case, we'd like still to provide GeoLocation. So, the suggestion is to remove .0 and replace with .1 for each zero byte, then do the lookup. Or maybe even the GeoIP lookup works for anonymized IP addresses... See: #3023
  • Detail: Sometimes, geoIP info is available for country, city, and/or region. It can also be exclusive OR. Therefore, we should be careful to set each data point separetely.
  • Test that, when Visitor IP is forced via the parameter &cip=1.2.3.4&token_auth=xyz then the Geolocation is done on this forced IP "cip".

New reports & New APIs

Under Visitors > Location & Provider...

  • the Countries report will automatically report geoip Countries
    • Note: each row is not clickable to open a subtable. The "regions" and "cities" are not implemented as subtables.
  • New report: Top Regions in the world.
    • The report should show regions independantly of countries. Each row would read, eg. "CA, United States" or "Ile de France, France". * Each row would show the flag of the country.
    • Each region has a pretty name, and a code name. The API output should have therefore: label (which will be $prettyName, $country), regionName, regionCode.
    • Each row also has a metadata "country" with the country code.
      • This way, one can easily filter to request only the regions from a set of countries. This will be useful for the maps: Greg wants to display the Top regions / cities in Europe for example. So he could simply do: filter_column=country&filter_pattern=FR|ES|DE would filter the "Top regions" to only france, spain and germany.
  • New report: Top Cities in the world.
    • Each row shows flag of country
    • the label displayed in UI will be "Paris, Ile de France, France". this will avoid the problem of same city name in different regions/countries.
    • similarly to top regions report, each row should have column with the country ID
    • The Top Cities API also return latitude/longitude so the map can plot the cities
    • eg. Using filter_column=country&filter_pattern=FR would return cities in france (eg. country would be a metadata column)


Also in Goals > Overview, and in each Goal > $goal_n report:

  • in the bottom, there is already "View goals by Visit>Country"
  • we should add " View goals by Region" and "View goals by City"


To process these new reports, there is going to be new archiving:

  • Region archiving
  • Cities + lat + long archiving
  • Archiving of region, cities, for each goal


Other UI changes

  • In Visitor Log & Real time widget: display the City + Region of the visitor on hover on the flag.
  • The new cities/regions reports will have metadata, and therefore will display in PDF/HTML and Piwik Mobile app.
  • When we build this, we should output a big warning on screen when the GeoIP plugin is enabled, to remind people to disable the old plugin. we can write an ugly warning in the Visitors > Location screen for example (or in the admin page).

General notes

  • I think we could put the GeoIP location in the existing UserCountry plugin. By default, GeoIP is not used, but users will easily be able to enable it. Having the GeoIP in the exisitng UserCountry plugin makes things easier (for controller output / API management - we don't need Usercountry.getcountry and Geolocation.getCountry...).
  • See Geoip home and Geoip FAQ
  • please look at the existing GeoIP plugin code for inspiration: #45 - but, warning, the code is old and probably has many issues. This ticket really is about making a clean rewrite of this plugin :)
  • There is some preliminery code committed in: http://dev.piwik.org/trac/changeset/6545
    • Webservice.php can be removed, we do not want to support the web service.

List of Tests to check before release

On top of the "automatic" integration tests testing the API, Here are some ideas of things to test that things still work as expected:

  • when GeoLocation disabled, country detection based on language
  • when GeoiP enabled, but when geoIP.dat is corrupted, country dection based on browser lang
  • when GeoIP enabled, Apache module selected, but module is broken or return empty strings: use default country dection based on browser lang
  • test with anonymized IP 1 byte or 2 bytes, that in this case the lookup assumes the anonmized byte is 1 and do the lookup. This will be approximate but most of the time will be quite good! The test
  • test Row Evolution works for region/cities/countries
  • test new reports work in HTML / PDF
  • test with the commercial version of Maxmind city/region to test that the code works as well as with the free DB


Note: all these should not necessarily be unit tests, but at least manual testing once is very important...

Script that will enrich existing past data with GeoIP

  • The old GeoIP plugin has a script that updates rows in log_visit table to enrich with city/region/lat/long. This script would be very nice to have in the new release. Not mandatory for V1, so let's do it only if trivial / enough time.


For later / V2 and beyond
Not for a first release, but ideas backlog for the future:

  • Test compat with IPv6. Maxmind seems to partially support, so we could try and make it work with ipv6
  • Test if shared memory caching, GEOIP_SHARED_MEMORY, results in better performance (it should). We'll do that only if users complain about slow or memory hungry piwik.
  • test with Cloudflare proxy which pre-looks up the GEOIP and stores it in: $_SERVERHTTP_CF_IPCOUNTRY?
  • show maps of last 100 visits, or 500 visits (use standard "limit" selector to change limit). The map would plot on lat/long a marker. the marker when clicked would show the Visitor information, and a link to view the visitor log restricted to this particular visit.
  • test the returned cities names are UTF8 / unicode. Test for example with Brazil "São Paulo"

End of Spec.

Let me know if there's any question or suggestion!

Changed 19 months ago by interfasys

Database updater via cron

comment:62 Changed 19 months ago by interfasys

I've just posted a file which can be used for monthly updates via cron:
# Monthly GeoIP updates
55 12 2 * * root /usr/local/bin/geoip.updater.sh
It could also be used for the initial DB setup.

comment:63 Changed 19 months ago by matt (mattab)

Addition to the spec:

  • Region, city, lat, long, should be available as segment so we can segment any report, and goal report, to a given city or within a range of lat / long.

comment:64 Changed 19 months ago by JulienM (JulienMoumne)

We probably need to write an FAQ on how to migrate from the old plugin to the new one.

comment:65 Changed 19 months ago by adrian

Maybe I missed it, but we should consider province/state as well. In some countries city names are duplicated which could make study of GEOip traffic inaccurate. For example: http://www.canada-city.ca/duplicate-cities.php

comment:66 Changed 19 months ago by capedfuzz (diosmosis)

(In [7122]) Refs #1823, modified UserCountry plugin to allow use of GeoIP databases if desired. Added two reports, getVisitsByRegion + getVisitsByCity.

Notes:

  • Supports country, region, city, org & isp GeoIP databases.
  • Supports GeoIP PHP API, PECL module & server modules.
  • Added ability to regenerate 'general' tracker cache.
  • Removed location_continent column from log_visit & log_conversion tables, and removed visits by continent blob record. Report is now a view over country report.

comment:67 Changed 19 months ago by matt (mattab)

(In [7130]) Cleaning up misc/ directory to prepare for GeoIP files, Refs #1823

comment:68 Changed 19 months ago by capedfuzz (diosmosis)

(In [7134]) Refs #1823, use misc dir instead of files-geolocation for GeoIP db files.

comment:69 Changed 19 months ago by capedfuzz (diosmosis)

(In [7140]) Refs #1823, several changes & tweaks to GeoIP modifications:

  • Renamed getVisitsByCity & getVisitsByRegion to getCity & getRegion.
  • Extra testing for anonymized IPs.
  • Show visitor city & region in visitor log & last visits widget.
  • Do specific check for apache module in checking for server based geoip implementation.
  • Fix for continent segment error.
  • Redesigned admin UI to be more compact & to show reason for broken implementations.
  • Don't show duplicate Unknowns in pretty location strings.
  • Don't use REMOTE_ADDR, instead get IP from Piwik_IP.

comment:70 Changed 19 months ago by capedfuzz (diosmosis)

(In [7144]) Refs #1823, added latitude/longitude + other metadata to getCity reports, added country name metadata to getRegion & tweaked admin UI a bit.

comment:71 Changed 19 months ago by capedfuzz (diosmosis)

(In [7145]) Refs #1823, small test improvement/fix.

comment:72 Changed 19 months ago by EZdesign (BeezyT)

The admin UI explaining the different implementations is great. But it doesn't tell the user how to set up new methods. Maybe it should be linked to a doc page where that's explained?

comment:73 Changed 19 months ago by peterb (peterbo)

(In [7150]) Refs #1823, added missing constant that stops tracker from working / breaks tracker if location code is unknown.

comment:74 Changed 19 months ago by peterb (peterbo)

(In [7151]) Refs #1823, reverted change of Visit Class- Constant was already defined in UserCountry. Only the reference to LocationProvider was wrong. Changed static Reference.

comment:75 Changed 19 months ago by capedfuzz (diosmosis)

(In [7159]) Refs #1823, fixed conversion tracking omission in initial GeoIP commit & added tests for conversion locations & unknown location. Removed 'Unknown' regions & cities from visitor log tooltips, added more detailed error messages for issues w/ the PECL module, add test w/ test IP and known result to isWorking methods and some more admin UI tweaks.

comment:76 Changed 19 months ago by capedfuzz (diosmosis)

(In [7167]) Refs #1823, more admin UI tweaks.

comment:77 Changed 19 months ago by matt (mattab)

(In [7169]) Refs #1823

  • Cache regenerate should re-generate the cache file
  • Missing require_once which was causing error

Fatal error: Class 'Piwik_UserCountry_LocationProvider' not found in /plugins/UserCountry/UserCountry.php on line 78

comment:78 Changed 19 months ago by matt (mattab)

(In [7172]) Refs #1823

  • Geolocation menu now appears green when selected

comment:79 Changed 19 months ago by EZdesign (BeezyT)

The new reports don't have a report documentation. Next to the country, region and city headlines, the question mark icon apprears but clicking it shows no documentation. For the continent report, the question mark icon doesn't appear at all, which is OK. IMO the reports should either be documented or the question mark should not be shown.

comment:80 Changed 19 months ago by capedfuzz (diosmosis)

(In [7180]) Refs #1823, test GeoIP w/ normal tracking, bulk tracking & log importing. Test if Apache module is working using GEOIP_ADDR server variable instead of GEOIP_COUNTRY_CODE since the latter may not always be set.

comment:81 Changed 19 months ago by capedfuzz (diosmosis)

(In [7181]) Refs #1823, fade 'Done' in & out after successfully switched location providers.

comment:82 Changed 19 months ago by capedfuzz (diosmosis)

(In [7186]) Refs #1823, many changes including:

  • Add warning if old GeoIP plugin is used.
  • Display links to installation instructions for different providers if they are not installed.
  • Add report documentation for country, region, continent and city reports, round latitude/longitude.
  • Increase 'Done' timeout when switching providers
  • Display quick start instructions for GeoIP if no GeoIP provider is currently working.
  • Add script to geolocate old data.

comment:83 Changed 19 months ago by capedfuzz (diosmosis)

(In [7187]) Refs #1823, display informative note when Region + City reports have no location data, merge unknown rows in region & city reports and make sure latitude/longitude is rounded in API output.

comment:84 Changed 19 months ago by matt (mattab)

(In [7203]) Refs #1823 Adding important-to-have target=_blank

comment:85 Changed 19 months ago by matt (mattab)

(In [7204]) Minor text change & provider ordering Refs #1823

comment:86 Changed 19 months ago by matt (mattab)

(In [7205]) remove test Refs #1823

comment:87 Changed 19 months ago by capedfuzz (diosmosis)

  • Resolution set to fixed
  • Status changed from new to closed

comment:88 Changed 19 months ago by capedfuzz (diosmosis)

This bug is fixed. :) I created a ticket for improvements here: #3442

comment:89 Changed 19 months ago by matt (mattab)

  • Description modified (diff)

See doc: Geo Locate visitors countries cities and regions.

comment:90 Changed 18 months ago by capedfuzz (diosmosis)

(In [7234]) Refs #1823, add note to geoipUpdateRows.php that tells user to re-process their reports.

comment:91 Changed 18 months ago by capedfuzz (diosmosis)

(In [7236]) Refs #1823, add note to IP anonymization about geolocation accuracy.

comment:92 Changed 18 months ago by capedfuzz (diosmosis)

(In [7260]) Refs #1823, add alternative check for GEOIP_COUNTRY_CODE $_SERVER var to ServerBased GeoIP implementation.

comment:93 Changed 18 months ago by capedfuzz (diosmosis)

(In [7281]) Refs #1823, do broken check w/ both GEOIP_ADDR & GEOIP_COUNTRY_CODE.

comment:94 Changed 18 months ago by capedfuzz (diosmosis)

(In [7283]) Refs #1823, added redundant trusted hosts warning to general settings page & display help icon that links to faq in warning.

comment:95 Changed 17 months ago by capedfuzz (diosmosis)

[7628] refs this ticket, not #1253. (typo in commit msg)

comment:96 Changed 16 months ago by matt (mattab)

We need YOUR help! We are running a crowd funding campaign to raise funds to implement the detailed Visitors Maps of Countries, Regions and Cities (for all countries)!

These maps will be beautiful, usable, and built using open standards SVG+JS. They will show detailed visitor count, conversion rates, by Country but also (New!) by city and region.

Pledge now at: http://crowdfunding.piwik.org/analytics-maps-world-country-city-region/

Piwik needs you!

Changed 2 months ago by thomasjones

Note: See TracTickets for help on using tickets.