Ticket #1652 (new New feature)

Opened 18 months ago

Last modified 10 days ago

Open Source SVG Map to show cities and regions

Reported by: nicobach1 Owned by: greg
Priority: normal Milestone: 1.x - Piwik 1.x
Component: Core Keywords:
Cc: Sensitive: no

Description (last modified by matt) (diff)

Right now the world map only shows which country the visits are coming from. It would be really great / useful if we could narrow it down to what state the visitor came from or even which city. Like in the Google analytics.

It'll be a great feature for marketing so we know where most of the visitors are!

We would like to also move the technology from Flash to full open source stack, and have the map displayed in SVG or Vector.

Once the Maps show cities and region, we definitely have to show the maps in the actual Country report: full awesomeness! see #1821

Attachments

cities2.png Download (14.7 KB) - added by greg 10 months ago.
A mockup of the new city view
regions1.png Download (18.6 KB) - added by greg 10 months ago.
A mockup of the new region view

Change History

  Changed 18 months ago by vipsoft

  • priority changed from major to normal
  • milestone changed from 4 - Piwik 1.0 to Features requests - after Piwik 1.0

This was mentioned in #1514.

  Changed 15 months ago by matt

  • owner set to greg
  • summary changed from World Map to Flash Map to show cities and regions

  Changed 13 months ago by matt

  • milestone changed from Feature requests to 1.x - Piwik 1.x

This should be implemented in the next few weeks.

  Changed 10 months ago by greg

So, I will start the map development in the next days. There are a couple of things to do:

  • porting the map preparation script to Python, reading shapefiles, projection, simplification, AMF export
  • extracting all countries sub-regions from shapefiles found at  http://gadm.org (seem to be complete for all countries)
  • updating the frontend SWF

However, there are some open problems.

  • How do we map the geo-locations to the country regions. I think this should be done once every time the GeoIP database gets an update. There are two possible ways to do this:
    • by checking point-in-polygon for each lat/lng with every region polygon and storing the result (lat/lng > country_region). This is almost accurate, but some cities may lay directly on a region border and thus may be wrongly mapped.
    • another way could be to use the data we get from the GeoIP database. At least for some countries there should by some information about the region, e.g. the state id in USA. In any case we need to map the country region ids (which are defined in the shapefiles) to the country regions ids in the GeoIP database. However, i'm not sure if this information is available for every country.
  • In both ways, this would require some install script to run every time the user updates his GeoIP database. Do you think this is possible?

Changed 10 months ago by greg

A mockup of the new city view

Changed 10 months ago by greg

A mockup of the new region view

  Changed 10 months ago by greg

Note, in this image you can see an overview about all available region outlines. Think this is pretty much complete.

 http://gadm.org/img/gadm_v1_level1_high.png

follow-up: ↓ 7   Changed 10 months ago by matt

Greg, great news that you will resume work on this :)

GADM data looks good and there for all countries, this sounds fantastic, nice find!

In both ways, this would require some install script to run every time the user updates his GeoIP database. Do you think this is possible?

It is possible to run a script on every GeoIP update, but not a good solution for ease of use. Here is a proposal: Maybe, we could run the script once before Piwik release (and commit the file to SVN after checking it's OK). We would use the latest GeoIP database for this. Then, if users use an older version (or newer version) compared to the pre-generated DB, maybe they could run the script manually themselves?

But, I would like to ask about this script, what exactly will it map?

My current understanding is that Piwik will report:

Is the algorithm designed to map "regions" according to GeoIP, to "regions" in the Piwik map?

I guess that, for each GeoIP region, we could give one lat/long of a city belonging to this region. If so, do we need a database for this, maybe the SWF could map in real time the lat/long to the pixel inside the region?

Thanks for claryfying

in reply to: ↑ 6   Changed 10 months ago by greg

Replying to matt:

It is possible to run a script on every GeoIP update, but not a good solution for ease of use. Here is a proposal: Maybe, we could run the script once before Piwik release (and commit the file to SVN after checking it's OK). We would use the latest GeoIP database for this. Then, if users use an older version (or newer version) compared to the pre-generated DB, maybe they could run the script manually themselves? My current understanding is that Piwik will report: * top cities + lat/long: visits, pages, etc. * top regions ( as per this specification): visits, pages, etc. Is the algorithm designed to map "regions" according to GeoIP, to "regions" in the Piwik map?

I didn't knew that the GeoIP DB contains a complete list of fips regions. So Piwik already knows how many visitors each region had? In this case, there's no need for another update script. Now, all I have to assure is that the map region ids are the same as the GeoIP/FIPS region ids.

Just to make sure I understand everything, here's a sample request flow:

  • At first, the client loads the Piwik Map SWF
  • the Piwik Map SWF will then request the vistors per country via Piwik API and display the world map
  • now the user can click on a continent to zoom in
  • now the user can click on a country
  • the map SWF will now load the detailed map data for this country (and it's regions)
  • at the same time, the map SWF will request the visitors per region (of this single country)
  • once the SWF received all data, it will display the regions map as shown in regions1.png
  • now the user can switch to the city mode
  • the map SWF will now request he visitors per city for this country. The API will serve a list of cities including their lat/lng coords with the number of visitors for each city.

Did I got it right?

Thanks

  Changed 10 months ago by matt

Yes it sounds good!

One other feedback, would be to give labels to the icons to switch to city/region view, and to the button to zoom in/zoom out, since these buttons are very important and must be easy to reach.

I think that we must check that GeoIP data will output the regions as we expect them, ie. that all visitors are indeed assigned in one of the regions listed in:  http://www.maxmind.com/app/fips10_4

I will double check this and confirm here.

Is there any other open question appart from this one?

  Changed 10 months ago by matt

  • summary changed from Flash Map to show cities and regions to Open Source Flash Map to show cities and regions

Greg, will the Flash maps know about all cities in the GeoIP Database?

Or, will the flash map expect a list of cities & lat/long, and plot them "blindly" in the flash map (projecting from lat/long to pixels, and drawing City name based on Piwik API as input) ?

  Changed 10 months ago by matt

  Changed 10 months ago by greg

The flash map doesn't know anything about the cities. Instead the map is able to project lat/lng coordinates to the exact pixels. I'm not sure if it will display all cities "blindly", maybe there will be some simple clustering of cities that are very close.

  Changed 10 months ago by matt

One thing I think of, is that sometimes GeoIP will not return region/city info (or just, not city some other time). So, maybe we can plan to display the "Unknown" on the map somewhere, discreetly?

Because, I guess the % displayed, will take into account the % for the Unknown region/city?

  Changed 10 months ago by greg

You mean that for some visitors GeoIP only knows the country but not the city and/or region? We can add a "Plus X unlocatable visitors" text somewhere to display this data. Good point.

  Changed 10 months ago by matt

Yes, we better expect the worst with geoip free edition, all use cases can happen. I think Piwik will return, for a country's region, the 'Unknown' (or 'Other') row that will contain these. simple text "X visits couldn't be located" sounds good!

  Changed 10 months ago by vipsoft

Sorry, I'm late joining this discussion.

I'm going to add a FIPS data file as part of my work in #1823 to convert the region names into the more compact FIPS 10-4 code when storing it in the log_visit table.

  • Do the region names in MaxMind's FIPS file correspond 1-to-1 with the region names used by gadm.org, or does the data need to be massaged?
  • Would it help if I provide a reverse lookup, i.e., FIPS code to region name?

re: the MaxMind FIPS file

  • names are (English) localized without any special characters
  • there are almost 4000 entries which means we shouldn't expect translations
  • the codes are used inconsistently, i.e., the data switches to ISO for US states and Canadian provinces

  Changed 10 months ago by matt

Partial feedback:

names are (English) localized without any special characters

Region names are in English charset, but they are not translated in English. For example french regions are written in french. Maybe this is not true for all countries.

Would it help if I provide a reverse lookup, i.e., FIPS code to region name?

I think, that the API output for getRegions (eg.) should contain the region code, and the full region name, like we do for Country API output (which includes the country code and full name, icon path, etc.)

  Changed 10 months ago by greg

Just wanted to let you know that I've got plenty of stuff to do right now. Hope to be able to continue working on this feature soon. Sorry for the delay..

  Changed 8 months ago by matt

I have just seen an interesting project: jVectorMap for  canvas+JS Map library.

Greg, what do you think about this work? hope your work load is getting better :)

  Changed 8 months ago by greg

Hi Matt,

jvectormap looks quite nice. I'm currently thinking a lot about JS/SVG based mapping myself, even developed some early prototypes while working on other projects. Still, our biggest challenge is how to build the map data files for every country (including admin level2 regions).

Thus, one of my next steps is to setup a Mapnik server and let it export clean SVG projections of shapefiles. My goal is to do that by writing as less code as possible. Mapnik is such a powerful library, so it should be almost only a matter of configuration.

After we created the map files, we still have the choice to either use one of the currently emerging JS/SVG mapping libraries (like jVectorMaps) or to develop a mini library especially for Piwik by ourself.

Besides of that, I fully agree on JS/SVG instead of Flash. Quite a challenge, but possible.

and, yes, my work load starts getting better. :)

  Changed 8 months ago by matt

Greg, I'm happy to hear you are keen on SVG for your own work. As you may have seen in the blog recently, we are now using canvas graphs only, and Timo from the team has contributed many patches to jqplot to make it work nicely in our use case. Hopefully, an existing library can meet our needs in terms of performance, maintainability, features, licensing.

In any case this work will be reused, not only in Piwik but I'm sure in hundreds of other projects in the future, since we are building the first truly open source world mapping with region details for all countries.

With shapefiles, I hope you can find a format that is of low size and with good shape quality, that must be quite a challenge.

  Changed 5 months ago by matt

  • description modified (diff)
  • summary changed from Open Source Flash Map to show cities and regions to Open Source SVG Map to show cities and regions

  Changed 5 months ago by matt

  • description modified (diff)

Once the Maps show cities and region, we definitely have to show the maps in the actual Country report: full awesomeness! see #1821

  Changed 5 months ago by matt

See also the blog post by greg:  http://vis4.net/blog/posts/piwik-maps-2/

  Changed 5 months ago by greg

I now proceed with this task. As a first step, I looked at the shapefiles provided by gadm.org. Since some people seem to be very interested in my work (I got plenty of mails after finishing the first version of the map), I decided to blog about my progress. I will post the links in here.

Part 1: Finding a map data source:  http://vis4.net/blog/en/posts/recreating-the-piwik-map

  Changed 5 months ago by matt

Greg; excellent first step, looking forward to the next part!

  Changed 5 months ago by matt

Regarding the simplification of shape files, it is obvious that we don't need any detail around Chile, a rough outline of the coast would do perfectly (we can't affort all the little islands ;).

Also is it possible to add a test not to plot islands less than 10 square km, or something similar?

It is really key for user experience to have the smallest file size possible to ensure fast downloading and JS parsing / CPU usage. :)

  Changed 5 months ago by greg

I just looked into a world shapefile and computed the areas of all 3761 polygons (a country may have multiple polygons for islands etc). Filtering every polygon smaller than 10 square km would remove 434 polygons (=11%). I checked the names of the countries which the removed polygons belong to and found many island states among them (as expected).

However, I think a hard cut at 10 sq km has some problems: - it removes some countries entirely (like the Maldives Islands) - it still keeps very many small islands (starting from 11sqkm)

See  sample rendering.

I tried a different rule: remove all polygons smaller than 5% of the maximum polygon area of that country. Since the maximum area of the Maldives is 9sqkm, all islands are kept, while all small islands of the USA are removed. Also this halved the resulting SVG file size keeping the map correct in terms of includedness of countries.

See  sample rendering.

However, removing every polygon smaller than 5% of the maximum per country is not satisfactory as well. Big and also well-known islands like Hawaii or  Novaya Zemlya shouldn't be removed in my opinion.

Also there is a problem with those tiny islands countries like the Maldives, which is obvious if you look at the  zoomed view on the Maldives. The islands are way too small to be rendered in a meaningful way.

This leads to some important questions: Does the map needs to include all countries or is it acceptable to ignore some countries, like the Maldives Islands?

In the old map widget this wasn't important because there was no country level view.

  Changed 5 months ago by greg

I think would like to keep islands and outlying regions in the country-level views. For instance, I could try to generate composed maps that mix different projections to fit the complete country into the map.

Like in this  example for the United States.

  Changed 5 months ago by greg

One note regarding the projection used for the country-level views. I would like to simplify the whole thing by using the same projection for every country, but with different parameters (namely the projection center).

One of the simpler projections is the orthographic projection, which looks the same as if looking onto a 3D globe. This gives quite good and less-distorted views for almost every country. The only country that looks quite distorted is Russia. Because of it's huge area, Russia takes too much space on the globe.

You can have a look at  distorted Russia here.

My opinion is that this distortion is acceptable given the simplicity of rendering maps in a single projection. In an ideal mapping world, one would use specialized projection for every country, like the New Zealand Map Grid for New Zealand etc.

Any more opinions?

  Changed 5 months ago by greg

The next design decision has to do with the aspect ratio of the maps. At some point in the map generation process I need to crop the country-level maps to a bounding box. The idea is to fit the countries as big as possible into the views while also showing a bit of their neighbour countries for navigation.

Now the question arises to which aspect ratio the maps should be cropped. As far as I see do we have to options:

1. A fixed aspect ratio (presumably some wide-screenish format) would be the simplest solution. The complete cropping could be done in the preprocessing stage, which would reduce complexity of map rendering. However, the obvious drawback of this solution is that the maps won't look as good as they could when displaying countries in the "opposite" aspect ratio. For some portrait countries (like  Germany), this should be acceptable. For other, more extreme aspect ratios (my favourite example is  Chile) don't.

2. In the latter cases, a dynamic aspect ratio would make much more sense. It would be perfect if we could present the Chilean Piwik users (I assume there are some) with a  portrait view of their country. Dynamic aspect ratios could be done either by choosing a fixed ratio per country or by cropping the maps at rendering time. Choosing a fixed ratio per country has the drawback that those countries could not fit a landscape ratio in fullscreen mode. In contrast, cropping at runtime may take more CPU. Also, I don't know if the current dashboard supports dynamic resizing of widgets, but this may be easy to implement.

  Changed 5 months ago by matt

I tried a different rule: remove all polygons smaller than 5% of the maximum polygon area of that country.

Sounds good

This leads to some important questions: Does the map needs to include all countries or is it acceptable to ignore some countries, like the Maldives Islands?

Removing countries all together is I think not a good idea. Maybe we could still display ALL countries, but leave the rule of 5% for all other countries. (eg. Maldives would be displayed with at least the main islands (if possible..), while canada would still lose many big islands.

I think would like to keep islands and outlying regions in the country-level views. For instance, I could try to generate composed maps that mix different projections to fit the complete country into the map. Like in this example for the United States.

If you can do that (at least for some selected countries?), it would be pretty cool!

My opinion is that this distortion is acceptable given the simplicity of rendering maps in a single projection. In an ideal mapping world, one would use specialized projection for every country, like the New Zealand Map Grid for New Zealand etc.

Simple is always better, even if the shape is not perfect. We could improve this later anyway. What would it look like for NZ? ;-)

The idea is to fit the countries as big as possible into the views while also showing a bit of their neighbour countries for navigation.

Also, maybe it would useful if hovering over the countries next to the zoomed country, would display a little tooltip with the Country metrics. Maybe it could be displayed next or below to the main country metrics (which would always be displayed?).

fixed aspect ratio (presumably some wide-screenish format) would be the simplest solution.

Agreed. I think it is expected that all country zooms have the same dimensions. dashboard doesn't support dynamic size widgets at present (not planned). Chile looks good in the map, even with a lot of sea. We can use the space for legends, metrics, etc.

The dynamic aspect ratio could be implemented later in the lib for static country-specific maps...

Thanks for posting your thoughts and log here, cheers!

follow-up: ↓ 33   Changed 5 months ago by greg

For New Zealand both projections doesn't differ that much. NZ is rather small (at least compared to the Earth), so the orthographic projection is  quite a good approximation.

in reply to: ↑ 32   Changed 5 months ago by matt

Replying to greg:

For New Zealand both projections doesn't differ that much. NZ is rather small (at least compared to the Earth), so the orthographic projection is  quite a good approximation.

Indeed the orthographic projection seems really good!

  Changed 4 months ago by greg

Update: I managed to generate SVG maps like this for every country:

http://vis4.net/tmp/FR.png

You can check them out  here. (ZIP, 4.8MB, contains small PNGs for each map)

This is how it works: For each country the algorithm computes a 'nice' bounding box which includes only the most important polygons. For instance, the US bounding box does not include Alaska and Hawaii and the bounding box of Spain doesn't include all those Spanish islands.

However, for some countries like Japan, the current algorithm doesn't work. We could fix this by manually adjusting the parameters for those "problem" countries.

The next step would be to replace the polygons of the active countries with their sub-region polygons.

Also I'm working on a nice compression algorithm for the vector data. SVG has way too much overhead because of all those XML syntax. The smallest file size can be achieved using a CSV like format. We could reduce the size even more by kind-of-Base64 encoding the coordinates. For instance, the number "12345" can be stored as something like "zxB", which saves two bytes for each number.

  Changed 4 months ago by greg

Here's the mentioned map of Japan

http://vis4.net/tmp/JP.png

follow-up: ↓ 37   Changed 4 months ago by matt

Thanks for the udpate, you are making nice progress! :)

AU + JP + AX look OK, but it looks like it is only a zoom problem. Would slightly zoom out should fix the display for these countries?

Canada + Greece + Indonesia + Norway + Philippines looks very detailed (all little islands), not sure we need so much detail.

Also I'm working on a nice compression algorithm for the vector data.

Great to hear, I can't stress enough how important it is to have small file sizes and fast SVG rendering. The Map will be displayed in all dashboards by default so should load uber fast so it doesn't slow down the piwik dashboard experience.

The next step would be to replace the polygons of the active countries with their sub-region polygons.

Does it mean, that you will draw the country regions inside the existing country shape, for these countries for which we have region mapping information?

in reply to: ↑ 36   Changed 4 months ago by greg

Great to hear, I can't stress enough how important it is to have small file sizes and fast SVG rendering. The Map will be displayed in all dashboards by default so should load uber fast so it doesn't slow down the piwik dashboard experience.

I see no problems for the Piwik dashboard because the map will only load those maps it really needs. Each country map will be stored in a different file and most users will only need two or three of them. The most important thing is the file size of the world map, which should be way smaller than current map plugin size (250kB).

However, I think that the total size of all country maps (again, which aren't loaded before the user clicks on a country). There are 246 countries in the map, even if I manage to reduce the avg map size to 10kB it would take more than 2MB to store all of them.

Does it mean, that you will draw the country regions inside the existing country shape, for these countries for which we have region mapping information?

The resulting maps will look like this: http://vis4.net/tmp/DE_lev1.png

http://vis4.net/tmp/FR_lev1.png

Just to clear this up: the region maps are available for *every* country. The mapping between GeoIP cities and country regions must be computed in most cases.

  Changed 4 months ago by matt

However, I think that the total size of all country maps (again, which aren't loaded before the user clicks on a country). There are 246 countries in the map, even if I manage to reduce the avg map size to 10kB it would take more than 2MB to store all of them.

One idea: All countries XML could be stored in a single ZIP file. Then, the requests to get the XML for a given country, could go through a Piwik php controller, this controller would on demand unpack the zip and only return the XML (or CSV) that is being requested. So that, the size overhead of all maps is just the size of the ZIP containing them all. The PHP code would be very simple and simply unpack the ZIP (code already exists cf. Piwik_Unzip) and return the requested mapping info.

If you have 200 * 10kb = 2Mb, zipped should be around 200-400kb maybe which should be uber fast to unzip, and also be low overhead. Thoughts?

follow-up: ↓ 41   Changed 4 months ago by greg

If you have 200 * 10kb = 2Mb, zipped should be around 200-400kb maybe which should be uber fast to unzip, and also be low overhead. Thoughts?

Zipping the kind-of-base64 encoded CSV files leads to compression rates of approximately 50% – not 10-20%.

Is disk storage really such a big issue for Piwik? Since the MySQL tables are getting large anyway (500kB per month in my own Piwik installation), I think that 2MB is still acceptable. However, serving the maps gezipped to the browser makes sense, since web traffic *is* a big issue for dashboards, especially when also supposed to run on mobile devices.

follow-up: ↓ 42   Changed 4 months ago by greg

By the way, to get an overview about the GeoIP location database I quickly mapped all stored locations. Maybe we could use this information to kind of focus on those maps on countries where it makes sense at all.

 http://vis4.net/tmp/geoip-locations.png

Some African countries have less GeoIP locations than sub-country regions. We could consider to limit the region-level reporting to those countries whose GeoIP location density is greater than the density of regions.

Close-up view on France:

http://vis4.net/tmp/FR-closeup.png

in reply to: ↑ 39   Changed 4 months ago by matt

Is disk storage really such a big issue for Piwik? Since the MySQL tables are getting large anyway (500kB per month in my own Piwik installation), I think that 2MB is still acceptable. However, serving the maps gezipped to the browser makes sense, since web traffic *is* a big issue for dashboards, especially when also supposed to run on mobile devices.

Disk storage is not a critical issue, but we try to keep the ZIP as small as possible, since once unzipped it is already 15.5 MB (and 5.5M zip).

We could easily serve the region maps in GZIP using the existing function Piwik::serveStaticFile which returns gzip if supported by server.

in reply to: ↑ 40   Changed 4 months ago by matt

Some African countries have less GeoIP locations than sub-country regions. We could consider to limit the region-level reporting to those countries whose GeoIP location density is greater than the density of regions.

Interesting map! Did you use the free GeoIP db for it? It would be interesting to plot the same using the commercial geoip DB (which some Piwik users will use) and see if there is any difference. I've sent you an email.

In any case I think it would be nice to have sub region mapping for all countries, but maybe the regions boundaries could be less sharp/accurate for the countries for which GeoIP density is poor? (rather than not plotting regions at all for these countries). Thoughts?

  Changed 4 months ago by greg

The current GeoLiteCity db contains more locations (318814) than the commercial GeoCity db you sent along (168685). The reason could be that the version you have is dated to 2005.

However, the distribution pattern looks almost the same.

 http://vis4.net/tmp/geolitecity-locations.png

 http://vis4.net/tmp/geocity-locations.png

  Changed 4 months ago by greg

I just checked all the region codes inside the GeoIP database and found out that they are not easily mappable to the region codes in the GADM shapefile. For some countries, like the US, the GeoIP db stores two-letter codes, for many other countries the regions are identified by two-digit numbers. The GADM shapefile uses two-letter ISO codes to identify the inner-country regions.

What do you think how we should integrate the GeoIP database with the map plugin?

One option would be to force the user to run some kind of initialization script which calculates the region for each GeoIP location – which bears the risk of false identification of cities that lay on the border between two regions.

Another option would be to try to map the inconsistent GeoIP region ids to the right ISO region ids. We could do this either by matching the region names (which are stored in both databases) or by trying to find mapping tables for all used region identifiers. This bears the risk of false-identification of regions, for instance if there are errors in one of the databases or the automated identification fails. Also, here we would need to run an initialization script that brings both databases together.

Does anyone knows wether the GeoIP plugin uses the PHP/MySQL approach or is build on top of a GeoIP API which works with the binary database? I don't know if it is possible to access the raw list of locations through the binary API.

  Changed 4 months ago by greg

Just saw that the GeoIP database uses the admin level 1 codes from geonames.org:

 http://download.geonames.org/export/dump/admin1CodesASCII.txt

  Changed 4 months ago by vipsoft

Maybe we can work out the mapping as part of #2379.

  Changed 4 months ago by greg

By the way, I just saw that the next version of GA supports region level mapping.

 http://vis4.net/tmp/ga-regions.png

  Changed 4 months ago by greg

Ok, I just got the first results.

246 country maps, including regions for the selected country.

SVG

total size: 3.6MB

zipped: 1.0MB

CSV

total size: 2.6MB

zipped: 857kB

This example shows the quality of the maps:

http://vis4.net/tmp/ye.png

Regarding the fact that the SVG files are not much bigger than the CSV files, especially when zipped, I would recommend to use the SVG files, which are a lot easier to maintain. However, the server should be configured to serve the SVG files gzipped.

I would not recommend to lower the quality and hope 1MB is still acceptable. Comments?

  Changed 4 months ago by greg

Here's the zip files which contains all SVG maps:

 http://vis4.net/tmp/all-svg.zip

  Changed 4 months ago by vipsoft

+1 for SVG

  Changed 4 months ago by matt

The region maps look beautiful!

1M zipped sounds reasonnable, but no more please ;) It is an increase of 20% of the ZIP size. Worth it for beautiful, open technology maps of the whole world for sure!

Btw, I noticed that  South Sudan is not in the list of countries (SS). Is it possible to add it (it's been independent since July 2011).

Also did you see if GeoIP returns "Tibet" as a country? It would be nice to have a map for it since it should also be an independant country!

what are your thoughts regarding the mapping of geoip regions to the Geonames IDs, probably the Piwik API dataset should contain the right geonames mapping directly, so that means that the mapping should be done in the Geoip API as part of #1823. Any thoughts?

  Changed 4 months ago by greg

Btw, I noticed that South Sudan is not in the list of countries (SS). Is it possible to add it (it's been independent since July 2011). Also did you see if GeoIP returns "Tibet" as a country? It would be nice to have a map for it since it should also be an independant country!

Both regions are easy to add to the maps, since their boundaries are already available as regions of Sudan and China.

However, the problem is that the GeoIP db doesn't know about Tibet and South Sudan yet and thus never would map visitors to those 'countries'.

what are your thoughts regarding the mapping of geoip regions to the Geonames IDs, probably the Piwik API dataset should contain the right geonames mapping directly, so that means that the mapping should be done in the Geoip API as part of #1823. Any thoughts?

I will try to map the regions to the Geonames IDs during the map creation. I think the best solution is to check the regions by name similarity and to double check ambitious cases with a few point-in-polygon tests with cities we'd expect to lay inside the region.

In the end, the SVG maps will only store the Geonames/GeoIP region ids. Also it won't store additional metadata like country names to reduce filesize.

  Changed 4 months ago by greg

I think I will switch to the naturalearth shapefile since it seems to gets updated more frequently. Just discovered that it already contains the shapes for South Sudan. And the public domain license is much better..

  Changed 4 months ago by matt

However, the problem is that the GeoIP db doesn't know about Tibet and South Sudan yet and thus never would map visitors to those 'countries'.

I see, not ideal.. Hopefully in the future they will change this, and it would be nice indeed if the Map would handle these countries already :)

I think I will switch to the naturalearth shapefile since it seems to gets updated more frequently.

Sounds good :)

  Changed 4 months ago by greg

Here's a quick documentation of the SVG map rendering:

 http://vis4.net/blog/posts/rendering_country_maps/

  Changed 4 months ago by matt

Beautiful. I love the globe too, I could imagine a little animation with the globe turning and a color pin appearing where new visitors appear (with a click opening a popover with visitor details). Maybe for a next feature after the standard maps? ;)

  Changed 4 months ago by greg

You mean like this?

 http://bl.ocks.org/1246403

  Changed 4 months ago by greg

Next thing to do is the mapping of regions between the Natural Earth map and Geocommons region meta-data (which uses the same ids than GeoIP database).

I'm already in touch with one of the guys at NaturalEarth (who  welcome Piwik for discovering their maps, btw), and they'd be very happy to include the mapped region ids in their dataset once I'm finished.

As a first step I checked the region coherence between both datasets, which I measured by comparing the region counts per country. On the map below, the green countries have the same number of regions in both datasets, while the red countries differ. Blue means that there are no regions defined at all in one of the data sets.

http://vis4.net/tmp/region-merging-map-01.png

Looks like this is going to be quite a lot of work. I think the reason for the differences might be that NaturalEarth provides very recently updated shapes while the geonames-region-db seems to be a bit out-dated (and so will be the GeoIP db, too).

  Changed 4 months ago by matt

You mean like this?  http://bl.ocks.org/1246403

Nice! If easy to do, it would be nice for sure. But let's discuss this later maybe :)

NaturalEarth looks very interesting, good news you found them. Hopefully you don't have to go through every country and manually adjust them?

Keep up the good work!

  Changed 4 months ago by greg

Here's the next part of my documentation series.

I finally managed to match the GeoIP regions to the actual regions found in the Natural Earth map.

 http://vis4.net/blog/posts/piwik-matching-regions/

  Changed 4 months ago by matt

Greg, great post!

The screencast is interesting, and shows how valuable this gpl library will be to the world, surfacing currently unavailable Region maps..

  • When there are multiple matches, maybe we can check that no large area regions are ignored and then not matched?
  • Do you have the list of missing matches regions? If we ignore them maybe we check that there is no important ones.
    • Also I guess there will be "new" missing unmatched regions after you bucket the "multiple regions matching" to the selected region?

  Changed 4 months ago by greg

Good idea to check the missing regions. Here's the list:

DM	10,06,07
HT	03
HN	09
FI	01
PY	13
PA	09
PG	11,12,15
PH	G5,B6,19,G3,C8,C2,C4,F1,A3,72,D4
GW	11
CI	68
CO	25,01
GQ	06
IS	06
ZA	04
CF	18,17
CY	02
VU	13
GM	01
AZ	08
ES	53
SY	03
US	AE,FM,AK,AA,PW,AP,PR
SI	09,04
SO	16
SB	08
SD	26,28,44,45,37,35,40,54,32
GB	O4,G6,B2,K5

I checked the GeoIP locations for the missing regions of the United Kingdom. Those regions have in common that they only one or two locations in it and are placed a tiny bit outside of the region, see  here.

In case of the United States, all missing regions are located on small islands.

  Changed 4 months ago by greg

Also, what do you think about providing the maps in two different qualities? In addition to medium-quality version (file size <1M) shipped with Piwik, the users could easily install a plugin that replaces the maps with high-quality versions.

  Changed 4 months ago by matt

Good that regions are all not important (I assume you checked them all).

Also, what do you think about providing the maps in two different qualities?

Could be cool to generate the high quality maps indeed, for the future users of the lib itself. We could of course make it accessible to piwik users too, but I suspect that the basic quality will be good enough for Piwik users? :)

  Changed 4 months ago by greg

Depends on what you mean by "good enough". For me as a true map geek, every removed point hurts a tiny bit ;). And really, there is a perceivable difference between a  17k map and a  34k map, especially when viewing the map in full width reports.

Also, I found a way to reduce the number of missing regions to 14, see  here for details.

follow-up: ↓ 68   Changed 4 months ago by greg

Here's a first interactive prototype of the SVG maps. May not work in IE, yet.

 http://vis4.net/labs/piwikmap/

You can jump to countries like this

 http://vis4.net/labs/piwikmap/?iso3=USA

or by clicking on a neighbor country.

  Changed 4 months ago by matt

Excellent!! :) Very promising!!

in reply to: ↑ 66   Changed 3 months ago by Maxlive_dev

Replying to greg:

 http://vis4.net/labs/piwikmap/?iso3=USA

Doesn't show Alaska, though clicks on it works.

  Changed 2 weeks ago by greg

Ok, let's sum up what API endpoints the map will talk to.

At first there's UserCountry.getCountry from which the map will retrieve a list of countries and the aggregated metrics. This is already implemented and working.

Then there will be some kind of new API UserCity.getCity which will return a list of cities with names, lat/lons and the aggregated metrics. Requests are limited to single countries. The cities are coming directly from the GeoIP database.

And lastly there will be another new API UserRegion.getRegion which will return a list of regions with names, ids and the aggregated metrics. Request are also limited to single countries.

I think the biggest challenge will be the aggregation of regions. In particular, for this we will need:

  • a list of available regions per country (region_name, region_id)
  • a mapping of GeoIP locations to regions. (geoip_loc_id -> region_id)

Getting this shouldn't be a problem, but we need to check a few things:

  • do we have consistent geoip location ids across different versions of the maxmind geoip database?
  • I assume that the resolved region will be stored in a new column the log_visits table, maybe location_region. So how do we handle future changes in the region database? I think it's pretty sure that the regions will change over time, which means that at some point I will provide updated region maps and location-region mapping. After updating the region maps, we get might get problems when a user want to browse through older reports.

  Changed 2 weeks ago by vipsoft

I plan to store the ISO 3166-2 region code in the database to better support other geolocation data/service providers.

I don't know how frequently the GeoIP regions change, but in #1823, I have to handle cases where the GeoIP API returns:

  • only the ISO 3166-1 alpha-2 country code
  • the ISO 3166-1 alpha-2 country code plus "00" for the region code
  • an ISO 3166-2 region code instead of a FIPS 10-4 region code

How do you represent the case where the region is unknown? Can that be used to include regions that no longer exist?

  Changed 2 weeks ago by greg

To begin with, the regions stored in the Maxmind database are not very useful at all. They differ largely between countries in their actual level of granularity (sometimes they use admin-level 1 regions, sometimes level2 or even level3). For many countries, at least the free version doesn't contain region ids at all (im not sure whether the commercial edition differs here).

So what I plan to do is to lookup the region ids myself. Since I do have accurate geographical descriptions of the regions the map is actually showing, it is no problem to compute the regions for any given lat/lon position. But since this computation is quite intense in terms of cpu and memory usage, we need to cache the results in the database.

Obviously, for IPs where the GeoIP API returns just ISO2 codes, the lat/lons are useless, since the API would just return the geographical center of that country. For those cases the map will display a side note like "X visitors from this country could not be geolocated".

Internally, the GeoIP database stores unique ids for every location stored in the db. There's comparably a small number of actual different locations stored, something around 350k. Does the GeoIP API which you're using return the internal location_id or just the metadata like ISO codes and country names?

  Changed 13 days ago by vipsoft

The GeoIP APIs don't return the internal location ID. Even if it did, I wouldn't use it because (1) it creates vendor lock-in and (2) there's no assurance of backward compatibility.

  Changed 13 days ago by matt

However, the problem is that the GeoIP db doesn't know about Tibet and South Sudan yet and thus never would map visitors to those 'countries'.

Please include them as country and maybe GeoIP will later match them. Tibet would at least be matched when the browser is set with Tibetan language (bo), so it will definitely be used :)

  Changed 13 days ago by matt

Greg and Vipsoft, I tried to understand the implementation plan for the getRegion but I didn't understand from the discussion.

I thought that, to draw the "region" maps, the SVG map would get the list of "cities" and then process the regions from the cities. But, this "point in polygon matching" algorithm would be in the SVG map in Javascript.

  • It would be slow if we have hundreds of cities (does that ever happen? probably not?)
  • If we wanted to cache the result in the DB, how would we do it since it would be processed in Javascript?
    • Or would this algorithm be in PHP in the Map plugin API?

It would be nice to clarify the implementation plan for drawing Regions maps to make sure that we agree & have the same data / API expectations.

Thanks!!

  Changed 11 days ago by matt

(In [5802]) the commit of the "free tibet" activist: detecting Tibetan language in browser and assigning to country "Tibet, Occupied". This might upset some Chinese users, but they are welcome to fork Piwik ;-) Refs #1652

  Changed 10 days ago by matt

When this is released, we can also delete this faq World map is not showing in Piwik

Note: See TracTickets for help on using tickets.