Opened 4 years ago

Closed 13 months ago

#1652 closed New feature (fixed)

Open Source SVG Map to show cities and regions

Reported by: nicobach1 Owned by: greg
Priority: critical Milestone: 1.x - Piwik 1.x
Component: Core Keywords:
Cc: Sensitive: no

Description (last modified by matt)

UPDATE! Fund this project now to help us finish the work and release the feature!

It takes 2 minutes: http://crowdfunding.piwik.org/analytics-maps-world-country-city-region/

Right now the world map only shows which country the visits are coming from. It would be really great / useful if we could narrow it down to what state the visitor came from or even which city. Like in the Google analytics.

It'll be a great feature for marketing so we know where most of the visitors are!

We would like to also move the technology from Flash to full open source stack, and have the map displayed in SVG or Vector.

Once the Maps show cities and region, we definitely have to show the maps in the actual Country report: full awesomeness! see #1821

Once implemented we should remove SWFobject lib: #3666

Attachments (2)

cities2.png (14.7 KB) - added by greg 3 years ago.
A mockup of the new city view
regions1.png (18.6 KB) - added by greg 3 years ago.
A mockup of the new region view

Download all attachments as: .zip

Change History (101)

comment:1 Changed 4 years ago by vipsoft (robocoder)

  • Milestone changed from 4 - Piwik 1.0 to Features requests - after Piwik 1.0
  • Priority changed from major to normal

This was mentioned in #1514.

comment:2 Changed 3 years ago by matt (mattab)

  • Owner set to greg
  • Summary changed from World Map to Flash Map to show cities and regions

comment:3 Changed 3 years ago by matt (mattab)

  • Milestone changed from Feature requests to 1.x - Piwik 1.x

This should be implemented in the next few weeks.

comment:4 Changed 3 years ago by greg (gka)

So, I will start the map development in the next days. There are a couple of things to do:

  • porting the map preparation script to Python, reading shapefiles, projection, simplification, AMF export
  • extracting all countries sub-regions from shapefiles found at http://gadm.org (seem to be complete for all countries)
  • updating the frontend SWF

However, there are some open problems.

  • How do we map the geo-locations to the country regions. I think this should be done once every time the GeoIP database gets an update. There are two possible ways to do this:
    • by checking point-in-polygon for each lat/lng with every region polygon and storing the result (lat/lng > country_region). This is almost accurate, but some cities may lay directly on a region border and thus may be wrongly mapped.
    • another way could be to use the data we get from the GeoIP database. At least for some countries there should by some information about the region, e.g. the state id in USA. In any case we need to map the country region ids (which are defined in the shapefiles) to the country regions ids in the GeoIP database. However, i'm not sure if this information is available for every country.
  • In both ways, this would require some install script to run every time the user updates his GeoIP database. Do you think this is possible?

Changed 3 years ago by greg (gka)

A mockup of the new city view

Changed 3 years ago by greg (gka)

A mockup of the new region view

comment:5 Changed 3 years ago by greg (gka)

Note, in this image you can see an overview about all available region outlines. Think this is pretty much complete.

http://gadm.org/img/gadm_v1_level1_high.png

comment:6 follow-up: Changed 3 years ago by matt (mattab)

Greg, great news that you will resume work on this :)

GADM data looks good and there for all countries, this sounds fantastic, nice find!

In both ways, this would require some install script to run every time the user updates his GeoIP database. Do you think this is possible?

It is possible to run a script on every GeoIP update, but not a good solution for ease of use. Here is a proposal: Maybe, we could run the script once before Piwik release (and commit the file to SVN after checking it's OK). We would use the latest GeoIP database for this. Then, if users use an older version (or newer version) compared to the pre-generated DB, maybe they could run the script manually themselves?

But, I would like to ask about this script, what exactly will it map?

My current understanding is that Piwik will report:

Is the algorithm designed to map "regions" according to GeoIP, to "regions" in the Piwik map?

I guess that, for each GeoIP region, we could give one lat/long of a city belonging to this region. If so, do we need a database for this, maybe the SWF could map in real time the lat/long to the pixel inside the region?

Thanks for claryfying

comment:7 in reply to: ↑ 6 Changed 3 years ago by greg (gka)

Replying to matt:

It is possible to run a script on every GeoIP update, but not a good solution for ease of use. Here is a proposal: Maybe, we could run the script once before Piwik release (and commit the file to SVN after checking it's OK). We would use the latest GeoIP database for this. Then, if users use an older version (or newer version) compared to the pre-generated DB, maybe they could run the script manually themselves?

My current understanding is that Piwik will report:

Is the algorithm designed to map "regions" according to GeoIP, to "regions" in the Piwik map?

I didn't knew that the GeoIP DB contains a complete list of fips regions. So Piwik already knows how many visitors each region had? In this case, there's no need for another update script. Now, all I have to assure is that the map region ids are the same as the GeoIP/FIPS region ids.

Just to make sure I understand everything, here's a sample request flow:

  • At first, the client loads the Piwik Map SWF
  • the Piwik Map SWF will then request the vistors per country via Piwik API and display the world map
  • now the user can click on a continent to zoom in
  • now the user can click on a country
  • the map SWF will now load the detailed map data for this country (and it's regions)
  • at the same time, the map SWF will request the visitors per region (of this single country)
  • once the SWF received all data, it will display the regions map as shown in regions1.png
  • now the user can switch to the city mode
  • the map SWF will now request he visitors per city for this country. The API will serve a list of cities including their lat/lng coords with the number of visitors for each city.

Did I got it right?

Thanks

comment:8 Changed 3 years ago by matt (mattab)

Yes it sounds good!

One other feedback, would be to give labels to the icons to switch to city/region view, and to the button to zoom in/zoom out, since these buttons are very important and must be easy to reach.

I think that we must check that GeoIP data will output the regions as we expect them, ie. that all visitors are indeed assigned in one of the regions listed in: http://www.maxmind.com/app/fips10_4

I will double check this and confirm here.

Is there any other open question appart from this one?

comment:9 Changed 3 years ago by matt (mattab)

  • Summary changed from Flash Map to show cities and regions to Open Source Flash Map to show cities and regions

Greg, will the Flash maps know about all cities in the GeoIP Database?

Or, will the flash map expect a list of cities & lat/long, and plot them "blindly" in the flash map (projecting from lat/long to pixels, and drawing City name based on Piwik API as input) ?

comment:11 Changed 3 years ago by greg (gka)

The flash map doesn't know anything about the cities. Instead the map is able to project lat/lng coordinates to the exact pixels. I'm not sure if it will display all cities "blindly", maybe there will be some simple clustering of cities that are very close.

comment:12 Changed 3 years ago by matt (mattab)

One thing I think of, is that sometimes GeoIP will not return region/city info (or just, not city some other time). So, maybe we can plan to display the "Unknown" on the map somewhere, discreetly?

Because, I guess the % displayed, will take into account the % for the Unknown region/city?

comment:13 Changed 3 years ago by greg (gka)

You mean that for some visitors GeoIP only knows the country but not the city and/or region? We can add a "Plus X unlocatable visitors" text somewhere to display this data. Good point.

comment:14 Changed 3 years ago by matt (mattab)

Yes, we better expect the worst with geoip free edition, all use cases can happen. I think Piwik will return, for a country's region, the 'Unknown' (or 'Other') row that will contain these. simple text "X visits couldn't be located" sounds good!

comment:15 Changed 3 years ago by vipsoft (robocoder)

Sorry, I'm late joining this discussion.

I'm going to add a FIPS data file as part of my work in #1823 to convert the region names into the more compact FIPS 10-4 code when storing it in the log_visit table.

  • Do the region names in MaxMind's FIPS file correspond 1-to-1 with the region names used by gadm.org, or does the data need to be massaged?
  • Would it help if I provide a reverse lookup, i.e., FIPS code to region name?

re: the MaxMind FIPS file

  • names are (English) localized without any special characters
  • there are almost 4000 entries which means we shouldn't expect translations
  • the codes are used inconsistently, i.e., the data switches to ISO for US states and Canadian provinces

comment:16 Changed 3 years ago by matt (mattab)

Partial feedback:

names are (English) localized without any special characters

Region names are in English charset, but they are not translated in English. For example french regions are written in french. Maybe this is not true for all countries.

Would it help if I provide a reverse lookup, i.e., FIPS code to region name?

I think, that the API output for getRegions (eg.) should contain the region code, and the full region name, like we do for Country API output (which includes the country code and full name, icon path, etc.)

comment:17 Changed 3 years ago by greg (gka)

Just wanted to let you know that I've got plenty of stuff to do right now. Hope to be able to continue working on this feature soon. Sorry for the delay..

comment:18 Changed 3 years ago by matt (mattab)

I have just seen an interesting project: jVectorMap for canvas+JS Map library.

Greg, what do you think about this work?
hope your work load is getting better :)

comment:19 Changed 3 years ago by greg (gka)

Hi Matt,

jvectormap looks quite nice. I'm currently thinking a lot about JS/SVG based mapping myself, even developed some early prototypes while working on other projects. Still, our biggest challenge is how to build the map data files for every country (including admin level2 regions).

Thus, one of my next steps is to setup a Mapnik server and let it export clean SVG projections of shapefiles. My goal is to do that by writing as less code as possible. Mapnik is such a powerful library, so it should be almost only a matter of configuration.

After we created the map files, we still have the choice to either use one of the currently emerging JS/SVG mapping libraries (like jVectorMaps) or to develop a mini library especially for Piwik by ourself.

Besides of that, I fully agree on JS/SVG instead of Flash. Quite a challenge, but possible.

and, yes, my work load starts getting better. :)

comment:20 Changed 3 years ago by matt (mattab)

Greg, I'm happy to hear you are keen on SVG for your own work. As you may have seen in the blog recently, we are now using canvas graphs only, and Timo from the team has contributed many patches to jqplot to make it work nicely in our use case. Hopefully, an existing library can meet our needs in terms of performance, maintainability, features, licensing.

In any case this work will be reused, not only in Piwik but I'm sure in hundreds of other projects in the future, since we are building the first truly open source world mapping with region details for all countries.

With shapefiles, I hope you can find a format that is of low size and with good shape quality, that must be quite a challenge.

comment:21 Changed 3 years ago by matt (mattab)

  • Description modified (diff)
  • Summary changed from Open Source Flash Map to show cities and regions to Open Source SVG Map to show cities and regions

comment:22 Changed 3 years ago by matt (mattab)

  • Description modified (diff)

Once the Maps show cities and region, we definitely have to show the maps in the actual Country report: full awesomeness! see #1821

comment:24 Changed 3 years ago by greg (gka)

I now proceed with this task. As a first step, I looked at the shapefiles provided by gadm.org. Since some people seem to be very interested in my work (I got plenty of mails after finishing the first version of the map), I decided to blog about my progress. I will post the links in here.

Part 1: Finding a map data source:
http://vis4.net/blog/en/posts/recreating-the-piwik-map

comment:25 Changed 3 years ago by matt (mattab)

Greg; excellent first step, looking forward to the next part!

comment:26 Changed 3 years ago by matt (mattab)

Regarding the simplification of shape files, it is obvious that we don't need any detail around Chile, a rough outline of the coast would do perfectly (we can't affort all the little islands ;).

Also is it possible to add a test not to plot islands less than 10 square km, or something similar?

It is really key for user experience to have the smallest file size possible to ensure fast downloading and JS parsing / CPU usage. :)

comment:27 Changed 3 years ago by greg (gka)

I just looked into a world shapefile and computed the areas of all 3761 polygons (a country may have multiple polygons for islands etc). Filtering every polygon smaller than 10 square km would remove 434 polygons (=11%). I checked the names of the countries which the removed polygons belong to and found many island states among them (as expected).

However, I think a hard cut at 10 sq km has some problems:

  • it removes some countries entirely (like the Maldives Islands)
  • it still keeps very many small islands (starting from 11sqkm)

See sample rendering.

I tried a different rule: remove all polygons smaller than 5% of the maximum polygon area of that country. Since the maximum area of the Maldives is 9sqkm, all islands are kept, while all small islands of the USA are removed. Also this halved the resulting SVG file size keeping the map correct in terms of includedness of countries.

See sample rendering.

However, removing every polygon smaller than 5% of the maximum per country is not satisfactory as well. Big and also well-known islands like Hawaii or Novaya Zemlya shouldn't be removed in my opinion.

Also there is a problem with those tiny islands countries like the Maldives, which is obvious if you look at the zoomed view on the Maldives. The islands are way too small to be rendered in a meaningful way.

This leads to some important questions: Does the map needs to include all countries or is it acceptable to ignore some countries, like the Maldives Islands?

In the old map widget this wasn't important because there was no country level view.

comment:28 Changed 3 years ago by greg (gka)

I think would like to keep islands and outlying regions in the country-level views. For instance, I could try to generate composed maps that mix different projections to fit the complete country into the map.

Like in this example for the United States.

comment:29 Changed 3 years ago by greg (gka)

One note regarding the projection used for the country-level views. I would like to simplify the whole thing by using the same projection for every country, but with different parameters (namely the projection center).

One of the simpler projections is the orthographic projection, which looks the same as if looking onto a 3D globe. This gives quite good and less-distorted views for almost every country. The only country that looks quite distorted is Russia. Because of it's huge area, Russia takes too much space on the globe.

You can have a look at distorted Russia here.

My opinion is that this distortion is acceptable given the simplicity of rendering maps in a single projection. In an ideal mapping world, one would use specialized projection for every country, like the New Zealand Map Grid for New Zealand etc.

Any more opinions?

comment:30 Changed 3 years ago by greg (gka)

The next design decision has to do with the aspect ratio of the maps. At some point in the map generation process I need to crop the country-level maps to a bounding box. The idea is to fit the countries as big as possible into the views while also showing a bit of their neighbour countries for navigation.

Now the question arises to which aspect ratio the maps should be cropped. As far as I see do we have to options:

  1. A fixed aspect ratio (presumably some wide-screenish format) would be the simplest solution. The complete cropping could be done in the preprocessing stage, which would reduce complexity of map rendering. However, the obvious drawback of this solution is that the maps won't look as good as they could when displaying countries in the "opposite" aspect ratio. For some portrait countries (like Germany), this should be acceptable. For other, more extreme aspect ratios (my favourite example is Chile) don't.
  1. In the latter cases, a dynamic aspect ratio would make much more sense. It would be perfect if we could present the Chilean Piwik users (I assume there are some) with a portrait view of their country. Dynamic aspect ratios could be done either by choosing a fixed ratio per country or by cropping the maps at rendering time. Choosing a fixed ratio per country has the drawback that those countries could not fit a landscape ratio in fullscreen mode. In contrast, cropping at runtime may take more CPU. Also, I don't know if the current dashboard supports dynamic resizing of widgets, but this may be easy to implement.

comment:31 Changed 3 years ago by matt (mattab)

I tried a different rule: remove all polygons smaller than 5% of the maximum polygon area of that country.

Sounds good

This leads to some important questions: Does the map needs to include all countries or is it acceptable to ignore some countries, like the Maldives Islands?

Removing countries all together is I think not a good idea. Maybe we could still display ALL countries, but leave the rule of 5% for all other countries. (eg. Maldives would be displayed with at least the main islands (if possible..), while canada would still lose many big islands.

I think would like to keep islands and outlying regions in the country-level views. For instance, I could try to generate composed maps that mix different projections to fit the complete country into the map.
Like in this example for the United States.

If you can do that (at least for some selected countries?), it would be pretty cool!

My opinion is that this distortion is acceptable given the simplicity of rendering maps in a single projection. In an ideal mapping world, one would use specialized projection for every country, like the New Zealand Map Grid for New Zealand etc.

Simple is always better, even if the shape is not perfect. We could improve this later anyway. What would it look like for NZ? ;-)

The idea is to fit the countries as big as possible into the views while also showing a bit of their neighbour countries for navigation.

Also, maybe it would useful if hovering over the countries next to the zoomed country, would display a little tooltip with the Country metrics. Maybe it could be displayed next or below to the main country metrics (which would always be displayed?).

fixed aspect ratio (presumably some wide-screenish format) would be the simplest solution.

Agreed. I think it is expected that all country zooms have the same dimensions. dashboard doesn't support dynamic size widgets at present (not planned).
Chile looks good in the map, even with a lot of sea. We can use the space for legends, metrics, etc.

The dynamic aspect ratio could be implemented later in the lib for static country-specific maps...

Thanks for posting your thoughts and log here, cheers!

comment:32 follow-up: Changed 3 years ago by greg (gka)

For New Zealand both projections doesn't differ that much. NZ is rather small (at least compared to the Earth), so the orthographic projection is quite a good approximation.

comment:33 in reply to: ↑ 32 Changed 3 years ago by matt (mattab)

Replying to greg:

For New Zealand both projections doesn't differ that much. NZ is rather small (at least compared to the Earth), so the orthographic projection is quite a good approximation.

Indeed the orthographic projection seems really good!

comment:34 Changed 2 years ago by greg (gka)

Update: I managed to generate SVG maps like this for every country:

http://vis4.net/tmp/FR.png

You can check them out here. (ZIP, 4.8MB, contains small PNGs for each map)

This is how it works: For each country the algorithm computes a 'nice' bounding box which includes only the most important polygons. For instance, the US bounding box does not include Alaska and Hawaii and the bounding box of Spain doesn't include all those Spanish islands.

However, for some countries like Japan, the current algorithm doesn't work. We could fix this by manually adjusting the parameters for those "problem" countries.

The next step would be to replace the polygons of the active countries with their sub-region polygons.

Also I'm working on a nice compression algorithm for the vector data. SVG has way too much overhead because of all those XML syntax. The smallest file size can be achieved using a CSV like format. We could reduce the size even more by kind-of-Base64 encoding the coordinates. For instance, the number "12345" can be stored as something like "zxB", which saves two bytes for each number.

comment:35 Changed 2 years ago by greg (gka)

Here's the mentioned map of Japan

http://vis4.net/tmp/JP.png

comment:36 follow-up: Changed 2 years ago by matt (mattab)

Thanks for the udpate, you are making nice progress! :)

AU + JP + AX look OK, but it looks like it is only a zoom problem. Would slightly zoom out should fix the display for these countries?

Canada + Greece + Indonesia + Norway + Philippines looks very detailed (all little islands), not sure we need so much detail.

Also I'm working on a nice compression algorithm for the vector data.

Great to hear, I can't stress enough how important it is to have small file sizes and fast SVG rendering. The Map will be displayed in all dashboards by default so should load uber fast so it doesn't slow down the piwik dashboard experience.

The next step would be to replace the polygons of the active countries with their sub-region polygons.

Does it mean, that you will draw the country regions inside the existing country shape, for these countries for which we have region mapping information?

comment:37 in reply to: ↑ 36 Changed 2 years ago by greg (gka)

Great to hear, I can't stress enough how important it is to have small file sizes and fast SVG rendering. The Map will be displayed in all dashboards by default so should load uber fast so it doesn't slow down the piwik dashboard experience.

I see no problems for the Piwik dashboard because the map will only load those maps it really needs. Each country map will be stored in a different file and most users will only need two or three of them. The most important thing is the file size of the world map, which should be way smaller than current map plugin size (250kB).

However, I think that the total size of all country maps (again, which aren't loaded before the user clicks on a country). There are 246 countries in the map, even if I manage to reduce the avg map size to 10kB it would take more than 2MB to store all of them.

Does it mean, that you will draw the country regions inside the existing country shape, for these countries for which we have region mapping information?

The resulting maps will look like this:
http://vis4.net/tmp/DE_lev1.png

http://vis4.net/tmp/FR_lev1.png

Just to clear this up: the region maps are available for *every* country. The mapping between GeoIP cities and country regions must be computed in most cases.

comment:38 Changed 2 years ago by matt (mattab)

However, I think that the total size of all country maps (again, which aren't loaded before the user clicks on a country). There are 246 countries in the map, even if I manage to reduce the avg map size to 10kB it would take more than 2MB to store all of them.

One idea: All countries XML could be stored in a single ZIP file. Then, the requests to get the XML for a given country, could go through a Piwik php controller, this controller would on demand unpack the zip and only return the XML (or CSV) that is being requested. So that, the size overhead of all maps is just the size of the ZIP containing them all. The PHP code would be very simple and simply unpack the ZIP (code already exists cf. Piwik_Unzip) and return the requested mapping info.

If you have 200 * 10kb = 2Mb, zipped should be around 200-400kb maybe which should be uber fast to unzip, and also be low overhead. Thoughts?

comment:39 follow-up: Changed 2 years ago by greg (gka)

If you have 200 * 10kb = 2Mb, zipped should be around 200-400kb maybe which should be uber fast to unzip, and also be low overhead. Thoughts?

Zipping the kind-of-base64 encoded CSV files leads to compression rates of approximately 50% – not 10-20%.

Is disk storage really such a big issue for Piwik? Since the MySQL tables are getting large anyway (500kB per month in my own Piwik installation), I think that 2MB is still acceptable. However, serving the maps gezipped to the browser makes sense, since web traffic *is* a big issue for dashboards, especially when also supposed to run on mobile devices.

comment:40 follow-up: Changed 2 years ago by greg (gka)

By the way, to get an overview about the GeoIP location database I quickly mapped all stored locations. Maybe we could use this information to kind of focus on those maps on countries where it makes sense at all.

http://vis4.net/tmp/geoip-locations.png

Some African countries have less GeoIP locations than sub-country regions. We could consider to limit the region-level reporting to those countries whose GeoIP location density is greater than the density of regions.

Close-up view on France:

http://vis4.net/tmp/FR-closeup.png

comment:41 in reply to: ↑ 39 Changed 2 years ago by matt (mattab)

Is disk storage really such a big issue for Piwik? Since the MySQL tables are getting large anyway (500kB per month in my own Piwik installation), I think that 2MB is still acceptable. However, serving the maps gezipped to the browser makes sense, since web traffic *is* a big issue for dashboards, especially when also supposed to run on mobile devices.

Disk storage is not a critical issue, but we try to keep the ZIP as small as possible, since once unzipped it is already 15.5 MB (and 5.5M zip).

We could easily serve the region maps in GZIP using the existing function Piwik::serveStaticFile which returns gzip if supported by server.

comment:42 in reply to: ↑ 40 Changed 2 years ago by matt (mattab)

Some African countries have less GeoIP locations than sub-country regions. We could consider to limit the region-level reporting to those countries whose GeoIP location density is greater than the density of regions.

Interesting map! Did you use the free GeoIP db for it? It would be interesting to plot the same using the commercial geoip DB (which some Piwik users will use) and see if there is any difference. I've sent you an email.

In any case I think it would be nice to have sub region mapping for all countries, but maybe the regions boundaries could be less sharp/accurate for the countries for which GeoIP density is poor? (rather than not plotting regions at all for these countries).
Thoughts?

comment:43 Changed 2 years ago by greg (gka)

The current GeoLiteCity db contains more locations (318814) than the commercial GeoCity db you sent along (168685). The reason could be that the version you have is dated to 2005.

However, the distribution pattern looks almost the same.

http://vis4.net/tmp/geolitecity-locations.png

http://vis4.net/tmp/geocity-locations.png

comment:44 Changed 2 years ago by greg (gka)

I just checked all the region codes inside the GeoIP database and found out that they are not easily mappable to the region codes in the GADM shapefile. For some countries, like the US, the GeoIP db stores two-letter codes, for many other countries the regions are identified by two-digit numbers. The GADM shapefile uses two-letter ISO codes to identify the inner-country regions.

What do you think how we should integrate the GeoIP database with the map plugin?

One option would be to force the user to run some kind of initialization script which calculates the region for each GeoIP location – which bears the risk of false identification of cities that lay on the border between two regions.

Another option would be to try to map the inconsistent GeoIP region ids to the right ISO region ids. We could do this either by matching the region names (which are stored in both databases) or by trying to find mapping tables for all used region identifiers. This bears the risk of false-identification of regions, for instance if there are errors in one of the databases or the automated identification fails. Also, here we would need to run an initialization script that brings both databases together.

Does anyone knows wether the GeoIP plugin uses the PHP/MySQL approach or is build on top of a GeoIP API which works with the binary database? I don't know if it is possible to access the raw list of locations through the binary API.

comment:45 Changed 2 years ago by greg (gka)

Just saw that the GeoIP database uses the admin level 1 codes from geonames.org:

http://download.geonames.org/export/dump/admin1CodesASCII.txt

comment:46 Changed 2 years ago by vipsoft (robocoder)

Maybe we can work out the mapping as part of #2379.

comment:47 Changed 2 years ago by greg (gka)

By the way, I just saw that the next version of GA supports region level mapping.

http://vis4.net/tmp/ga-regions.png

comment:48 Changed 2 years ago by greg (gka)

Ok, I just got the first results.

246 country maps, including regions for the selected country.

SVG

total size: 3.6MB

zipped: 1.0MB

CSV

total size: 2.6MB

zipped: 857kB

This example shows the quality of the maps:

http://vis4.net/tmp/ye.png

Regarding the fact that the SVG files are not much bigger than the CSV files, especially when zipped, I would recommend to use the SVG files, which are a lot easier to maintain. However, the server should be configured to serve the SVG files gzipped.

I would not recommend to lower the quality and hope 1MB is still acceptable. Comments?

comment:49 Changed 2 years ago by greg (gka)

Here's the zip files which contains all SVG maps:

http://vis4.net/tmp/all-svg.zip

comment:51 Changed 2 years ago by matt (mattab)

The region maps look beautiful!

1M zipped sounds reasonnable, but no more please ;) It is an increase of 20% of the ZIP size. Worth it for beautiful, open technology maps of the whole world for sure!

Btw, I noticed that South Sudan is not in the list of countries (SS). Is it possible to add it (it's been independent since July 2011).

Also did you see if GeoIP returns "Tibet" as a country? It would be nice to have a map for it since it should also be an independant country!

what are your thoughts regarding the mapping of geoip regions to the Geonames IDs, probably the Piwik API dataset should contain the right geonames mapping directly, so that means that the mapping should be done in the Geoip API as part of #1823. Any thoughts?

comment:52 Changed 2 years ago by greg (gka)

Btw, I noticed that South Sudan is not in the list of countries (SS). Is it possible to add it (it's been independent since July 2011).
Also did you see if GeoIP returns "Tibet" as a country? It would be nice to have a map for it since it should also be an independant country!

Both regions are easy to add to the maps, since their boundaries are already available as regions of Sudan and China.

However, the problem is that the GeoIP db doesn't know about Tibet and South Sudan yet and thus never would map visitors to those 'countries'.

what are your thoughts regarding the mapping of geoip regions to the Geonames IDs, probably the Piwik API dataset should contain the right geonames mapping directly, so that means that the mapping should be done in the Geoip API as part of #1823. Any thoughts?

I will try to map the regions to the Geonames IDs during the map creation. I think the best solution is to check the regions by name similarity and to double check ambitious cases with a few point-in-polygon tests with cities we'd expect to lay inside the region.

In the end, the SVG maps will only store the Geonames/GeoIP region ids. Also it won't store additional metadata like country names to reduce filesize.

comment:53 Changed 2 years ago by greg (gka)

I think I will switch to the naturalearth shapefile since it seems to gets updated more frequently. Just discovered that it already contains the shapes for South Sudan. And the public domain license is much better..

comment:54 Changed 2 years ago by matt (mattab)

However, the problem is that the GeoIP db doesn't know about Tibet and South Sudan yet and thus never would map visitors to those 'countries'.

I see, not ideal.. Hopefully in the future they will change this, and it would be nice indeed if the Map would handle these countries already :)

I think I will switch to the naturalearth shapefile since it seems to gets updated more frequently.

Sounds good :)

comment:55 Changed 2 years ago by greg (gka)

Here's a quick documentation of the SVG map rendering:

http://vis4.net/blog/posts/rendering_country_maps/

comment:56 Changed 2 years ago by matt (mattab)

Beautiful. I love the globe too, I could imagine a little animation with the globe turning and a color pin appearing where new visitors appear (with a click opening a popover with visitor details). Maybe for a next feature after the standard maps? ;)

comment:58 Changed 2 years ago by greg (gka)

Next thing to do is the mapping of regions between the Natural Earth map and Geocommons region meta-data (which uses the same ids than GeoIP database).

I'm already in touch with one of the guys at NaturalEarth (who welcome Piwik for discovering their maps, btw), and they'd be very happy to include the mapped region ids in their dataset once I'm finished.

As a first step I checked the region coherence between both datasets, which I measured by comparing the region counts per country. On the map below, the green countries have the same number of regions in both datasets, while the red countries differ. Blue means that there are no regions defined at all in one of the data sets.

http://vis4.net/tmp/region-merging-map-01.png

Looks like this is going to be quite a lot of work. I think the reason for the differences might be that NaturalEarth provides very recently updated shapes while the geonames-region-db seems to be a bit out-dated (and so will be the GeoIP db, too).

comment:59 Changed 2 years ago by matt (mattab)

You mean like this? http://bl.ocks.org/1246403

Nice! If easy to do, it would be nice for sure. But let's discuss this later maybe :)

NaturalEarth looks very interesting, good news you found them. Hopefully you don't have to go through every country and manually adjust them?

Keep up the good work!

comment:60 Changed 2 years ago by greg (gka)

Here's the next part of my documentation series.

I finally managed to match the GeoIP regions to the actual regions found in the Natural Earth map.

http://vis4.net/blog/posts/piwik-matching-regions/

comment:61 Changed 2 years ago by matt (mattab)

Greg, great post!

The screencast is interesting, and shows how valuable this gpl library will be to the world, surfacing currently unavailable Region maps..

  • When there are multiple matches, maybe we can check that no large area regions are ignored and then not matched?
  • Do you have the list of missing matches regions? If we ignore them maybe we check that there is no important ones.
    • Also I guess there will be "new" missing unmatched regions after you bucket the "multiple regions matching" to the selected region?

comment:62 Changed 2 years ago by greg (gka)

Good idea to check the missing regions. Here's the list:

DM	10,06,07
HT	03
HN	09
FI	01
PY	13
PA	09
PG	11,12,15
PH	G5,B6,19,G3,C8,C2,C4,F1,A3,72,D4
GW	11
CI	68
CO	25,01
GQ	06
IS	06
ZA	04
CF	18,17
CY	02
VU	13
GM	01
AZ	08
ES	53
SY	03
US	AE,FM,AK,AA,PW,AP,PR
SI	09,04
SO	16
SB	08
SD	26,28,44,45,37,35,40,54,32
GB	O4,G6,B2,K5

I checked the GeoIP locations for the missing regions of the United Kingdom. Those regions have in common that they only one or two locations in it and are placed a tiny bit outside of the region, see here.

In case of the United States, all missing regions are located on small islands.

comment:63 Changed 2 years ago by greg (gka)

Also, what do you think about providing the maps in two different qualities? In addition to medium-quality version (file size <1M) shipped with Piwik, the users could easily install a plugin that replaces the maps with high-quality versions.

comment:64 Changed 2 years ago by matt (mattab)

Good that regions are all not important (I assume you checked them all).

Also, what do you think about providing the maps in two different qualities?

Could be cool to generate the high quality maps indeed, for the future users of the lib itself. We could of course make it accessible to piwik users too, but I suspect that the basic quality will be good enough for Piwik users? :)

comment:65 Changed 2 years ago by greg (gka)

Depends on what you mean by "good enough". For me as a true map geek, every removed point hurts a tiny bit ;). And really, there is a perceivable difference between a 17k map and a 34k map, especially when viewing the map in full width reports.

Also, I found a way to reduce the number of missing regions to 14, see here for details.

comment:66 follow-up: Changed 2 years ago by greg (gka)

Here's a first interactive prototype of the SVG maps. May not work in IE, yet.

http://vis4.net/labs/piwikmap/

You can jump to countries like this

http://vis4.net/labs/piwikmap/?iso3=USA

or by clicking on a neighbor country.

comment:67 Changed 2 years ago by matt (mattab)

Excellent!! :) Very promising!!

comment:68 in reply to: ↑ 66 Changed 2 years ago by Maxlive_dev

Replying to greg:

http://vis4.net/labs/piwikmap/?iso3=USA

Doesn't show Alaska, though clicks on it works.

comment:69 Changed 2 years ago by greg (gka)

Ok, let's sum up what API endpoints the map will talk to.

At first there's UserCountry.getCountry from which the map will retrieve a list of countries and the aggregated metrics. This is already implemented and working.

Then there will be some kind of new API UserCity.getCity which will return a list of cities with names, lat/lons and the aggregated metrics. Requests are limited to single countries. The cities are coming directly from the GeoIP database.

And lastly there will be another new API UserRegion.getRegion which will return a list of regions with names, ids and the aggregated metrics. Request are also limited to single countries.

I think the biggest challenge will be the aggregation of regions. In particular, for this we will need:

  • a list of available regions per country (region_name, region_id)
  • a mapping of GeoIP locations to regions. (geoip_loc_id -> region_id)

Getting this shouldn't be a problem, but we need to check a few things:

  • do we have consistent geoip location ids across different versions of the maxmind geoip database?
  • I assume that the resolved region will be stored in a new column the log_visits table, maybe location_region. So how do we handle future changes in the region database? I think it's pretty sure that the regions will change over time, which means that at some point I will provide updated region maps and location-region mapping. After updating the region maps, we get might get problems when a user want to browse through older reports.

comment:70 Changed 2 years ago by vipsoft (robocoder)

I plan to store the ISO 3166-2 region code in the database to better support other geolocation data/service providers.

I don't know how frequently the GeoIP regions change, but in #1823, I have to handle cases where the GeoIP API returns:

  • only the ISO 3166-1 alpha-2 country code
  • the ISO 3166-1 alpha-2 country code plus "00" for the region code
  • an ISO 3166-2 region code instead of a FIPS 10-4 region code

How do you represent the case where the region is unknown? Can that be used to include regions that no longer exist?

comment:71 Changed 2 years ago by greg (gka)

To begin with, the regions stored in the Maxmind database are not very useful at all. They differ largely between countries in their actual level of granularity (sometimes they use admin-level 1 regions, sometimes level2 or even level3). For many countries, at least the free version doesn't contain region ids at all (im not sure whether the commercial edition differs here).

So what I plan to do is to lookup the region ids myself. Since I do have accurate geographical descriptions of the regions the map is actually showing, it is no problem to compute the regions for any given lat/lon position. But since this computation is quite intense in terms of cpu and memory usage, we need to cache the results in the database.

Obviously, for IPs where the GeoIP API returns just ISO2 codes, the lat/lons are useless, since the API would just return the geographical center of that country. For those cases the map will display a side note like "X visitors from this country could not be geolocated".

Internally, the GeoIP database stores unique ids for every location stored in the db. There's comparably a small number of actual different locations stored, something around 350k. Does the GeoIP API which you're using return the internal location_id or just the metadata like ISO codes and country names?

comment:72 Changed 2 years ago by vipsoft (robocoder)

The GeoIP APIs don't return the internal location ID. Even if it did, I wouldn't use it because (1) it creates vendor lock-in and (2) there's no assurance of backward compatibility.

comment:73 Changed 2 years ago by matt (mattab)

However, the problem is that the GeoIP db doesn't know about Tibet and South Sudan yet and thus never would map visitors to those 'countries'.

Please include them as country and maybe GeoIP will later match them.
Tibet would at least be matched when the browser is set with Tibetan language (bo), so it will definitely be used :)

comment:74 Changed 2 years ago by matt (mattab)

Greg and Vipsoft, I tried to understand the implementation plan for the getRegion but I didn't understand from the discussion.

I thought that, to draw the "region" maps, the SVG map would get the list of "cities" and then process the regions from the cities. But, this "point in polygon matching" algorithm would be in the SVG map in Javascript.

  • It would be slow if we have hundreds of cities (does that ever happen? probably not?)
  • If we wanted to cache the result in the DB, how would we do it since it would be processed in Javascript?
    • Or would this algorithm be in PHP in the Map plugin API?

It would be nice to clarify the implementation plan for drawing Regions maps to make sure that we agree & have the same data / API expectations.

Thanks!!

comment:75 Changed 2 years ago by matt (mattab)

(In [5802]) the commit of the "free tibet" activist: detecting Tibetan language in browser and assigning to country "Tibet, Occupied".
This might upset some Chinese users, but they are welcome to fork Piwik ;-)
Refs #1652

comment:76 Changed 2 years ago by matt (mattab)

When this is released, we can also delete this faq World map is not showing in Piwik

comment:77 Changed 2 years ago by matt (mattab)

Just as a side note, having SVG maps will also be great for another reason: we will be able to have a Debian package for Piwik. Currently, debian won't let package with .swf files in it because it's a non free technology. So, then we can have piwik official debian package, and run piwik on freedom box when this cool concept is released in the next months/years :)

comment:78 Changed 23 months ago by greg (gka)

Here's a live demo of the map widget running in Piwik. You can already get a feeling for the country maps. You can navigate to other countries by clicking on them (not the labels) or by entering an ISO3 alpha code (such as "USA") and click update.

Despite my original plan of using the same aspect ratio for all countries I now update the height of the widget according to the aspect of the country SVG. Otherwise the small maps would be really useless for countries like Chile. The way I implemented this is to use a default ratio of 1.5 for all countries while defining exceptional ratios for others.

comment:79 Changed 23 months ago by greg (gka)

Ah, forgot the link :)

http://piwik.vis4.net

comment:80 Changed 23 months ago by greg (gka)

And, by the way, the development currently happens on this Github repositories:

Map widget for Piwik
https://github.com/gka/piwik-map-widget

Generator for the SVGs
https://github.com/gka/piwik-map-generator

Both projects are based on Kartograph, the library I wrote for that exact purpose :)

comment:81 Changed 23 months ago by greg (gka)

Just added

  • three-level-zoom
  • coloring for world and continent maps (based on the existing API).
  • selector for countries and contintents

http://piwik.vis4.net

comment:82 Changed 23 months ago by greg (gka)

Added fake display for regions and cities (100 randomly chosen). Also the widget now remembers the country/continent and the view mode (region/city).

Test it live:
http://piwik.vis4.net/index.php?module=CoreHome&action=index&idSite=14&period=day&date=yesterday

comment:83 Changed 22 months ago by matt (mattab)

Beautiful & Amazing work!! :-)

after 1 hour testing I found some bugs and have a few questions:

  • when clicking on Netherlands, Belgium is not clickable
  • UK has too many regions, but you mentionned the other viewe with only 6 regions which might be too small -- ist here anything in between?
    • Also can we really choose different region mapping, will it work with GeoIP DB? Or will you do point in polygon or something similar? I'm still unsure how it all works haha ;)
  • some countries, venezuela, russia, thailand, do not display the tooltip on hover on regions, also it seems imposible to click on neighbouring countries
  • in NZ specifically, there is no region "Wellington" but there should be sincec it's the capital city - ws the region removed because too small or simply absent from original dataset? I'm curious if similar regions missing could happen in other countries as well.
  • when looking at panama, the country on the left (Costa rica) is not clickable

Here is my UX review & feedback:

  • Could a click on ocean/sea unzoom the map? this would make it more user friendly as one wouldn't have to aim for the icons in the footer
  • Updates to the tooltip
    • On a continent view, the tooltip for each country could say "France - 145 visits - 5% of all visits"
    • On a country view, for a particular region or city, the tooltip could say "Poitou Charentes (France) - 15 visits - 10% of visits from France - 0.5% of all visits"
    • When a country has no visit, could the tooltip say "Panama - No visit" rather than not display the tooltip which could be less easier to understand
  • Unknown visitors could be displayed on the world / continent views, for example bottom right in grey?
    • Above Unknown, on the World view (or continent view) we could also display "All visits: 5432 - Unknown visitors location: 433 (15%)" so that users always know
    • When on a county region or city zoom, this bottom right legend could display "All visits: 5432 - Visits from France: 145 (2.7%)"
  • In the footer, could we display "Regions" next to the icon, and "Cities", similarly to: http://piwik.org/wp-content/uploads/2011/05/Ecommerce-Best-products-report.png - there might be some space missing, could we display the text only for the currently selected icon, and on hover on the icon?
    • To save much space, please set eg. the SELECT for metrics and country with font-size: 11px; then it should be enough space to write the Region/City legend?
  • During the time anything is loading, could we have low opacity on the map, and have the rolling wheel in the middle 100% opacity? It would feel smoother I think, currently there are some screen jumps/flashing elements as they are loading.

I believe with all these improvements the usability and ease to understand the map will be much improved. Did you have other ideas as well and/or does it sound good?

comment:84 Changed 18 months ago by matt (mattab)

(In [7173]) filter_pattern= will now work on metadata columns. This will be useful for example, if we wanted to select only a number of "countries", from the UserCountry.getCity API, we could do the following: &filter_column=country&filter_pattern=de|fr|es|it|nl
It will filter and only return only the Cities, which belong to the country specified. This will work ,because rows have a metadata <country>
Refs #1652

comment:85 Changed 17 months ago by matt (mattab)

(In [7498]) Fixing some tests, also adding <city>xx</city> to tell unknown cities from others, as requested by Greg Refs #1652

comment:86 Changed 17 months ago by matt (mattab)

In #3585 I added a new parameter &showRawMetrics that will return all raw metrics in the API metadata getProcessedReport call.

comment:87 Changed 16 months ago by matt (mattab)

  • Priority changed from normal to critical

comment:88 Changed 16 months ago by matt (mattab)

(In [7643]) deleting the map plugin not working and will revert to previous version Refs #1652

We are starting crowd funding for this feature! Participate in this fantastic project by donating at: http://crowdfunding.piwik.org/analytics-maps-world-country-city-region/

comment:89 Changed 16 months ago by matt (mattab)

(In [7644]) Refs #1652 Maps

comment:90 Changed 16 months ago by matt (mattab)

We need YOUR help! We are running a crowd funding campaign to raise funds to implement the detailed Visitors Maps of Countries, Regions and Cities (for all countries)!

These maps will be beautiful, usable, and built using open standards SVG+JS. They will show detailed visitor count, conversion rates, by Country but also (New!) by city and region.

Pledge now at: http://crowdfunding.piwik.org/analytics-maps-world-country-city-region/

Piwik needs you!

comment:91 Changed 16 months ago by matt (mattab)

  • Description modified (diff)

comment:92 Changed 15 months ago by matt (mattab)

  • Description modified (diff)

comment:93 Changed 14 months ago by mattab <matthieu.aubry@…>

In b9c192cd25a1c9611d6dab79000954c2e09ad84a:

Refs #1652 Adding Continent+ Country codes to Live API output for easier plotting

comment:94 Changed 14 months ago by Fabian Becker <halfdan@…>

In 7bf551f9b4509451e5e12056fe263473d8e2b99b:

Fixes Integration test after changes to display Continent+Country Code

refs #1652

comment:95 Changed 14 months ago by Fabian Becker

In e54bb8dc89e28eed13df36d46c1a7c402e53e3dd:

Fixes Integration test after changes to display Continent+Country Code

refs #1652

comment:96 Changed 14 months ago by vipsoft (robocoder)

js/vendor/kartograph.is has conflicting licenses GPL and LGPL.

js/vendor/kmeans.js is missing any sort of attribution/license.

PiwikMap references in LEGALNOTICE should be removed.

Remove swfoject per #3666.

comment:97 Changed 14 months ago by matt (mattab)

@vipsoft, all fixed now!

comment:98 Changed 13 months ago by matt (mattab)

In 62402bdad4053089a4491573b8a4aaf4703adcdd:

Refs #1652 Encoding the JS output for making map work in languages with single quotes in strings, eg. French

comment:99 Changed 13 months ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.