Ticket #703 (reopened New feature)
Piwik an alternative to AWStats and Urchin, build server log import script
| Reported by: | bharatkalyan | Owned by: | |
|---|---|---|---|
| Priority: | critical | Milestone: | 1.7.2 - Piwik 1.7.2 |
| Component: | Core | Keywords: | |
| Cc: | Cyril | Sensitive: | no |
Description (last modified by matt) (diff)
Urchin Alternative: Import your server logs in Piwik, the Free web analytics platform!
See blog post Piwik alternative to Urchin for more information.
Piwik is the Urchin alternative but also Webalyzer and AWStats alternative: with a Python script, you can now import webserver logs (apache, iis, and more) in Piwik, instead of using the javascript tracking.
Description A Python script available in piwik/misc/log-analytics/ will parse server logs efficiently and automatically call the Piwik Tracking API to inject the visits/pageviews/downloads in Piwik.
How to install / how to use
- Requires Piwik >= 1.7.2-rc2. Download the latest version from http://builds.piwik.org/?C=N;O=D
- Requires at least Python 2.6
- Requires one or many server log files, typically called access.log in Apache for example. These log files will be imported into Piwik.
- You can also create a "test website" in Piwik to import all data into, rather than importing into your existing websites. Then, use the command --idsite=X to force all info from the log files to be imported into this idsite
- You can use --dry-run command to have a test run and make sure you will not track data or create new websites
How you can help?
- please use the script and report your feedback and bugs here
- if you are a hacker yourself, please review the code and consider submitting performance optimization, or improvements.
- If you are a webhost or web agency and wish to offer Piwik to hundreds of your customers, please contact us
- review the doc at Server log analytics
Tasks to do before final release
- Test, test and test
- Setup on demo.piwik.org in a new website
- Check all code review feedback managed
- Review Import Logs in Piwik doc page.
- decomission apache2piwik (update blog post)
Feature requests for V2 or later
- detect that a given log file has already been imported and don't import again
- When CTRL+C, message is displayed. Could we also display the --skip=X value to resume where we cancelled the script?
- make it easy to delete logs for one day only.
- This would be a new option to the python script. It would reuse the code from the Log Delete feature, but would only delete one day. The python script would call the CoreAdmin API for example, deleting this single day for a given website. This would allow to easily re-import data that didn't work the first time or was bogus.
PERFORMANCE improvement ideas Performance improvements ideas
How to debug performance? First of all, you can run the script with --dry-run to see how many log lines per second are parsed. It typically should be between 2,000 and 5,000. When you don't do a dry run, it will insert new pageviews and visits calling Piwik API.
Ideas to improve log import performance:
- Enable persistent connections
- In mysqli: In the config.ini.php, host = "p:localhost".
- Use keep alive connections
- This requires a new library to use in Python
- Only implement if gain is significant: let's do performance testing first and then see.
- Bulk load requests to Piwik via POST see #3134
- Once we have bulk loading, we could even improve and optimize the PHP code. For example, we could issue less SELECT query since we know the last status of a visit in memory
