New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug import_logs.py #4697
Comments
Attachment: test logfile 22n january |
Attachment: logfile 14th Feb |
65.55.24.218 is excluded because it is a MSN bot IP address which we exclude by design ( unless you add --enable-bots ) IP: 83.233.x.x is tracked for me. After importing the logs you have to re-run the archive.php cron script All looks like working (I reused your access_test0 log). Plaese try again with RC5 as I think it will work! |
Indeed I added the --enable-bots option !!! So I retry with RC6 today on access_test0 ( 13 lines logfile) and I still have the same problem.. IP: 83.233.x.x is not tracked for me and nor is 65.55.24.218 even with --enable-bots option .. Why do you say I have to re run archive.php cron script ?? I don't understand.. and what do you think of this following one line logifle example I mentioned in the forum Last test with piwik 2.1RC6 and one line logfile : 41.107.212.109 - - +0100 "GET /slav/ling/cours/a07-08/SEMI%20UNIL/041207Iva.html HTTP/1.1" 200 6690 "https://www.google.dz/" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36" command: Output: Logs import summary1 requests imported successfully Website import summary1 requests imported to 1 sites 0 distinct hostnames did not match any existing site: Performance summaryTotal time: 0 seconds And the actions > report page is ..... empty ................!!!!!!!!!! I just see the IP in the visitor log but again with 0 action. Any idea ?? Could it be my database is corrupted somewhere ?? I do not understand and really have no confidence in piwik tracking results Thanks for your help |
do you have "Browser trigger archiving" enabled? see: http://piwik.org/docs/setup-auto-archiving/ execute:
|
No, "Browser trigger archiving" is disabled now.. !!! Best regards, |
Hello to all! I am using piwik for a customer and just found out the following very serious issue. I am using the latest piwik (2.2.0) and have probably the same issue with imoullet. PROBLEM: Lines with HTTP status 200 are ignored!! i.e. only the first entry is included both to the Visits and to the Actions. This applies before or after I do the achieving. So archiving is irrelevant. I just import (access.log : file with just 2 lines): 66.249.76.11 - - +0100 "GET /id/resource/013541589 HTTP/1.1" 303 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" via command:
Result: 0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current) Logs import summary
Website import summary
Performance summary
Kind Regards, |
Latest piwik is 2.2.2 and this bug should be fixed. please try latest latest beta; http://piwik.org/faq/how-to-update/faq_159/ |
Running
piwik 2.1 RC1
python 2.7.1
Discussing about thet problem for 4 weeks now in the forum (
http://forum.piwik.org/read.php?2,110277 ) I cannot get any solution and really think there is a bug in import_logs.py.I let you read all the details in the discussion mentioned above
Here is a summarry with some test log file ..
I run the following command
/usr/bin/python2.7 /var/www/html/piwik/misc/log-analytics/import_logs.py --url=https://w3stat.unil.ch/piwik/ /var/tmp/stats/app/xxxx --idsite=xxx --config=/var/www/html/piwik/config/config.ini.php --recorders=2 --log-hostname=www3.unil.ch --hostname=www3.unil.ch --enable-static --enable-bots --enable-http-errors --enable-http-redirects --enable-reverse-dns --strip-query-string
for the two logfiles I send you in attachment ( 22nd of january and 14th pf february)
You can have a look to the results for this site on our piwik site : https://w3stat.unil.ch/piwik using piwik/debug4piwik as user/pwd.
You wil see that the piwik results are wrong both for the visitor log ( some IP are ignored for the 22nd of january AND also for the 14th of february ) and the actions > pages report.
For example, some IP are missing in Log visitor report for day 22 of january
65.55.24.218 and 83.233.207.74 are not there while they are present in the log files.. ( see my preceding message)
And the actions > pages report is empty !!!!!!!!!!!!!!!!! while I have some access such as
83.139.189.139 - - +0100 "GET /wpmu/alumnil/participez-a-la-construction-dun-nouvel-avenir-technologique-et-social/ HTTP/1.0" 200 34367 "http://www3.unil.ch/wpmu/alumnil/participez-a-la-construction-dun-nouvel-avenir-technologique-et-social/" "Mozilla/5.0 (Windows NT 5.2; rv:17.0) Gecko/20100101 Firefox/17.0"
in my logfile
I also mention that for the same site I can see some access ( in actions > pages report) as I use the WP piwik plugin for this individual site !!
The actions > downloads report is the only one which seem to be correct.
So in conclusion I cannot compare my results for each individual Wordpress site generated using WP PIwik plugin and the results for all my WOrdpress sites generated using import_log.py. Indeed the result for all ( ie 250 sites !!) WP sites are much less than for one indivudual site. That 's the reason which alerts me somethng was wrong with import_log.py !!
I am really confused about that..
I have the same results for all my parsed logfiles. They all come from an apache webserver with combined ( ncsa.. ) format..
Please let me know if you need more information.. The piwik output file is correct in the sene that it imports the correct number of lines.......
Hope you can help me !!
Keywords: import_logs
The text was updated successfully, but these errors were encountered: