New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import_logs.py cannot resume with line number reported by skip for ncsa_extended log format #3867
Comments
beyond failing to resume, for those looking to automate the import it is not ideal to have high probability of the import failing so possibly some issue parsing the ncsa_extended format initially experienced with 1.11.1 release (python 2.6 and python 2.7.3 with php 5.3.14 and 21,22,23) I've used the most current version of the script from the git repo and same issue I have 7GB of log data to import split into reasonable size files and the import fails on a few of them now whereas without specifying an exclude list (save on processing requests to api) it failed more often |
Thanks for the report. Could you take a look at the REGEX used for this log format, and existing tests in: https://github.com/piwik/piwik/tree/master/misc/log-analytics/tests/logs post here if you have findings... thanks! |
Attempting skip after import quit...continuing the import doesn't get far when reattempting using the value reported for skip. However, I am able to get the import to complete in most cases using last report value of lines parsed. Other cases, the import continues recording but quits again. Specifying the skip value results in quitting without recording any line but providing the last reported value of lines parsed results in the import completing. Start import of log 1st attempt to continue import using value reported by skip 2nd attempt to continue import using value reported by skip Continuing with last reported value of lines parsed results in import completing Logs import summary
Website import summary
Performance summary
Importing to reproduce the failure again...I now noticed that for the piwik server nginx logged a few 499s just before the script terminated. That's probably normal I suppose being that the import script quits after a fixed number of errors? There are 4 recorders, assuming one of the four encounters the error and retries then the script decides to quit having the other three close their connections? If not, then the question is why the connection was being closed by the import script. So possibly an NGINX configuration problem then? I'm seeing anything in to indicate an configuration problem in the nginx, fpm, or php logs. options being used:
... ... Importing suspect portion of log...Additionally, I tried to reproduce the failure with the suspect portion of the log but that went without issue.
... Logs import summary
Website import summary
Performance summary
|
Update: I forgot to post back but ultimately the problem had to do with server configuration timeouts and the size of logs I was importing... although it wasn't captured as an error in the logs for nginx or php. Increasing the values for nginx and php in the php-fpm pool for piwik allowed successful import of logs, the only other issue I observed was the archiving had been timing out after importing a large amount of data. For that, I further increase the values. http://nginx.org/en/docs/http/ngx_http_fastcgi_module.html pass timeout responsibility to upstream (php)fastcgi_read_timeout 14400; # 4 hrs PHP FPM (FastCGI) pool config values for piwik ; 30mins for archive.php to generate reports and allow sizable log file imports These values worked for the size of log file I was importing given the server hardware piwik was running and a separate database server. |
others seem to be experiencing the same problem:
http://forum.piwik.org/read.php?2,94797,102560
When the import fails, trying to resume with line reported by skip fails as well. However, I noticed the number of lines parsed was typically the same and supplied that value to skip which allowed the import to resume. Without debugging the issue, I cannot know if this is a valid work around i.e. does the script resume skipping over potentially valid log lines
Keywords: import_logs.py ncsa_extended resume skip lines
The text was updated successfully, but these errors were encountered: