Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Piping one single line from the access.log into import_logs.py work in apache #3757

Closed
anonymous-matomo-user opened this issue Feb 15, 2013 · 15 comments
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.

Comments

@anonymous-matomo-user
Copy link

Description: Make Piping one single line from the access.log into import_logs.py works but using the same command directly from apache nothing gets logged

This is about the same issue as reported by ottodude125 (#3163) and elm (#3163). Piping one single line from the access.log into import_logs.py works but using the same command directly from apache nothing gets logged.

To reproduce, use the following apache config (within a VirtualHost container):

# Set up your log format as a normal extended format, with hostname at the start
LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" piwik_log_format
# Log to a file
CustomLog /var/log/apache2/access_piwik.log piwik_log_format
# Log to piwik
CustomLog "|/usr/bin/python /var/www/foo/piwik/misc/log-analytics/import_logs.py --idsite=1 --recorders=1 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --url=http://foo.bar.org/piwik/ --log-format-name='common_vhost' --output=/tmp/piwik_import.log -dd -" piwik_log_format
# Test log piping
CustomLog "|tee -a /tmp/pipetest.log" piwik_log_format

Both /var/log/apache2/access_piwik.log and /tmp/pipetest.log gets populated, so there is no doubt that the import_logs.py script receives the standard input continuously. However, /tmp/piwik_import.log only gets any activity upon restart of apache (like for ottodude125)

Running:

/usr/bin/python /var/www/foo/piwik/misc/log-analytics/import_logs.py --idsite=1 --recorders=1 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --url=http://foo.bar.org/piwik/ --log-format-name='common_vhost' --output=/tmp/piwik_import.log -dd /var/log/apache2/access_piwik.log

... (the same command as in the CustomLog directive but instead reading a file instead of stdin) works fine.

Piwik 1.10.1
Debian 6

@anonymous-matomo-user
Copy link
Author

Have also verified that the behavior is the same whether or not sh (dash) or bash is standard shell for www-data (the apache user).

Using python's unbuffered binary stdout and stderr (-u) like so:

CustomLog "|/usr/bin/python -u /var/www...

...makes no difference.

I added debug log output in import_logs.py already in main() which is also only printed out on apache restart. On any subsequent requests all is silent...

@anonymous-matomo-user
Copy link
Author

Believe I've found the main issue here.

According to apache docs (http://httpd.apache.org/docs/2.2/logs.html#piped): Apache will start the piped-log process when the server starts, and will restart it if it crashes while the server is running. (This last feature is why we can refer to this technique as "reliable piped logging".)

However, import_logs.py will not start actually parse any lines until an EOF has been encountered it seems.

So, this works:

cat access_piwik.log "|/usr/bin/python /var/www...etc

...but this does not work:

tail -f access_piwik.log "|/usr/bin/python /var/www...etc

The latter example will keep the python script running indefinitely until the tail process is killed, whereafter the read lines will be parsed.

@mattab
Copy link
Member

mattab commented Feb 15, 2013

Thanks for the report & clear info!

If you can find the fix in the python script that'd be great...

@anonymous-matomo-user
Copy link
Author

First of all, the script hangs on something like sys.stdin.read() internally (it hangs just before the for loop in parse(), doesn't proceed past it it seems). This indicates that some buffer is being used, but I am not sure about the size of this buffer.

sys.stdin.readline() is however unbuffered. In the end, a fix to this issue is replacing

        for lineno, line in enumerate(file):

with

        lineno = 0
        while 1:
            line = sys.stdin.readline()
            if not line: break
            lineno = lineno + 1

Perform this in a copy of import_logs.py (call it pipe_import_logs.py or something), since this breaks normal usage :)

Of course, use pipe_import_logs.py in your CustomLog directive as well.

It is possible that stdin is buffered and that the script that seems to hang eventually gets some lines to work with, but relying on that seems risky since the buffer is thrown away on apache restart.

I'll leave it up to some actual python devs to do the proper refactoring :)

Relevant links:
http://stackoverflow.com/questions/7056306/python-wait-until-data-is-in-sys-stdin
http://stackoverflow.com/questions/6033781/how-to-avoid-python-fileinput-buffering

@anonymous-matomo-user
Copy link
Author

Btw, this is how the flow goes:

  1. Apache starts a shell with the piped python process on server start
  2. Apache pipes log contents continuously to that shell's stdin
  3. (with the above fix) The python process runs an eternal loop (while 1) and parses each received log line from stdin, turning them into "hits"
  4. For each config.options.recorder_max_payload_size (--recorder_max_payload_size=200 is the default), the hits gets POST:ed to the piwik server by one or many recorder processes that runs in the background. While in the eternal loop, the script currently does not output anything revealing what's going on...
  5. On apache shutdown the process is simply killed (ie no status reports will be written out anywhere).

Therefore, it is a good idea to also add some statements like: logging.debug('lineno %s', lineno) or similar to be able to verify that it works for you.

@anonymous-matomo-user
Copy link
Author

Btw, any hits that still haven't been recorded to Piwik (because them being less than recorder_max_payload_size, in queue or whatever) when Apache crashes or restarts will not be recorded. Ideally, the script should listen to the correct signals and record these before shutting down...

@mattab
Copy link
Member

mattab commented Apr 4, 2013

There were bugs in the script. Can you try with the latest python script from GIT or from : http://builds.piwik.org/piwik-1.12-b5.zip ?

@anonymous-matomo-user
Copy link
Author

Still not working with latest git version of import_logs.py.

@anonymous-matomo-user
Copy link
Author

I have the same Problem / reloading Apache also gets the entries to piwik

@anonymous-matomo-user
Copy link
Author

when this bug will be closed. I change awstats by piwik

@mattab
Copy link
Member

mattab commented Jan 13, 2014

To all users who have had this bug, do you still experience the issue with Piwik 2.0.3?

@mattab
Copy link
Member

mattab commented Mar 16, 2014

Hi guys, this ticket is getting long. Maybe someone can make a pull request with a fix for this issue?

@anonymous-matomo-user
Copy link
Author

Hi all

I do have got Piwik 2.1.0 installed via FreeBSD ports. It doesn't work for me, even after applying the patch above. Is there an other solution available?

@anonymous-matomo-user
Copy link
Author

Replying to mirror:

Hi all

I do have got Piwik 2.1.0 installed via FreeBSD ports. It doesn't work for me, even after applying the patch above. Is there an other solution available?

It does work now, but there is a delay. I did do the following:

As motin mentioned 13 months ago it makes no difference if I use -u or not. But it is working, not as stated before.

@mattab
Copy link
Member

mattab commented Mar 12, 2015

Issue was moved to the new repository for Piwik Log Analytics: https://github.com/piwik/piwik-log-analytics/issues

refs #7163

@mattab mattab closed this as completed Mar 12, 2015
@mattab mattab added the duplicate For issues that already existed in our issue tracker and were reported previously. label Mar 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests

2 participants