Opened 3 years ago

Closed 3 years ago

#1938 closed Task (fixed)

ensure that the archive algorithm cannot be triggered multiple times for a same site/period/segment

Reported by: matt Owned by: matt
Priority: major Milestone: 1.6 Piwik 1.6
Component: Core Keywords:
Cc: hansfn Sensitive: no

Description

If the script is already running ( ps aux | grep archive.sh | wc -l or similar) if it is running, script echos "archive.sh already running, exiting..."

is there any reason the simple ps aux would not work on all linux servers?

Ideally the windows powershell version should do this as well, but optional for now

Change History (19)

comment:1 Changed 3 years ago by vipsoft (robocoder)

In #1698, hansfn suggests changing archive.sh to use /bin/sh instead of /bin/bash. I'm not opposed since /bin/sh is often symlinked to a variant, e.g., bash or dash.

comment:2 Changed 3 years ago by vipsoft (robocoder)

  • Cc hansfn added

comment:3 Changed 3 years ago by hansfn

"ps aux" should work on all Linux servers. (Maybe drop the "u" since it just adds unnecessary output?) However, be aware that some times the grep command itself will be included in the ps listing - ref http://stackoverflow.com/questions/4473610 For me the "grep -v grep" addition normally works.

PS! You could of course use locking files - create at start of the script, remove at the end and if killed (using trap).

comment:4 follow-up: Changed 3 years ago by halfdan

"ps aux" does work on GNU ls but certainly not on all systems. Solaris does not use GNU ls but a ls which conforms to POSIX (see http://en.wikipedia.org/wiki/Ls). We should use "ps -A" which works on all systems I got at hand (Linux, FreeBSD/NetBSD, Solaris 10).

File locking as you described it can run into race conditions due to OS task scheduling.

comment:5 Changed 3 years ago by halfdan

Note to self: Don't write comments in the middle of the night.

I obviously meant GNU ps. http://en.wikipedia.org/wiki/Ps_%28Unix%29
Same applies, ps on Linux does not support the POSIX standard. We should use "ps -A".

comment:6 Changed 3 years ago by matt (mattab)

  • Priority changed from normal to major

This is covered in: http://stackoverflow.com/questions/185451/quick-and-dirty-way-to-ensure-only-one-instance-of-a-shell-script-is-running-at-a

and http://stackoverflow.com/questions/185451/how-to-check-in-a-bash-script-if-something-is-running-and-exit-if-it-is

We just had the problem on the demo, process running 6 times in parrallel causing massive IO and crash, we should fix this..!

I don't like the solution with a lock file. Instead we should use the ps XXX | grep archive.sh

comment:7 Changed 3 years ago by tthuermer

matt asked me to check this issue in a comment to #2440

using ps|grep is really, really ugly... (my opinion as "shell expert")

for example if some other script also named archive.sh exists on the system, and is being run, maybe even by another user...

i'd strongly prefer lockfiles, and generate a warning for the user if the lockfile is locked (and of a certain age maybe)...

but there's one perfectly clean and elegant solution that beats all that cruft:

using a lock in the database.

and the job already uses a database, because it's working on the data in it (duh!).

such a lock is managed on the server, and automatically released when the connection is closed (script dies). (mysql has get_lock(name,timeout) )

the only issue here is that the archive job opens a new database connection (opened by a separate api/php invocation) for each processing step...

i would suggest to rewrite the bottom part of the script in php and move it into the piwik api as a function,
then it could simply acquire a lock in the database...
(beware that this might cause issues with the php process growing due to memory leaks, but that could in worst case be worked around by running separate php instances for the steps like the shellscript does, while keeping the db connection/lock.)

that would also help reduce the language diversity, and the need for shell programmers, which seem to be lacking in piwik project staff... ;)

comment:8 in reply to: ↑ 4 Changed 3 years ago by tthuermer

Replying to halfdan:

File locking as you described it can run into race conditions due to OS task scheduling.

wrong! not if done correctly.

what is needed is some kind of atomic operation that can only succeed once...
ofcourse checking for a file's existance and then creating it is not atomic.
but for example opening the file with O_EXCL is atomic, as are file locking functions...

the by far easiest alternative in a shellscript is mkdir though, which i prefer to use for locking.

if ! mkdir /tmp/mylock ; then

echo failed to get lock

exit 1

fi

trap "rmdir /tmp/mylock" EXIT

a bunch of stuff worth reading:
http://mywiki.wooledge.org/BashFAQ/045
http://mywiki.wooledge.org/ProcessManagement#How_do_I_make_sure_only_one_copy_of_my_script_can_run_at_a_time.3F
actually all of http://mywiki.wooledge.org/ProcessManagement

comment:9 Changed 3 years ago by tthuermer

i updated my rework of the script for #2440 to use a lockfile.
http://dev.piwik.org/trac/attachment/ticket/2440/archive.sh

comment:10 follow-up: Changed 3 years ago by matt (mattab)

I agree that the DB lock would be good to have, but like you say it is nice to do it in the script also ;-)

Thanks for implementing lock file! Maybe when a lock file was detected, the error message could say "If no other process is running, you can manually delete the lock file at: $URL" to help users know what to do next if for some reasons lock wasn't deleted (server crashed maybe?)

Otherwise it's great, I think the next step for improving the script would be, multithread the requests to make use of the cores of a system ;)

See also #2440

comment:11 in reply to: ↑ 10 Changed 3 years ago by tthuermer

Replying to matt:

I think the next step for improving the script would be, multithread the requests to make use of the cores of a system ;)

that would be possible to add, with a little restructuring...
but is it really a good idea?
we just added a lock to prevent multiple instances, and now we want to re-introduce them inside the single instance?
has it been verified that this actually improves performance?
i don't know the internals of the "archiving" process, but would assume that it runs a bunch of aggregate queries over the log data and caches the result...? does running those in parallel realy improve performance, or will they just slow eachother down due to table contention?
i'd suggest you open a new issue for that feature and do some testing first...

comment:12 Changed 3 years ago by matt (mattab)

Running in parallel will increase performance, because the data queried will be different, and all the time spent in PHP processing can be multithreaded very efficiently. It might not work well for very large websites, but will make a huge difference in the use case of a piwik with thousands of small websites to process.

comment:13 Changed 3 years ago by Cyril (cbay)

For the record, using a multithreaded archive.sh (from #2563), performance increases almost linearly. I have a 8-core machine and more than 20,000 sites in my Piwik.

Having a lock on archive.sh doesn't seem to be an issue to me: multithreaded is handled by xargs running inside archive.sh.

comment:14 Changed 3 years ago by vipsoft (robocoder)

I don't know if this will help or hinder, but in [5065], we are now using get_lock/release_lock around the archive processing.

comment:15 Changed 3 years ago by matt (mattab)

  • Summary changed from archive.sh should check that the archive script is not already running to ensure that the archive script cannot run multiple times for a same archive

vipsoft, Re: [5065]

The release lock name I think should contain the segment as well: $this->segment since you can archive several segments for a same site/period.

Adding here as a note to check before release.

comment:16 Changed 3 years ago by matt (mattab)

  • Summary changed from ensure that the archive script cannot run multiple times for a same archive to ensure that the archive algorithm cannot be triggered multiple times for a same site/period/segment

comment:17 Changed 3 years ago by vipsoft (robocoder)

re: lock name. Feel free to add segment to the name.

comment:18 Changed 3 years ago by matt (mattab)

  • Owner set to matt
  • Type changed from New feature to Task

comment:19 Changed 3 years ago by matt (mattab)

  • Resolution set to fixed
  • Status changed from new to closed

(In [5102]) Refs #2327

  • adding option forceall+reset which does imitate closely the current archive.sh behavior (with still the added bonus) Fixes #1938 added segment in lock name. I have tested the code path but haven't actually verifier that this improved performance
Note: See TracTickets for help on using tickets.