Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cron/archive.sh: php error messages (on stdout) are silently discarded #2440

Closed
anonymous-matomo-user opened this issue May 20, 2011 · 44 comments
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Milestone

Comments

@anonymous-matomo-user
Copy link

i already tried to point this out in #2239, but not hard enough i guess....

the documentation suggests:

 #MAILTO="youremail@"
 #5 * * * * www-data /path/to/piwik/misc/cron/archive.sh > /dev/null
 # When an error occurs (eg. php memory error, timeout) the error messages
 # will be sent to youremail@.

php notoriously prints it's error messages to stdout,
that is really a php bug imho, but it should be worked around to avoid problems for users.

see the following, which i just ran into (and i think this happens to a lot of users):

 $ /srv/piwik/misc/cron/archive.sh >/dev/null
 (no ouput here)
 $ /srv/piwik/misc/cron/archive.sh | tail -n 1
 Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 4703 bytes) in /srv/piwik_1.4/core/DataTable.php on line 952

php prints the "fatal error" to stdout, which is discarded, so contrary to the documentation, the user gets no error message, and stuff simply does not work correctly.

workarounds could be...

  • remove all output from the php and shell script (what's it needed for besides debugging anyway) and don't discard stdout
  • check the return value of the php invocation in archive.sh (it will be != 0 on errors), and then either provide a generic ("php failed" message, or pull the last lines of output from a temporary file where it'd have to be put)

to provide a simple "patch" again, in archive.sh, for any involcation of php:

$CMD || echo "running '$CMD' failed", please run archive.sh manually and check what went wrong" 2>&1

(and again, one of those notes: the way the commands are built in $CMD will break if the installation path contains spaces, and doing that doesn't seem to be any useful anyway...)
Keywords: cron archiive.sh

@anonymous-matomo-user
Copy link
Author

to provide a simple "patch" again, in archive.sh, for any involcation of php:

$CMD || echo "running '$CMD' failed", please run archive.sh manually and check what went wrong" 2>&1

silly bug, it should ofcourse be:

$CMD || echo "running '$CMD' failed", please run archive.sh manually and check what went wrong" >&2

@anonymous-matomo-user
Copy link
Author

curiously, the shellscript uses the -e option, so it will also exit on the first error - a somewhat advanced technique...
while it does not have any useful error handling/messsages otherwise...

i could offer to completely rewrite it, but it would look rather different afterwards...

@mattab
Copy link
Member

mattab commented May 20, 2011

Maybe the best would be to run archiving and not exit on errors, to make sure we process as many websites as possible, then at the end of the execution, report all errors to the error output?

Would you be interested/possible to submit a patch to fix this issue? this would be great :)

@anonymous-matomo-user
Copy link
Author

the most important issue is that no errors are reported at all, this likely affects a lot of users who are wondering why setting up the cronjob does not speed up their reportings...
(also, scheduled mail reports will not be sent, because that job is run last!)

also, it is not necessary to collect error messages and only print the at the end, cron automatically handles that.

actually the fix i suggested will 'fix' the exiting on the first error, because the -e shell option only exits on unhandled errors (see shell docs for details ;) )

@mattab
Copy link
Member

mattab commented May 21, 2011

can you please provide a patch of suggested changes?

Otherwise see http://bugs.php.net/22839 - it is possible to tell php to display errors in standard error output, but it requires php change which is not ideal.

However, I'm pretty sure I received error messages by email when the execution somehow failed during cron, but I haven't received one in a while so I'd need to double check.

@robocoder
Copy link
Contributor

Let's not mess with the script.

Suppressing output should be done in the crontabso output can be easily enabled when archiving doesn't appear to be working. Most of the time, I prefer not to get cron email, so I use```

/dev/null 2>/dev/null ```.

@anonymous-matomo-user
Copy link
Author

Replying to vipsoft:

be easily enabled when archiving doesn't appear to be working.
Most of the time, I prefer not to get cron email

the whole point is that cronjobs normally should not produce output, so you don't get mails,
but if an errors occurs, they should, so you get a notification about it.
(that is also what the comment in the script explains)

why would you want to wait until you eventually notice that something is broken?

@anonymous-matomo-user
Copy link
Author

Attachment: trivial patch: check return value of php invocations and print a message to stderr if php returns failure
archive.sh.diff

@anonymous-matomo-user
Copy link
Author

Replying to matt:

can you please provide a patch of suggested changes?

attached a diff for the (trivial) fix now.

this will print a message to stderr if a php invocation returns a status !=0 (which it does on fatal errors).
the handling also makes the script not exit on the first error but continue processing other sites and scheduled mailings.

@mattab
Copy link
Member

mattab commented Jun 1, 2011

tthuermer, thanks for the patch!

Some feedback:

  • is it possible to refactor error message so it is shown only once in the script in a variable?
  • is it possible that the error message displays, as well as the CMD, the "head -n10" or so of the error message potentially returned?

Thanks!!

@anonymous-matomo-user
Copy link
Author

Replying to matt:

  • is it possible to refactor error message so it is shown only once in the script in a variable?
    that's sure possible, but
    • i don't think it's worth the effort to include that it in this version of the script - those errors should rarely happen anyway, just that if they happen, some message should be generated.
    • the php invocation (included in the message) is different everytime, so it's not completely redundant... - so the only change would be collecting the failed invocations, and then printing the help message only once. (btw, that message should be made more detailed, but i wouldn't know how... maybe a reference to the manual)
  • would you be able to fix the bug mentionned in Detect crashed mysql tables & display a message in the UI #2194 at the same time? In particular, throw the same error if the output contains <error message="
    if that php code is only called from cron, it would be more appropriate to have it print to stderr directly... otherwise it would require pretty much the same changes as the next point.
  • is it possible that the error message displays, as well as the CMD, the "head -n10" or so of the error message potentially returned?
    displaying the script output conditionally would require collecting it in a temporary file or variable, which is a rather large change in the scritpt. also i'd prefer the variable solution (to avoid cluttering the filesystem), but that'd be a little unconventional.
    as mentioned before, at that point i'd completely rewrite the script, but i am estimating chances for that being accepted as a patch rather low, so i'd rather not waste my time.
    (is there really no "owner" of that code in the piwik project?)

@mattab
Copy link
Member

mattab commented Jun 2, 2011

that's sure possible, but

My point is just to refactor the actual error string in a variable, nothing more complicated... because copy/paste is now allowed in piwik codebase ;)

if that php code is only called from cron, it would be more appropriate to have it print to stderr directly... otherwise it would require pretty much the same changes as the next point.

This php code is not only called from the cron, it is also used as the Piwik analytics API

However, you make a good point, that maybe when Piwik is ran as CLI, errors could be written to stderr as well as standard output?

But, we'd rather not change this and only update the archive.sh. A good way to test this, would be to fake the token.

The following patch:

     for period in day week month year; do
       echo ""
       echo "Archiving period = $period for idsite = $idsite..."
-      CMD="$PHP_BIN -q $PIWIK_PATH -- module=API&method=VisitsSummary.getVisits&idSite=$idsite&period=$period&date=last52&format=xml&token_auth=$TOKEN_AUTH"
+      CMD="$PHP_BIN -q $PIWIK_PATH -- module=API&method=VisitsSummary.getVisits&idSite=$idsite&period=$period&date=last52&format=xml&token_auth=fake$TOKEN_AUTH"

will cause the script to return:

Archiving period = week for idsite = 7...
<?xml version="1.0" encoding="utf-8" ?>
<result>
        <error message="You can't access this resource as it requires an 'view' access for the website id = 7." />
</result>

it would be great if the script would catch these, as well as other errors you catch with the " || echo " patch :)

i'd completely rewrite the script

Does showing the first lines of the output really requires a lot of code or a rewrite?

If this the case, then OK not to show the head of the error message no problem

@anonymous-matomo-user
Copy link
Author

Replying to matt:

that's sure possible, but

My point is just to refactor the actual error string in a variable, nothing more complicated... because copy/paste is now allowed in piwik codebase ;)
ah, coding standards...

if that php code is only called from cron, it would be more appropriate to have it print to stderr directly... otherwise it would require pretty much the same changes as the next point.

This php code is not only called from the cron, it is also used as the Piwik analytics API

However, you make a good point, that maybe when Piwik is ran as CLI, errors could be written to stderr as well as standard output?

But, we'd rather not change this and only update the archive.sh. A good way to test this, would be to fake the token.
[...]

Archiving period = week for idsite = 7...
<?xml version="1.0" encoding="utf-8" ?>
<result>
        <error message="You can't access this resource as it requires an 'view' access for the website id = 7." />
</result>

you have yet to reealize how messy this is...

for example:

$ /usr/bin/php5 -q /srv/www/VSEO/piwik_1.4/misc/cron/../../index.php -- 'module=API&method=SitesManager.getAllSitesId&token_auth=xx&format=csv&convertToUnicode=0'
Error: You can't access this resource as it requires a 'superuser' access.

so we'd have to check for both "<error" and "Error:"...

it's a horrible mess to try and catch all errors like this.

at very least the php scripts should "die(1);" or similar, so the caller can check if there was an error without parsing the output for error messages...

i'd completely rewrite the script

Does showing the first lines of the output really requires a lot of code or a rewrite?

If this the case, then OK not to show the head of the error message no problem

as i wrote before, parsing the output for error messages requires the same changes...

(the other part of the "rewrite" is, that while i'm writing the much code, i'd fix a bunch of other stuff too...)

@anonymous-matomo-user
Copy link
Author

in the attached version i added the hack to detect error messages in piwik's output...
also did some cleanup, and removed some checks (like the IS_NUMERIC cruft) that were apparantly only needed to avoid parsing error messages (or csv column titles...) as data...
looks much better now, imho...
(but could use a little testing)
if you don't like it, feel free to write your own fix...

@mattab
Copy link
Member

mattab commented Jun 3, 2011

tthuermer, thanks a lot for this patch, this is GREAT stuff!!!

I'm curious, why does it work without the IS_NUMERIC test? I remember putting it because otherwise the loop would loop on invalid idsites (like you say, column headers or something). But in your code it works fine so I'm not sure why?

I tested it quickly, by setting up as cron, and then adding a few characters to the &token_auth=aggea

I received the email which is now very useful and explicit so really a useful improvement!

Your patch is very appreciated, thanks, I'll commit after asking Anthon for feedback!

@anonymous-matomo-user
Copy link
Author

Replying to matt:

tthuermer, thanks a lot for this patch, this is GREAT stuff!!!
so not waste of time after all, nice

I'm curious, why does it work without the IS_NUMERIC test? I remember putting it because otherwise the loop would loop on invalid idsites (like you say, column headers or something). But in your code it works fine so I'm not sure why?
the column header is removed here:

86    run_piwik "module=API&method=CoreAdminHome.getKnownSegmentsToArchive&token_auth=$TOKEN_AUTH&format=csv&convertToUnicode=0" |
87    sed 1d # remove column-header line

which was previously filtered out by

 80             if test $segment != "value"; then

for the IS_NUMERIC test, i can only guess that it was needed to avoid parsing the stdout error messages as IDs, and those should be caught inside run_piwik() now...
(or do we need another safety check in case the php scripts return invalid data? that would seem a little excessive... and i'd hope the php scripts catch invalid input in some way...)
the loops in the script will never turn into endless loops... just previously when an error occured in a previous php invocation, it would loop over the words in the error message.

I tested it quickly, by setting up as cron, and then adding a few characters to the &token_auth=aggea

I received the email which is now very useful and explicit so really a useful improvement!

Your patch is very appreciated, thanks, I'll commit after asking Anthon for feedback!
enjoy

@anonymous-matomo-user
Copy link
Author

also, SitesManager.getAllSitesId seems to not return column headers, even if there's the same format=csv parameter set...?
also, i could not test the case where CoreAdminHome.getKnownSegmentsToArchive rerturns a non-empty list.

@anonymous-matomo-user
Copy link
Author

warning, stupid bug:
line 100: run_piwik "${CMD}&segment=$segment"
forgot to rename the variable there, ist must be:
line 100: run_piwik "${ARGS}&segment=$segment"

(see my note that this part was untested...)

@mattab
Copy link
Member

mattab commented Jun 6, 2011

tthuermer thanks for the notice, I'll test this part of the script before committing

Also, since you seem to be a shell expert, maybe you would be interested in this ticket: #1938

This is to ensure that the script isn't running twice which would cause some troubles (like server crashing if requests pile up)

@anonymous-matomo-user
Copy link
Author

Replying to matt:

Also, since you seem to be a shell expert, maybe you would be interested in this ticket: #1938

see my comment there.

i've been meaning to suggest implementing the job processing in php anyway, since the discussion about the loops...

@anonymous-matomo-user
Copy link
Author

Attachment: fixed stupid bug, added lockfile to protect against multiple instances
archive.sh

@anonymous-matomo-user
Copy link
Author

i put the md5sum of the config-path into the lock name, mainly thinking of possible multiple installations on the same host... also makes the name a little less predictable... it's not technically necessary, but it won't hurt either (if md5sum is not available, the part will just stay blank)

@anonymous-matomo-user
Copy link
Author

Attachment: details...
archive.2.sh

@mattab
Copy link
Member

mattab commented Jun 20, 2011

Thanks for the patch tthuermer! definitely will try and include this in 1.6 :) please let us know here if you have newer revisions.Cheers

see also #1938

@mattab
Copy link
Member

mattab commented Jul 12, 2011

tthuermer, there is a very interesting submission in #2563 that is using xargs to run the archive on multi cores. However Cyril says that it might not be compatible with the patch for this ticket.

Maybe you have some ideas on how to use the multithreaded option while keeping the clean error handling that you implemented in this patch?

Cyril's patch can be found at: http://issues.piwik.org/attachments/2563/archive.multithreaded.sh.diff

your feedback/review is greatly appreciated :)

@anonymous-matomo-user
Copy link
Author

xargs -P is the preferred method to parallelize job execution in shellscripts, i would use that aswell.

Cyril's patch in #2563 mainly collects the php invocations (that the old script builds in variables before executing them) in a temporary file (not to my taste ;) ), and then passes that to xargs to process the jobs in parallel - the obvious choice to add that feature to the old script.

to add parallelization to my version, there are the following issues (none are a big problem really, but they should be considered):

  • the outout-filtering is some shell code that is wrapped around the php invocations, currently in form of a shell function. a shell function can not be passed to xargs, which can only run programs. possible fixes are either to put that code into a separate script (creates clutter), or to have the existing script invoke itself again, with some parameter (more elegant, but hard on any maintainers not identical with the author ;) )
  • when jobs are run in parallel, their output gets randomly interspersed... that should not be a big issue, because the filtering code already collects the messages during execution and only prints a short summary all at once at the end, so the chance of collisions is relatively low, but the issue can not be totally avoided without adding lots of overhead. (collecting the output of each job in a separate tempfile and join them at the end)
    (one might actually run the php jobs directly, and then do the output processing on the tempfiles at the end...? but that still requires some wrapper to do the per-job file redirection)
  • i would personally not like having it default it to parallel execution with one thread per cpu.
  • my largest installation has only three small sites, and the segment archiving doesn't run for any
  • question: does it make sense to run the "period" and "segment" jobs in parallel (archive.sh could execute in multithreaded mode for better performance #2563) or would it make more sense to run those in two separate batches?

other than that, i'll be submitting a version of my version, with parallelization added soon.

@anonymous-matomo-user
Copy link
Author

Replying to tthuermer:

Cyril's patch in #2563 mainly collects the php invocations (that the old script builds in variables before executing them) in a temporary file (not to my taste ;) ), and then passes that to xargs to process the jobs in parallel - the obvious choice to add that feature to the old script.

actually, as he wrote, he processes the invocations for each site in a separate thread... i can't tell if/why that is required, i see no obvious reasons, but ofcourse i could implement that aswell...

i would probably just add an option to the script to process only a given site, and then run it on each site using xargs -P instead of the 'for idsite in $ID_SITES; do' loop...
(having the script default to getting the site-list and then running iself on each site.)

@mattab
Copy link
Member

mattab commented Jul 14, 2011

invoke itself again, with some parameter
That sounds tricky but definitely the right thing to do (VS using 2 scripts). It will be fine to understand it if variables etc. are carefully named :)

when jobs are run in parallel, their output gets randomly interspersed...
If there is no "easy fix" I think this is perfectly fine for V1 to have, in rare occurences, strange looking output. If it becomes a big problem we can fix it later

i would personally not like having it default it to parallel execution with one thread per cpu.
Do you mean that, by default, you prefer having the script run on one core only (current trunk behavior)?

question: does it make sense to run the "period" and "segment" jobs in parallel (#2563) or would it make more sense to run those in two separate batches?

  • Websites: the goal is to run several websites in parallel: websites data is stored in different rows in mysql, so panellizing each website will be great performance improvement
  • Period: must be ran in the order day/week/month/year, very important because weeks use day archives, month use days, and year use months
  • Segments: various segments can be parallelized, but since they will hit the same rows in mysql, I suspect that performance improvement will be not as great. However, it is OK to run several segments in parallel, for a given period.

But how could you manage to run several Websites AND several segments for each website in parallel? Is it possible at all?

my largest installation has only three small sites, and the segment archiving doesn't run for any
is this a problem for testing? I can provide you with test case for segments, and also a script to generate thousands of websites.

other than that, i'll be submitting a version of my version, with parallelization added soon.

it is GREAT to hear, your help is very appreciated! it will in fact benefit many Piwik power users. More and more hosting companies are using piwik for many sites and will greatly enjoy all these nice improvements to archive.sh!!

@mattab
Copy link
Member

mattab commented Jul 14, 2011

Replying to tthuermer:

he processes the invocations for each site in a separate thread... i can't tell if/why that is required, i see no obvious reasons, but ofcourse i could implement that aswell...

Yes this is required,in fact it's the main objective of his changes: to process different websites in parallel

i would probably just add an option to the script to process only a given site, and then run it on each site using xargs -P instead of the 'for idsite in $ID_SITES; do' loop...
(having the script default to getting the site-list and then running iself on each site.)

that sounds good I think :)

@anonymous-matomo-user
Copy link
Author

Replying to matt:

i would personally not like having it default it to parallel execution with one thread per cpu.
Do you mean that, by default, you prefer having the script run on one core only (current trunk behavior)?
basically... or at most with n_cpus/2 or something... i would not make the default behaviour that aggressive.. ymmv ofcourse.

  • Websites: the goal is to run several websites in parallel: websites data is stored in different rows in mysql, so panellizing each website will be great performance improvement
  • Period: must be ran in the order day/week/month/year, very important because weeks use day archives, month use days, and year use months
  • Segments: various segments can be parallelized, but since they will hit the same rows in mysql, I suspect that performance improvement will be not as great. However, it is OK to run several segments in parallel, for a given period.

But how could you manage to run several Websites AND several segments for each website in parallel? Is it possible at all?

the does the processing of segments have to be done sequentially after the periods? if not, o could create jobs that do the following:

  • for each website, archive the periods in order
  • archive each segment for each website
    and run those...

if parallelizing inside a site is not a priority anyway, i'd just stick the --site option from my second comment

is this a problem for testing? I can provide you with test case for segments, and also a script to generate thousands of websites.
i'd rather... i do the coding and you do the testing... ;)

@anonymous-matomo-user
Copy link
Author

ah, ok, the segments exists for each period...
so one would have to process the period first, and then it's segments in parallel... but nested parallelization (that, together with sites in parallel) is not as trivial to do

@mattab
Copy link
Member

mattab commented Jul 14, 2011

I thought so, so we can maybe keep it simple for V1 and simply parallelize websites?

@cbay
Copy link
Contributor

cbay commented Jul 14, 2011

My 2 cents:

  • using a temporary file is ugly, I agree. Unfortunately, using a simple variable doesn't scale at all: appending characters to a variable containing several MB of data takes way too much time. I have more than 20,000 sites to process, which would probably take hours just to create the variable. Hence the temporary file. I'd love to get rid of it though, if anyone has suggestions :)
  • defaulting to the number of cores is probably a bad idea. I think defaulting to a single core would be better.
  • output will indeed be interspeded, I really don't see how it could be avoided. By definition, tasks will execute in parallel, so the execution order cannot be respected. As a user, I wouldn't want to have the output buffered just to have it printed in order, as it would hide the tasks that are already finished.
  • I think we'd better avoid doing too fancy things (like parallelizing segments) if it makes the script much more complex. Complex shell scripts are a PITA :)

@anonymous-matomo-user
Copy link
Author

Replying to Cyril:

My 2 cents:

  • using a temporary file is ugly, I agree. Unfortunately, using a simple variable doesn't scale at all: appending characters to a variable containing several MB of data takes way too much time. I have more than 20,000 sites to process, which would probably take hours just to create the variable. Hence the temporary file. I'd love to get rid of it though, if anyone has suggestions :)

i have would use a variable...
but also i would only work with the site-IDs there, the rest can be derived inside the thread.
(and 20k integers should still be manageable)
but really there's no need need to store that stuff at all! it can jut be piped straight into xargs.

  • defaulting to the number of cores is probably a bad idea. I think defaulting to a single core would be better.

what i said
(also, the current code to get the core count only works on linux anyway, and the other *nix people will hate us for that)

  • output will indeed be interspersed, I really don't see how it could be avoided. By definition, tasks will execute in parallel, so the execution order cannot be respected. As a user, I wouldn't want to have the output buffered just to have it printed in order, as it would hide the tasks that are already finished.

as i wrote, the output processing i wrote for this bug #2440 already collects the output in a variable to analyze/summarize it at the end, so that's not such a big issue.
(for the case where stdout is surpressed... when running the script with stdout visible to troubleshoot problems, one should disable the parallelization anyway)

  • I think we'd better avoid doing too fancy things (like parallelizing segments) if it makes the script much more complex. Complex shell scripts are a PITA :)

i would not do that in shell if there's no simple solution...

@anonymous-matomo-user
Copy link
Author

... but if anybody's going to take on the task of just properly fixing archive.sh by porting it to the php side, i can supply code to do the parallelization in php... (can be neatly done using php:posix (just will likely not work on win32... but who uses that for large insallations anyway ;) ))

@cbay
Copy link
Contributor

cbay commented Jul 14, 2011

Replying to tthuermer:

i have would use a variable...
but also i would only work with the site-IDs there, the rest can be derived inside the thread.
(and 20k integers should still be manageable)
but really there's no need need to store that stuff at all! it can jut be piped straight into xargs.

I fail to see how you could only store the site IDs and still use xargs (don't forget the segments), unless you rewrite the script to have xargs call itself.

(for the case where stdout is surpressed... when running the script with stdout visible to troubleshoot problems, one should disable the parallelization anyway)

I strongly disagree. Archiving 20,000 sites takes hours (using 8 cores, days using only one), and I really want to know where it's at during the process.

@anonymous-matomo-user
Copy link
Author

Replying to Cyril:

Replying to tthuermer:

i have would use a variable...
but also i would only work with the site-IDs there, the rest can be derived inside the thread.
(and 20k integers should still be manageable)
but really there's no need need to store that stuff at all! it can jut be piped straight into xargs.
I fail to see how you could only store the site IDs and still use xargs (don't forget the segments), unless you rewrite the script to have xargs call itself.

as i wrote, the script could run _it_self via xargs, once for each site, passing only the site-id, and then only process that site in the "for idsite in $ID_SITES; do" loop.
additional overhead (e..g getting the segments list multiple times) should be neglegible...(?)

(for the case where stdout is surpressed... when running the script with stdout visible to troubleshoot problems, one should disable the parallelization anyway)
I strongly disagree. Archiving 20,000 sites takes hours (using 8 cores, days using only one), and I really want to know where it's at during the process.

collecting messages is done per php invocation, not over the whole script run. my point is that messages of a single invocation would appear in a continuous block if an error occurs.

note:

(for the case where stdout is surpressed

for most people archive.sh will run from cron, where we want to surpress output and only provide error messages on errors, which is what this ticket is about.

ofcourse you can still run the script and watch it's output...
do you need the live stdout display of each php invocation aswell? (in addition to the messages echo'd by the script itself?) i could add an option to preserve that, but it's most likely just a lot of clutter.

@cbay
Copy link
Contributor

cbay commented Jul 14, 2011

As long as I can have a message (on stdout) that says when archiving is done for a site ID, that's fine. It can be an option, as I understand that the point of this ticket is to suppress printing on stdout except on errors. I don't need a message for each PHP invocation either.

@mattab
Copy link
Member

mattab commented Jul 15, 2011

what i said (also, the current code to get the core count only works on linux anyway, and the other *nix people will hate us for that)

I suppose there is no easy way to get number of cores that would work on non linux servers? it could be nice to have since Piwik has all types of users and I know many *nix users..

Great discussion & conclusions here

tthuermer, please let me know when you have the script & done some basic testing. Then, Cyril can test with thousands of sites and I can test the segment thing :) thanks!!

@mattab
Copy link
Member

mattab commented Aug 3, 2011

tthuermer, is there an update on the archive.sh compatible with multi threaded mode? it would be great to include this in trunk so we can do more work on archive.sh optimization (I hope to work on #2327 next!)

Many thanks for your help on this one

@mattab
Copy link
Member

mattab commented Aug 10, 2011

Note: work is being done in: #2327 -> rewrite of archive.sh in archive.php allowing a lot more flexibility and performance

tthuermer therefore we will patch archive.sh with your existing patch, without multitreading support, OK?

Can you please confirm that this patch: http://issues.piwik.org/attachments/2440/archive.2.sh is the latest patch that I should test before committing to SVN?

thanks!

@anonymous-matomo-user
Copy link
Author

sorry i didn't get to work on this yet...

Replying to matt:

Note: work is being done in: #2327 -> rewrite of archive.sh in archive.php allowing a lot more flexibility and performance

great that you picked up that suggestion!

tthuermer therefore we will patch archive.sh with your existing patch, without multitreading support, OK?

Can you please confirm that this patch: http://issues.piwik.org/attachments/2440/archive.2.sh is the latest patch that I should test before committing to SVN?

yes, no changed since then...

@mattab
Copy link
Member

mattab commented Sep 13, 2011

Thanks for all your work and ideas, it really helped the work on #2327

I think considering the huge work done on the new archive.php, we should completely deprecate archive.sh and .ps1 and change all doc to use archive.php and only maintain this one.

@mattab
Copy link
Member

mattab commented May 8, 2014

See also: #5111 Improve error logging of core:archive cron script

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Projects
None yet
Development

No branches or pull requests

4 participants