Closed Bug 31364 Opened 25 years ago Closed 24 years ago

parallel build dies of race condition xpidl<->mkdir in export

Categories

(SeaMonkey :: Build Config, defect, P3)

Sun
Solaris
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: axel, Assigned: cls)

Details

Attachments

(4 files)

doing a parallel build with -j12 on a 12proc machine with sources in /tmp: cvsco.log from Mar 10 20:12 MET error messages like (this is not the first one, but they don't differ much) (second run after nsprpub) make[3]: Entering directory `/tmp/build/rdf/chrome' Creating .deps make[4]: Entering directory `/tmp/build/rdf/chrome/public' Creating .deps Creating _xpidlgen nsIChromeRegistry.idl ../../../dist/bin/xpidl -m header -w -I ../../../dist/idl -I/tmp/mozilla/rdf/chrome/public -o _xpidlgen/nsIChromeRegistry /tmp/mozilla/rdf/chrome/public/nsIChromeRegistry.idl nsIChromeEntry.idl ../../../dist/bin/xpidl -m header -w -I ../../../dist/idl -I/tmp/mozilla/rdf/chrome/public -o _xpidlgen/nsIChromeEntry /tmp/mozilla/rdf/chrome/public/nsIChromeEntry.idl ../../../config/nsinstall -R -m 444 /tmp/mozilla/rdf/chrome/public/nsIChromeRegistry.idl /tmp/mozilla/rdf/chrome/public/nsIChromeEntry.idl ../../../dist/idl error opening output file: No such file or directory make[4]: *** [_xpidlgen/nsIChromeEntry.h] Error 1 make[4]: *** Waiting for unfinished jobs.... make[4]: Leaving directory `/tmp/build/rdf/chrome/public' make[3]: *** [export] Error 2 make[3]: Leaving directory `/tmp/build/rdf/chrome'
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
Can one of you 3 that sees this problem try building setting XPIDL_GEN_DIR=. ? Our problems with parallel builds with the classic NSPR build stemmed from all of the mkdir calls used to create the .OBJ dirs.
As setting the environment didn't cut it, I changed rules.mk by hand to replace XPIDL_GEN_DIR = _xpidlgen with XPIDL_GEN_DIR = . in line 234 Then the race is fixed, but the build brakes with make[1]: Entering directory `/tmp/build/widget/public' nsIWidget.idl ../../dist/bin/xpidl -m header -w -I ../../dist/idl -I/tmp/mozilla/widget/public -o .//tmp/mozilla/widget/public/nsIWidget /tmp/mozilla/widget/public/nsIWidget.idl error opening output file: No such file or directory make[1]: *** [/tmp/mozilla/widget/public/nsIWidget.h] Error 1 make[1]: Leaving directory `/tmp/build/widget/public' make: *** [export] Error 2 New bug? cls said, he would work on it after getting up again. Axel
How about generating dirs when generating makefiles? I made hack to acoutput-fast.pl that check if makefile have XPIDLSCR and makes xpidl dirs. I found some errors in allmakefiles when testing this, i attach patch that has hacks to acoutput-fast.pl and fixes to allmakefiles.sh How about making .deps -dirs same way?
Because you still need to be able to make the directories on the fly after someone does a 'make clean'. Axel, which version of gnu make are you using?
I use GNU Make version 3.78.1. I had a different idea, how about a dummy target, and add that to the dependencies? $(XPIDL_GEN_DIR)/%.h: %.idl $(XPIDL_COMPILE) dirs_target $(REPORT_BUILD) dirs_target: @if test ! -d $(XPIDL_GEN_DIR); then echo Creating $(XPIDL_GEN_DIR); rm -rf $(XPIDL_GEN_DIR); mkdir $(XPIDL_GEN_DIR); else true; fi this way, the headers depend on the exist test for the dir, but not on the dir itself, right?
I've had nothing but problems doing parallel makes with gnu make > 3.77 . I don't know what Smith changed with the jobserver stuff but it doesn't work. Downgrade to 3.76.1 and let me know if the problem persists.
mass re-assign of all bugs where i was listed as the qa contact
QA Contact: cyeh → chofmann
Can anyone duplicate this using gnu make <= 3.77?
Target Milestone: --- → M18
After some digging thru the bug-make mail archive, I ran across a thread that seems to indicate that there is a serious bug with at least make 3.78.1. Look at the '3.78.1 Error with "::" targets and "-j" option' thread. http://www.geocrawler.com/archives/3/351/1999/11/0/ From experience, it doesn't appear to have been fixed with 3.79 but I don't see anything about it one way or the other.
While make 3.79.1 does fix the bug mentioned in bug-make, it does not fix this. I wonder if a small example can be come up with, to submit to the make people.
Also, btw, I tried make 3.77, but it seems to have another bug that makes it fail immediately: make[5]: Entering directory `/mnt/proj/mozilla/mozilla/nsprpub/pr/include/obsolete' ../../../config/SunOS5.7_sparc_32_PTH_DBG.OBJ/nsinstall -R -m 444 /mnt/proj/mozilla/mozilla/dist/include/obsolete usage: ../../../config/SunOS5.7_sparc_32_PTH_DBG.OBJ/nsinstall [-C cwd] [-L linkprefix] [-m mode] [-o owner] [-g group] [-DdltR] file [file ...] directory make[5]: *** [export] Error 2
That is a known issue with the $(wildcard) feature & make 3.77 under solaris. You will need to downgrade to 3.76.1. :-/
Part of this looks like a basic test-and-create-is-not-atomic race. In many places across the makefiles we have: if test ! -d foo; rm -rf foo; mkdir foo; else true; fi If I just use "mkdir -p foo" instead, the xpidlgen problems go away (however I still get errors building nspr; I'm looking into those). What's the reasoning behind this test?
My current patch is above. With this applied, the only error I can consistently reproduce is one that also happens sometimes on non-SMP systems (well, Master_D is getting it at least, and he's not SMP). I don't think it's 100% fixed though.
I believe the reason for the test is that the -p option is not supported on mkdir on all platforms. I'm wondering if we shouldn't just start using a mkinstalldirs script like a number of projects do?
Would it be possible to simply make sure the 'export' target gets built with -j1 all of the time? This is where all the problems are, so it would at least be a good workaround until we figure out mkinstalldirs (i'm not familiar with the details of that, unfortunately) or something else.
adding self to cc as our unix daily build systems are multicpu but aren't doing parallel builds.
hmm, I just tried doing a non-parallel make export and a -j4 make install on sol26 and cut the build time from about 5 hours down to about 4, but I got a lot of gmake[2]: warning: -jN forced in submake: disabling jobserver mode. Other than that, it seemed to complete without problems. If it works on hpux and linux I'll turn it on for the daily builds.
I think this is because all the submakes use -j4 . Taken literally, this would mean that each submake should start 4 jobs. Since this is obviously not what you want, it ignores that and coordinates the number of jobs with the parent make. The warning is to tell you it's doing that. It might go away if we could tell the submakes not to use -jN, but that's probably not trivial. So I think it can be ignored.
I finally got around to configuring the dual processor linux box for daily verification builds. Once I get the daily builds switched over to the new system (test build going now) I'll be looking at turning this on again for the daily builds...
If I make sure that the generated have an actual dependency upon a target that makes the XPIDL_GEN_DIR, then the problem goes away for me. Can someone with a hoss test box try this out? Note: they cannot actually depend upon XPIDL_GEN_DIR as the timestamp of XPIDL_GEN_DIR changes when its contents change.
Hi, I tested a (modified version of) cls' patch. The file in xpidlgen_ does the trick. I gave some facelifting to the patch by cls. First, there were some security patches in there, removed those. The XPIDL_GEN_DIR is not part of the MAKE_DIRS variables anymore, as we have the right dependency in there. no need to have it twice. I rephrased the generating line a bit. Nothing much happened there. I tested this on our machine, with a make -j6 export. The load is not particularily low at the moment, but 4 procs were free. I figure I should have got trapped if this wouldn't work. clobber worked out allright, too. r=me Axel
Patch has been checked in. Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Tested the patch on my SMP system here, works fine. Marking verified.
Status: RESOLVED → VERIFIED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: