Closed Bug 29908 Opened 25 years ago Closed 25 years ago

Crash on quit after using mail; horks machine

Categories

(Core :: Networking, defect, P3)

PowerPC
Mac System 9.x
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: mikepinkerton, Assigned: gordon)

Details

(Keywords: crash, Whiteboard: [PDT+] mustfix (fix checked in))

Attachments

(3 files)

- launch apprunner - click "mail" icon in browser window - type your imap mail password - click a mail message, any message - quit the app w/ cmd-q CRASH! PowerPC access exception at 1566399C PR_Lock+00154 (CurStackBase does not seem to apply...dumping 4K.) Calling chain using A6/R1 links Back chain ISA Caller 00000000 PPC 00B4DDC0 CallOnSwappedStack+00078 00C67898 PPC 00C32730 OTScheduleDriverDeferredTask+0016C 00C67838 PPC 00C2D1C4 qrun+00194 00C677D8 PPC 00C04630 TNativeProvider::Notify(unsigned long, long, void*)+ 00138 00C67788 PPC 00C07B3C OTOpenEndpointOnStreamPriv+02750 00C676F8 PPC 00C029E8 TNativeProvider::NotifyClient(void*, unsigned long, long, void*) +000D4 00C67688 PPC 00BFD808 TOTProcNotifier::~TOTProcNotifier()+000FC 00C67638 PPC 1566C924 NotifierRoutine+00438 00C675B8 PPC 1566C4A4 WakeUpNotifiedThread+00068 00C67578 PPC 1566A8BC DoneWaitingOnThisThread+00030 Closing log Doing an 'es' in macsbug leaves the mac in a limping state that next time any app tries to use the network the machine locks hard.
keywords.
Keywords: beta1, crash
->gordon
Assignee: gagan → gordon
PDT+
Whiteboard: [PDT+] mustfix
I'm unable to reproduce this particular crash, and it seems to work for Mike now as well. We'll continue to poke it a bit to see if we can get it to misbehave again. If that doesn't work, then we'll close it.
Status: NEW → ASSIGNED
man, this worries me that this bug just seemed to go away....It was so severe, and then poof...no problems. *worried look on his face*
I just saw this today.
Adjust summary. Here's an ip for the crash: PowerPC access exception at 1D9F5D2C PR_Lock+00150 Disassembling PowerPC code from 1D9F5D04 PR_Lock +00128 1D9F5D04 lwz r3,0x0810(RTOC) | 80620810 +0012C 1D9F5D08 mr r4,r25 | 7F24CB78 +00130 1D9F5D0C li r5,0x0106 | 38A00106 +00134 1D9F5D10 bl PR_Assert ; 0x1D9E6458 | 4BFF0749 +00138 1D9F5D14 nop | 60000000 +0013C 1D9F5D18 lwz r30,0x000C(r29) | 83DD000C +00140 1D9F5D1C addi r0,r29,0x000C | 381D000C +00144 1D9F5D20 cmplw r30,r0 | 7C1E0040 +00148 1D9F5D24 beq PR_Lock+00160 ; 0x1D9F5D3C | 41820018 +0014C 1D9F5D28 lwz r4,-0x0050(r30) | 809EFFB0 +00150 1D9F5D2C *lwz r3,0x0010(r29) | 807D0010 +00154 1D9F5D30 lwz r0,-0x0050(r3) | 8003FFB0 +00158 1D9F5D34 cmpw r4,r0 | 7C040000 +0015C 1D9F5D38 bne PR_Lock+00184 ; 0x1D9F5D60 | 40820028 +00160 1D9F5D3C addi r30,r29,0x000C | 3BDD000C +00164 1D9F5D40 b PR_Lock+00190 ; 0x1D9F5D6C | 4800002C +00168 1D9F5D44 lwz r3,0x000C(r29) | 807D000C +0016C 1D9F5D48 subi r26,r3,0x0054 | 3B43FFAC +00170 1D9F5D4C lwz r3,0x0004(r31) | 807F0004 +00174 1D9F5D50 lwz r0,0x0004(r26) | 801A0004 Closing log
Summary: Opening Mail message takes out networking on Mac → Crash on quit after using mail; horks machine
It you can reproduce this, please dump the registers as well. Simon, did your stack crawl look the same as Mike's? We're dying somewhere in the second half of this condition: if (q == &lock->waitQ || _PR_THREAD_CONDQ_PTR(q)->priority == _PR_THREAD_CONDQ_PTR(lock->waitQ.prev)->priority) { Here's the line in the file for more context: http://lxr.mozilla.org/seamonkey/source/nsprpub/pr/src/threads/combined/ prulock.c#281
Here's my stack crawl: (CurStackBase does not seem to apply...dumping 4K.) Calling chain using A6/R1 links Back chain ISA Caller 00000000 PPC 1FD08B70 CallOnSwappedStack+00078 00772C08 PPC 1FD2BAC0 OTScheduleDriverDeferredTask+0016C 00772BA8 PPC 1FD26554 qrun+00194 00772B48 PPC 1FCF3364 TNativeProvider::Notify(unsigned long, long, void*)+ 00138 00772AF8 PPC 1FCF68A8 OTOpenEndpointOnStreamPriv+02788 00772A68 PPC 1FCF171C TNativeProvider::NotifyClient(void*, unsigned long, long, void*)+000D4 007729F8 PPC 1FCEC4D8 TOTProcNotifier::~TOTProcNotifier()+000FC 007729A8 PPC 1D9FED24 NotifierRoutine+00438 00772928 PPC 1D9FE8A4 WakeUpNotifiedThread+00068 007728E8 PPC 1D9FCCBC DoneWaitingOnThisThread+00030 PowerPC access exception at 1D9F5D2C PR_Lock+00150
Target Milestone: M14
It looks like async socket I/O is completing on a thread that has been deallocated. Simon, could you post the log from the crash we looked at yesterday? Thanks.
For those taking a look at the MacsBug log, keep in mind that the crash is actually occuring on line PR_Lock+14c.
I spoke with Wan-Teh and he proposed changes to SendReceiveStream() to fix this. We still need to find a way to reproduce the problem, so we can tell if we've fixed it. At a minimum, I'd like to examine another crash to try to confirm our current theories about the bug.
If Wan Teh has a proposed fix, can we get that fix onto a machine that *can* reproduce this ASAP, and see if the fix works?? Thanks, Jim
Attached patch Proposed fix. (deleted) — Splinter Review
I've sent the fix to wtc, sfraser, davidm, pinkerton, and mwelch for review and testing; see the attachment for the diffs. To my knowledge, only pinkerton and sfraser have seen this problem. I will work with them to try and verify the fix.
Another slight variation for us to verify: 1. Launch to browser, go to mail. 2. Select a message. 3. Close mail window (via close box). 4. Application hangs, no crash.
My comments above were referring to me seeing this with IMAP account using 2000-03-10-14m15 commercial build. Able to reproduce over a couple different machines.
laurel: this could be two things: 1. It could be something like bug 29733. Here, the app fails to quit, but you can still choose menu items, and a second Quit works. 2. It could be a bug (maybe unfiled?) where we hang in an NSPR thread. If you drop into MacsBug, you'll see _MD_PauseCPU on the stack. So please get a stack crawl if possible so we can determine which it is.
I'll keep trying to get a stack crawl, but so far I'm not dropping into macsbug -- I just hang and CAN'T select any menu items and can't quit or close browser, I need to force quit. I am also not quitting from compose window. I'll give it some more tries...
when it hangs, you can usually drop into macsbug manually (cmd-power or the right front button on the machines's front panel).
Laurel's log looks liks this: (CurStackBase does not seem to apply...dumping 4K.) Calling chain using A6/R1 links Back chain ISA Caller 0E751D79 PPC 0046CFAC EmToNatEndMoveParams+00014 0E751D00 PPC 002D4FCC 0E751CB0 PPC 00D356C8 0E751BB8 68K 00428F3E 'scod BFAF 0002'+01F1E 0E751B90 68K 0042D2D6 'scod BFAF 0002'+062B6 0E751B7C 68K 00438AF8 'scod BFAF 0002'+11AD8 Return addresses on the stack Stack Addr Frame Addr ISA Caller 0E751F18 PPC 1F18296C PRP_TryLock+00994 0E751F08 68K 0E6E084E 0E751EB8 0E751EB0 PPC 1F180EB8 PR_GetThreadPrivate+004D8 0E751EAC 68K 0E6E084E 0E751E68 0E751E60 PPC 1F1874BC PR_GetPrimaryThread+002A0 0E751E5C 68K 0E6E084E 0E751E58 0E751E50 PPC 1F181364 _PR_GetPrimordialCPU+0045C 0E751E28 0E751E20 PPC FFD0133C WaitNextEvent+00028 0E751E22 68K 1E6041FE 0E751E18 68K 0E6E084E 0E751D74 0E751D70 68K 005829CE 0E751D08 PPC 0046CFAC EmToNatEndMoveParams+00014 0E751CE0 68K 0E751D9E 0E751CB8 0E751CB0 PPC 002D4FCC 0E751C78 0E751C70 PPC 002D6DD4 MPGetPoolStatistics+002A0 0E751C68 0E751C60 PPC 00D356C8 0E751B94 0E751B90 68K 00428F3E 'scod BFAF 0002'+01F1E 0E751B80 0E751B7C 68K 0042D2D6 'scod BFAF 0002'+062B6 0E751B70 0E751B6C 68K 00438AF8 'scod BFAF 0002'+11AD8 0E751B60 68K 0F6A8F26 0E751B54 68K 0F6DD306 0E751B50 0E751B4C 68K 00438BA6 'scod BFAF 0002'+11B86 This looks to me like the zombie thread on quit problem.
gordon/sfraser: Given that it's the old "zombie thread on quit" problem (whatever that means), where does that leave us in terms of a fix? Is this a notorious and unfixable problem? A problem with a standard fix? Please give us a timeline for repair, and a proposed landing date. The beta branch has been cut... and this is a bad problem. Please give us some info RSN.
Here's the deal. This bug (this particular crash on quit in PR_Lock()) is very hard to reproduce, happening only occasionally on some machines on quitting after using mail. I think it's very infrequency is enough to justify release noting this for beta. This infrequency also makes it hard to verify that a fix works. The 'zombie thread on quit' is a separate problem (as far as I can tell), for which no bug exists (my bad, I guess). Next time I see it, I'll get a stack and file a bug. It also happens more if you've been using mail. I think we need to get feedback from QA on how frequent each of these bugs are (perhaps using Talkback data), and triage based on that.
No longer blocks: 7799
Pinkerton: How often have you seen this? Are you in agreement with sfraser that it is rare, and we should release note?
Whiteboard: [PDT+] mustfix → [PDT+] mustfix (unless this is rare enough to relnote)
I'll start trying to repro this again with gordon's patch that he mailed me.
w/out gordon's patch, i am _still_ seeing this crash (it reappeared!). I'm trying gordon's patch right now and will keep trying to dupe.
Whiteboard: [PDT+] mustfix (unless this is rare enough to relnote) → [PDT+] mustfix (waiting for mark welch to review)
I checked in gordon's patch (attachment 6542 [details] [diff] [review]) on the following branches (of NSPR): 1. Main trunk: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c, revision 3.16 2. NSPRPUB_RELEASE_4_0_BRANCH: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c, revision 3.14.8.4 3. NSPRPUB_CLIENT_BRANCH: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c, revision 3.15.2.1 The checkin to NSPRPUB_CLIENT_BRANCH is approved by jar@netscape.com.
We just missed the verification builds for today; we'll be ready to checkin to the beta branch on Wednesday 3/15.
Whiteboard: [PDT+] mustfix (waiting for mark welch to review) → [PDT+] mustfix (ready to checkin)
We've checked in a fix (to beta branch and trunk). The fix clears the thread field of socket file descriptors after SendReceiveStream() is done using it, so that later incoming async OT events for the socket don't mistakenly try to wake the old thread up.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Whiteboard: [PDT+] mustfix (ready to checkin) → [PDT+] mustfix (fix checked in)
verified on MAC 0S9 - build 2000031615 nb1b
Status: RESOLVED → VERIFIED
OS: Mac System 9.x
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: