Closed
Bug 29908
Opened 25 years ago
Closed 25 years ago
Crash on quit after using mail; horks machine
Categories
(Core :: Networking, defect, P3)
Tracking
()
VERIFIED
FIXED
M14
People
(Reporter: mikepinkerton, Assigned: gordon)
Details
(Keywords: crash, Whiteboard: [PDT+] mustfix (fix checked in))
Attachments
(3 files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
- launch apprunner
- click "mail" icon in browser window
- type your imap mail password
- click a mail message, any message
- quit the app w/ cmd-q
CRASH!
PowerPC access exception at 1566399C PR_Lock+00154
(CurStackBase does not seem to apply...dumping 4K.)
Calling chain using A6/R1 links
Back chain ISA Caller
00000000 PPC 00B4DDC0 CallOnSwappedStack+00078
00C67898 PPC 00C32730 OTScheduleDriverDeferredTask+0016C
00C67838 PPC 00C2D1C4 qrun+00194
00C677D8 PPC 00C04630 TNativeProvider::Notify(unsigned long, long, void*)+
00138
00C67788 PPC 00C07B3C OTOpenEndpointOnStreamPriv+02750
00C676F8 PPC 00C029E8 TNativeProvider::NotifyClient(void*, unsigned long,
long, void*)
+000D4
00C67688 PPC 00BFD808 TOTProcNotifier::~TOTProcNotifier()+000FC
00C67638 PPC 1566C924 NotifierRoutine+00438
00C675B8 PPC 1566C4A4 WakeUpNotifiedThread+00068
00C67578 PPC 1566A8BC DoneWaitingOnThisThread+00030
Closing log
Doing an 'es' in macsbug leaves the mac in a limping state that next time any app
tries to use the network the machine locks hard.
Reporter | ||
Comment 1•25 years ago
|
||
keywords.
I'm unable to reproduce this particular crash, and it seems to work for Mike now
as well. We'll continue to poke it a bit to see if we can get it to misbehave
again. If that doesn't work, then we'll close it.
Status: NEW → ASSIGNED
Reporter | ||
Comment 5•25 years ago
|
||
man, this worries me that this bug just seemed to go away....It was so severe,
and then poof...no problems. *worried look on his face*
Comment 6•25 years ago
|
||
I just saw this today.
Comment 7•25 years ago
|
||
Adjust summary. Here's an ip for the crash:
PowerPC access exception at 1D9F5D2C PR_Lock+00150
Disassembling PowerPC code from 1D9F5D04
PR_Lock
+00128 1D9F5D04 lwz r3,0x0810(RTOC) |
80620810
+0012C 1D9F5D08 mr r4,r25 |
7F24CB78
+00130 1D9F5D0C li r5,0x0106 |
38A00106
+00134 1D9F5D10 bl PR_Assert ; 0x1D9E6458 |
4BFF0749
+00138 1D9F5D14 nop |
60000000
+0013C 1D9F5D18 lwz r30,0x000C(r29) |
83DD000C
+00140 1D9F5D1C addi r0,r29,0x000C |
381D000C
+00144 1D9F5D20 cmplw r30,r0 |
7C1E0040
+00148 1D9F5D24 beq PR_Lock+00160 ; 0x1D9F5D3C |
41820018
+0014C 1D9F5D28 lwz r4,-0x0050(r30) |
809EFFB0
+00150 1D9F5D2C *lwz r3,0x0010(r29) |
807D0010
+00154 1D9F5D30 lwz r0,-0x0050(r3) |
8003FFB0
+00158 1D9F5D34 cmpw r4,r0 |
7C040000
+0015C 1D9F5D38 bne PR_Lock+00184 ; 0x1D9F5D60 |
40820028
+00160 1D9F5D3C addi r30,r29,0x000C |
3BDD000C
+00164 1D9F5D40 b PR_Lock+00190 ; 0x1D9F5D6C |
4800002C
+00168 1D9F5D44 lwz r3,0x000C(r29) |
807D000C
+0016C 1D9F5D48 subi r26,r3,0x0054 |
3B43FFAC
+00170 1D9F5D4C lwz r3,0x0004(r31) |
807F0004
+00174 1D9F5D50 lwz r0,0x0004(r26) |
801A0004
Closing log
Summary: Opening Mail message takes out networking on Mac → Crash on quit after using mail; horks machine
It you can reproduce this, please dump the registers as well. Simon, did your
stack crawl look the same as Mike's?
We're dying somewhere in the second half of this condition:
if (q == &lock->waitQ || _PR_THREAD_CONDQ_PTR(q)->priority ==
_PR_THREAD_CONDQ_PTR(lock->waitQ.prev)->priority) {
Here's the line in the file for more context:
http://lxr.mozilla.org/seamonkey/source/nsprpub/pr/src/threads/combined/
prulock.c#281
Comment 9•25 years ago
|
||
Here's my stack crawl:
(CurStackBase does not seem to apply...dumping 4K.)
Calling chain using A6/R1 links
Back chain ISA Caller
00000000 PPC 1FD08B70 CallOnSwappedStack+00078
00772C08 PPC 1FD2BAC0 OTScheduleDriverDeferredTask+0016C
00772BA8 PPC 1FD26554 qrun+00194
00772B48 PPC 1FCF3364 TNativeProvider::Notify(unsigned long, long, void*)+
00138
00772AF8 PPC 1FCF68A8 OTOpenEndpointOnStreamPriv+02788
00772A68 PPC 1FCF171C TNativeProvider::NotifyClient(void*, unsigned long,
long, void*)+000D4
007729F8 PPC 1FCEC4D8 TOTProcNotifier::~TOTProcNotifier()+000FC
007729A8 PPC 1D9FED24 NotifierRoutine+00438
00772928 PPC 1D9FE8A4 WakeUpNotifiedThread+00068
007728E8 PPC 1D9FCCBC DoneWaitingOnThisThread+00030
PowerPC access exception at 1D9F5D2C PR_Lock+00150
Updated•25 years ago
|
Target Milestone: M14
Assignee | ||
Comment 10•25 years ago
|
||
It looks like async socket I/O is completing on a thread that has been
deallocated. Simon, could you post the log from the crash we looked at
yesterday? Thanks.
Comment 11•25 years ago
|
||
Assignee | ||
Comment 12•25 years ago
|
||
For those taking a look at the MacsBug log, keep in mind that the crash is
actually occuring on line PR_Lock+14c.
Assignee | ||
Comment 13•25 years ago
|
||
I spoke with Wan-Teh and he proposed changes to SendReceiveStream() to fix this.
We still need to find a way to reproduce the problem, so we can tell if we've
fixed it. At a minimum, I'd like to examine another crash to try to confirm our
current theories about the bug.
Comment 14•25 years ago
|
||
If Wan Teh has a proposed fix, can we get that fix onto a machine that *can*
reproduce this ASAP, and see if the fix works??
Thanks,
Jim
Assignee | ||
Comment 15•25 years ago
|
||
Assignee | ||
Comment 16•25 years ago
|
||
I've sent the fix to wtc, sfraser, davidm, pinkerton, and mwelch for review and
testing; see the attachment for the diffs.
To my knowledge, only pinkerton and sfraser have seen this problem. I will work
with them to try and verify the fix.
Comment 17•25 years ago
|
||
Another slight variation for us to verify:
1. Launch to browser, go to mail.
2. Select a message.
3. Close mail window (via close box).
4. Application hangs, no crash.
Comment 18•25 years ago
|
||
My comments above were referring to me seeing this with IMAP account using
2000-03-10-14m15 commercial build. Able to reproduce over a couple different
machines.
Comment 19•25 years ago
|
||
laurel: this could be two things:
1. It could be something like bug 29733. Here, the app fails to quit, but you
can still choose menu items, and a second Quit works.
2. It could be a bug (maybe unfiled?) where we hang in an NSPR thread. If you
drop into MacsBug, you'll see _MD_PauseCPU on the stack.
So please get a stack crawl if possible so we can determine which it is.
Comment 20•25 years ago
|
||
I'll keep trying to get a stack crawl, but so far I'm not dropping into macsbug
-- I just hang and CAN'T select any menu items and can't quit or close browser,
I need to force quit. I am also not quitting from compose window. I'll give it
some more tries...
Reporter | ||
Comment 21•25 years ago
|
||
when it hangs, you can usually drop into macsbug manually (cmd-power or the right
front button on the machines's front panel).
Comment 22•25 years ago
|
||
Laurel's log looks liks this:
(CurStackBase does not seem to apply...dumping 4K.)
Calling chain using A6/R1 links
Back chain ISA Caller
0E751D79 PPC 0046CFAC EmToNatEndMoveParams+00014
0E751D00 PPC 002D4FCC
0E751CB0 PPC 00D356C8
0E751BB8 68K 00428F3E 'scod BFAF 0002'+01F1E
0E751B90 68K 0042D2D6 'scod BFAF 0002'+062B6
0E751B7C 68K 00438AF8 'scod BFAF 0002'+11AD8
Return addresses on the stack
Stack Addr Frame Addr ISA Caller
0E751F18 PPC 1F18296C PRP_TryLock+00994
0E751F08 68K 0E6E084E
0E751EB8 0E751EB0 PPC 1F180EB8 PR_GetThreadPrivate+004D8
0E751EAC 68K 0E6E084E
0E751E68 0E751E60 PPC 1F1874BC PR_GetPrimaryThread+002A0
0E751E5C 68K 0E6E084E
0E751E58 0E751E50 PPC 1F181364 _PR_GetPrimordialCPU+0045C
0E751E28 0E751E20 PPC FFD0133C WaitNextEvent+00028
0E751E22 68K 1E6041FE
0E751E18 68K 0E6E084E
0E751D74 0E751D70 68K 005829CE
0E751D08 PPC 0046CFAC EmToNatEndMoveParams+00014
0E751CE0 68K 0E751D9E
0E751CB8 0E751CB0 PPC 002D4FCC
0E751C78 0E751C70 PPC 002D6DD4 MPGetPoolStatistics+002A0
0E751C68 0E751C60 PPC 00D356C8
0E751B94 0E751B90 68K 00428F3E 'scod BFAF 0002'+01F1E
0E751B80 0E751B7C 68K 0042D2D6 'scod BFAF 0002'+062B6
0E751B70 0E751B6C 68K 00438AF8 'scod BFAF 0002'+11AD8
0E751B60 68K 0F6A8F26
0E751B54 68K 0F6DD306
0E751B50 0E751B4C 68K 00438BA6 'scod BFAF 0002'+11B86
This looks to me like the zombie thread on quit problem.
Comment 23•25 years ago
|
||
gordon/sfraser: Given that it's the old "zombie thread on quit" problem
(whatever that means), where does that leave us in terms of a fix? Is this a
notorious and unfixable problem? A problem with a standard fix? Please give us
a timeline for repair, and a proposed landing date.
The beta branch has been cut... and this is a bad problem. Please give us some
info RSN.
Comment 24•25 years ago
|
||
Here's the deal. This bug (this particular crash on quit in PR_Lock()) is very
hard to reproduce, happening only occasionally on some machines on quitting after
using mail. I think it's very infrequency is enough to justify release noting
this for beta. This infrequency also makes it hard to verify that a fix works.
The 'zombie thread on quit' is a separate problem (as far as I can tell), for
which no bug exists (my bad, I guess). Next time I see it, I'll get a stack and
file a bug. It also happens more if you've been using mail.
I think we need to get feedback from QA on how frequent each of these bugs are
(perhaps using Talkback data), and triage based on that.
No longer blocks: 7799
Comment 25•25 years ago
|
||
Pinkerton: How often have you seen this? Are you in agreement with sfraser that
it is rare, and we should release note?
Whiteboard: [PDT+] mustfix → [PDT+] mustfix (unless this is rare enough to relnote)
Reporter | ||
Comment 26•25 years ago
|
||
I'll start trying to repro this again with gordon's patch that he mailed me.
Reporter | ||
Comment 27•25 years ago
|
||
w/out gordon's patch, i am _still_ seeing this crash (it reappeared!). I'm trying
gordon's patch right now and will keep trying to dupe.
Updated•25 years ago
|
Whiteboard: [PDT+] mustfix (unless this is rare enough to relnote) → [PDT+] mustfix (waiting for mark welch to review)
Comment 28•25 years ago
|
||
Comment 29•25 years ago
|
||
I checked in gordon's patch (attachment 6542 [details] [diff] [review]) on the
following branches (of NSPR):
1. Main trunk:
/cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c, revision 3.16
2. NSPRPUB_RELEASE_4_0_BRANCH:
/cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c, revision 3.14.8.4
3. NSPRPUB_CLIENT_BRANCH:
/cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c, revision 3.15.2.1
The checkin to NSPRPUB_CLIENT_BRANCH is approved by jar@netscape.com.
Assignee | ||
Comment 30•25 years ago
|
||
We just missed the verification builds for today; we'll be ready to checkin to
the beta branch on Wednesday 3/15.
Whiteboard: [PDT+] mustfix (waiting for mark welch to review) → [PDT+] mustfix (ready to checkin)
Assignee | ||
Comment 31•25 years ago
|
||
We've checked in a fix (to beta branch and trunk). The fix clears the thread
field of socket file descriptors after SendReceiveStream() is done using it, so
that later incoming async OT events for the socket don't mistakenly try to wake
the old thread up.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Whiteboard: [PDT+] mustfix (ready to checkin) → [PDT+] mustfix (fix checked in)
You need to log in
before you can comment on or make changes to this bug.
Description
•