Closed Bug 4270 Opened 26 years ago Closed 25 years ago

THREADING: Random startup crashes realted to component manager

Categories

(Core :: XPCOM, defect, P3)

x86
Windows NT
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: peterl-retired, Assigned: dp)

References

Details

(Keywords: crash)

These problems seem to only happen on DUAL pentium boxes. Also seems coupled to debug builds. The crash is consistent, the stack trace varies: NTDLL! 77f76148() nr_FindAtLevel(_regfile * 0x029e4c60, long 1230195779, char * 0x029e4ef0, _desc * 0x0308f6fc, long * 0x0308f4f4) line 1549 + 43 bytes nr_Find(_regfile * 0x029e4c60, long 28467, char * 0x029e4ef0, _desc * 0x0308f75c, long * 0x00000000, long * 0x00000000, int 1) line 1487 + 28 bytes NR_RegGetKeyRaw(void * 0x029e4ff0, long 19179, char * 0x029e4ef0, long * 0x0308f9ac) line 2861 + 27 bytes nsComponentManagerImpl::PlatformProgIDToCLSID(char * 0x029e4ef0, nsID * 0x0308faa0) line 708 + 27 bytes nsComponentManagerImpl::ProgIDToCLSID(nsComponentManagerImpl * const 0x02992370, char * 0x029e4ef0, nsID * 0x0308faa0) line 1017 + 16 bytes nsComponentManager::ProgIDToCLSID(char * 0x029e4ef0, nsID * 0x0308faa0) line 45 NET_PluginStream(int 38, void * 0x00000000, URL_Struct_ * 0x029e3370, MWContext_ * 0x029e3de0) line 193 + 14 bytes NET_StreamBuilder(int 38, URL_Struct_ * 0x029e3370, MWContext_ * 0x029e3de0) line 238 + 24 bytes net_setup_file_stream(_ActiveEntry * 0x029e3120) line 779 + 25 bytes net_ProcessFile(_ActiveEntry * 0x029e3120) line 1315 + 9 bytes NET_ProcessNet(PRFileDesc * 0x00000000, int 1) line 3312 + 13 bytes NET_PollSockets() line 298 + 9 bytes NetlibTimerProc(void * 0x00000000, unsigned int 275, unsigned int 5413, unsigned long 226892265) line 279 USER32! 77e71373() nsNetlibThread::NetlibThreadMain(void * 0x029be4f0) line 259 _PR_NativeRunThread(void * 0x029be760) line 379 + 13 bytes _threadstartex(void * 0x029be9d0) line 212 + 13 bytes KERNEL32! 77f04f2c() --------------------- NTDLL! 77f76148() PR_DestroyLock(PRLock * 0x02dceb00) line 197 + 38 bytes PR_DestroyMonitor(PRMonitor * 0x02dce1d0) line 66 + 14 bytes nr_DeleteNode(_regfile * 0x02dcec60) line 285 + 12 bytes NR_RegClose(void * 0x02dcf490) line 2330 + 12 bytes nsComponentManagerImpl::PlatformFind(const nsID & {...}, nsFactoryEntry * * 0x0012fc5c) line 682 nsComponentManagerImpl::FindFactory(nsComponentManagerImpl * const 0x02d65870, const nsID & {...}, nsIFactory * * 0x0012fcb8) line 929 + 16 bytes nsComponentManager::FindFactory(const nsID & {...}, nsIFactory * * 0x0012fcb8) line 35 nsCharsetConverterManager::GetICharsetConverterInfo(ConverterInfo * 0x02dcd704, int 0x00000002, int * 0x02dcd58c) line 268 + 24 bytes nsCharsetConverterManager::GatherConvertersInfo() line 222 + 61 bytes nsCharsetConverterManager::CreateMapping() line 211 nsCharsetConverterManager::GetUnicodeDecoder(nsCharsetConverterManager * const 0x02dcd580, const nsString * 0x0012fd48 {"ISO-8859-1"}, nsIUnicodeDecoder * * 0x0012fdc4) line 340 + 8 bytes NS_NewB2UConverter(nsIUnicodeDecoder * * 0x0012fdc4, nsISupports * 0x00000000, nsString * 0x0012fd48 {"ISO-8859-1"}) line 149 + 20 bytes NS_NewConverterStream(nsIUnicharInputStream * * 0x0012fdfc, nsISupports * 0x00000000, nsIInputStream * 0x02dcca20, int 0x00000000, nsString * 0x00000000 {???}) line 302 + 15 bytes nsDocFactoryImpl::InitUAStyleSheet() line 638 + 20 bytes nsDocFactoryImpl::CreateDefaultDocument(nsIURL * 0x02dc7280, char * 0x00000000, nsIContentViewerContainer * 0x02db0900, nsIStreamListener * * 0x02dc71c0, nsIContentViewer * * 0x0012fe94) line 362 nsDocFactoryImpl::CreateInstance(nsDocFactoryImpl * const 0x02db0830, nsIURL * 0x02dc7280, char * 0x02dcc7f0, char * 0x00000000, nsIContentViewerContainer * 0x02db0900, nsISupports * 0x00000000, nsIStreamListener * * 0x02dc71c0, nsIContentViewer * * 0x0012fe94) line 279 + 28 bytes nsDocumentBindInfo::OnStartBinding(nsDocumentBindInfo * const 0x02dc71a0, nsIURL * 0x02dc7280, char * 0x02dcc7f0) line 1700 + 67 bytes OnStartBindingProxyEvent::HandleEvent(OnStartBindingProxyEvent * const 0x02dcc7a0) line 506 StreamListenerProxyEvent::HandlePLEvent(PLEvent * 0x02dcc7a4) line 471 + 12 bytes PL_HandleEvent(PLEvent * 0x02dcc7a4) line 476 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x02d6f410) line 437 + 9 bytes _md_EventReceiverProc(void * 0x00730516, unsigned int 0x0000c0aa, unsigned int 0x00000000, long 0x02d6f410) line 799 + 9 bytes USER32! 77e713ed() -------- and more...
Summary: Random starup crashes realted to component manager → Random startup crashes realted to component manager
Here's another one - the assertion is tripped because no charset converter comes back from the component manager: NTDLL! 77f76148() nsDebug::Assertion(char * 0x007c29b0, char * 0x007c299c, char * 0x007c2974, int 137) line 140 + 13 bytes nsScanner::SetDocumentCharset(const nsString & {"ISO-8859-1"}, nsCharsetSource kCharsetFromDocTypeDefault) line 137 + 32 bytes nsScanner::nsScanner(nsString & {"file:///S|/mozilla/dist/WIN32_D.OBJ/bin/res/samples/test0.html"}, int 0, const nsString & {"ISO-8859-1"}, nsCharsetSource kCharsetFromDocTypeDefault) line 98 nsParser::Parse(nsIURL * 0x02dc7910, nsIStreamObserver * 0x00000000, int 0) line 575 + 78 bytes nsHTMLDocument::StartDocumentLoad(nsHTMLDocument * const 0x02dcf430, nsIURL * 0x02dc7910, nsIContentViewerContainer * 0x02daf900, nsIStreamListener * * 0x02dc7890, char * 0x00000000) line 372 nsDocFactoryImpl::CreateDefaultDocument(nsIURL * 0x02dc7910, char * 0x00000000, nsIContentViewerContainer * 0x02daf900, nsIStreamListener * * 0x02dc7890, nsIContentViewer * * 0x0012fe94) line 382 + 28 bytes nsDocFactoryImpl::CreateInstance(nsDocFactoryImpl * const 0x02daf830, nsIURL * 0x02dc7910, char * 0x02dcace0, char * 0x00000000, nsIContentViewerContainer * 0x02daf900, nsISupports * 0x00000000, nsIStreamListener * * 0x02dc7890, nsIContentViewer * * 0x0012fe94) line 279 + 28 bytes nsDocumentBindInfo::OnStartBinding(nsDocumentBindInfo * const 0x02dc7870, nsIURL * 0x02dc7910, char * 0x02dcace0) line 1700 + 67 bytes OnStartBindingProxyEvent::HandleEvent(OnStartBindingProxyEvent * const 0x02dcac90) line 506 StreamListenerProxyEvent::HandlePLEvent(PLEvent * 0x02dcac94) line 471 + 12 bytes PL_HandleEvent(PLEvent * 0x02dcac94) line 476 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x02d6ee00) line 437 + 9 bytes _md_EventReceiverProc(void * 0x024504c8, unsigned int 49322, unsigned int 0, long 47640064) line 799 + 9 bytes USER32! 77e713ed()
It's pretty clearly a race (or set or races) between netlib and the main ui/layout thread, usually benign on uniprocessors, less so on MP systems where each thread gets a CPU. It appears to me that nsComponentManagerImpl::ProgIDToCLSID does not interlock with other accessor/mutators of the underlying platform registry, nor does its subroutine, nsComponentManagerImpl::PlatformProgIDToCLSID, which in turn calls into thread-unsafe libreg functions (NR_RegGetKeyRaw, nr_Find, nr_FindAtLevel). The fact that libreg exports some entry points that are thread-safe, and others (NR_RegGetKeyRaw) that aren't, seems like a bug to me. There appear to be two layers with their own locking schemes (bloat note: both use PRMonitors; possibly only one layer of synchronization is necessary), but in some control flows, neither layer protects its invariants! This stuff needs a code-review, for both thread-safety and code minimization. One quick inspection task: match up monitor enter and exit calls, find paths from nsComponentManager entry points that don't acquire at all, such as ProgIDToCLSID. /be
It's pretty clearly a race (or set or races) between netlib and the main ui/layout thread, usually benign on uniprocessors, less so on MP systems where each thread gets a CPU. It appears to me that nsComponentManagerImpl::ProgIDToCLSID does not interlock with other accessor/mutators of the underlying platform registry, nor does its subroutine, nsComponentManagerImpl::PlatformProgIDToCLSID, which in turn calls into thread-unsafe libreg functions (NR_RegGetKeyRaw, nr_Find, nr_FindAtLevel). The fact that libreg exports some entry points that are thread-safe, and others (NR_RegGetKeyRaw) that aren't, seems like a bug to me. There appear to be two layers with their own locking schemes (bloat note: both use PRMonitors; possibly only one layer of synchronization is necessary), but in some control flows, neither layer protects its invariants! This stuff needs a code-review, for both thread-safety and code minimization. One quick inspection task: match up monitor enter and exit calls, find paths from nsComponentManager entry points that don't acquire at all, such as ProgIDToCLSID. /be
Assignee: dp → dveditz
Rickg and peterl are seeing this. I am told that both have a dual processor pentium. Can you make sure threading is handled correctly by libreg/
So brendan, would you be willing to code review ???
dvedits, cool to put on M4 radar? really need this fix in. ok?
Assignee: dveditz → dp
Status: NEW → ASSIGNED
Target Milestone: M4
Dan and myself checked in fixes for this. People claim it works. Now I need to remove the overfix that I did with nsIRegistry. So taking over this bug.
This crash still happens (though perhaps 1 out of 10 times these days). The only stack trace I still see is the first one.
Target Milestone: M4 → M5
I don't think dual Pentium boxes are critical for M4
Target Milestone: M5 → M6
Uhg.. moving again to M6
Dan, did you say there was other places in the registry that weren't MP safe. Can you confirm that libreg/ is 100% thread safe.
libreg is thread safe except for a potential race in NR_StartupRegistry() which creates the lock. As long as the registry is only started once, or is first started (and not shutdown) before other threads are spawned it won't be a problem. Is there a static initializer for a PRLock the way there is for Pthreads? I couldn't find one.
Target Milestone: M6 → M7
Summary: Random startup crashes realted to component manager → THREADING: Random startup crashes realted to component manager
Depends on: 7308
Target Milestone: M7 → M8
I am keeping this. No one has reported these for two milestones now. Good. We are doing a lot of extra locking. Once I clean that up, I will close this one.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Adding crash keyword
Keywords: crash
Component: XPCOM Registry → XPCOM
QA Contact: dp → xpcom
You need to log in before you can comment on or make changes to this bug.