Closed
Bug 911
Opened 26 years ago
Closed 23 years ago
Fix(?) for crash when quickly launching multiple windows
Categories
(MozillaClassic Graveyard :: NetLib, defect, P1)
Tracking
(Not tracked)
VERIFIED
WONTFIX
People
(Reporter: bryce, Assigned: nisheeth_mozilla)
Details
Not sure if I have the right component for this one... Pardon if not so.
It might belong to the RDF or the Cookies folks.
I found a bug that occurs when one opens a number of windows, which may
be related to a low memory condition, and somehow to cookies.
I've located the bug exactly and have a patch that seems to stop the
bug. However, whatever condition happens when this bug crops up is now
causing a bug in a different area of the program (I think) so this isn't
a total fix.
I'm running WinNT4.0 SR3, on a 166MHz Dell Pentium with 64MB RAM and
200MB swap. In addition to WinNT processes and mozilla, I am running
cdplayer.exe, MSACCESS, TASKMGR, MSDEV, CRT, and a notepad-like editor.
The crash occurs when 4-8 windows are opened in rapid succession (within
30 sec or so). The faster one opens windows, the sooner the error will
occur.
I was looking at nine web pages on CGI scripts, looking for the
slashdot source code. I had run a search on "slashdot.cgi" and opened
up a bunch of windows to view the results of the search. Seven of the
pages loaded, the final two hadn't loaded up fully at the time of crash.
Error was reported "Access violation".
Here is the stack trace:
ProcessCookiesAndTrustLabels(_ActiveEntry * 0x025ea8a0) line 3739 + 21 bytes
net_ProcessFile(_ActiveEntry * 0x025ea8a0) line 1300 + 9 bytes
NET_ProcessNet(PRFileDesc * 0x00000000, int 1) line 3334 + 13 bytes
net_process_slow_net_timer_callback(void * 0x00000000) line 216 + 9 bytes
wfe_ProcessTimeouts(unsigned long 13087538) line 303 + 12 bytes
FireTimeout(HWND__ * 0x001201b2, unsigned int 275, unsigned int 777, unsigned
long 13087538) line 60 + 9 bytes
USER32! 77e7128c()
CNetscapeApp::Run() line 1675 + 8 bytes
AfxWinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00000000, char *
0x00142595, int 1) line 52 + 11 bytes
WinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00000000, char * 0x00142595,
int 1) line 33 + 21 bytes
WinMainCRTStartup() line 330 + 57 bytes
KERNEL32! 77f1b304()
Here's the code snippit where MSVC says the crash occurred:
void ProcessCookiesAndTrustLabels( ActiveEntry *ce )
{
#define TEN_MINUTES (time_t)(10*60) /* 10 minutes in seconds */
unsigned int i;
TrustLabel *ALabel;
XP_List *TempTrustList;
if ( IsTrustLabelsEnabled() && ce && ce->URL_s) {
/*
* if the trust label parsing is enabled then look at each cookie
* and try to match it to a trust label on the trust list to see
* if one matches the cookie
*/
for(i=0 ;i < ce->URL_s->all_headers.empty_index; i++) {
/* look for a cookie field - allow Set-cookie: or Set-Cookie2: -
CASE INSENSITIVE COMPARE */
/* >> */ if(!PL_strncasecmp(ce->URL_s->all_headers.key[i],"Set-Cookie", 10))
{
NET_SetCookieStringFromHttp(CE_FORMAT_OUT,
ce->URL_s,CE_WINDOW_ID, ce->URL_s->address, ce->URL_s->all_headers.value[i]);
}
}
/* Snip */
It died on the if statement line. Here's what the variables were:
i = 0
ce->URL_s->all_headers.empty_index = 39256272
ce->URL_s->all_headers.key = 0x001e71e3
.key should be an array, or at least, that's how it's being used in the
above code snippit. But MSVC couldn't evaluate the pointer.
MSVC crashed. Reloaded MSVC, then launched Mozilla in debug mode, did a search
for slashdot.cgi again and started launching off the windows. Got five open
before it crashed.
Error was: "Unhandled exception in mozilla.exe: 0xC0000005: Access
Violation."
Crashed on the same line of the same function. This time variables are:
i = 0
ce->URL_s->all_headers.empty_index = 3722304989
ce->URL_s->all_headers.key = 0xdddddddd
Here's a few lines of relevant assembly code:
007dcd24 mov eax,dword ptr [eax+ecx*4]
007dcd27 push eax
007dcd28 call _PL_strncasecmp (00831644)
007dcd2d add esp,0000000c
007dcd30 test eax,eax
007dcd32 jne ProcessCookiesAndTrustLabels+000000af (007dcd6f)
3740: NET_SetCookieStringFromHttp(CE_FORMAT_OUT, ce->URL_s,
CE_WINDOW_ID, ce->URL_s->address, ce->URL_s->all_headers.value[i]);
The ce structure pretty much looks like its blank. There's a ton of
fields, like "window_chrome", "referer", "username", "password", etc.
etc. but nearly all of the fields are set to either 0xdddddddd "",
-572662307, 3722304989, or 221. The few that are set to particular
values:
ce->status = 1
ce->bytes_received = 16534
ce->socket = 0x00000000
ce->con_sock = 0x00000000
ce->local_file = 1
ce->memory_file = 0
ce->protocol = 12
ce->proto_impl = 0x0097d470
ce->con_data = 0x00d45f30
ce->exit_routine = 0x00792cc0 il_netgeturldone(URL_Struct_ *, int, MWContext_ *)
ce->window_id = 0x00c5a120
ce->format_out = 2
ce->save_stream = 0x00000000
ce->busy = 1
ce->proxy_conf = 0x00000000
ce->proxy_addr = 0x00000000
ce->socks_host = 0
ce->socks_port = 0
Some Debug output:
Created rdf:ht4
www.hax0r.org error=0 h_name=1 task=12
www.kalifornia.com error=0 h_name=1 task=13
sunsite.unc.edu error=0 h_name=1 task=14
hax0r.org error=0 h_name=1 task=15
First-chance exception in mozilla.exe: 0xC0000005: Access Violation.
My first impression of what's going on is:
- Mozilla, while idle, polls the sockets.
- Since net_calling_all_the_time_count != 0 (it's set to 5, in this
case)
- NET_ProcessNet is called, which allows multiple connections to be
processed simultaneously.
- ready_fd = NULL, so an attempt to find a socket is made
- a bunch of code is run to set up the socket (I think...)
- sockets ready for reading are processed one by one
- tmpEntry->busy is false, so processing proceeds
- ready_fd = 0, and since both tmpEntry->socket and tmpEntry->con_sock
are NULL, the else if statement is executed.
- The line
rv = (*tmpEntry->proto_impl->process)(tmpEntry);
evaluates to
rv = net_ProcessFile(tmpEntry);
and so net_ProcessFile is called.
- net_ProcessFile is called for the file "M1AIR9QS.GIF", which is a
picture of the USSR flag.
- con_data->next_state = 15, which is NET_FILE_DONE
- con_data->stream is non-zero, so the macro COMPLETE_STREAM is run.
- con_data->next_state is set to NET_FILE_FREE
- ProcessCookiesAndTrustLabels is called:
- Checks are made: trust labels is enabled, ce is non-null, and
ce->URL_s is non-null. All pass.
- A loop is made through all the headers (I think?)
- Loop runs from 0 to ce->URL_s->all_headers.empty_index, which equals
3722304989. Hmm. Here's the problem.
Okay, now to figure out a solution. Obviously this cookie code
shouldn't be called in some cases.
I don't think it's the pages themselves that's causing the crash, but
rather the strain of loading several heavy duty pages all at once.
Here are the exact pages I loaded:
http://harbor.ecn.purdue.edu/~jacoby/Slashdot_Mailer/
http://www.krazi.org/
http://www.stars.com/vlib/providers/cgi.html
http://www.hax0r.org/
http://www.icemall.com/free/free_perl_scripts.html
I'm going to load each one in turn and see if one page in particular is
causing the problem...
Nope. I loaded each page up in a single browser, with no crash. Then I
went nuts loading links into new browsers. I loaded up half a dozen
links out of the list on
http://www.icemall.com/free/free_perl_scripts.html and got the same
error that I've been having.
Tried again, this time only got four windows opened. I was a tad
slower in launching the windows this time.
Here's a possible fix. Change the code to look like this:
void ProcessCookiesAndTrustLabels( ActiveEntry *ce )
{
#define TEN_MINUTES (time_t)(10*60) /* 10 minutes in seconds */
unsigned int i;
TrustLabel *ALabel;
XP_List *TempTrustList;
if ( IsTrustLabelsEnabled() && ce && ce->URL_s
&& ce->URL_s->all_headers.empty_index != 0xdddddddd
) {
The code under the if statement shouldn't really be executed when
empty_index is set to such a large number. The program still crashes,
but in a different location, and it seems to allow more web pages to
load. I think this new way of crashing is not related to the fix I just
made, but I'm not certain. I'll submit it as a separate bug report once
I have more info on it.
Comment 1•26 years ago
|
||
That wouldn't work for an optimized build since 0xddddddd is inserted only by
the debugging code to find uninitialized or freed memory or something like that.
Updated•26 years ago
|
Assignee: morse → nisheeth
Comment 2•26 years ago
|
||
Looks like this might be related to bugs 324513 and 324098. Both of those bugs
are caused by the url struct being freed too early and it looks like that is
what is happening here as well. Assigning this one to Nisheeth since he already
has the other two.
Reporter | ||
Comment 3•26 years ago
|
||
I've been working on mkgeturl-related bugs for a while. Whomever accepts this
bug, please contact me, as I have further information and details on it.
Assignee | ||
Updated•26 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•26 years ago
|
||
Accepting bug. Bryce, thanks a lot for the detailed analysis you did for this
bug. Please continue to discuss this on this bug report. I am going to
try to fix this bug this week. Gagan Saksena (netlib), Steve Morse
(privacy), Pam Nunn (imagelib) and me sat around and decided that the root
cause of this problem was that the url struct was going away too early. We
are going to add a ref count to the url struct and use it selectively for that
cases that we know are causing crashes.
We welcome any comments you might have.
Reporter | ||
Comment 5•26 years ago
|
||
Because I've had Mozilla crash in several different modes, all seemingly
caused by bad URL_s structures, I believe there are at least two, and
perhaps three different places where the URL_s structure is being deallocated
early.
One of these places is the code called when a user closes a browser
window when many windows are open. My hypothesis is that when the
user has a lot of connections open and is keeping the browser very
busy, and then the user closes a window, the URL_s belonging to that
window is deallocated, but the notice of this lost window does not
reach all of the data structures using the pointer to that URL_s.
Finding where this occurs, and under what conditions, was too
time consuming for me. Here is the process I would use if I had the
time: Locate every area in the code where the URL_s or any component
of it is deallocated, deleted, freed, NULLed, etc. and put a TRACE
message there. Put TRACE messages into the places in the code that
remove the URL_s pointers from various data structures, as well.
Then recreate the crash by rapidly loading a number of
windows to different web sites into separate windows; once a dozen or so
windows are opened, begin closing windows one by one until the crash
occurs. Hopefully, the debug messages will help narrow down which
part of the code introduced the bad URL_s.
The reference counter approach would be a more comprehensive solution
though. Please let me know how this goes.
Comment 6•26 years ago
|
||
I just had the same crash immediately on clicking "Related" folder, however,
there's some interesting notes... I include the full stack backtrace :
ProcessCookiesAndTrustLabels(_ActiveEntry * 0x00b7c320) line 3836 + 21 bytes
net_ProcessFile(_ActiveEntry * 0x00b7c320) line 1366 + 9 bytes
NET_ProcessNet(PRFileDesc * 0x00000000, int 0x00000001) line 3365 + 13 bytes
net_process_slow_net_timer_callback(void * 0x00000000) line 240 + 9 bytes
wfe_ProcessTimeouts(unsigned long 0x036cddc8) line 303 + 12 bytes
FireTimeout(HWND__ * 0x0ec10368, unsigned int 0x00000113, unsigned int
0x00000309, unsigned long 0x036cddc8) line 60 + 9 bytes
USER32! 77e71373()
USER32! 77e9161f()
USER32! 77e923dc()
USER32! 77e9290a()
USER32! 77e91bd7()
USER32! 77e92679()
USER32! 77e914ec()
__crtMessageBoxA(char * 0x0012b14c, char * 0x1024d83c, unsigned int 0x00012012)
line 65
CrtMessageWindow(int 0x00000002, char * 0x008fa454, char * 0x0012c280, char *
0x00000000, char * 0x0012e2a4) line 520 + 22 bytes
_CrtDbgReport(int 0x00000002, char * 0x008fa454, int 0x00000052, char *
0x00000000, char * 0x00000000) line 419 + 76 bytes
AfxAssertFailedLine(char * 0x008fa454, int 0x00000052) line 39 + 20 bytes
XP_AssertAtLine(char * 0x008fa454, int 0x00000052) line 2692 + 13 bytes
makeNewAssertion(RDF_TranslatorStruct * 0x00ad9600, RDF_ResourceStruct *
0x00b7df40, RDF_ResourceStruct * 0x009de850, void * 0x00b7d160, unsigned short
0x0003, int 0x00000001) line 82 + 72 bytes
remoteStoreAdd(RDF_TranslatorStruct * 0x00ad9600, RDF_ResourceStruct *
0x00b7df40, RDF_ResourceStruct * 0x009de850, void * 0x00b7d160, unsigned short
0x0003, int 0x00000001) line 208 + 30 bytes
remoteAssert3(RDF_FileStruct * 0x00b34e90, RDF_TranslatorStruct * 0x00ad9600,
RDF_ResourceStruct * 0x00b7df40, RDF_ResourceStruct * 0x009de850, void *
0x00b7d160, unsigned short 0x0003, int 0x00000001) line 112 + 30 bytes
addSlotValue(RDF_FileStruct * 0x00b34e90, RDF_ResourceStruct * 0x00b7df40,
RDF_ResourceStruct * 0x009de850, void * 0x00b7d160, unsigned short 0x0003, char
* 0x008fac5c) line 604 + 44 bytes
addElementProps(char * * 0x0012f448, char * 0x036c2399, RDF_FileStruct *
0x00b34e90, RDF_ResourceStruct * 0x00b7df40) line 230 + 49 bytes
parseNextRDFToken(RDF_FileStruct * 0x00b34e90, char * 0x036c2398) line 379 + 27
bytes
parseNextRDFXMLBlobInt(RDF_FileStruct * 0x00b34e90, char * 0x03695ee8, long
0x000004b0) line 128 + 22 bytes
parseNextRDFXMLBlob(_NET_StreamClass * 0x00b7c5d0, char * 0x03695ee8, long
0x000004b0) line 146 + 17 bytes
net_CacheWrite(_NET_StreamClass * 0x00b7d2f0, char * 0x03695ee8, long
0x000004b0) line 1459 + 24 bytes
net_pull_http_data(_ActiveEntry * 0x00b4c7b0) line 3096 + 30 bytes
net_ProcessHTTP(_ActiveEntry * 0x00b4c7b0) line 3488 + 9 bytes
NET_ProcessNet(PRFileDesc * 0x00b4ca80, int 0x00000002) line 3365 + 13 bytes
NET_PollSockets() line 203 + 18 bytes
CNetscapeApp::OnIdle(long 0x0000007b) line 1831 + 5 bytes
CNetscapeApp::Run() line 1663 + 30 bytes
AfxWinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00000000, char *
0x00141e37, int 0x0000000a) line 52 + 11 bytes
WinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00000000, char * 0x00141e37,
int 0x0000000a) line 34
WinMainCRTStartup() line 330 + 54 bytes
See the AfxAssert there..? Guess what, it seems to have forked on "Netscape
plug-ins ÿ Downloads" or something the like, which I reported on another bug.
This assert _also_ happens around the same place where I reported r->url ending
up undefined (r freed) in the same bug. I hope the implications are pretty
clear, but as usual, I need to get to sleep... ;)
Assignee | ||
Updated•26 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → LATER
Assignee | ||
Comment 7•26 years ago
|
||
Latering this bug for now because the old Mozilla codebase is dead. The Aurora
pane and the cookies code will have to be re-implemented around NGLayout. We'll
check that this bug doesn't exist when these features have been re-implemented.
Updated•26 years ago
|
Status: RESOLVED → VERIFIED
Comment 8•26 years ago
|
||
verified later
Comment 9•23 years ago
|
||
LATER is deprecated per bug 35839.
Status: VERIFIED → REOPENED
Resolution: LATER → ---
Comment 10•23 years ago
|
||
.
Status: REOPENED → RESOLVED
Closed: 26 years ago → 23 years ago
Resolution: --- → WONTFIX
Comment 12•18 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•