Closed Bug 27736 Opened 25 years ago Closed 25 years ago

"Save Page As..." causes deadlock.

Categories

(Core :: Networking, defect, P3)

x86
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: mbp, Assigned: warrensomebody)

Details

(Whiteboard: 3d)

Attachments

(1 file)

"Save Page As..." results in a Progress dialog which never progresses. I have observed this with both M13 and the latest source (feb 14). Saving very small files (size < 8k) might work. I haven't tried. The problem is that once the reading thread is suspended, it never resumes. Even though nsFileTransportService::Resume() is called for that thread. In netwerk/base/src/nsFileTransportService.h NS_FILE_TRANSPORT_WORKER_COUNT is #defined to be 1. When I change this to 2 the problem goes away. As the saving thread never gets suspended (it merely waits on a monitor), it seems that the worker count needs to be at least two. Indeed, it looks as if the value was once four. So perhaps there is some other problem that was solved by setting it to one.
warren knows more about NS_FILE_TRANSPORT_WORKER_COUNT
Assignee: gagan → warren
The count has nothing to do with the deadlock. I'm sure that if you did multiple save-as's you could still deadlock. I'll investigate.
True. If N "Save As"'s use the same nsFileTransportService then it will require a NS_FILE_TRANSPORT_WORKER_COUNT of N+1 to guarantee that no deadlock occurs. But as long as the writing thread blocks inside Process() (actually in ReadSegments(), iirc) when the buffer is empty no fixed amount of threads will ever be enough for an arbitrary number of "nsFileTransport"s. If each Save As had its own nsFileTransportService, this would not be a problem.
Keywords: beta1
Target Milestone: M14
Putting on the PDT+ radar for beta1.
Whiteboard: [PDT+]
Need ETA on fix in Status Whiteboard please.
I'm not seeing what mbp describes. When I try to reproduce it, I get this: ************************************************************ ** NOTE: This report will only be printed in DEBUG builds.** * Call to xpconnect wrapped JSObject produced this error: * [Exception... "[JS Error: "TypeError: elem.childNodes[0] has no properties" {file: "chrome://global/content/downloadProgress.js" line: 67}] [nsIObserver::Observ e]" nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)" location: "<unknown>" data: yes] ************************************************************ d:\seamonkey-nsgetfactoryargs\mozilla\xpfe\components\xfer\src\nsStreamXferOp.c pp 302: Observe failed, rv=0x80570021 Also, I tried turning on file transport logging, and I could see that all of the file was getting downloaded. Must be that the js error is causing things to bail. Reassigning to law.
Assignee: warren → law
I still get the problem with the current source (cvs checkout about an hour ago). But only if the file to be saved is bigger than 8kb (the size of one segment in the segmented buffer). And increasing the number of worker threads does make this specifiv problem go away although it may not solve the real problem.
I've got a fix for the JS error. This seems to make it work properly for me. I'm still unsure about the nature of the problem mbp is reporting. With the JS error present, it may be that the progress dialog doesn't properly display the download progress but the download completes. Are you seeing something different than that? How are you observing this "deadlock"?
Status: NEW → ASSIGNED
The deadlock I see is real and not just in the dialog. The file really isn't written. It could be a Linux-only problem though - I haven't tried it on other platforms. The problem is also present in the "official" M13 binary, where no JS-related error messages appear. I looked at this in some detail a few weeks ago, and I found out that the writing thread is blocking in nsPipe::nsPipeInputStream::Fill() in mozilla/xpcom/io/nsPipe2.cpp The exact line is "rv = mon.Wait()". It is waiting for a notification from the reading thread. But that thread left its nsFileTransport::Process() when it had filled one buffer segment - and it does not get restarted. NS_FILE_TRANSPORT_WORKER_COUNT is set to one, and that place seems to be held by the writer doing a "mon.Wait()". That (I think) is why the problem goes away when I increase the thread count.
I'll check in my JS fix. Then I'll try again to reproduce this. It sounds like a networking component threading issue, however. After I've cleaned up the JS error, maybe I can convince Warren to take another look.
Here's my JS fix. I was unable to reproduce the problem in my Linux build today. What hardware/software configuration are you running? I'm testing on a dual Pentium-Pro so it might be threading/timing things are different for me. Index: downloadProgress.js =================================================================== RCS file: /cvsroot/mozilla/xpfe/components/xfer/resources/downloadProgress.js,v retrieving revision 1.7 diff -r1.7 downloadProgress.js 66c66,72 < if ( elem ) { --- > if ( elem > && > elem.childNodes > && > elem.childNodes[0] > && > elem.childNodes[0].nodeValue ) { 68,71c74,76 < } < // If missing, use "?" instead. < if ( !dialog.strings[ stringId ] ) { < dialogs.strings[ stringId ] = "?"; --- > } else { > // If unable to fetch string, use an empty string. > dialog.strings[ stringId ] = "";
I'm downloading the latest nightly build to make sure that this is not just a problem in my build. But I don't think it is, as I also have the problem with M13. Also, I just found out that the problem only occurs if the page I try to "Save As ..." is a local file (file://... ). If it is from the network, there is no deadlock.
Tried the latest nightly build. Same problem.
Well, my debug build doesn't exhibit the problem (I tried file: urls, too). Is anybody else seeing this? Is it something peculiar to your hw/sw configuration, I wonder? I'm sending this back to Warren. If there is indeed a problem, it would be his (aside from the JS error that my patch resolves).
Assignee: law → gagan
Status: ASSIGNED → NEW
I have observed the problem on two quite different RedHat 6.1 systems. (When trying to "Save As" file://... url's bigger than 8k). I don't think there is anything special about either of those machines.
I have now also observed the problem on a W2K machine using the latest nightly build.
law: did you not checkin your fixes? I am not seeing them in the latest version of progressDialog.js. cc'ing warren.
->dougt wants to volunteer... :)
Assignee: gagan → dougt
I can reproduce this problem. investigating.
Status: NEW → ASSIGNED
Whiteboard: [PDT+] → [PDT+] eta 3-3
Whiteboard: [PDT+] eta 3-3 → [PDT+] Fix being reviewed.
fix is to add dymanic thread allocation to the thread pool and change file transport to pass a real min and max thread count. Danm reviewed. Since three are alone two customers of threadpools, you can verfiy that I did not break anything by trying ftp, or read/write to a file. The bug here is that if you open a page via a "file:" url, you can not "save as..."
fixes checked in.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
had to backout. problems occured on linux. more work is needed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [PDT+] Fix being reviewed. → [PDT+] eta 3/11
removing ETA since I am not currently working on this. I also thing that we could punt on this for beta1.
Keywords: relnote
Whiteboard: [PDT+] eta 3/11
PDT-. Please release note
Whiteboard: [PDT-]
Attached patch Thread Pool changes (deleted) — Splinter Review
Target Milestone: M14 → M16
warren was looking at this last week.
Assignee: dougt → warren
Status: REOPENED → NEW
Keywords: beta1beta2
Whiteboard: [PDT-] → 3d
Keywords: nsbeta2
I checked in a modified version of these thread pool changes last night.
Status: NEW → RESOLVED
Closed: 25 years ago25 years ago
Keywords: beta2
Resolution: --- → FIXED
verified Linux 2000053108
Status: RESOLVED → VERIFIED
-relnote, fixed.
Keywords: relnote
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: