Closed Bug 29044 Opened 25 years ago Closed 25 years ago

Mac SSL: pictures not loading properly over https

Categories

(Core :: Networking, defect, P3)

PowerPC
All
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: mw, Assigned: rpotts)

References

Details

(Whiteboard: [PDT+] 2/28/00 (patch works on Mac))

Attachments

(1 file)

Symptoms: When loading a page via https, pictures at first seem to load, then after some time elapses, the pictures disappear to be replaced by their ALT text. What I know thus far: - The data is passed via PSM to the browser using a local socket connection. - Cartman properly closes the socket from its end using PR_Close/OTSndOrderlyDisconnect. However, the browser does not respond properly to the orderly disconnect. - This is somewhat perplexing, because when NSPR sockets are created on the Mac, they are automatically attached to a notifier routine (called, appropriately enough, NotifierRoutine) which responds to T_ORDREL (the event that happens when the peer closes the socket normally) by calling OTRcvOrderlyDisconnect. However, OT Session Watcher reports that no such call was made. - Doug Turner indicated to me earlier today that the SSL socket code in Necko pushes a new IO layer onto the NSPR socket; I am beginning to wonder if that is causing problems. I plan to investigate further this evening. This will take me an indeterminate amount of time, because I am not as familiar with the Necko or socket code as anyone who has worked in the code.
Attaching to 19119, asking for PDT eval.
Blocks: 19119
Keywords: beta1
Adding warren to the CC list in case he can chime in with pointers or suggestions. Thanks, Jim
Putting on PDT+ radar for beta1. Need est to fix in Status Whiteboard ASAP please.
Whiteboard: [PDT+]
I put in a date of 2/25 for the fix, but that is a patently untrue date. The real truth is that this is a nondeterministic, low-level bug, probably in NSPR. I am having difficulty tracking this down, and do not have any idea when it will be done.
Whiteboard: [PDT+] → [PDT+] 2/25
Since I have heard a whole lot of nothing here, I thought I'd add a few more data points. NSPR is, as far as I can tell, properly returning an EOF when it is done reading data from the socket. I have traced the code back into the level below nsSocketTransport::doRead; I am in the process of determining where this code is coming from. In any case, this intermediate layer is conflating the EOF with other data that had previously been read, returning the whole lump and pretending to be readable. Since there is no more data to be read, an error condition comes in at poll time, and the socket is closed with an error, resulting in no pictures. Does this jog anyone's memory?
[Adding wtc since he will be able to confirm/deny NSPR behavior] I want to point your collective attention to the following lines: (A) http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSocketTransport.cpp#9 46 (B) http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsPipe2.cpp#659 (C) http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSocketTransport.cpp#9 58 (D) http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsPipe2.cpp#676 nsSocketTransport calls the underlying pipe's WriteSegment method at line 946 (A) above. The WriteSegment call loops, calling the reader method at nsPipe2.cpp (B). In the case of the picture I am testing, it loops twice: once to read the picture data, the second picks up EOF from the socket as it should. However, the EOF state is not propagated up, apparently due to the first read call returning some number of bytes. nsSocketTransport incorrectly assumes (at (C)) that the socket would block and presumably goes back to poll on the socket. However, I believe PR_Poll will return an error (POLLHUP) on the socket since EOF has been reached. This would cause the picture to go away. Now, I have confirmed (at (D)) that the pipe state is properly set to NS_BASE_STREAM_CLOSED. What would be the best way to propagate this state up to nsSocketTransport, given that the zero-length read is not being passed up? Let me know what the right fix is here.
Rick Potts needs to look at this.
This sucks! It sounds like we need to be propagating the NS_BASE_STREAM_CLOSED result out through WriteSegments(...) so the socket transport knows not to put the socket back in the poll list. I guess the other option is to add some getter on nsIBufferInputStream that returns whether the stream has reached eof... Any idea why this is only a problem on the Mac? It appears that on the other platforms, doing the extra poll to detect EOF is working... -- rick
I am not certain why PR_Poll has a problem with this only on the Mac -- wtc or gordon might have a better idea there. I believe that the SSL socket code is the only place in the Seamonkey code where a new IO layer is pushed onto a PRFileDesc (PR_PushIOLayer), so for all I know that could affect PR_Poll results in this case. In any case, Rick seems to know at least as much as anyone else what the proper fix is here, so I will assign this bug to him. Feel free to reassign back to me if you think there is something else happening at a lower level or in SSL that I need to look at. Likewise, if you have a potential fix available, I'll be all too happy to drop it into my tree and try it out. (Simon Fraser just switched the builds to use the correct NSPR tag, so PSM should work in tomorrow's Mac commercial builds.) If it helps, the URL of the image I was using for testing is (https://www.waiter.com/wwwsys/images/mywtrlogo.gif).
Assignee: mwelch → rpotts
Good detective work Mark. Rick and I discussed this and think we have a solution. What's going on is that the pipe model is that as you read from a pipe, the EOF marker logically sits at the end of the data. And if you read multiple times while at EOF, you'll just get EOF back each time. However, sockets don't work this way -- if you try and put them back in the select set and they've been closed, you'll get an error, not EOF again. What was happening was that we would do a read, and the underlying socket would get EOF, but there would still be data in the pipe. Since we wanted to be fair to other clients, we would only take some of the data. Since the pipe didn't report EOF, we were rescheduling the socket with the select set for subsequent reading -- but the socket had already been closed. Hence the HUP. So basically, once we've detected EOF when reading from a socket, we have to remember that fact in the socket transport so that we don't end up putting the socket back in the select set again later. We can do this by noting that fact in the (up until now unused) OnClose callback on the socket transport. BTW, this is the mac manifestation of the same bug we had on linux some weeks ago that I tried to fix by handling POLL_HUP, bug 23778.
Whiteboard: [PDT+] 2/25 → [PDT+] 2/28/00
Hey Mark, I'm attaching a patch which should fix the problem. Basically, I pass a structure into nsBufferInputStream::WriteSegments(...) which allows me to capture the EOF inside of nsReadFromSocket(...) and propagate it back out to nsSocketTransport::doRead(...). If the EOF flag is set, the transport moves into the "done" state immediately (without putting the FD back on the POLL list one last time :-) Let me know if this fixes your problem... -- rick
Attached patch Patch to nsSocketTransport.cpp (deleted) — Splinter Review
The patch looks like the right solution based on my understanding. I'll have to try it out tomorrow morning when I get in front of my Mac. Many thanks.
Just got in, will try patch.
Whiteboard: [PDT+] 2/28/00 → [PDT+] 2/28/00 (waiting on feedback re: patch)
Whiteboard: [PDT+] 2/28/00 (waiting on feedback re: patch) → [PDT+] 2/28/00 (patch works on Mac)
Patch appears to work. Managed to load several pages, and the pictures all appeared. Many thanks. Is there any way to check this in tonight? PDT seems to really want this (and all other Mac SSL-related changes) in tomorrow's build.
checked in.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
verified: Mac9.0 03/07/00 build
Status: RESOLVED → VERIFIED
OS: All
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: