Closed Bug 6890 Opened 25 years ago Closed 25 years ago

XML Parsing error (again) in the Message headers with 8-bit characters

Categories

(MailNews Core :: MIME, defect, P3)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: momoi, Assigned: nhottanscp)

References

()

Details

(Whiteboard: Regression: fails even M4 intl criteria)

** Observed with 5/2/99 Win32 build but not with 5/19/99 Win32 build ** I thought about re-opening Bug 4784 but since symptoms are bizzare, I decided to open a new one. The outward apperance of the problem is tha same as Bug 4784, I see an error message that there is an XML parsing error in the message pane headers of almost all the messages with 8-bit characters in them. So, here are some steps to reproduce this bug: 1. Choose a mailbox containing some 8-bit Subject headers (.e.g French, Japanese). 2. Throw away the summary file and start your 5/20 build. 3. Now view the Message oane headers. You may or may not see the XML parsing error. Now quit and re-start. 4. This time, if you did not see the parsing errors, you wil probably see them now. If did see the problem in step 3, you may not see them this time. 5. Try quitting and re-starting again. This time, the conditions in Step 4 may reverse. I have experienced flips flops as described in steps 3-5 several times. Enough to think that this is bizzare. But errors are there often enough that this should be fixed for M6. This is really a regression. For International Mail, we go back to pre M4 state. There are 2 types of errors: All 2-byte headers (e.g. Japanese produce XML parsing errors). Some 1-byte accented characters produce parsing errors while others simply misdisplay the accented characters without the parsing error. These are the specific problems when the errors occur. I tried the 5/19 build and followed the same steps but was not able to reproduce this bug.
QA Contact: 4080 → 1308
This fails our International Smoketest for M6. Please fix it for M6 -- unless it's a 1-day wonder which will go away.
Does 8-bit mean raw 8bit or MIME encoded? Only MIME encoded headers are supported currently. If we cannot see MIME encoded headers then that's a regression bug. The pref to send MIME encoded header is this. user_pref("mail.strictly_mime_headers", true);
My data messages contain almost all MIME-encoded headers. And they are showing this error. So indeed this is a regression.The symptoms are almost identical to 4784 except for the intermittent nature of the problem.
Whiteboard: Regression: fails even M4 intl criteria
Target Milestone: M6
** Re-checked with 5/23/99 Win32 build ** This problem is continuing with this new build and so this is not a 1-day problem. With this error we cannot show Japanese Message headers and of course headers in other languages. We canot possibly ship M6 with this. Setting TFV to M6 and copying choffmann.
Status: NEW → ASSIGNED
Can you send me the message that is showing this problem, That would be very helpful in my debugging. - rhp
Momoi san, I tested with windows/32bit/x86/1999-05-23-08-M6 also my local tree updated this morning but cannot see this problem. Let's check again today. Rich, I will send MIME encoded Latin1 mail using 4.5. FYI, easy way to try 8 bit char is (Alt+202 in the num pad).
I have several messages with this type of encoding in the headers and I'm not seeing the problem either. - rhp
I was able to reproduce this problem again in yesterday's build. As I said in the original report, the problem is more easily reproducible with Japanese headers than Latin 1 headers though I have observed it in both. One critical step is that you might need to try re-starting several times. Today I had to re-start 3 times before I got the parsing errors on the Japanese msgs which were showing the headers OK before these re-starts. The problem might go away after that and they also come back again when you try re-starting Messenger again and again. At times, I saw this problem right after I deleted the .msf file and re-started. Other times I didn't. There is some inconsistency in obvtaining the problem reproduced but it surely does occur.
Summary: XML Parsing error (again) in the Message headers with 8=bit characters → XML Parsing error (again) in the Message headers with 8-bit characters
This really sounds like a problem with the data I am receiving from the mail store and not a parsing error. What is the XML error you are seeing? I can probably tell from the message if you are getting valid data or not. - rhp
I made some images of the bug in action under NT4-Japanese, and placed them here: http://rocknroll/users/momoi/publish/bugs/6890/
I see one of the image http://rocknroll/users/momoi/publish/bugs/6890/image6.JPG contains string looks like ISO-2022-JP which should have been converted to UTF-8 instead.
Could any of this be related to ftang's fix that needs to be checked in. I know that there are other bugs (i.e. vCard display with 8 bit data) that will be fixed when he gets his stuff into the tree. I just can't get this thing to duplicate and I am a bit at the end of my rope on this one for now :-( - rhp
I said before that this problem is hard to reproduce on NT4-US. Well, no more. I tried the "Smoketest" mailbox file I sent to rhp on my laptop's NT4-US. I normally use the Japanese NT4 on this laptop but since I couldn't reproduce this problem on the NT4-US at work. I tried the laptop one and I had no problem reproducing this problem. There are some differences between the laptop NT4-US and the one I use at work on a desktop. One is that I have things like VC++ 4.2 on the work machine but not on my laptop. I any case, I got the problem to appear at the 3rd restart on this mailbox file (the same one I had sent to rhp): http://rocknroll/users/momoi/publish/bugs/6890/smoketest.zip Just select the first 2 msgs for display. If the problem doesn't show, then quit, re-start and try the same 2 msgs, etc. If you happen to be on a machine which has a problem, you should see the problem after 3-4 restarts and tries.
Assignee: rhp → nhotta
Status: ASSIGNED → NEW
Ok, I am seeing this problem now and after much tracking (thanks momoi for the test messages), I know where it is failing, but unfortunately, I don't know why. The line that is failing randomly is in comi18n.cpp and the exact line that fails is: res = ccm->GetUnicodeEncoder(&aCharset, &encoder); Now all of the arguments going into that routine are fine and I've stepped through the function and seen it work, so it is a random problem with the GetUnicodeEncoder() call. I have stepped through the code with the test messages in smoketest.zip and I've traced the first message being displayed correctly AND incorrectly. The reason XML parsing fails is that when we go to convert the string to UTF-8 and it fails, we end up escaping the wrong characters and the "<" and ">" are not in the correct location for the XML parser. This isn't good either and I will look into that, but it's not the real bug. The real bug is finding why we can't locate an encoder. The input charset is "iso-8859-1" and I am requesting "UTF-8" output. Naoki, I am assigning this one over to you because I don't know where to go next with this encoder/decoder problem. If you use the smoketest.zip and just run apprunner.exe in the debugger, you will eventually get where you see the problem. If I can be of any help, please let me know. - rhp For what it is worth, the stack at that point is: INTL_ConvertCharset(const char * 0x017ef48c, const char * 0x04498304, const char * 0x043d84f0, const int 15, char * * 0x017ef488) line 1234 MIME_ConvertString(const char * 0x017ef48c, const char * 0x04498304, const char * 0x043d84f0, char * * 0x017ef488) line 1378 + 34 bytes mime_convert_rfc1522(const char * 0x043df5a0, int 34, const char * 0x00000000, const char * 0x0449204c, char * * 0x017ef544, int * 0x017ef548, void * 0x043dd350) line 204 + 28 bytes MimeHeaders_convert_rfc1522(MimeDisplayOptions * 0x043dd1f0, const char * 0x043df5a0, int 34, char * * 0x017ef570, int * 0x017ef56c) line 901 + 35 bytes MimeHeaders_convert_header_value(MimeDisplayOptions * 0x043dd1f0, char * * 0x017ef5a0) line 117 + 27 bytes MimeHeaders_write_all_headers(MimeHeaders * 0x043df670, MimeDisplayOptions * 0x043dd1f0, int 0) line 2014 + 13 bytes MimeMessage_write_headers_html(MimeObject * 0x043dd160) line 591 + 21 bytes MimeMessage_close_headers(MimeObject * 0x043dd160) line 344 + 9 bytes MimeMessage_parse_line(char * 0x03e639d8, int 2, MimeObject * 0x043dd160) line 211 + 9 bytes convert_and_send_buffer(char * 0x03e639d8, int 2, int 1, int (char *, unsigned int, void *)* 0x044773c0 MimeMessage_parse_line(char *, int, MimeObject *), void * 0x043dd160) line 148 + 15 bytes mime_LineBuffer(const char * 0x00d84648, int 112, char * * 0x043dd188, int * 0x043dd190, unsigned int * 0x043dd198, int 1, int (char *, unsigned int, void *)* 0x044773c0 MimeMessage_parse_line(char *, int, MimeObject *), void * 0x043dd160) line 235 + 29 bytes MimeObject_parse_buffer(char * 0x00d842c0, int 1016, MimeObject * 0x043dd160) line 218 + 49 bytes mime_display_stream_write(_nsMIMESession * 0x043dd0d0, const char * 0x00d842c0, int 1016) line 282 + 20 bytes MimePluginInstance::Write(MimePluginInstance * const 0x043dd854, const char * 0x00d842c0, unsigned int 1016, unsigned int * 0x017efb58) line 379 + 20 bytes plugin_stream_write(_NET_StreamClass * 0x043dd030, const char * 0x00d842c0, long 1016) line 68 + 24 bytes net_read_file_chunk(_ActiveEntry * 0x043ddb30) line 956 + 27 bytes net_ProcessFile(_ActiveEntry * 0x043ddb30) line 1327 + 9 bytes NET_ProcessNet(PRFileDesc * 0x00000000, int 1) line 3355 + 13 bytes nsNetlibThread::NetlibMainLoop() line 304 + 9 bytes nsNetlibThread::NetlibThreadMain(void * 0x00c057f0) line 260 _PR_NativeRunThread(void * 0x00c05600) line 379 + 13 bytes _threadstartex(void * 0x00c05450) line 212 + 13 bytes
I have tried (7 times) to reproduce this with the same data Rich used with my local build. Adding cata@netscape.com to cc since he owns the encoder code. I will continue try more to reproduce this. But could someone also try this with today's build?
I looked at 5/25/99 M7 build which presumably includes ftang's fixes from yesterday. No change there. I can reproduce the problem on my NT4-J.
It's not reproducable on my machine niether Cata's. I put printf to debug with momoi's machine (it's about 20-30% occurrence on that machine). The result confirms the Rich's comment. It is failing to create Encoder. Input charset is "UTF-8" and the converter manager return value is 80500001 (NS_ERROR_UCONV_NOCONV). Neither service manager or decoder creation fails. nsCharsetConverterManager::GetCharsetConverter in file nsCharsetConverterManager.cpp returns this result code. I need Cata's help to debug this code. There is a service manager related issue (i.e. NS_WITH_SERVICE) but this looks like not related to this bug at least. The bug happens even without using NS_WITH_SERVICE. By the way, earliest we can see this bug is 5/18 build. And it not possible to reproduce on some machines and even if reproducable it's not always happen (like 20-30%). I will look at this again tomorrow inside the encoder with Cata's help.
Assignee: nhotta → cata
I was able to dump the internal table of the converter manager. Looks like the table got screwed after the 17th entry. And no entry for UTF-8. Reassign to cata@netscape.com for further investigation. #####FAILED: GetCharsetConverter result = -2142240767 80500001 aSize = 42 0 ISO-8859-1 1 ISO-8859-2 2 ISO-8859-3 3 ISO-8859-4 4 ISO-8859-5 5 ISO-8859-6 6 ISO-8859-7 7 ISO-8859-8 8 ISO-8859-9 9 windows-1250 10 windows-1251 11 windows-1252 12 windows-1253 13 windows-1254 14 windows-1257 15 x-mac-roman 16 x-mac-ce 17 x-mac-ce 18 x-mac-ce 19 x-mac-ce 20 x-mac-ce 21 x-mac-ce 22 x-mac-ce 23 x-mac-ce 24 x-mac-ce 25 x-mac-ce 26 x-mac-ce 27 x-mac-ce 28 x-mac-ce 29 x-mac-ce 30 x-mac-ce 31 x-mac-ce 32 x-mac-ce 33 x-mac-ce 34 x-mac-ce 35 x-mac-ce 36 x-mac-ce 37 x-mac-ce 38 x-mac-ce 39 x-mac-ce 40 x-mac-ce 41 x-mac-ce #####FAILED to get encoder Charset = UTF-8 result = -2142240767 80500001
Target Milestone: M6 → M7
Kat mentionned the problem that occurs in Message header with 8-bit chars. I'm encountering (again, it wasn't there for a while) the problem with 8-bit Message body. On my machine message with 8-bit chars is showing garbage in the body in viewing (sending is alright)The confusing thing and that's why it is a random problem is that i see 8-bit message from Kat's Smoketest folder just fine.There is no problem in viewing this message in 4.6 ****** observed with 5/26, 5/27 and 5/28 builds ******
Cata, please check in your part of the fix then reassign to dp.
Assignee: cata → dp
My part of fix is in, reassigned to dp to investigate the FindFactory failure.
Assignee: dp → nhotta
I would prefer to open another bug on the FindFactory() failing part and make this depend on it if that is ok.
Depends on: 7308
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Marking this as fix. With Cata's fix, there are going to be very little chance to see the current problem (unless FindFactory for UTF-8 converter fails). Dp, please open the new bug for FindFactory().
Status: RESOLVED → VERIFIED
** Checked with 6/11/99 Win32 build ** I have started and re-started Messenger about 10 times on the NT4 machine I had the original problem with. In none of these 10 times, I saw the original XML parser problem. Prior to the fix, I would have seen this problem at about the 3rd or 4th try. This result gives me confidence to say that the fix works. Marking it verified fixed.
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.