Closed Bug 4463 Opened 26 years ago Closed 25 years ago

Default encoding for XUL/XML/RDF should use UTF-8

Categories

(Core :: XML, defect, P1)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: rchen, Assigned: nisheeth_mozilla)

References

Details

I seperate the default encoding to UTF-8 from bug 4431.
Assignee: trudelle → danm
Severity: normal → enhancement
changing severity to enhancement, assigning to danm
Peter, this is not an enhancement, check this reference. http://www.w3.org/TR/1998/REC-xml-19980210#charencoding We have as part of M4 a goal of pseudo localizing XUL into Japanese, and this is a blocking issue. Please, modify the priority and set the TFV M4
QA Contact: 4015 → 4338
Peter, The fix should be in M4 and severity is major. Thank you.
Assignee: danm → hyatt
Severity: enhancement → major
Target Milestone: M4
Okay. reassigning to hyatt for m4
I have no idea what this means. Isn't nsString unicode? What is the problem?
nsString is UTF-16 Unicode which is a 16-bit value for all characters. But the XML is likely to be be UTF-8. If an XML file does not contain: <?xml encoding='IANA-charset-name'?> then by default the encoding is either UTF-16 or UTF-8. For most interesting cases, each UTF-16 character is a fixed 2-byte quantity, but these can be affected by endian-ness. So you need to check if the file starts with a Byte Order Mark (BOM). It will either be FEFF or FFFE, depending on the endian-ness. If there's a BOM, the data is UTF-16. Otherwise it is UTF-8 which is a byte-stream and unaffected by endian-ness. ASCII is a proper subset of UTF-8 and each character is represented as 1 byte. Other character sets are encoded as multiple bytes per character (e.g., accented characters). We plan to create all of the 5.0 XUL files in UTF-8, so we need to convert from UTF-8 to UTF-16. For M4 we need UTF-8 default supported. Later we will need UTF-16 and the <?xml encoding='IANA-charset-name'?> supported. But that is another bug: 4431
Is this only XUL's problem? Are XML and HTML supporting UTF-8?
I'm trying to figure out if this problem is with the XML parser or if it's with the XUL content sink (or both).
I think its the XML parser. It needs to parse: <?xml encoding='IANA-charset-name'?> as spec'd in http://www.w3.org/TR/1998/REC-xml-19980210#charencoding. (As it did in pre-5.0 browsers, HTML needs to support UTF-8 if specified, but HTML defaults to ISO-8859-1 not UTF-8. Our implementation is more complicated because we take into account user settings too.) Another thing that affects both XML and HTML is the HTTP Content-Type, e.g., Content-Type: text/xml; charset=UTF-8 or Content-Type: text/html; charset=UTF-8 This needs to be parsed by netlib for both HTML and XML. ftang is working w/gagan and rickg on this. For M4, we need the default XML behavior. For M4, we don't need the above HTTP header and the XML <?xml encoding ...> parsing.
Be advised that we are supposed to switch to XPAT by M4. One thing to try is recompile the client with xpat turned on. Add Nisheeth to CC list for comments.
Be advised that we are supposed to switch to XPAT by M4. One thing to try is recompile the client with xpat turned on. Add Nisheeth to CC list for comments.
As an additional fact, UTF-8 HTML display is working on the current (4/1 build) as long as the Meta-Equiv Content-Type header icnludes "charset = utf-8".
Assignee: hyatt → ftang
Hyatt, I hope you don't mind I reassign this to myself . I have fix that in nsParser (approved by rickg) to use UTF-8 as default charset for RDF, XML, or XUL. Check in as mozilla/htmlparser/src/nsParser.cpp 3.81 I verify this w/ my psueod l10n file. The button show up correctly. Howerver, the menu still display garbage, but this is a seperate issue. Let's put the menu display problem into a seperate bug.
Status: NEW → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
I re-open the bug because there is something wrong with the newer builds. Here are the results with UFT8 JA pseudo navigator.xul on two OS machines: Build 04-07-11 JA- NT : menu, buttons, status bar display fine. US- NT with J fonts: buttons, status bar display fine but not menu (it shows ????). The result is similar to MAC. Build 04-08-10 04-09-12 and 04-09-16 JA- NT : Only display frame window, no menu, buttons, status bar. US- NT with J fonts: Only display frame window, no menu, buttons, status bar.
Target Milestone: M4 → M5
Moving to M5, we need to investigate more and it shouldn't be a show stopper
Assignee: ftang → nisheeth
Status: ASSIGNED → NEW
ToNewCString in nsExpatTokenizer::ConsumeToken cause the damange of the data.It damange the Unicode data which already get conveterted (by assuming UTF-8) Nisheeth, please fixed it ASAP, None of our XUL/XML/RDF work without this.This is a blocker for L10N and pseudo L10N
QA Contact: 4338 → 4475
This is also blocking viewing message headers (2671).
Resolution: FIXED → ---
This has been reopened because of the switch to expat. Clearing resolution FIXED.
Status: NEW → ASSIGNED
Component: XUL → XML
Accepting bug. Setting component to XML...
Status: ASSIGNED → RESOLVED
Closed: 26 years ago25 years ago
Resolution: --- → FIXED
The fix is checked in. Expat now accepts unicode buffers.
*** Bug 4431 has been marked as a duplicate of this bug. ***
*** Bug 5262 has been marked as a duplicate of this bug. ***
Status: RESOLVED → VERIFIED
Verified on Japanese NT4.0.
You need to log in before you can comment on or make changes to this bug.