Closed
Bug 4463
Opened 26 years ago
Closed 25 years ago
Default encoding for XUL/XML/RDF should use UTF-8
Categories
(Core :: XML, defect, P1)
Core
XML
Tracking
()
VERIFIED
FIXED
M5
People
(Reporter: rchen, Assigned: nisheeth_mozilla)
References
Details
I seperate the default encoding to UTF-8 from bug 4431.
Updated•26 years ago
|
Assignee: trudelle → danm
Severity: normal → enhancement
Comment 1•26 years ago
|
||
changing severity to enhancement, assigning to danm
Peter, this is not an enhancement, check this reference.
http://www.w3.org/TR/1998/REC-xml-19980210#charencoding
We have as part of M4 a goal of pseudo localizing XUL into Japanese, and this is
a blocking issue. Please, modify the priority and set the TFV M4
Peter,
The fix should be in M4 and severity is major.
Thank you.
Updated•26 years ago
|
Assignee: danm → hyatt
Severity: enhancement → major
Target Milestone: M4
Comment 4•26 years ago
|
||
Okay. reassigning to hyatt for m4
Comment 5•26 years ago
|
||
I have no idea what this means. Isn't nsString unicode? What is the problem?
nsString is UTF-16 Unicode which is a 16-bit value for all characters.
But the XML is likely to be be UTF-8.
If an XML file does not contain:
<?xml encoding='IANA-charset-name'?>
then by default the encoding is either UTF-16 or UTF-8. For most interesting
cases, each UTF-16 character is a fixed 2-byte quantity, but these can be
affected by endian-ness. So you need to check if the file starts with a
Byte Order Mark (BOM). It will either be FEFF or FFFE, depending on the
endian-ness.
If there's a BOM, the data is UTF-16. Otherwise it is UTF-8 which is a
byte-stream and unaffected by endian-ness. ASCII is a proper subset of
UTF-8 and each character is represented as 1 byte. Other character sets
are encoded as multiple bytes per character (e.g., accented characters).
We plan to create all of the 5.0 XUL files in UTF-8, so we need to convert
from UTF-8 to UTF-16.
For M4 we need UTF-8 default supported. Later we will need UTF-16 and the
<?xml encoding='IANA-charset-name'?>
supported. But that is another bug: 4431
Comment 7•26 years ago
|
||
Is this only XUL's problem? Are XML and HTML supporting UTF-8?
Comment 8•26 years ago
|
||
I'm trying to figure out if this problem is with the XML parser or if it's with
the XUL content sink (or both).
I think its the XML parser. It needs to parse:
<?xml encoding='IANA-charset-name'?>
as spec'd in http://www.w3.org/TR/1998/REC-xml-19980210#charencoding.
(As it did in pre-5.0 browsers, HTML needs to support UTF-8 if specified,
but HTML defaults to ISO-8859-1 not UTF-8. Our implementation is more
complicated because we take into account user settings too.)
Another thing that affects both XML and HTML is the HTTP Content-Type, e.g.,
Content-Type: text/xml; charset=UTF-8
or
Content-Type: text/html; charset=UTF-8
This needs to be parsed by netlib for both HTML and XML. ftang is working
w/gagan and rickg on this.
For M4, we need the default XML behavior. For M4, we don't need the above HTTP
header and the XML <?xml encoding ...> parsing.
Comment 10•26 years ago
|
||
Be advised that we are supposed to switch to XPAT by M4. One thing to try is
recompile the client with xpat turned on.
Add Nisheeth to CC list for comments.
Comment 11•26 years ago
|
||
Be advised that we are supposed to switch to XPAT by M4. One thing to try is
recompile the client with xpat turned on.
Add Nisheeth to CC list for comments.
Comment 12•26 years ago
|
||
As an additional fact, UTF-8 HTML display is working on the
current (4/1 build) as long as the Meta-Equiv Content-Type
header icnludes "charset = utf-8".
Updated•26 years ago
|
Assignee: hyatt → ftang
Comment 13•26 years ago
|
||
Hyatt, I hope you don't mind I reassign this to myself . I have fix that in
nsParser (approved by rickg) to use UTF-8 as default charset for RDF, XML, or
XUL. Check in as mozilla/htmlparser/src/nsParser.cpp 3.81
I verify this w/ my psueod l10n file. The button show up correctly. Howerver,
the menu still display garbage, but this is a seperate issue. Let's put the menu
display problem into a seperate bug.
Updated•26 years ago
|
Status: NEW → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 14•26 years ago
|
||
I re-open the bug because there is something wrong with the newer builds. Here
are the results with UFT8 JA pseudo navigator.xul on two OS machines:
Build 04-07-11
JA- NT : menu, buttons, status bar display fine.
US- NT with J fonts: buttons, status bar display fine but not menu (it shows
????). The result is similar to MAC.
Build 04-08-10 04-09-12 and 04-09-16
JA- NT : Only display frame window, no menu, buttons, status bar.
US- NT with J fonts: Only display frame window, no menu, buttons, status bar.
Comment 15•26 years ago
|
||
Moving to M5, we need to investigate more and it shouldn't be a show stopper
Updated•25 years ago
|
Assignee: ftang → nisheeth
Status: ASSIGNED → NEW
Comment 16•25 years ago
|
||
ToNewCString in nsExpatTokenizer::ConsumeToken cause the damange of the data.It damange the Unicode data which already get conveterted (by assuming
UTF-8)
Nisheeth, please fixed it ASAP, None of our XUL/XML/RDF work without this.This is a blocker for L10N and pseudo L10N
Comment 17•25 years ago
|
||
This is also blocking viewing message headers (2671).
Comment 18•25 years ago
|
||
This has been reopened because of the switch to expat.
Clearing resolution FIXED.
Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Component: XUL → XML
Assignee | ||
Comment 19•25 years ago
|
||
Accepting bug. Setting component to XML...
Assignee | ||
Updated•25 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 26 years ago → 25 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 20•25 years ago
|
||
The fix is checked in. Expat now accepts unicode buffers.
Assignee | ||
Comment 21•25 years ago
|
||
*** Bug 4431 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 22•25 years ago
|
||
*** Bug 5262 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 23•25 years ago
|
||
Verified on Japanese NT4.0.
You need to log in
before you can comment on or make changes to this bug.
Description
•