Closed
Bug 79928
Opened 24 years ago
Closed 23 years ago
Zero-width space (unicode x200b) not interpreted
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: samuel, Assigned: waterson)
References
Details
(Keywords: fonts, platform-parity, Whiteboard: INVALID/WONTFIX?)
Attachments
(8 files)
(deleted),
text/html
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
waterson
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
image/gif
|
Details | |
(deleted),
image/gif
|
Details |
The unicode character #x200b, which is supposed to be a zero-width space, is not
interpreted by mozilla, and whatever character happens to be or not be at that
position in the current font gets displayed.
What should happen is that mozilla interprets the character and lays out the
text accordingly and doesn't use the font.
Reporter | ||
Updated•24 years ago
|
Reporter | ||
Comment 1•24 years ago
|
||
Reporter | ||
Comment 2•24 years ago
|
||
As demonstrated by the testcase, it does cause a line break, so it just needs to
not display the font character.
Reporter | ||
Comment 3•24 years ago
|
||
Comment 4•24 years ago
|
||
why? If the font claims to have a glyph for U+200B, why shouldn't we use it?
I think this is INVALID.
Comment 5•24 years ago
|
||
This is marked "OS: All" but it works for me on Windows 2000. It could be a
Linux-only bug, but more likely it's just an error in the font.
Comment 6•24 years ago
|
||
Me again. If it is a bug in the font and there are fonts on Linux which don't
have the bug, then one possible solution is to simply put the non-broken fonts
at the front of the preferred font list in the CSS.
I tried this on Exceed and apparently none of my fonts have this character, so
it displayed as a question mark. The same should happen if none of my fonts had
a normal space (U+0020) or any of the other spacing characters around U+200x.
Keywords: pp
Whiteboard: INVALID/WONTFIX?
Reporter | ||
Comment 7•24 years ago
|
||
On Windows 98 it gives a question mark. Isn't the point of this character code
to have no visible representation? I think having the font indicate what
character it is, is good for debugging applications that aren't supporting it
properly.
Comment 8•24 years ago
|
||
The point is there isn't anything to "support". It's up to the font to have a
codepoint for U+200B, just like it is for U+0020 and U+2009. U+200B is not
special in any way.
Note that when we ship with MathML, we will have to ship with Math fonts, so
that would be a good time to ensure that we also ship with a font that has a
U+200B glyph.
Assignee | ||
Comment 9•23 years ago
|
||
But shipping fonts is a download headache at best, and has licensing issues at
worst. Ought we simply hack around it for now? Seems reasonable to me, anyway.
Assignee: karnaze → waterson
Keywords: patch
Comment 10•23 years ago
|
||
Sounds reasonnable to me too for now. I have been wandering in the area
recently. The best spot for the fix-up patch is probably IS_DISCARDED(_ch)
in nsTextTransformer.cpp.
Reporter | ||
Comment 11•23 years ago
|
||
Assignee | ||
Comment 12•23 years ago
|
||
Comment on attachment 49397 [details] [diff] [review]
better patch?
sr=waterson
Attachment #49397 -
Flags: superreview+
Comment 13•23 years ago
|
||
r=rbs, didn't test - assuming it has been tested
(I tried using the checkbox from the patch manager for the first time, but got a
red alert that "one of the statuses wasn't valid" -- is there any link to a help
for these new things?)
Reporter | ||
Comment 14•23 years ago
|
||
Hang on, the second patch doesn't seem to work, I'll look at it some more.
(The first patch does work, btw)
Reporter | ||
Comment 15•23 years ago
|
||
The second patch doesn't make sense. We don't want the character discarded or
it won't cause the line break. Also, the macro doesn't seem to handle big
characters or something. So, back to the first patch unless someone has a
better idea.
Whiteboard: INVALID/WONTFIX?
Comment 16•23 years ago
|
||
Just having a font is the cleanest way to handle the problem since nsTextFrame
is not the only place where strings are drawn within Mozilla. We have been
bitten many times by layout patches that "work". So that criteria is not enough.
Specifically, it is not clear what happens with several consecutive ZWSP
characters. Also, there is not enough analysis as to what is happening to the
offsets that nsTextFrame maintains, and whether it is necessary to sync the
tranformation flag that nsTextFrame uses when it ends up with a text that isn't
a replica of the original (the flag helps to optimize subsequent reflows such as
that coming from dynamic changes). Suggesting caution about this one; Futuring
is also a sensible option as originally proposed.
Comment 17•23 years ago
|
||
Please do not remove comments which I added to the status whiteboard. I still think
this bug should be marked WONTFIX or INVALID. Fonts should have an empty glyph at
the ZWSP codepoint. If they don't, then they probably intended to show the glyph,
or are buggy. Either way, that isn't our problem, except if we want to ship with a
font that does this right. See also my comments at 2001-05-13 13:47.
Whiteboard: INVALID/WONTFIX?
Assignee | ||
Comment 18•23 years ago
|
||
Allright: Hixie wins. (Arguing with him is like wrestling with a bag full of cats.)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Comment 19•23 years ago
|
||
Verified WONTFIX. Very rare problem. Hopefully, it might become a non-issue if
the auto-download of fonts becomes more comprehensive --can't remember the bug#.
(BTW, whitespace testcases also need to include an example with whitespace:pre
for completeness.)
Status: RESOLVED → VERIFIED
Comment 20•23 years ago
|
||
Waterson, you pansie ;)
This isn't rare. It's something we could use in *every* line of the javascript
debugger console, and chatzilla. In these console interfaces, we need to make
sure that a long word doesn't come along and cause a horizontal scrollbar (try
using a console where every line overflows to the right.) Currently, we do this
by inserting an html:img with a width of 0px. If we could reliably use the
&zwsp; character instead, we'd have far fewer DOM nodes, and could construct the
messages faster, while conserving DOM objects. Unfortunatley, in the Real World
(you remember the Real World, right, where our users live?) lots of fonts are
broken. This includes default installs of contemporary Windows and Linux
systems. Please bear this in mind as this bug sits "WONTFIX".
This isn't ssieb picking nits, he's trying to do something cool with the code
(see bug 71549), lets not stonewall it.
Comment 21•23 years ago
|
||
What I meant was that ZWSP is rarely encountered in web pages.
FYI, the font called "Lucida Sans Unicode" has a glyph for ZWSP as well as some
other space-like characters.
But actually, speaking from a broader perspective, if this bug is blocking some
other work, it might be much easier/safer to special case the ZWSP in the font
subsystem than in the cryptic code of nsTextFrame -- although getting r/sr on
the various versions of GFX is an annoyance. There is another advantage if done
in the font subsystem in that not all characters will have to be tested. Only
when fonts are missing -- it is known when fonts are missing for particular
characters, that's where they get substituted to the question mark '?'.
If a font with a ZWSP glyph is found, then all goes well, nothing to worry
about, otherwise, the so-called fallback font-substitute code gets activated and
a one-liner can be added therein to not susbstitute any ZWSP (since its width is
zero and it can be treated as if it wasn't there...) I am in the midst of
changes and can easily try this if it is blocking an immediate work.
Comment 22•23 years ago
|
||
&zwsp; is not the way to fix the wrapping problem. The way to fix that problem is
by using a proprietary value for 'white-space' just like we did with the 'wrap'
attribute on the <pre> element (using the '-moz-pre-wrap' keyword). Call it
'-moz-hard-wrap' or whatever. It is very likely that CSS3 will have such a feature
anyway, so we are going to have to write the code at some point whatever we do.
Comment 23•23 years ago
|
||
Interesting, people have already started asking for a hard-wrap in bug 99457.
I personnally think that hixie (and dbaron) do a good job by being very careful
with the standards and keeping an eye on the platform for the future. It is too
tempting to hack all sort of things as a matter of expediency, and since mozilla
is so big, you wouldn't know the ugly things that are going on on your back and
their potential side-effects in the long run. I also know that they are careful
to keep their feets on the ground and accept hacks when they are convinced of
their value. So...
Who will conduct the analysis for the nsTextFrame hack?! But I looked at the
font code and we might perhaps help w.r.t. ZWSP if it blocks some other nifty
work, alright hixie? If you think it is still premature, I will move on... or
maybe you might want to first see the patch to confirm that it is simpler/safer
over there... let me attach something soon.
Comment 24•23 years ago
|
||
The reason I didn't mark the bug WONTFIX, nor verify it when it was so marked, is
because although I don't like such a hack, I would be prepared to tolerate it to
some extent. :-) So long as a font with a glyph at U+200B would still show the
glyph in response to a &zwsp; entity, then we should be ok.
However, I *do* think that using &zwsp; for the purposes that rginda wants,
namely a "cheap" hard wrap, is wrong, and would be better handled by the
'white-space' property, so if that is the sole reason to implement this hack,
then I'm against.
I think implementing this at the same level as the code that changes U+2122 into
the two characters "T" and "M" would be reasonable, and not a hack.
Comment 25•23 years ago
|
||
Comment 26•23 years ago
|
||
Re-opening since some activitity is going on in the bug.
I attached a patch for gfxwin. Other platforms use transliterate.properties
(this is where U+2122 is associated to "TM", and © associated to "(c)" when
there is no font with glyphs). Maybe the fix on other platforms is simply to add
empty entries in intl/unicharutil/tools/gentransliterate.pl which is the
generator of transliterate.properties. Need testers on Mac & Linux to confirm
that.
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
Comment 27•23 years ago
|
||
Assignee | ||
Comment 28•23 years ago
|
||
Let's try to fix chatzilla as Hixie suggested (|whitespace: -moz-hard-wrap|),
then we can deal with this bug on its own merits (which seem marginal). It looks
like bug 99457 sufficiently covers that RFE, yes?
Comment 29•23 years ago
|
||
Agree. But notice that the patch is worthwhile in the substitute-font fallback
context. Once that code gets activated, it means there is no font at all with
the needed glyphs, and recovery (or "faking" the rendering if you like) is all
that matter. It benefit everybody -- not just nsTextframe, and callers don't
have to care that something else was used in the place of their input.
Assignee | ||
Comment 30•23 years ago
|
||
Yes, I buy that. sr=waterson on attachment 49543 [details] [diff] [review] if dbaron and other font-savvy
people are happy with it. (As you note, we'll need to duplicate this code for
Mac and Linux as well.)
Comment 31•23 years ago
|
||
It might actually be much easier on other platforms because they already use a
parameterized transliterate handling. Just adding an entry for ZWSP in the
transliterate.properties file might do the trick (need testing to be sure -- of
course):
http://lxr.mozilla.org/seamonkey/find?string=transliterate
Comment 32•23 years ago
|
||
Reporter | ||
Comment 33•23 years ago
|
||
These patches don't solve the original issue which was that the font does have a
glyph for that character. It appears that the font designer expected the
application to interpret those character codes, instead of using the font.
The glyph looks like:
ZW
SP
with a box around it and the same for other similar characters, e.g. zwnj, thsp.
I would be very happy for a |whitespace: -moz-hard-wrap| option, that would
solve a lot of problems.
Comment 34•23 years ago
|
||
Just curious, care to tell which is the font that is claiming to have a glyph?
(indeed the problem wouldn't be solved by the patches since they come into play
too late in the chain during the font-substitute fallback).
Comment 35•23 years ago
|
||
This "original problem" is the exact issue that I do not think we should "fix".
If the font is buggy then tough. Uninstall your font. Would you blame Mozilla for
misrendering the "A" character if your font had a "B" glyph at that code point?
Comment 36•23 years ago
|
||
To respond to waterson's comment at 20:18 above -- sure, it makes sense to me at
that level.
Comment 37•23 years ago
|
||
rbs's patch 49543 looks very reasonable to me. When a system could not provide a
glyph for a character, it is soly application's responsibility to provide a way
to represent it. Either replace it with '?' or transliterate, or in this case
just ignore it all looks reasonable as long as the character's function (as
line-break ) remains.
Reporter | ||
Comment 38•23 years ago
|
||
Reporter | ||
Comment 39•23 years ago
|
||
I might possibly accept that the font is broken, but I'm not convinced yet.
But, if mine is broken, then there are probably a lot of others which are broken
as well, and we can't expect the average user to know it's their font, so the
proposed solution here won't help chatzilla.
Comment 40•23 years ago
|
||
Samuel, won't rbs' patch solve the problem you mentioned? From user's
perspective, it should have the same effect as your original patch.
Reporter | ||
Comment 41•23 years ago
|
||
no, because as you can see in the picture, there actually *is* a glyph at that
character location, so even with the patch, mozilla will use that glyph.
Comment 42•23 years ago
|
||
I had to disable the Lucida Sans Unicode and Palatino fonts myself so as to hit
a break point I added in SubstituteChars() to test my patch. Had I not disabled
these fonts, I wouldn't reach that function. But unlike your font, these fonts
have the expected glyphs.
So what is the font with those glyphs if I may re-ask?
Reporter | ||
Comment 43•23 years ago
|
||
Reporter | ||
Comment 44•23 years ago
|
||
ok, I attached a screenshot of a bunch of fonts on my system. So which one
should I use? ;-)
Are you going to tell me that everybody did their fonts wrong?
Comment 45•23 years ago
|
||
Those glyphs are probably all coming from a single font since they don't exist
in most or all of the fonts you list.
Reporter | ||
Comment 46•23 years ago
|
||
How does that work? Which font would they be coming from? Is it mozilla or the
font server picking them?
Comment 47•23 years ago
|
||
It is just that I am interested to know the font for my own records so that I
know about that font for future references.
To see the glyphs in a font, you could do something like:
xfd -fn -adobe-symbol-medium-r-normal--14-140-75-75-p-85-adobe-fontspecific
To see the font actually used by Mozilla, you could set a testpage with
the single character, and set:
setenv NS_FONT_DEBUG 2
(which is the value that I see defined in nsFontMetricsGTK for that purpose
#define NS_FONT_DEBUG_CALL_TRACE 0x02)
===
But FYI, here is how it works:
Suppose you have:
<span style="font-family: font1, font2, cursive"> your chatzilla text... </span>
If 'your chatzilla text...' is entiterely ASCII, then little happens, otherwise
the hunt for a font for _each_ character goes on like this:
FindFont(HDC aDC, PRUnichar aChar)
{
->see if the user prefers something else...
nsFontWin* font = FindUserDefinedFont(aDC, aChar);
if (!font) {
--->see if there is a glyph in the local list of fonts: "font1, font2"
font = FindLocalFont(aDC, aChar);
if (!font) {
----->see if there is a glyph in a font that is of generic "cursive" type
font = FindGenericFont(aDC, aChar);
if (!font) {
------->see if there is a glyph in the global list of fonts on your system
font = FindGlobalFont(aDC, aChar);
if (!font) {
--------->fallback: substitute/transliterate to render something if possible...
font = FindSubstituteFont(aDC, aChar);
}
}
}
}
return font;
}
So if there is font on your system that claims to have the glyphs, the fallback
is never reached.
Comment 48•23 years ago
|
||
> so the proposed solution here won't help chatzilla.
Just for the record again... this bug *should not* help chatzilla. The bug that
will help chatzilla is bug 99457.
Comment 49•23 years ago
|
||
Now I'm using zwsp in the purpose of wraping the lines
instead of old <wbr> which is no longer supported.
In Thai, we can have a very long line of contacting words.
html 4.01:
".. When formatting text, user agents should identify these
words (squences white space separate words) and lay them out
according to the conventions of the particular written
language (script) and target medium .. This layout may
involve putting space between words (called inter-word
space), but conventions for inter-word space vary from
script to script .. while in Thai it is a zero-width word
separator (&x200B;) .."
xhtml 1.0:
".. in languages whose script is related to Nagari (e.g.,
Sanskrit, Thai, etc.), grammatical boundaries may be encoded
using the ZW 'space' character, but will not typically be
represented by typographic whitespace in rendered output .."
So, I think using zwsp in my purpose makes sense.
zwsp is not just another "A". I think Mozilla should do
something when font has no glyph.
Comment 50•23 years ago
|
||
Pawee: I agree. That's what the patch does -- when the character is missing from
all the fonts, we will handle it specially (by rendering it as "" instead of "?").
The issue raised above is with regard fonts that _do_ have a glpyh in that
position, either on purpose, or because the font designer was misguided.
Comment 51•23 years ago
|
||
With "setenv NS_FONT_DEBUG 2", ssieb reported that Mozilla gave the mystery font
being used on his system for ZWSP (U+200B) as:
FindFont(200B):
returns -mutt-clearlyu-medium-r-normal--17-120-100-100-p-128-iso10646-1
Comment 52•23 years ago
|
||
Resolving as duplicate of bug 33498. I am hooking the transliterator to GfxWin,
and this bug is fixed as a direct consequence of that.
*** This bug has been marked as a duplicate of 33498 ***
Status: REOPENED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•