Closed Bug 79928 Opened 24 years ago Closed 23 years ago

Zero-width space (unicode x200b) not interpreted

Categories

(Core :: Layout, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 33498

People

(Reporter: samuel, Assigned: waterson)

References

Details

(Keywords: fonts, platform-parity, Whiteboard: INVALID/WONTFIX?)

Attachments

(8 files)

The unicode character #x200b, which is supposed to be a zero-width space, is not interpreted by mozilla, and whatever character happens to be or not be at that position in the current font gets displayed. What should happen is that mozilla interprets the character and lays out the text accordingly and doesn't use the font.
Blocks: 71549
Keywords: fonts
Attached file testcase (deleted) —
As demonstrated by the testcase, it does cause a line break, so it just needs to not display the font character.
Attached patch patch (deleted) — Splinter Review
why? If the font claims to have a glyph for U+200B, why shouldn't we use it? I think this is INVALID.
This is marked "OS: All" but it works for me on Windows 2000. It could be a Linux-only bug, but more likely it's just an error in the font.
Me again. If it is a bug in the font and there are fonts on Linux which don't have the bug, then one possible solution is to simply put the non-broken fonts at the front of the preferred font list in the CSS. I tried this on Exceed and apparently none of my fonts have this character, so it displayed as a question mark. The same should happen if none of my fonts had a normal space (U+0020) or any of the other spacing characters around U+200x.
Keywords: pp
Whiteboard: INVALID/WONTFIX?
On Windows 98 it gives a question mark. Isn't the point of this character code to have no visible representation? I think having the font indicate what character it is, is good for debugging applications that aren't supporting it properly.
The point is there isn't anything to "support". It's up to the font to have a codepoint for U+200B, just like it is for U+0020 and U+2009. U+200B is not special in any way. Note that when we ship with MathML, we will have to ship with Math fonts, so that would be a good time to ensure that we also ship with a font that has a U+200B glyph.
But shipping fonts is a download headache at best, and has licensing issues at worst. Ought we simply hack around it for now? Seems reasonable to me, anyway.
Assignee: karnaze → waterson
Keywords: patch
Sounds reasonnable to me too for now. I have been wandering in the area recently. The best spot for the fix-up patch is probably IS_DISCARDED(_ch) in nsTextTransformer.cpp.
Attached patch better patch? (deleted) — Splinter Review
Comment on attachment 49397 [details] [diff] [review] better patch? sr=waterson
Attachment #49397 - Flags: superreview+
r=rbs, didn't test - assuming it has been tested (I tried using the checkbox from the patch manager for the first time, but got a red alert that "one of the statuses wasn't valid" -- is there any link to a help for these new things?)
Hang on, the second patch doesn't seem to work, I'll look at it some more. (The first patch does work, btw)
The second patch doesn't make sense. We don't want the character discarded or it won't cause the line break. Also, the macro doesn't seem to handle big characters or something. So, back to the first patch unless someone has a better idea.
Whiteboard: INVALID/WONTFIX?
Just having a font is the cleanest way to handle the problem since nsTextFrame is not the only place where strings are drawn within Mozilla. We have been bitten many times by layout patches that "work". So that criteria is not enough. Specifically, it is not clear what happens with several consecutive ZWSP characters. Also, there is not enough analysis as to what is happening to the offsets that nsTextFrame maintains, and whether it is necessary to sync the tranformation flag that nsTextFrame uses when it ends up with a text that isn't a replica of the original (the flag helps to optimize subsequent reflows such as that coming from dynamic changes). Suggesting caution about this one; Futuring is also a sensible option as originally proposed.
Please do not remove comments which I added to the status whiteboard. I still think this bug should be marked WONTFIX or INVALID. Fonts should have an empty glyph at the ZWSP codepoint. If they don't, then they probably intended to show the glyph, or are buggy. Either way, that isn't our problem, except if we want to ship with a font that does this right. See also my comments at 2001-05-13 13:47.
Whiteboard: INVALID/WONTFIX?
Allright: Hixie wins. (Arguing with him is like wrestling with a bag full of cats.)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Verified WONTFIX. Very rare problem. Hopefully, it might become a non-issue if the auto-download of fonts becomes more comprehensive --can't remember the bug#. (BTW, whitespace testcases also need to include an example with whitespace:pre for completeness.)
Status: RESOLVED → VERIFIED
Waterson, you pansie ;) This isn't rare. It's something we could use in *every* line of the javascript debugger console, and chatzilla. In these console interfaces, we need to make sure that a long word doesn't come along and cause a horizontal scrollbar (try using a console where every line overflows to the right.) Currently, we do this by inserting an html:img with a width of 0px. If we could reliably use the &zwsp; character instead, we'd have far fewer DOM nodes, and could construct the messages faster, while conserving DOM objects. Unfortunatley, in the Real World (you remember the Real World, right, where our users live?) lots of fonts are broken. This includes default installs of contemporary Windows and Linux systems. Please bear this in mind as this bug sits "WONTFIX". This isn't ssieb picking nits, he's trying to do something cool with the code (see bug 71549), lets not stonewall it.
What I meant was that ZWSP is rarely encountered in web pages. FYI, the font called "Lucida Sans Unicode" has a glyph for ZWSP as well as some other space-like characters. But actually, speaking from a broader perspective, if this bug is blocking some other work, it might be much easier/safer to special case the ZWSP in the font subsystem than in the cryptic code of nsTextFrame -- although getting r/sr on the various versions of GFX is an annoyance. There is another advantage if done in the font subsystem in that not all characters will have to be tested. Only when fonts are missing -- it is known when fonts are missing for particular characters, that's where they get substituted to the question mark '?'. If a font with a ZWSP glyph is found, then all goes well, nothing to worry about, otherwise, the so-called fallback font-substitute code gets activated and a one-liner can be added therein to not susbstitute any ZWSP (since its width is zero and it can be treated as if it wasn't there...) I am in the midst of changes and can easily try this if it is blocking an immediate work.
&zwsp; is not the way to fix the wrapping problem. The way to fix that problem is by using a proprietary value for 'white-space' just like we did with the 'wrap' attribute on the <pre> element (using the '-moz-pre-wrap' keyword). Call it '-moz-hard-wrap' or whatever. It is very likely that CSS3 will have such a feature anyway, so we are going to have to write the code at some point whatever we do.
Interesting, people have already started asking for a hard-wrap in bug 99457. I personnally think that hixie (and dbaron) do a good job by being very careful with the standards and keeping an eye on the platform for the future. It is too tempting to hack all sort of things as a matter of expediency, and since mozilla is so big, you wouldn't know the ugly things that are going on on your back and their potential side-effects in the long run. I also know that they are careful to keep their feets on the ground and accept hacks when they are convinced of their value. So... Who will conduct the analysis for the nsTextFrame hack?! But I looked at the font code and we might perhaps help w.r.t. ZWSP if it blocks some other nifty work, alright hixie? If you think it is still premature, I will move on... or maybe you might want to first see the patch to confirm that it is simpler/safer over there... let me attach something soon.
The reason I didn't mark the bug WONTFIX, nor verify it when it was so marked, is because although I don't like such a hack, I would be prepared to tolerate it to some extent. :-) So long as a font with a glyph at U+200B would still show the glyph in response to a &zwsp; entity, then we should be ok. However, I *do* think that using &zwsp; for the purposes that rginda wants, namely a "cheap" hard wrap, is wrong, and would be better handled by the 'white-space' property, so if that is the sole reason to implement this hack, then I'm against. I think implementing this at the same level as the code that changes U+2122 into the two characters "T" and "M" would be reasonable, and not a hack.
Attached patch patch in gfxwin (tested) (deleted) — Splinter Review
Re-opening since some activitity is going on in the bug. I attached a patch for gfxwin. Other platforms use transliterate.properties (this is where U+2122 is associated to "TM", and &copy associated to "(c)" when there is no font with glyphs). Maybe the fix on other platforms is simply to add empty entries in intl/unicharutil/tools/gentransliterate.pl which is the generator of transliterate.properties. Need testers on Mac & Linux to confirm that.
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
Let's try to fix chatzilla as Hixie suggested (|whitespace: -moz-hard-wrap|), then we can deal with this bug on its own merits (which seem marginal). It looks like bug 99457 sufficiently covers that RFE, yes?
Agree. But notice that the patch is worthwhile in the substitute-font fallback context. Once that code gets activated, it means there is no font at all with the needed glyphs, and recovery (or "faking" the rendering if you like) is all that matter. It benefit everybody -- not just nsTextframe, and callers don't have to care that something else was used in the place of their input.
Yes, I buy that. sr=waterson on attachment 49543 [details] [diff] [review] if dbaron and other font-savvy people are happy with it. (As you note, we'll need to duplicate this code for Mac and Linux as well.)
It might actually be much easier on other platforms because they already use a parameterized transliterate handling. Just adding an entry for ZWSP in the transliterate.properties file might do the trick (need testing to be sure -- of course): http://lxr.mozilla.org/seamonkey/find?string=transliterate
These patches don't solve the original issue which was that the font does have a glyph for that character. It appears that the font designer expected the application to interpret those character codes, instead of using the font. The glyph looks like: ZW SP with a box around it and the same for other similar characters, e.g. zwnj, thsp. I would be very happy for a |whitespace: -moz-hard-wrap| option, that would solve a lot of problems.
Just curious, care to tell which is the font that is claiming to have a glyph? (indeed the problem wouldn't be solved by the patches since they come into play too late in the chain during the font-substitute fallback).
This "original problem" is the exact issue that I do not think we should "fix". If the font is buggy then tough. Uninstall your font. Would you blame Mozilla for misrendering the "A" character if your font had a "B" glyph at that code point?
To respond to waterson's comment at 20:18 above -- sure, it makes sense to me at that level.
rbs's patch 49543 looks very reasonable to me. When a system could not provide a glyph for a character, it is soly application's responsibility to provide a way to represent it. Either replace it with '?' or transliterate, or in this case just ignore it all looks reasonable as long as the character's function (as line-break ) remains.
Attached image this is what the glyphs look like (deleted) —
I might possibly accept that the font is broken, but I'm not convinced yet. But, if mine is broken, then there are probably a lot of others which are broken as well, and we can't expect the average user to know it's their font, so the proposed solution here won't help chatzilla.
Samuel, won't rbs' patch solve the problem you mentioned? From user's perspective, it should have the same effect as your original patch.
no, because as you can see in the picture, there actually *is* a glyph at that character location, so even with the patch, mozilla will use that glyph.
I had to disable the Lucida Sans Unicode and Palatino fonts myself so as to hit a break point I added in SubstituteChars() to test my patch. Had I not disabled these fonts, I wouldn't reach that function. But unlike your font, these fonts have the expected glyphs. So what is the font with those glyphs if I may re-ask?
Attached image another screenshot (deleted) —
ok, I attached a screenshot of a bunch of fonts on my system. So which one should I use? ;-) Are you going to tell me that everybody did their fonts wrong?
Those glyphs are probably all coming from a single font since they don't exist in most or all of the fonts you list.
How does that work? Which font would they be coming from? Is it mozilla or the font server picking them?
It is just that I am interested to know the font for my own records so that I know about that font for future references. To see the glyphs in a font, you could do something like: xfd -fn -adobe-symbol-medium-r-normal--14-140-75-75-p-85-adobe-fontspecific To see the font actually used by Mozilla, you could set a testpage with the single character, and set: setenv NS_FONT_DEBUG 2 (which is the value that I see defined in nsFontMetricsGTK for that purpose #define NS_FONT_DEBUG_CALL_TRACE 0x02) === But FYI, here is how it works: Suppose you have: <span style="font-family: font1, font2, cursive"> your chatzilla text... </span> If 'your chatzilla text...' is entiterely ASCII, then little happens, otherwise the hunt for a font for _each_ character goes on like this: FindFont(HDC aDC, PRUnichar aChar) { ->see if the user prefers something else... nsFontWin* font = FindUserDefinedFont(aDC, aChar); if (!font) { --->see if there is a glyph in the local list of fonts: "font1, font2" font = FindLocalFont(aDC, aChar); if (!font) { ----->see if there is a glyph in a font that is of generic "cursive" type font = FindGenericFont(aDC, aChar); if (!font) { ------->see if there is a glyph in the global list of fonts on your system font = FindGlobalFont(aDC, aChar); if (!font) { --------->fallback: substitute/transliterate to render something if possible... font = FindSubstituteFont(aDC, aChar); } } } } return font; } So if there is font on your system that claims to have the glyphs, the fallback is never reached.
No longer blocks: 71549
> so the proposed solution here won't help chatzilla. Just for the record again... this bug *should not* help chatzilla. The bug that will help chatzilla is bug 99457.
Now I'm using zwsp in the purpose of wraping the lines instead of old <wbr> which is no longer supported. In Thai, we can have a very long line of contacting words. html 4.01: ".. When formatting text, user agents should identify these words (squences white space separate words) and lay them out according to the conventions of the particular written language (script) and target medium .. This layout may involve putting space between words (called inter-word space), but conventions for inter-word space vary from script to script .. while in Thai it is a zero-width word separator (&x200B;) .." xhtml 1.0: ".. in languages whose script is related to Nagari (e.g., Sanskrit, Thai, etc.), grammatical boundaries may be encoded using the ZW 'space' character, but will not typically be represented by typographic whitespace in rendered output .." So, I think using zwsp in my purpose makes sense. zwsp is not just another "A". I think Mozilla should do something when font has no glyph.
Pawee: I agree. That's what the patch does -- when the character is missing from all the fonts, we will handle it specially (by rendering it as "" instead of "?"). The issue raised above is with regard fonts that _do_ have a glpyh in that position, either on purpose, or because the font designer was misguided.
With "setenv NS_FONT_DEBUG 2", ssieb reported that Mozilla gave the mystery font being used on his system for ZWSP (U+200B) as: FindFont(200B): returns -mutt-clearlyu-medium-r-normal--17-120-100-100-p-128-iso10646-1
Depends on: 32536
Depends on: 33498
No longer depends on: 32536
Resolving as duplicate of bug 33498. I am hooking the transliterator to GfxWin, and this bug is fixed as a direct consequence of that. *** This bug has been marked as a duplicate of 33498 ***
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: