Closed Bug 6585 Opened 26 years ago Closed 25 years ago

missing glyphs should use substitute

Categories

(Core :: Layout, defect, P1)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: erik, Assigned: erik)

References

()

Details

(Keywords: css1, Whiteboard: [PDT-])

Attachments

(5 files)

If we cannot find any font that contains a glyph for the particular character we are trying to measure/draw, then we should pop up a window that explains where you can download such a font, similar to the window that comes up when you visit a page containing a plug-in that you don't have yet. Currently, the measuring code returns zero, and the drawing code doesn't attempt to draw anything for such characters. (We might want to at least draw some boxes in this case, like Windows does.) For the pop-up window, there are various issues to consider: What do we say to the user? "You don't have a font for the character U+1234"? "Would you like to download a Japanese font?" Etc. Do we automatically re-render the page after the user has downloaded the font? How often do we trigger the pop-up? Once per page? Once per session? Once per session for each character? For each "script" (Unicode range)? Etc.
Status: NEW → ASSIGNED
Target Milestone: M15
*** Bug 3852 has been marked as a duplicate of this bug. ***
OS: other → All
Summary: pop up window when no font found for particular character → {css1} missing glyphs should use substitute (U+FFFD)
The following page: http://www.bath.ac.uk/%7Epy8ieh/internet/eviltests/glyphs.html ...should help with this bug (it has some obscure maths codes in it). You should definitely be drawing something for unknown glyphs. That is a {css1} compliance issue. I am increasing the severity for the moment, since the missing glpyh is an error, not an enhancement. Technically, if you cannot display a glyph then you should be displaying the glyph with unicode code point U+FFFD, REPLACEMENT CHARACTER. If that doesn't map to anything, then, well... a space (U+0020) would be better than nothing! A popup window would be a possibility, but see also the suggestion in bug 6211: a window detailing all errors in the document (HTML, CSS, image downloading problems, and so on). I will add "glyphs not found" to the list of errors that should be reported.
Adding lyecies@netscape.com to cc list.
*** Bug 17567 has been marked as a duplicate of this bug. ***
Depends on: 19494
Here is the test case from bug 17567: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2486 BTW, on my system IE5 has a really amusing attempt at this: it displays three different "missing glyph" glyphs!!! :-) I'm hoping we will be slightly more consistent than that...
Bug 17567 had a note saying that the code to fix this bug (or at least the part covered by 17567) has already landed, but is #ifdef MOZ_MATHML.
Actually, the font code that Roger checked in for MathML was mostly written by me, and was designed to be extensible, so that we could extend it to cover this bug too. So, it isn't just a matter of making the #ifdef MATHML code be the default. We need to write some more code.
Keywords: css1
Migrating from {css1} to css1 keyword. The {css1}, {css2}, {css3} and {css-moz} radars should now be considered deprecated in favour of keywords. I am *really* sorry about the spam...
Moving all of my M15s to M16. Please add comments if you disagree.
Target Milestone: M15 → M16
Summary: {css1} missing glyphs should use substitute (U+FFFD) → missing glyphs should use substitute (U+FFFD)
I would hope to see this fixed before M16. How hard is it? I would nominate it for beta1 if I were more sure that it isn't that complicated... It's not a minor issue, especially when reading texts in certain languages (that are not the user's default language).
Beta1 candidacy is not so much dependent on how hard it is to fix it, but how much we need it for "beta quality", which is of course difficult to define, but the PDT team has a pretty good sense of what is important and what isn't. So, how about just nominating for Beta1, by adding the beta1 keyword above?
OK, nominating for beta1, because I think it's important that the user knows when bits of text are missing.
Keywords: beta1
Agreed ... it's frustrating that there's content on the page that mozilla just doesn't show at all. 4.x shows them wrong, but at least it shows that there's something there.
PDT indicated it would not hold beta for this... so PDT-
Whiteboard: [PDT-]
Perhaps a tooltip can be displayed when hovering over a missing glyph (U+FFFD), e.g. _Latin small letter e with inverted breve_ for U+0207.
How can you hover over something that's not there at all? And how would the user know where to hover?
> How can you hover over something that's not > there at all? And how would the > user know where to hover? When a glyph can't be found, the REPLACEMENT CHARACTER U+FFFD should be displayed. When hovering over this, the name of the missing glyph can be shown as a tooltip. Hope this makes things clearer.
Erik, now that I am wading into fonts again for MathML add-ons, things are pretty fresh in my memory, and I may lay a helping hand w.r.t. this bug in the Win32 corner. The fix has to be in FindGlobalFont(), right? Rather than return nsnull, return the first font that has the Unicode's REPLACEMENT CHAR?
Thinking again, just returning a font there is not enough! Since this doesn't tell to other rendering/measuring functions that that particular char has to be re-mapped to the missing char.
Right, you can't just return a font that contains a glyph for U+FFFD. I would rather make it more general, and return an object that implements the GetWidth and DrawString APIs, but isn't actually a font on the inside. It is just a couple of methods that deal with rectangular boxes on the screen. I.e. the GetWidth method returns the width of the boxes that would be drawn by the DrawString method. Also, we want to create another subclass that maps Unicodes to other character codes for which we *do* have glyphs on the system. E.g. Unix, with limited iso8859-1 fonts, where windows-1252 docs can't be displayed normally. But a box-drawing subclass is a good first step, and I think we need it anyway.
Got a fix in my tree. As you suggested, I added another subclass (nsFontWinSubstitute) which acts as a fall-back for chars without glyphs. Great job with this foundation, erik! I didn't implement a drawing box font. I am simply using an existing font (out of the global list) that has the replacement char. This font is added in mLoadedFonts[], but its map only contains chars without glyphs, and it is not added in the hashtable (this way, it cannot be looked up, and therefore it will not interfere with the other font switching tricks, and its name doesn't matter). The way to get hold of the font is with something like mLoadedFonts[mIndexOfSubstituteFont], which is then used to instantiate a nsFontWinSubstitute that has its own GetWidth/DrawString. [e.g., GetWidth('xyz') actually does GetWidth('???'), DrawString('xyz') actually does DrawString('???')]; Since the font is kept in mLoadedFonts[], subsequent searches for the same chars without glyphs are speedy, and return the substitute font straight away.
Should this go in M14?
The PDT team has already looked at this and marked it PDT- (i.e. not a candidate for Beta 1). Your work *is* appreciated though, especially by me!
While the tree is still open (i.e., today), why should not being marked PDT+ prevent someone who doesn't work for Netscape from checking in the fix (assuming it's reviewed, etc.)? Some of us do think this should definitely be in beta1.
I wondered that as well. The tree isn't closed to non-PDT+ fixes, it's just that Netscape engineers aren't supposed to be spending time tracking down non-PDT+ fixes. Or at least, that's been my understanding.
I don't have time to review the code since I'm working on PDT+ bugs. If somebody else does have time to review it, and both Roger and the reviewer can promise that this code will not cause other bugs in M14 or cause performance problems, then I may give my reluctant "OK". I'm not trying to be mean here. I've had years of experience shipping betas, and this is not the right time to introduce significant numbers of lines of code in sensitive areas for features that are not even deemed PDT+ by the PDT team. When you are close to the deadline, a fix must be for something that is *both* actually needed *and* very safe and understood.
By the way, Windows GFX changes now require Troy's review, and Kevin McCluskey is the GFX owner, so his review is also required.
By the way, as Jim Roskind said in mozilla.seamonkey, M14 may not become Netscape's Beta 1. If M14 has too many beta-stoppers, M15 will be the target. So there is still some chance that this fix will make Netscape's Beta 1, and even if it doesn't we can still get it into Beta 2.
If you can post the diffs and get Troy and Kevin McCluskey to review the changes, go for it... cc'd troy and kmcclusk
Attached file diff in gfx (deleted) —
Attached file diff in intl (deleted) —
Ok guys, now you have everything. Upon review, the decision is yours. I am pretty confident that the code is now solid. It makes the browsing experience quite nice. Erik has been very helpful in providing insights and a great deal of feedback during development. I understand his prudence, and if the code has to wait, no problemo. The part in intl consists in adding extra mapping tables to enable the conversion of further math fonts. These fonts can be enabled by default in gfx by removing the #ifdef MOZ_MATHML that is in gFontsHaveConverters, if you deem appropriate. Screenshots of the testcases when these were enabled and disabled were attached. The fact that some math chars "overflow" in the testcase is due to the FindFont('a') hack --it returns the "ascii" metrics that are used as the default metrics. This is a recorded bugzilla bug 20394 which erik plans to address with a GetDimensions() method that will supersede the current GetHeight/GetWidth.
Of note, the patch for gfx can be tried in *isolation*. So, feel free to apply it on gfx, and report what you see (on win32)...
Severity: normal → major
Priority: P3 → P1
Target Milestone: M16 → M15
The bug is scheduled for M15, meaning there is time for improvements on the present solution, if worthwhile. Currently, since "viewsource" goes through the same rendering process, it means that chars without glyphs are still renderered with the REPLACEMENT CHAR, even within "viewsource". Is it, instead, desirable to display the unicode points there? Implementing this is not really demanding, because the font engine knows everything, and can either choose to replace the chars, or render their unicode points. However, "viewsource" is currently known to be notoriously slow and the question is whether it is worthwhile doing these things that will make it even slower. Also, the font engine will then have to know that the consumer is "viewsource" BTW, is there a common unicode font that actually has a glyph for the Unicode's replacement char U+FFFD?
nsFont.decorations is a PRUint8 bit-mask that could be used for your View Source idea. If we had 2 bits, we could have 4 modes: (1) show glyphs as usual (like browser) (2) unavailable glyphs are shown as Unicodes (3) all non-ASCII chars (> 0x7F) are shown as Unicodes (4) all chars are shown as Unicodes Number (4) is not that useful though...
Sounds good. Some comments/clarifications: (1) show glyphs as usual (like browser). All missing chars show up as the replacement char (this is the current behavior --not really helpful, but can be left as an option, as it is the fatest mode, and is a sort of WYSIWYG too!). (2) unavailable glyphs are shown as Unicodes -- this mode allow copy-pasting of the screen content of viewsource. (3) all non-ASCII chars (> 0x7F) are shown as Unicodes -- Not very sure how helpful is this. So the reformulation: (3) all non-ASCII chars (> 0x7F) are shown as both Unicodes and Glyphs, in the format "0xUnicode{Glyph}". Hence a char withouht glyph will show up as "0xUnicode{REPLACEMENT_CHAR}".
I now have all this coded up in my tree, and there are in sync with the changes Troy made recently --bug 27056, to optimize the use of SelectObject(). For (1), that was the original code. I didn't have to do anything --save bringing the code in sync. For (3), this is coded in nsRenderingContextWin. I added a convenience function SubstituteNonAsciiChars() that checks to see if some non-ascii chars have to be substituted. And... it is the output of this transformation that is used in the body of GetWidth and DrawString. (This way, the font-swithing machinery will pick the glyphs... For example, "Å" will appear as "ÅU+000C5". "Å" could be any strange glyph that could be picked in any font.) The SubstituteNonAsciiChars() function is carefully optimized and does nothing if nothing has to be done (e.g., when the current text run consists only of ascii chars). Concretely, the code of GetWidth(aString) and DrawString(aString) now loos like: PRUnichar* pstr = aString; if (shouldSubstituteNonAsciiChars) { pstr = SubstituteNonAsciiChars(aString); } ... use the *same* code as before to do the measurement/rendering using pstr For (2), this is coded by nsFontWinSubstitute in nsFontMetricsWin. (Doing so fulfills the intended purpose because nsFontWinSubstitute only applies to chars without glyphs.) The code of GetWidth(aString) and DrawString(aString) there now loos like: PRUnichar* pstr = aString; if (shouldDisplayUnicode) { ... sprintf Unicode points } else { ... default to the REPLACEMENT CHAR } ... use the *same* code as before to do the measurement/rendering using pstr ==== It is interesting to see the HEX (huh, Unicode) modes in action, and as you can see from the description above, the code is solidly standing upon the existing foundation and should therefore be as robust as the foundation :-) That's why I am pretty confident that the code is OK and should go in M15 as early as possible to enable further testing and enjoyement by others. What is missing in the code is how to activate these 'shouldDisplayUnicode' or 'shouldSubstituteNonAsciiChars'.
I just ran across this font problem while porting to the Photon UI, which may be the same as this problem, on articles at www.abcnews.com I see HTML that looks like: <font face= geneva,arial,helvetica size=5><b> <a href="/sections/us/DailyNews/bettyloubeets000218.html" > Weighing &#145;Black Widow&#146;s&#146; Fate </a> </b></font><br> The "&#145;" character looks like an arrow pointing upward under Photon. It is correct under Windows, Mac and Linux Netscape 4.7 but it is not correct under Photon or GTK Mozilla. I found out that numeric entities between 130-159 are special and need to be translated. Check out this link for more info: http://www.hut.fi/u/jkorpela/www/windows-chars.html I guess I can put some sort of translation in my own GFX code to do the translation but I was not sure if the purpose and fix for this bug would also fix this problem. Does this bug encompass this problem??
Those numeric entities should not be supported in strict mode, since they are undefined. (However, as characters in the document, they could be supported if it's in a windows encoding, rather than iso-8859-1.)
I am not sure if these codes are treated as special cases somewhere, and re-mapped to their proper unicode values --At the level of GFX they are not treated as special cases.
NCRs (numeric character references) like &#145; are indeed illegal, but very common on the Net. Our parser converts them to the correct Unicodes, and GFX then tries to find fonts with glyphs for those Unicodes. On Unix we have for a long time had the problem that Windows extensions in the 0x80-0x9F range could not be displayed. This bug report does indeed refer to that kind of problem. There are several bug reports assigned to me in this area, some Unix-specific.
BTW, we still need additional API to really nail this bug 6585 down. The substitution mode is not yet defined and I saw in bug 5100 that peterl didn't like seeing nsFont used for extra things. So we will need some other ways to pass rendering options in GFX. There could be a SetFlags() (or whatever) to specify rendering policies. RFE: going back to the initial questions raised in this bug report, an interesting option could be to pool/sort all the missing chars, in view of possibly sending a one-off feedback to the user (bug 6211, with a listing of names?), or in view of determining the fonts that can be auto-downloaded (or suggested for download). Note that it would be hard to implement a tooltip --compared to improving the viewsource trick to show their unicode in an ERROR-type color, but even achieving this is not straightforward.
This bug is now for Windows only, and assigned to Roger for M16 (i.e. after Beta 1 branch). I created bug 31252 for Unix. Mac already has substitutes.
Assignee: erik → rbs
Status: ASSIGNED → NEW
OS: All → Windows NT
Hardware: All → PC
Summary: missing glyphs should use substitute (U+FFFD) → missing glyphs should use substitute
Target Milestone: M15 → M16
Code checked-in. Chars with missing glyphs are now substituted. Currently only one replacement mode is activated. Additional API is needed to switch to different modes on the fly.
Roger, thanks for taking on this work, and for checking it in. This bug report is out of control, and we ought to create separate bug reports for the remaining issues. Re-assign the bug back to me if you'd like me to do that.
Re-assigning to erik to better organise where to head to: * Use substitute as per bug title (done). * API to flag different rendering policies (e.g., could be with a general struct, or for a start, a bitarray with some bits for the replacement mode) * Visual feedback of missing chars (Error-color for ViewSource? popup?) * Auto-download of fonts ? * Error log ? (collect all missing chars -- bug 6211) ?
Assignee: rbs → erik
At some point, I will look into this bug, and create one or more new bugs to cover the remaining issues. For now, setting milestone to M17.
Status: NEW → ASSIGNED
Target Milestone: M16 → M17
I'd prefer we think about the design sooner, so we can define and prioritize tasks and possibly delegate them as well...
I did similar thing on Mac already. However, we should do special process for "space" characters in the range of U+2000-U+206F. (some of them, not all of them). There are no reason we cannot "render"/"measure" a space characters, even non of the font have these glyph (a glyph for a space character is simply a matrix without any countor....)
This bug is *way* out of control. Please do not add any more comments to it. I have created a new bug for the Unicode space character issue: http://bugzilla.mozilla.org/show_bug.cgi?id=33032
Please do not add any more comments to this bug. Marking FIXED (since we now at least get question marks for unavailable glyphs). Other issues have been separated out. Thanks, Shrirang! From: shrir@netscape.com (Shrirang Khanzode) I have seperated the other issues from bug 6585. Bugs logged: Bug 41251 : API to flag rendering policies Bug 41249: Visual feedback for missing characters Bug 41250 : Auto download of fonts (Filed already) Bug 6211 - Error log to collect missing characters
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Target Milestone: M17 → M16
Marking fixed in the July 6th build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: