Closed Bug 26179 Opened 25 years ago Closed 22 years ago

Mozilla reports existence of phantom text nodes in the DOM

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

VERIFIED INVALID
Future

People

(Reporter: lori, Assigned: jst)

References

()

Details

(Keywords: dom1, testcase, Whiteboard: http://mozilla.org/docs/dom/technote/whitespace/)

Attachments

(2 files)

The following code, when opened in IE5, shows an almost-correct representation of the DOM (except for the SCRIPT and STYLE nodes having no children, and the comment being reported as a tag with tagName ! -- duh! ;). In M13, virtually every node is reported as having a text node for a child, even when none actually exists in the document. (Note all the Text:s followed by blank space rather than actual text, and you'll see what I mean.) <html> <head> <title>Foo Document</title> <script language="JavaScript"> var structWin; function openStructWin(){ structWin = window.open ('','foo','width=600,scrollbars=yes,resizable=yes,menubar=yes'); structWin.document.write('<' + 'html>\n<' + 'head>\n<' + 'title>Document Structure</title>\n<' + 'style type="text/css">\nPRE ' + '{font-size: 24pt}' + '\n</style>\n</head>\n\n<' + 'body bgcolor="#FFFFFF">\n\n<' + 'pre>'); } function getNodeAndChildrenAsString(indentString,theNode){ var structString = ''; var structString = ''; var nodeString; var theChildren = theNode.childNodes; // If theNode is an element node if (theNode.nodeType == '1'){ nodeString = "<" + "b>" + theNode.tagName + "</b>"; // Otherwise, if theNode is a text node }else if (theNode.nodeType == '3'){ nodeString = "<" + "b>Text: </b>" + theNode.data; // In any other case (and here I'm assuming // that the only other case would be a // comment node) }else{ nodeString = "<" + "b>Comment: </b>" + theNode.data; } structString += indentString + nodeString; structString += '\n'; structWin.document.write(structString); for (var i=0; i < theChildren.length; i++){ getNodeAndChildrenAsString(indentString + " ", theChildren[i]); } } function writeClosingTags(){ structWin.document.write('<' + '/pre>\n\n<' + '/body>\n<' + '/html>'); } </script> <style type="text/css"> <!-- .red { color: #FF0000} --> </style> </head> <body onLoad="openStructWin();getNodeAndChildrenAsString ('',document.documentElement);writeClosingTags()"> <table width="600" border="0" cellspacing="10" cellpadding="2"> <tr> <th colspan="2" align="left">Flavors</th> </tr> <tr> <td>grape</td> <td>cherry</td> </tr> <tr> <td>lemon</td> <td>lime</td> </tr> <tr> <td>orange</td> <td>raspberry</td> </tr> </table> <p> That was a table, and this is a paragraph with some <b>bold text</b> in it. </p> <p> This paragraph contains an image. <img src="images/sun.gif" width="100" height="100"> </p> <!-- This is a comment before the text in the last paragraph. --> <p> This paragraph contains <b>bold text,</b> <code>code text,</code> and <span class="red">styled text.</span> </p> </body> </html>
The "phantom" text nodes are actually the newlines between the elements. The DOM spec says that these text nodes can (but don't have to be) be preserved and represented in the DOM. We choose to do so, so that document roundtripping can occur. Since IE doesn't, it does unfortunately mean that you have to code around them. Specifically, scripts that use hardcoded childNode offsets will not work across browsers. We might consider getting rid of these small text nodes for efficiency reasons. For now, we've made the explicit decision to keep them.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
Thanks for clarifying, Vidur -- that was the suspicion of one of the engineers here (Macromedia). This could also be what's causing the code in bug 26178 to act oddly, though I did check the contents of the nodes before sorting them to make sure I had the right ones. I'll look into it and update the status of 26178 accordingly.
Status: RESOLVED → CLOSED
I'm reopening this one because the more I think about it, the more I think exposing whitespace as a text node is the wrong thing to do. Why? Because you can't do it consistently. Text nodes are invalid in between table rows and table cells, for example -- so the same newline that appears after a </td> cannot be represented as a text node the way it can be if it appears after a </p>. This harms the roundtripping argument, IMHO. Also, it seems to me that whitespace shouldn't affect the structure of a document; I consider an HTML document with no linebreaks to be structurally identical to one with all kinds of whitespace between tags. If newlines are represented as text nodes, the two documents would be structurally different.
Status: CLOSED → REOPENED
Resolution: INVALID → ---
After talking it over with Eric Krock, our decision is to Future this one. It means, unfortunately, that scripts need to take into account the possible existence of extra text nodes in element content. David Baron is probably going to write a set of utilities (at the very least, a tech note) to help with common operations. Lhylan is correct, though - we're not even including the text nodes consistently.
Target Milestone: --- → Future
*** Bug 48560 has been marked as a duplicate of this bug. ***
Mass update of qa contact
QA Contact: gerardok → janc
*** Bug 62269 has been marked as a duplicate of this bug. ***
What is the DOM working group position on this? Vidur?
I've checked the DOM standard and it's not clear as to whether there should be these phantom text nodes. I am running PC Windows 98 on a PC so should the OS/platform be updated on this bug? IMHO, I can't see the reason for all these little text nodes. It just fills the DOM with nonsense. The frustrating thing is the incompatibility with IE. I thought the W3C was meant to clear this up. If Netscape and Microsoft are going to differ on interpretation, then we are heading back to the bad old days of browser incompatibility. Don't you guys ever talk to another ?
Tim: Not only do they talk to each other, they come up with the spec together...
The DOM spec doesn't really say, I don't think. The SGML spec has some provisions for ignoring newlines just inside start/end tags, but since we already totally ignore what it says on whitespace (as have basically all HTML browsers ever), I don't think we should take it too seriously. Preserving the white space in the DOM is needed for things like 'white-space: pre' to work and for the editor to output HTML as it was read in. There are some notes on how to deal with these "extra" text nodes at: http://mozilla.org/docs/dom/technote/whitespace/
Is bug 65658 a dup?
*** Bug 65658 has been marked as a duplicate of this bug. ***
Keywords: dom1
Component: DOM Level 1 → DOM Core
QA contact Update
QA Contact: janc → desale
Updating QA contact to Shivakiran Tummala.
QA Contact: desale → stummala
*** Bug 89782 has been marked as a duplicate of this bug. ***
I hope this can be taken care of before Moz 1.0...
Isn't this INVALID or WONTFIX? And if not, why not?
Whiteboard: WONTFIX?
*** Bug 104785 has been marked as a duplicate of this bug. ***
Attached file testcase (deleted) —
Keywords: testcase
OS: Mac System 8.6 → All
Hardware: Macintosh → All
Vidur can you give an update on this? These "phantom" nodes are going to play havoc on folks writing DOM js code - if they have to do a browser detect just to filter out all these empty text nodes, and doing things like childNodes is going to report very different numbers between IE and mozilla.
Adding roger (dhtml apps perspective)
*** Bug 114749 has been marked as a duplicate of this bug. ***
*** Bug 118213 has been marked as a duplicate of this bug. ***
Newlines are translated into a space, is this a wontfix
why? why wouldnt newlines be ignored between tags?
The more I think about this, by speaking with web app developers, and by working on DOM samples, the more I think the current way to handle whitespace nodes is flawed. It is however necessary to take a decision and resolve this bug before 1.0, because changing the way it works after 1.0 would be seriously bad for backwards-compatibility. I think including whitespace nodes is clumsy for the following reasons: 1) It is not compatible with MSIE (and probably Opera). 2) The DOM code is heavily dependant on the markup (insert a new line somewhere and your average childNodes[] or firstChild breaks) 3) Although I have no proof, building nodes for each whitespace node (about 2 per tag if your markup is readable) is probably a perf hit and a footprint hit (although jst's recent mInner changes made this better) 4) It's highly non-intuitive for new developers. I find myself often wondering why my .firstChild has no properties. 5) Those nodes carry no information (see below for exceptions) These observations are based on my own experience, now nearly one year of discussions with web developers, lurking in developer newsgroups and mailing lists, and developing web apps myself. This is one of the mostfreq question in the developer newsgroups and mailing lists. The only good reasons to keep te whitespace nodes were given by David Baron in a comment in this bug: i) 'white-space: pre' currently needs this to work ii) the editor needs them to output HTML as it was read in. This is perhaps not a good argument, but for ii), the editor already makes enough of a mess of your source code not to worry about some extra new lines. As far as I'm concerned it should put new lines automatically to make code more readable :-P Not sure about i), but perhaps layout has some "whitespace frames" that could be used for the same purpose? This bug escaped my attention in my quest for Mozilla 1.0 bugs, but now I definitely include it. Thanks for reading :-)
um, I think Sivakiran's missing the point. it's not about newlines being converted to spaces -- it's about N6 REPORTING THOSE SPACES AS TEXT NODES! because text nodes are not allowed between a TR and a TD, for example, it's impossible to apply the "whitespace = text node" rule consistently. so that's the logical argument for fixing this bug. Fabian posted a very eloquent version of the practical argument while I was busy changing my password. ;)
Which nodes are you proposing to eliminate? Only the ones inside start-tags and end-tags, or all of the ones you can (whatever that means)? If the former, how do you plan to handle the childNodes array for something like: <ul> <li>foo</li> <li>bar</li> </ul> I think the DOM benifits of such an approach are minimal. If the latter, how do you plan to ensure that <span>two</span> <span>words</span> aren't merged together? Which does IE do? In all cases, how do you plan to keep 'white-space: pre' working?
well how about making the actual text (non-parsed) source of the document accessible and then have parser provide line/column position of every node/tocken inside of it, so that you can locate nessesary portions of text. not only you can get 'white-space: pre' working this way, but you can also fix View Source not displaying actual source of the document.
To get this, removal of white space textnodes that are not allowed, to work one would have to check the DTD of the SGML and/or XML document. Is this information available from the SGML/XML parser? I like Alexey Chernyak's idea about adding the line and column position to the elements/nodes. This would allow the composer to maintain the layout without actually inserting text nodes in the DOM tree where they are not allowed.
I believe, although I'm not entirely certain, that DOM-2 Range covers resolution down to the individual character. I do know Mozilla supports DOM-2 Range.
the real pain for web developers is that stuff like previousSibling etc. get practically useless... or you have to call helper functions every time you want to use it. just stumbled across it yesterday: <div id="div1"></div> <div id="div2"></div> != <div id="div1"></div><div id="div2"></div> and the stupid thing is, codewise i want to have version 1., because everyting else is unreadable. js-wise i want to have version 2., because there is no whitespace in it. but still i cant be sure that no one changes it (including my unaware self) at a later time and breaks the code. so what about (uuuh, i know this would have to go through the standardisation process) making previousSibling etc. a function like HTMLElement.previousSibling(boolean skipWhitespace) or even HTMLElement.previousSibling(boolean skipTextNodes). another possibility would be to distinguish between whitespace and newlines, and skip all the newlines. as far as i understand this would allow white-space:pre to work. newlines between tags are always just for source code readability, right?
I wonder if DOM Traversal (TreeWalker (and NodeIterator, once we have it)) would be the right way to work around the problem for scripts. The problem of whitespace within tables and stuff is a validation issue. Mozilla doesn't validate, so we shouldn't bother. And I read http://www.w3.org/TR/2001/WD-DOM-Level-2-HTML-20011210/html.html#ID-6986576 to indicate that one should use the .cells to get to the tr or th. (If that one would contain textnodes, that'd be bah.)
Is anything moving with regards to this bug? A solution is really important to some of the stuff I am trying to do, so I hope so. Also does anyone know if there is any movement for fuller implementation on the CSS2-Transverse-Range module?
nominating nsbeta1 - can we ship with all this cruft in the dom?
Keywords: nsbeta1
I still think this should be WONTFIX. Furthermore, nobody's actually proposed how to fix it -- that requires answering the questions in comment 29.
Maybe we should do some sort of check for "white-space: pre" before removing superfluous white space. There cannot be so many elements which have that natively. Likewise, someone would have to set a style attribute to contain that phrase before it would take effect on an element that didn't default to that.
The DOM doesn't currently know anything about the style system, and it would be a major architectural change if it did. (How could we resolve style without having DOM nodes to resolve the style on?)
:( This is a lot like math classes on scientific notation: which "0" digits are significant? 000025.7000... My opinion is, if it renders, include it. If it doesn't render or isn't meant to render, exclude it. For HTML, white space between <head> and <body> tags doesn't render. White space between <pre> and </pre> tags does. White space between any two tags in general, if there is at least one space or tab between them, we typically treat as a single space. (We have &nbsp; for when we want a space.) I haven't even begun to think about white space implications for XML DOM. What if we did a process late in the game, before any event handlers or scripting take over but after styling, of cleaning up the nodes?
David, latest spec very clearly defines how white space characters should be treated: http://www.w3.org/TR/2001/WD-xhtml1-20011004/#uaconf Here are relevant extracts out of that document for every-day pages: ...snip... * All white space surrounding block elements should be removed. * Comments are removed entirely and do not affect white space handling. One white space character on either side of a comment is treated as two white space characters. ...snip... * Leading and trailing white space inside a block element must be removed. * A sequence of white space characters without any LINE FEED characters must be reduced to a single SPACE character. * A sequence of white space characters with one or more LINE FEED characters must be reduced in the same way as a single LINE FEED character. ...snip... * The LINE FEED character must be converted into a SPACE character. This very well answers your question: <span>two</span><span>words</span> should be merged together. while in: <span>two</span> <span>words</span> LINE FEED and any other white space characters in between should be replaced with a SPACE character. This is a very straight forward spec. And we should be thankful to W3C for defining it so clearly. As for 'white-space: pre', see comment 30
Here's an example of what a tree traversal tool like DOM Inspector see's when it looks at a website's DOM: (see image) #text nodes all over the place...
Leading and trailing newlines of a textnode should be skipped. Isn't it as simple as that? if i write <tag1> </tag1> i obviously want a newline there. not so for <tag1> </tag1> if i write: <pre> Some text here. Some on another line. </pre> i want the data of the text node to be == " Some text here.\n\n Some on another line."
That's not how it works. The <pre>...</pre> element means "preformatted text". That means the browser must copy the text to the screen, character for character. Including the newlines immediately adjoining the tag. The exception are additional markup tags, such as <em>...</em>, which would still apply inside the preformatted text tags. Inside <pre>...</pre> tags is always considered "significant", unconditionally. The preformatted text element is a block-level element, like the <p>...</p> element. Typically people do not add newline characters immediately following the opening tag of either one, but if they do for the preformatted text element, the browser must assume that's intentional.
As Alexey points out the W3C recommendation is very clear on how white space should be treated.
Moving to jst@netscape.com's bug list. Apologies for letting it languish on mine.
Assignee: vidur → jst
Status: REOPENED → NEW
From jscript@pacbell.net aka WeirdAl > Maybe we should do some sort of check for "white-space: pre" before removing > superfluous white space. not so easy, what if I dynamically set an elements CSS white-space property to pre ? Parsing should not depend on the stylesheet I think. From alexey@ihug.com.au > David, latest spec very clearly defines how white space characters should be > treated: > http://www.w3.org/TR/2001/WD-xhtml1-20011004/#uaconf This will only apply to xhtml served as xml when the xml:space attribute of the element is not set to preserve. I assume that <pre> in xhtml has the xml:space attribute set to preserve. If the white spaces are removed in HTML based on what that doc mentions but without a way to revert back to current behavior, we have effectively removed support for css property white-space: pre; in html. From jscript@pacbell.net aka WeirdAl > What if we did a process late in the game, before any event handlers or > scripting take over but after styling, of cleaning up the nodes? Again not so easy, what if I dynamically set an elements CSS white-space property to pre ?
As a first pass, how about just deleting #text nodes whose content is *entirely* white space (incl. newline)? That should eliminate most of the problem with the phantom nodes in the DOM tree, but still leave the information that "white-space: pre" needs.
> how about just deleting #text nodes whose content is *entirely* white space <p><em>This</em> <em>should</em> <em>have</em> <em>spaces</em></p>
and it does in IE: HTML HEAD TITLE SCRIPT STYLE BODY P EM Text: This Text: EM Text: should Text: EM Text: have Text: EM Text: spaces In N6, the same text nodes are there, but so are several others: HTML HEAD TITLE Text: Foo Document Text: SCRIPT Text: var structWin; function openStructWin(){ structWin = window.open ('','foo','width=600,scrollbars=yes,resizable=yes,menubar=yes'); structWin.document.write('<' + 'html>\n<' + 'head>\n<' + 'title>Document Structure\n<' + 'style type="text/css">\nPRE ' + '{font-size: 24pt}' + '\n\n\n\n<' + 'body bgcolor="#FFFFFF">\n\n<' + 'pre>'); } function getNodeAndChildrenAsString(indentString,theNode){ var structString = ''; var structString = ''; var nodeString; var theChildren = theNode.childNodes; // If theNode is an element node if (theNode.nodeType == '1'){ nodeString = "<" + "b>" + theNode.tagName + ""; // Otherwise, if theNode is a text node }else if (theNode.nodeType == '3'){ nodeString = "<" + "b>Text: " + theNode.data; // In any other case (and here I'm assuming // that the only other case would be a // comment node) }else{ nodeString = "<" + "b>Comment: " + theNode.data; } structString += indentString + nodeString; structString += '\n'; structWin.document.write(structString); for (var i=0; i < theChildren.length; i++){ getNodeAndChildrenAsString(indentString + " ", theChildren[i]); } } function writeClosingTags(){ structWin.document.write('<' + '/pre>\n\n<' + '/body>\n<' + '/html>'); } Text: STYLE Text: Text: BODY Text: P EM Text: This Text: EM Text: should Text: EM Text: have Text: EM Text: spaces Text:
Guess I should have said what the point of that little demonstration was... My point was: how is IE doing it? I'm sorry I can't propose a solution, since I don't know how any of the code works, but it seems like IE must be following rules. That is, "whitespace is allowed here, but not here" or some such. I would guess that one of the rules in play in the example above is that whitespace between tags that are within a paragraph are significant and must be preserved as text nodes. Others might be: Whitespace between block-level tags is not significant, and therefore need not be preserved as text nodes. Text nodes between TR and TD tags is not allowed, and therefore whitespace between TR and TD tags is not preserved as text nodes. Maybe this is just really hard, or not the way you've tackled this thus far? As an engineer on another product, I can't stand it when people who've never seen the codebase say, "this should be so easy to implement!", so I won't. ;)
You know, I've looked at this, and I've changed my mind. I said: >My opinion is, if it renders, include it. If it doesn't render or isn't meant >to render, exclude it. But that's a cheap way to approach a fundamental question. A document's rendering does not necessarily correspond to how the DOM views it. Does the DOM see a <!DOCTYPE > tag? Yes. Does the user? No. We're dancing around the issue. Should there or should there not be whitespace text nodes in the DOM? We haven't yet figured that out. Frankly, when I restrict my perspective to that specific question, I am forced to give those whitespace text nodes the benefit of the doubt. What reason do we really have to take them out? IE's behavior, as we are all well aware, is not necessarily the correct behavior. Nor should conveinience always dictate what we require of Mozilla 1.0. Sloppy coding by users is what brought people to condemn Netscape 6.0, because Mozilla and Netscape stopped supporting layers. I recommend WONTFIX or INVALID.
There is no question on which nodes and where to delete. The spec is very clear on that. No point in discussing it. The question at hand is how to get CSS property 'white-space: pre' to work after white spaces were removed from DOM. >We're dancing around the issue. Should there or should there not be whitespace >text nodes in the DOM? We haven't yet figured that out. The way I see it, DOM is a mechanism for describing a document structure (HTML). And HTML specification defines the rules for structure of HTML documents. The structure described by our DOM violates those rules, so it basically can't be called an HTML DOM. We need a different solution for handling 'white-space: pre' from the one we have now. Also we need different solution for ViewSource, which right now relies on whitespaces in DOM and doesn't show the actual source of the document! 2 possible approaches are: 1. Storing discarded white space information so it can be used by "pre" or ViewSource. This will keep ViewSource working, but will not help to show the *actual* source. 2. Ability to retrieve and use raw portions of code. This is preferable for ViewSource. However coordinates would have to be preserved for each element for ViewSource colouring to work. This also would involve re-parsing for "pre", for it may contain other tags inside of it. The second approach is more favourable, but looks harder. So, is this feasible? Fixing this bug and ViewSource before 1.0 release would be really awesome.
A few comments on the XHTML spec link Alexey posted: 1) The line in question ("All white space surrounding block elements should be removed.") is a "should" not a "must". 2) By "block" I presume it means things that are declared to be blocks in the DTD? That is, at http://www.zbarsky.org:8000/~bzbarsky/domTest.html the first two "Some text" occurences should be on one line with no space between them while the second two should be on two separate lines?
In reply to comment 41: The XHTML spec is on crack. See my post in www-talk: http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0141.html I was asked to make a "definitive standards statement". My opinion are my own, and are thus not normative or anything, but: I would say this bug is a WONTFIX. In fact I put a comment to that effect in the status whiteboard last July.
Web authors should read: http://www.mozilla.org/docs/dom/technote/whitespace/ I'm going to mark this WONTFIX because bz gave me the go-ahead to do so. :-)
Status: NEW → RESOLVED
Closed: 25 years ago23 years ago
Resolution: --- → WONTFIX
To clarify the go-ahead part.... The XHTML specification in question is a Working Draft. As such it is most definitely not final. If one implements what that draft currently says then no strings anywhere in the DOM would ever have newlines in them; an obviously ridiculous proposition when one considers the contents of textareas or <pre> elements. The comment handling recommended in the draft is also completely bogus -- all the comments would be gone from the DOM.... I feel fairly certain that this specification's whitespace handling will be amended before it gets to be a recommendation. (At least I hope so for the sake of the sanity of the CSS and DOM specification authors).
Actually, the XHTML specification in question is a REC. http://www.w3.org/TR/xhtml1/
Boris, XHTML *is* already a reccomendation. If is final. I have provided a link to the Second Edition draft which is much more specific on white space handling than the first edition. Second Edition doesn't change any of these rules, it just makes them more clear and more precise. You can have a look at non-draft XHTML Reccomendation as well: http://www.w3.org/TR/xhtml1/#uaconf Ian, Boris, are you ready to say that <div> </div> and <div></div> are 2 completely different documents DOM-wise? If you are, I'll go with that. But with my limited knowledge of DOM spec I am really uncomfortable with that statement.
alexey: Yes, they are different. IMHO. Why wouldn't they be? Regarding the XHTML spec: Like I said above, it's on crack. It is totally out of line for the XHTML working group to be laying down rules on how the parser and the DOM should interact when handling XML. There is no way, IMHO of course, that we should special-case different namespaces' white-space handling.
Sorry to drag this one up again, but I think that it is very important and I have a new suggestion. Would it be possible to somehow correlate the document with its DTD? If #PCDATA is not valid in any given element then the children of all elements of this type could be examined and, if one of its child nodes is an (all whitespace) text node, is could be removed. In this way at least the 'illegal' text nodes could be removed so that they don't turn up between </td> <td> etc. These whitespace text nodes are only there because of the formatting of the source code, and have nothing to do with whitespace: pre. Of course <div> </div> and <div></div> etc. would remain different (as they should IMHO) and whitespace: pre would not be broken which seems to be the main concern. Please consider this or some other means of getting rid of at lest the 'illegal' whitespace text nodes. My concern is that leaving these text nodes in the DOM after Moz1.0 will also mess things up for scripts working on none HTML documents such as SVG. Helper functions are all very well, but they are a messy work around.
This controversy could be dealt with effectively (for the XML DOM, at least) by implementing a validating XML parser in Mozilla. Here is the reasoning: A validating parser would be able to read DTDs. This would allow us to determine which elements have xml:space="preserve" in which whitespace *must* be preserved; we could then change our application convention for "default" to condense whitespace. This would mean that xml:space="preserve" elements retained whitespace within them in the DOM, and that it would be condensed or stripped away in xml:space="default" elements. We would also set white-space: pre on these elements in our html.css, so that they would *display* without space. The disadvantage, obviously, is that white-space: pre on other elements would have no effect. However, document authors wishing to make use of this could use an external DTD subset to add the xml:space attribute to any such elements, which would cause the parser to preserve spaces in them as well, which they could then reflect or not reflect in display through whitespace: pre.
*** Bug 131169 has been marked as a duplicate of this bug. ***
Actually we can't do any of this at all anyway, otherwise you lose round-tripping of white-space in document source. e.g. around <table> tags. This is important for various reasons, e.g. the "generated source" bookmarklet.
*** Bug 144603 has been marked as a duplicate of this bug. ***
*** Bug 147489 has been marked as a duplicate of this bug. ***
WONTFIX?? Aren't you be serious? My bug report has been marked as a duplicate, but please have a look at it because I've given suggestion. It's bug 147489. If you don't want to look at it, I can summarize what I said: I don't care about how IE treats it, and I don't care about XHTML. I just use the (old) HTML 4.01 and DOM1. Look at this section in W3C DOM specification: http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/introduction.html#ID-E7C30821 By looking at the tree inside the figure, I can surely say that those guys in W3C implicitly agree that there's no #text siblings for block-level elements. HTML 4 is very clear about block-level elements and in-line elements. So the suggestion is simple: eliminate block-level elements' whitespace siblings, but preserve in-line elements' whitespace siblings.
*** Bug 147487 has been marked as a duplicate of this bug. ***
Since you have obviously not read this bug... "What happens when the element's style is changed from block to inline?" The problem is that HTML has this concept of block-level elements that is totally divorced from the CSS concept of block elements. Thing is, Mozilla is a CSS browser, not an HTML browser. So if we have to pick which concept of "block" we go with we sort of have to choose the CSS one.
Boris, I've read through about 80% of this whole bug before writing my comment, but maybe I miss something. So, which comment does your question refer to?
Re comment #54 (which is in response to comment #41): In HTML 4 spec, there's a section on conformance (2nd paragraph): http://www.w3.org/TR/html4/conform.html which further refers to http://www.ietf.org/rfc/rfc2119.txt In summary, "should" means "recommended", "but the full implications must be understood and carefully weighed before choosing a different course." (sic) On the other hand, when you look up in a dictionary, "should" mostly means "express obligation" and "must". IMO, we have to reopen this bug and consider the issue seriously.
HTML does not give any rules for what must appear in the DOM, so the HTML spec is irrelevant in this case.
> when you look up in a dictionary This is what that RFC exists for. To define exactly what those words mean in RFCs. Said meaning has nothing to do with their dictionary meaning; the RFC authors chose to pick words that already had existing meanings but they could just as easily have come up with brand-new terms and defined them. "should" and "must" are totally different in an RFC and in the HTML specification. Comment 29 is the comment I was referring to. Consider taking some <td> or <li> nodes and setting them to display:inline....
Re comment #72: Ok, granted! but DOM does refer back to HTML: http://www.w3.org/TR/REC-DOM-Level-1/introduction.html#ID-E7C30821 Looking at the last sentence in the paragraph below the figure: "...if any two Document Object Model implementations are used to create a representation of the same document, they will create the same structure model, with precisely the same objects and relationships." Since the structure model produced by Mozilla is different from what they represent in the figure, Mozilla has a wrong DOM implementation. QED!
In complement to my previous comment, look at this (maybe you would say it's irrelevant because it's XML, not HTML, but I would say they are the same): http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-745549614 Could you see how they write the XML code? It's like this: <elementExample id="demo"> <subelement1/> <subelement2><subsubelement/></subelement2> </elementExample> And they said "... node for "elementExample", which contains TWO child Element nodes, ..." They said TWO, not FIVE! But if we use Mozilla's DOM implementation, if we take SGML linebreak rule into account <http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.1>, we have at least to write the code in this way: <elementExample id="demo"> <subelement1/><subelement2><subsubelement/></subelement2> </elementExample> If SGML linebreak rule isn't respected, we even have to write the code in this way in order to produce the same structure: <elementExample id="demo"><subelement1/><subelement2><subsubelement/></subelement2></elementExample> So, either we say HTML, XML and DOM1 specifications are all wrong, or there's problem in Mozilla's implementation. As a last word, if we let this state remain in Mozilla 1.0, it won't be accepted by many people and this bug could be one main responsible of its failure.
Re comment #73 : Exactly! And the RFC states clearly that "should" = "recomended", not "optional". Q: And what _would_ you do if someone _recomends_ you to do something? A: I think you'd _better_ do it rather than ignoring it. I talked about dictionary is to point out that if we use the normal meaning of "should", it's not "recommended" either.
> different from what they represent in the figure The figure is informative, not normative. > And they said "... node for "elementExample", which contains TWO child > Element nodes, ..." > They said TWO, not FIVE! Yes. #text nodes are not Element nodes. As the definition of "should" says: "but the full implications must be understood and carefully weighed before choosing a different course". They have been. The implications of following that "should" are inconsistent behavior and incorrect layout when style is changed on some elements. Having correct layout outweighed the dubious benefit of following a "should" in which the HTML specification tries to dictate how the DOM should be constructed (something that is outside the scope of the HTML specification).
Re comment #29 : For the <ul> code: <ul> <li>foo</li> <li>bar</li> </ul> as I stated before, we just have to strip off block level elements' whitespace siblings, ie <li>'s whitespace sibling. In other words, Mozilla's present implementation gives: UL +--#text +--LI +--#text +--LI +--#text Since LI is block element, this gives UL +--LI +--LI ______________________________________________ For SPAN: <span>two</span> <span>words</span> Mozilla gives: | +--SPAN +--#text +--SPAN We must not strip the #text node because SPAN is in-line element, so the #text node is conserved. ______________________________________________ Lastly, for 'white-space: pre' ...... hmmm, I admit that this is very delicate! But first of all, we could look altogether at its spec: http://www.w3.org/TR/REC-CSS1#white-space Please note that it only applies to block-level elements...... So what? I don't know yet :( I'm in Europe and it's evening now. Let me go home and think of this issue during the night. But IMO, there're much more developpers twisting DOM nodes than there're developpers to use this CSS property. Or maybe those whitespace siblings could be made "invisible" in DOM, and when a 'white-space: pre' is used, they're rendered visible again?
<ul style="display: inline"> <li style="display: inline">foo</li> <li style="display: inline">bar</li> </ul> Should there be a space between "foo" and "bar"? The point is that the block/inline distinction for _layout_ purposes is determined by CSS, not HTML.
I'm someone who like to do things according to intuition and common sense, and I feel really relectant to pick every word somebody else had said, like what those lawyers did when defending their clients ...... but it seems I've no choice. Re comment #77 : Part 1) I don't see in the page that the figure is only informative, but not nonative. And even if granted that it's informative, it's then informative on the structure as it's written in the paragraph. Part 2) Good remark! ... but exactly with the same argument: they didn't talk about text nodes!
> Looking at the last sentence in the paragraph below the figure: > "...if any two Document Object Model implementations are used to create a > representation of the same document, they will create the same structure > model, with precisely the same objects and relationships." If you look at the equivalent sentence in the latest version of the specification (http://www.w3.org/TR/DOM-Level-2-Core/introduction.html) you'll see that it now says: # One important property of DOM structure models is structural isomorphism: if # any two Document Object Model implementations are used to create a # representation of the same document, they will create the same structure # model, in accordance with the XML Information Set. The XML Information Set (infoset for short) includes white-space nodes. In fact it goes _on_ to say: # Note: There may be some variations depending on the parser being used to build # the DOM. For instance, the DOM may not contain whitespaces in element content # if the parser discards them. In other words, we are explicitly within our rights to include the white-space nodes according to the latest version of the DOM Core Specification. > Please note that [white-space] only applies to block-level elements...... That is also an error in the spec, and it has been corrected in the recently published working draft of the next version of the text module. The 'white-space' property applies to all elements and generated content.
> intuition and common sense Apply those to http://web.mit.edu/bzbarsky/www/testcases/testTextNodes.html where IE5.0 has the second <li> as the nextSibling of the first <li> but shows space between the two! Where the hell did that space come from? Is this the behavior you want from Mozilla?
*** Bug 159352 has been marked as a duplicate of this bug. ***
reopening bug, i think this is still an open issue
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
sivarikan, would you care to explain in more detail? I assume that you've read the discussion on this bug and have something insightful to add?
#document HTML HEAD TITLE #text #text -- not allowed SCRIPT #text #text BODY ... ... When trying to access the script node, i actually got the text node. we should not have a #text node as child of head element. ref: http://www.w3.org/TR/1998/REC-html40-19980424/struct/global.html#h-7.4.1
This is the HTML parser we're talking about? Or the XML parser parsing an XHTML document?
HTML parser
OK. Could you please file a bug on the parser module saying that text nodes should not be created where they are illegal per the DTD? For example, the following "text" strings should be dropped because there is nowhere in that document where a textnode would be valid: <html> <head> text </head> <body> text <dl> text </dl> </body> </html> Or did you want to just special-case <head> elements here?
Recommend removing the nsbeta1 keyword as well, as long as we're not certain we should do this even in Mozilla, much less Netscape.
Re: comment #89 -- if you do file a separate bug, can you make sure to note the table tags case (that text nodes are not allowed between them)? I'm cautiously thrilled to think that this discussion could move off of whether to preserve whitespace at all and on to whether to create text nodes where they are explicitly forbidden, since that was my original argument for reopening this bug (comment #3). having said that, thanks much to everyone who's thought long and hard about all the issues involved here.
The last comments on this bug are INVALID. While arbitrary text is not allowed between elements in HTML <head> blocks, text consisting of exclusively white-space characters _is_ allowed, and no spec that I know of says that this should not be represented in the DOM.
Status: REOPENED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → INVALID
In fact, looking carefully at the spec, I'd say our behavior is mandated for XML. (I don't think much of it for HTML, but I generally think that representing HTML4 with DOM is as bozotic as Appendix C, a view that's unlikely to gain political traction...) DOM1, Interface Text: "The Text interface represents the textual content (termed character data in XML) of an Element or Attr." XML 1.0, section 2.4: "All text that is not markup constitutes the character data of the document." (Would it have been that painful for XML to distinguish non-significant whitespace from character data?!)
why should it be represented in DOM when they are not allowed?
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
*sigh* HTML follows the rules of SGML, so the content model of the HTML DTD differentiates between whitespace (the "s" production of SGML) and actual character data. Whitespace can crop up in the head, even though character data is not allowed, because of this distinction. XML and DOM don't make the distinction; to them, everything that is not markup is character data. Therefore, things that are not character data in HTML/SGML become character data in the DOM. Again, the problem is that HTML is being forced into a representation (DOM) which is simply inadequate to represent it, and this is one of the places where it shows. (And if there's some way we could modify DOM Core to adequately represent SGML-based HTML, I'm all for it.)
Sivarikan, you never answered my question from comment 89... What should the DOM representation of that document be in Mozilla and why?
boris, this is what i am expecting and yes i opened a bug on parser module. #document HTML HEAD TITLE #text SCRIPT #text
sivarikan, you must be looking at the wrong HTML... see comment 89 again.
boris, may be i was not clear before, <head> is just an example, in general i am talking about all the cases where having a text node is illegal according to DTD.
Yeah, that's my point too. Text is not allowed in <body> by the DTD. Nor in <dl>. Nor in various other places where people commonly put it. If we made any attempt to enforce that, large chunks of the web would start failing to render... For that matter, why pick on this part of the DTD? Why not use a validating SGML parser that will completely fail to parse 99% of the pages on the web? (Think <p> inside <font>, <a> around all sorts of crud, <form> inside <table>, etc, etc.) Sorry, but parsing tag soup per DTD is a lost cause.
Not to mention that as choess and hixie pointed out whitespace is not CDATA and is therefore allowed in <head>... The DTD does not say anything about the DOM that is produces, since there is no real concept of a DOM in SGML-land, again as choess points out.
This specific bug (whitespace nodes existing in the DOM), as described above, is invalid. Marking as such.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → INVALID
If there was such a thing as "whitespace nodes", then perhaps this would be invalid. The problem is that this whitespace is being represented as TEXT NODES in the DOM, in places where text nodes are not allowed. That's the logical problem. The practical problem is that all these extra text nodes make coding for Mozilla/Netscape a nightmare. For those of us who thought standards would save the web... well, I guess we were wrong. :(
> in places where text nodes are not allowed Why are they not allowed there exactly? PCDATA is not allowed in the source in those places. If it's present, that's an error in the source. Whitespace _is_ allowed in the source there. The DTD has nothing to do with the DOM representation. Standards that do not specify behavior (as the DOM standard does not here) can be a PITA, as you noted. This is a problem with the DOM standard that pops up all over the place...
FWIW, I have mostly seen this "bug" exhibited throughout the web in a table context. A valid standard compliant workaround for tables is to replace this: tableElt.childNodes[0].childNodes[0].childNodes[0]; with this: tableElt.tBodies[0].rows[0].cells[0]; Of course there aren't workarounds for everything, but from what I've seen, this should cure many issues.
> The problem is that this whitespace is being represented as TEXT NODES > in the DOM, in places where text nodes are not allowed. No spec says text nodes aren't allowed there. The HTML spec isn't defined in terms of the DOM, and the DOM spec isn't defined in terms of SGML DTDs.
n.b. SGML (well, HyTime) does have a DOM-like standard (closer to XML Infoset, actually), SGML groves. But that's another story.
http://www.codingforums.com/showthread.php?s=&threadid=7028 The above link has a script or two that can be customized to remove the whitespace nodes in a document. A few notes: Re comment 29, if the onus is on the webpage developer (as this bug's invalid status suggests, and I agree with that), then the webpage developer can simply move the space inside one of the spans. True, a stying effect like text-decoration:underline can spoil things a little, but other styling effects might be able to achieve the same whitespace rendering in the document. The page developer simply has to be careful.
Status: RESOLVED → VERIFIED
Whiteboard: WONTFIX? → http://mozilla.org/docs/dom/technote/whitespace/
so here's a thought. Strings have a shared, common empty buffer that is used whenever a string is empty, to guarantee that a string has a non-null value. Could we use the same approach here? Have a single shared #text node that represents all unstyled text nodes in the tree? Making things like the .parent node might be ugly, but maybe a simple stack-based wrapper around the real node in places where you need to access the text node would suffice? does this sound feasable enough to file a bug?
*** Bug 178508 has been marked as a duplicate of this bug. ***
*** Bug 179709 has been marked as a duplicate of this bug. ***
Hmm... this bug makes for a very interesting read! <angry swedish web developer mode> So the solution IN REAL LIFE (=to get it to work in the 2 major browsers on earth) is to code everything on one row, so my example from bug 179709 would look like this: <table border="1"><tr id="TableRow" onClick="getLastChild(this);"><td id="A">A</td><td id="B">B</td></tr></table> Yes, Ahaa, Mmm... it really feels like a BIG STEP FORWARD folks! ;) </angry swedish web developer mode> I just hope there is a way to solve this "problem" (no, I don't want to have to use helper functions), because this is really realy bad for Mozilla acceptance among the "I only code for IE because it's got biggest market share" group. (of which I'm not a member) Sorry to have taken up your time!
If you're needing this for tables only, you can use DOM HTML to your advantage. var myTable = document.getElementById("myTable"); var lastTBody = myTable.tBodies[myTable.tBodies.length - 1]; var lastRow = lastTBody.rows[lastTBody.rows.length - 1]; var lastCell = lastRow.cells[lastRow.cells.length - 1]; Of course, not every HTML element has a similar API...
*** Bug 189467 has been marked as a duplicate of this bug. ***
*** Bug 196983 has been marked as a duplicate of this bug. ***
To follow up the <angry swedish web developer mode> comment here is: <angry dutch web developer mode> I'm slightly shocked after reading this bug from top to bottom. We have an expression for this in dutch which translates as "Operation succesful, patient deceased". It seems that you (the Mozilla developers) have succeeded in defending your point of view to the point that using the DOM becomes fairly useless. I would like to use a function like: function toggle(thingy) { elem = thingy.nextSibling; if(elem.style.display == "none") { display(elem); } else { hide(elem); } } on some code like: <h3 onclick="toggle(this)">Header</h3> <table> <tr><td class="desc">Content</td><td>Content2</td></tr> <tr><td class="desc">Content</td><td>Content2</td></tr> </table> Which according to your Vulcan Logic (TM) does not work because there is a text node between the </h3> and the <table>. I'd have to write it as: <h3 onclick="toggle(this)">Header</h3><table> <tr><td class="desc">Content</td><td>Content2</td></tr> <tr><td class="desc">Content</td><td>Content2</td></tr> </table> Hurray for legibility!!! Any web developers remember writing code like this to get a background color on a thin table cell?: <TD><FONT SIZE=1>&nbsp;</FONT></TD> Or remember when <TR> <TD> <IMG HEIGHT=10 SRC="bg.gif"> </TD> </TR> Had to be written as <TR><TD><IMG HEIGHT=10 SRC="bg.gif"</TD></TR> to render without extra white-space? I thought/hoped we had left those dark ages behind us. No matter how "right" you are, I predict many, many duplicates for this bug. IE DOM developers are going to run, not walk, away from Mozilla, since any next/previousSibling function is useless without a wrapper. This behaviour is going to bite and discourage the vast majority of beginning DOM developers, which is unfortunate, as the idea behind the DOM is really neat. </angry dutch web developer mode> Sorry for the rant, but I really hope you will reconsider your point of view on this issue.
Which part of: "Either the text nodes are there or the layout may be incorrect" do people fail to understand? Yes, it sucks. No, the CSS and DOM specs leave no other choice. Please raise this point with the W3C if you don't like it.
I reported this duplicate. http://bugzilla.mozilla.org/show_bug.cgi?id=196983 I think these new-lines and white-spaces are not HTML Contents. It so problem....
*** Bug 206729 has been marked as a duplicate of this bug. ***
*** Bug 214943 has been marked as a duplicate of this bug. ***
*** Bug 217842 has been marked as a duplicate of this bug. ***
*** Bug 221364 has been marked as a duplicate of this bug. ***
*** Bug 252684 has been marked as a duplicate of this bug. ***
*** Bug 258564 has been marked as a duplicate of this bug. ***
*** Bug 263813 has been marked as a duplicate of this bug. ***
As others have pointed out, this issue will cause much grief for developers attempting to walk DOM trees. My primary concern at the moment revolves around XML in an XHTML document (e.g. something pulled down by an XMLHttpRequest). With no formal XML parser or DTD validator, these text nodes become both extraneous and erroneous. If there is to eventually be a formal XML parser then fine, but if childNodes() and the like are to be used to walk XML as well then we will all be severely crippled at best. For example, as others have pointed out, the following XML: <myData> <myChilddata> <data1>A</data1> <data2>B</data2> </myChilddata> </myData> ...under current circumstances is completely different from the following XML: <myData><myChilddata><data1>A</data1><data2>B</data2></myChilddata></myData> Something will definitely need to be done about this -- either within the context of this bug or a more formalized XML-oriented solution.
This is a parser versus application/render issue. With the latest specs it is clear that this bug is INVALID. XML parser MUST preserve white space and pass it to application according to both XML 1.0 and XML 1.1 specs: http://www.w3.org/TR/REC-xml/#sec-white-space http://www.w3.org/TR/xml11/#sec-white-space XHTML 1.0, places a burden of defining white space handling onto CSS2: http://www.w3.org/TR/xhtml1/#uaconf Further XHTML languages don't even want to deal with whitespaces leaving it all to CSS: http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/introduction.html#s_intro_formatting However CSS2 does a pretty poor job at this: http://www.w3.org/TR/REC-CSS2/text.html#white-space-prop But this is fixed in the upcoming CSS 2.1 standard: http://www.w3.org/TR/CSS21/text.html#q8 And made even more complex in CSS3: http://www.w3.org/TR/css3-text/#white-space-processing Archaic HTML talks about how to "render" whitespaces, not about how to "parse" them either: http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks DOM is simply a Parser's output. As such, whitespaces belong to the DOM application, and it is up to CSS to decide what to do with them. You really can't say 2 documents are identical just because they have same markup. They are not. And CSS is the way to show the differences. As such DOM should reflect this difference.
*** Bug 299108 has been marked as a duplicate of this bug. ***
*** Bug 311654 has been marked as a duplicate of this bug. ***
<angry german web developer mode> HTML is still written by humans and has to be readable to them. FF's pedantic reading of the specs (which aren't that specific) won't persuade the other browser's developers to adopt your (exact and even maybe right) point of view but cause other web developers to ignore FF interpretation of their code, resulting in less comfort for FF's users. </angry german web developer mode> Does whitespace and only whitespace between tags transport so much information that worth? Worth the protectionism?
*** Bug 315938 has been marked as a duplicate of this bug. ***
My god dudes. Why are we all sitting on our hands about this? This is far from a minor irritation. It renders nextSibling, lastChild, firstChild, and any kind of indexing useless. Not just in Firefox, but in the Web at large since we develop applications that respect Gecko these days. If that proves too hard then we'll just develop for IE and tell our customers about it's larger market share and what it would cost to additionally support Netscape like we're used to. The fact that a portion of the DOM doesn't make any sense when implemented like this should be enough to stop you worrying if the spec explicitly states it or not. I'll call this comment a spec if it makes you more comfortable. I'm considering learning C++ so I can fix this, please someone beat me to it.
> I'll call this comment a spec if it makes you more comfortable. Your comment doesn't make a good spec, since it's self-contradictory.
> Your comment doesn't make a good spec, since it's self-contradictory. Do excuse the use of language, but I don't think many are giving this the priority it deserves. It's amazing that I try to do a bit of DOM Level 1 coding and find that Mozilla recommend I put a mass of helper functions in to circumvent a bug. Moreover, the bug has been open for ages and half the people who have looked at it don't think it's a bug. It makes some core Level 1 properties unusable in a sane way, of course it's a bug. This is out in the wild in Firefox 1.5, it's going to become difficult to change soon and it will mean that the DOM standard failed as an interoperable system for coding. We'll have to get back to sniffing to work out which type of implementation we have and versioning the code. I got the impression that you guys were with me in wanting to see the end of that kind of pain. Just drop the empty text nodes when parsing white-space:normal nodes. There is no functional use for them when programming but a lot of fiddling when writing many kinds of interactive page. Having to join all the lines up on a page to get a portable DOM is synonymous with the kind of bug that made NS4 most developers least favourite browser.
> It makes some core Level 1 properties unusable in a sane way, of course it's a > bug. No, just because the properties are less useful does NOT make the behavior a bug -- the behavior is allowed by the DOM spec and required by other W3C specs. > it's going to become difficult to change soon It's not going to be changed. That's why the bug is resolved "invalid". There's no way to change it without breaking basic CSS functionality or violating either the DOM or CSS spec (or both, as in IE). > Just drop the empty text nodes when parsing white-space:normal nodes. They don't have a "white-space" value while being parsed. If you don't understand that, I suggest you actually read the DOM and CSS specs until you do. We're as sorry as you are that IE is breaking the DOM and CSS specs this badly, but there's nothing we can do about IE bugs other than complaining to Microsoft (which I urge you to do).
Is there any definite decision how this will be handled in coming releases of Firefox?!? This behaviour and interpretation seems to be controversial. We have customers requesting functionality that worked with Firefox 1.0 and complaining about problems because of the changed behaviour in Firefox 1.5. We have to use kind of excessive DOM-processing in our application, and we need to be sure that we change things for the current release and don't have to fix things for coming releases again.
(In reply to comment #136) > Is there any definite decision how this will be handled Yes -- as it is now. > This behaviour and interpretation seems to be controversial. It's not. > We have customers requesting functionality that worked with Firefox 1.0 and > complaining about problems because of the changed behaviour in Firefox 1.5. This behavior didn't change and isn't planned to change, so I don't see how what you said is relevant.
(In reply to comment #137) It did change from Firefox 1.0 to 1.5. At least in reference to what i originally posted under https://bugzilla.mozilla.org/show_bug.cgi?id=320353
> https://bugzilla.mozilla.org/show_bug.cgi?id=320353 Parsing of <frameset> in particular was buggy on the 1.7 branch; that's been fixed.
*** Bug 324195 has been marked as a duplicate of this bug. ***
*** Bug 324195 has been marked as a duplicate of this bug. ***
*** Bug 326078 has been marked as a duplicate of this bug. ***
(In reply to comment #137) > > Is there any definite decision how this will be handled > Yes -- as it is now. Her famous last words... :-) We (developers) had similar answers about innerHTML way ago (Netscape 6 betas). Phantom nodes preservation *is* a mistake - more over not directly required by W3C. You'll have to change it sooner or later, but later it is - more expensive it will be, so why not start sooner? FOr the transition period you could (as suggessted by some) to make a flag preservePhantomNodes one could turn on if needed - but off by default. You even can make preservePhantomNodes == true by default if you really want to. But the problem needs to be addressed. Right now all your supporters have to write additional filters atop of all node-related DOM methods, which is totally abnormal IMHO. Do you really think it's just fine?
> Her famous last words... :-) We (developers) had similar answers about > innerHTML way ago (Netscape 6 betas). Agreed. innerHTML, document.all, the list goes on... Is IE 7.0 going to now start preserving phantom nodes even though prior versions didn't, all in the name of the spec? I doubt it. I've personally handled a few hundred business cases where phantom nodes weren't wanted or needed. I'm curious - has anyone actually run into a business case where they *did* need them to remain?
(In reply to comment #143) > But the problem needs to be addressed. Right now all your supporters have to > write additional filters atop of all node-related DOM methods, which is totally > abnormal IMHO. Do you really think it's just fine? Hear hear. I'm a jobbing web developer and getting sick of writing out loops to find the nodes I want when there are supposedly DOM mechanisms. More complicated DOM work turns into a plethora of nested loops and it gets difficult to keep track. Since any code already created is catering for both manifestations of the DOM, getting rid of this bug shouldn't break any web sites. Please please write it out.
*** Bug 329019 has been marked as a duplicate of this bug. ***
*** Bug 332821 has been marked as a duplicate of this bug. ***
I'm talking here about HTML case, i'm not and will not consider XML. DOM tree are not walkable easily with Fx, i urge you guys to reconsider the invalid state. With so many Duplicate, dont you think your point (not removing phantom node) is plain wrong ? <p> <span>one</span> <span>two</span> </p> It's plain and simple, the P tag has ONLY 2 childs, no reason for Firefox to create new text phantom nodes. Especially since NO OTHER browser is acting this wrong way. From my side of view, the white-space:pre issue is not a real issue. You can eventually keep an internal DOM representation for all the CSS rendering issues you encounter, but give us the excepted DOM representation without thoses ugly phantom text nodes. nextSibling and such is TOTALLY unusable the way it is right now. And the solution which consists of using some (firefox only) helpers functions from comment #11 is wrong. We dont need thoses nodes, we dont want them and worst we have to use some loops (in firefox ONLY) to fix them. Why do you force me to use this kind of js, or the helpers functions from comment #11, everytime i want to use the childnodes and xxxxxSibling() from DOM ? DOM.cleanWhitespace = function(element, recursif) { element = (typeof element == 'string')? $(element):element; for (var i=element.childNodes.length-1; i>=0; i--) { var node = element.childNodes[i]; if (node.nodeType == document.TEXT_NODE) { var nodeValue = node.nodeValue.trim(); if (nodeValue === '' || nodeValue == '\n' || nodeValue == '\t' || nodeValue == '\r' || nodeValue == '\r\n' || nodeValue == '\n\r') { element.removeChild(node); } } else if (node.nodeType == document.ELEMENT_NODE) { DOM.cleanWhitespace(node, recursif); } } }; Please reconsider the invalid state. Thanks. So many Duplicate (31) bugs means something imo. When 1 or 2 of my users are reporting a bug, i consider it a specific issue and i fix for them only (helpers functions), but when 31 users are reporting the exact same issue, it's my duty to reconsider my position (even if i'm right) and fix the issue. 147 comments, 31 duplicate = 21,08% of the comments The issue is real, the invalid state of this bug is just wrong. I though standards would help us to stop using browser specific code. You are breaking it here by not resolving the bug :( #5 Dup 48560 #7 Dup 62269 #13 Dup 65658 #16 Dup 89782 #19 Dup 104785 #23 Dup 114749 #24 Dup 118213 #63 Dup 131169 #65 Dup 144603 #66 Dup 147489 #68 Dup 147487 #83 Dup 159352 #110 Dup 178508 #111 Dup 179709 #114 Dup 189467 #115 Dup 196983 #119 Dup 206729 #120 Dup 214943 #121 Dup 217842 #122 Dup 221364 #123 Dup 252684 #124 Dup 258564 #125 Dup 263813 #128 Dup 299108 #129 Dup 311654 #131 Dup 315938 #140 Dup 324195 #141 Dup 324195 #142 Dup 326078 #146 Dup 329019 #147 Dup 332821
> The issue is real, the invalid state of this bug is just wrong. I though > standards would help us to stop using browser specific code. You are breaking > it here by not resolving the bug :( > > #5 Dup 48560 > #7 Dup 62269 > #13 Dup 65658 > #16 Dup 89782 > #19 Dup 104785 > #23 Dup 114749 > #24 Dup 118213 > #63 Dup 131169 > #65 Dup 144603 > #66 Dup 147489 > #68 Dup 147487 > #83 Dup 159352 > #110 Dup 178508 > #111 Dup 179709 > #114 Dup 189467 > #115 Dup 196983 > #119 Dup 206729 > #120 Dup 214943 > #121 Dup 217842 > #122 Dup 221364 > #123 Dup 252684 > #124 Dup 258564 > #125 Dup 263813 > #128 Dup 299108 > #129 Dup 311654 > #131 Dup 315938 > #140 Dup 324195 > #141 Dup 324195 > #142 Dup 326078 > #146 Dup 329019 > #147 Dup 332821 Fully completely sustained. "Firefox vs Users" is an opposition I would not imagine in my nightmare... Besides endless dups just search the Web for relevant blogs and forums. What's wrong to mark it ACTIVE BLOCKING and add a flag like -moz-preserve-phantom-nodes we could set to false (leave it default true, no problem). What's wrong in doing it in the next minor update rather then fight for nothing till the last bullet? (Not sure though if this whole thread is not in a "kill file" anyway)
> (Not sure though if this whole thread is not in a "kill file" anyway) It pretty much is, since none of the people doing the talking, yourself included, seem to understand what's going on with the relevant standards, much less the changes they want made in the rendering engine. I will make one more futile attempt to set people straight, however. I advise reading closely. What you seem to want is a fundamental rewrite of the CSS engine in Gecko so it's not DOM-based (like IE). Then you want the DOM exposed to the web page to not reflect the actual data structures the browser has in memory but rather to depend on the CSS formatting. All of that that seems like a poor idea, esp. since it would introduce a lot of IE's CSS bugs; bugs that are basically due to its internal representation NOT being a DOM. Note that Opera 8.5 behaves like Gecko does on the testcase in this bug and on other testcases I've tried for this behavior, in both standards and quirks mode. So does Konqueror in standards mode (can't test it in quirks mode at the moment). So does Safari, last I checked (I don't have a Safari build on me right now). Given that, the "Especially since NO OTHER browser is acting this wrong way" crap in comment 148 simply indicates to me that laurent vilday likes to make claims without testing them. In fact, every single modern browser other than IE (we're going to consider IE a modern browser here for the sake of argument) behaves like Gecko does. I would also like to reply to one other part of comment 148. Specifically, the part where it says "So many Duplicate (31) bugs means something imo." What that means is that 31 of the people out there who like coding to IE's mis-implementation of the DOM spec filed bugs requesting that we introduce the same bugs. No more, no less. Given the number of "IE-only" sites out there even now, this number is not really all that surprising, at least to me.
(In reply to comment #150) > What you seem to want is a fundamental rewrite of the CSS engine in > Gecko so it's not DOM-based (like IE). We don't need to sound so tragic. We (accounting at least 31 filed dups I can use "we") simply want a possibility to /choose/ the most convenient way to handle the DOM. W3C Box Model is more than disconvenient yet there are maybe people who just love it and cannot live without it. So Mozilla just added -moz-box-sizing so anyone could "go to the hell by his own road" :-) It seems that it did not cause the sky to fail onto the earth, didn't it? The same way it is needed to give the same choice in DOM Tree structure: some -moz-preserve-everything or so. Whoever likes phantom nodes and even sees some usee for them is welcome to leave the default "yes". The rest can set it to "false". The phantom nodes are being digged out anyway by parser from the source pretty-print, so the relevant patch would take a line or two (check false/true in a flag and either throw away or add to the tree). I can imagine some circumstances where it is needed to switch a fragment from parsed to <pre> state and back - or to restore the source byte-in-byte as it came from the server. But these are /occasional/ usages as opposed to the mass usage, and really shouldn't be the subject of such intensive preoccupation. Still -moz-preserve-everything (or whatever) set to true takes care even of these occasional situation. > I would also like to reply to one other part of comment 148. > Specifically, the part where it says "So many Duplicate (31) bugs means something imo." > What that means is that 31 of the people out there who like coding to IE's > mis-implementation of the DOM spec filed bugs requesting that we introduce the > same bugs. Then my question is: how many bugs to you need to be filed to admit that there is something rotten in the kingDOM? 310? 3,100? 31,000? I believe NN6 failed for innerHTML after 2,000 or so claims. Do you really need another Chartist Movement :-)) Also please note that these are not "just 31 people". These are active Firefox supporters bothered to learn about bugzilla, open an account, prepare testcase and file bug properly. I would easily add behind each of them at least 100 end users who just did not have time for all of that or who was not aware of bugzilla, or simply dropped Firefox. IMHighlyHO.
VK, this bug is about what getFirstChild and getNextSibling return. Those are defined by the DOM spec and return whatever the DOM has. If what you want are separate methods to access only parts of the DOM (similar to what the SVG 1.2 Tiny spec has -- they only see Element nodes), then feel free to file _separate_ bugs on that. Please make sure to clearly define exactly what your proposed methods should return in all cases. Once there's a clear need established and a clear description of what the methods should do (which was the situation with innerHTML), implementing them can actually be discussed in a reasonable way. Also, if you have issues with the DOM spec you may want to consider raising them with the W3C so that _all_ browsers would implement these methods you want. Unless you plan to write script that only works in Gecko but breaks in Opera or Safari or Konqueror?
(In reply to comment #152) Phantom nodes filed as Bug #339511 > VK, this bug is about what getFirstChild and getNextSibling return. Not really, it is about "Mozilla reports phantom text nodes in the DOM tree" as the bug description states. Native DOM methods problems is just one of outcomes. > Those are defined by the DOM spec and return whatever the DOM has. Last two weeks (when I had free time of course) I spent by trying to find these definitive DOM specs and failed. It seems though that it is the same with all other researchers. All I see is "somebody said something, and as we did not find any better place, we just dumped it in here as emty text nodes" - and I was really careful in reading your arguments in this thread. Yet I might missed something vitally important. > then feel free to file _separate_ bugs on that. Not a separate bug really, but the same old mistake reviewed after six years once over again: with new facts and in a whole new situation. I also linked some testcases which (I think) will be a big surprise to you in application of "DOM specs" ;-) Phantom nodes filed as Bug #339511
*** Bug 339511 has been marked as a duplicate of this bug. ***
(In reply to comment #154) > *** Bug 339511 has been marked as a duplicate of this bug. *** By mistake which is corrected now. Please note that bug #339511 is a feature request, not a "bug" (as something contradicting to the declared behavior).
*** Bug 339511 has been marked as a duplicate of this bug. ***
*** Bug 339766 has been marked as a duplicate of this bug. ***
A friend of mine, Joao Eiras from Portugal,who is a w3c mailing member, he gave me a valious hint: TO WORK WITH TABLES THE BEST APPROACH IS table.rows[y].cells[x] I think it solves 90% of the trouble... (using the correct method) The funniest thing is that I developed a Javascript Self Explorer (available at http://sitedosergio.sitesbr.net - inside the Computings > Javascript menu) that shows me that exist "rows" and "cells" properties for table objects... but it's hard to know or remember everything ... :D Thanks Joao !
*** Bug 364248 has been marked as a duplicate of this bug. ***
I have seen the following workaround somewhere: embed your readability white space into tags, i.e. use <tag ></tag > instead of <tag> </tag>. In this way the source code contains whitespace and it is more readable and the whitespace gets consumed by the tag parser and does not make it into the DOM tree. This trick, however, does not work for HTML comments, which enter the DOM tree as comment nodes. While you have complete control on where you put your comments, it is not possible to hide them from the DOM. And adding a comment to HTML or removing one can break your script, which is even more unexpected. That is going to happen when a future maintainer decides he needs an annotation here and there! All in all, it seems there is no reliable way to handle this except for using helper functions.
(In reply to comment #29) > Which nodes are you proposing to eliminate? Only the ones inside start-tags > and > end-tags, or all of the ones you can (whatever that means)? If the former, how > do you plan to handle the childNodes array for something like: > > <ul> > <li>foo</li> > <li>bar</li> > </ul> > > I think the DOM benifits of such an approach are minimal. > > If the latter, how do you plan to ensure that <span>two</span> > <span>words</span> aren't merged together? > > Which does IE do? IE blindly eats up white space between elements and inside elements whether it is appropriate or not: <http://www.microsoft.com/communities/newsgroups/list/en-us/default.aspx?&lang=en&cr=us&guid=&sloc=en-us&dg=microsoft.public.internetexplorer.general&p=1&tid=925c63ae-5f0b-452b-8b61-a5d5a67a6330&mid=925c63ae-5f0b-452b-8b61-a5d5a67a6330> > > In all cases, how do you plan to keep 'white-space: pre' working? >
My problem with the current implementation relates to loading up XML documents. I can understand that you need to keep the white-space around with HTML so that "white-space: pre" works, however I don't see the need for it when loading up XML documents. Given that an XML document is a representation of data, having the white-space nodes there does not make any sense at all. However, I'm not just raking up the old arguments again, I do have a question to ask: I'm using: XmlDoc = document.implementation.createDocument; followed by: XmlDoc.load('xmlDoc.xml'); to load up an XML document. Is the load method used by the browser when it is creating the DOM of an HTML page? Because if not, can the load method just not create white-space nodes? Or take an additional boolean parameter to specify it? Failing that, I see that the third parameter for the document.implementation.createDocument method is not implemented yet - can it not be a boolean to switch white-space node creation on or off?
You should rather load XML data using XMLHttpRequest.
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: