Closed
Bug 26179
Opened 25 years ago
Closed 22 years ago
Mozilla reports existence of phantom text nodes in the DOM
Categories
(Core :: DOM: Core & HTML, defect, P3)
Core
DOM: Core & HTML
Tracking
()
VERIFIED
INVALID
Future
People
(Reporter: lori, Assigned: jst)
References
()
Details
(Keywords: dom1, testcase, Whiteboard: http://mozilla.org/docs/dom/technote/whitespace/)
Attachments
(2 files)
The following code, when opened in IE5, shows an almost-correct representation
of the DOM (except for the SCRIPT and STYLE nodes having no children, and the
comment being reported as a tag with tagName ! -- duh! ;). In M13, virtually
every node is reported as having a text node for a child, even when none
actually exists in the document. (Note all the Text:s followed by blank space
rather than actual text, and you'll see what I mean.)
<html>
<head>
<title>Foo Document</title>
<script language="JavaScript">
var structWin;
function openStructWin(){
structWin = window.open
('','foo','width=600,scrollbars=yes,resizable=yes,menubar=yes');
structWin.document.write('<' + 'html>\n<' + 'head>\n<' + 'title>Document
Structure</title>\n<' + 'style type="text/css">\nPRE ' + '{font-size: 24pt}'
+ '\n</style>\n</head>\n\n<' + 'body bgcolor="#FFFFFF">\n\n<' + 'pre>');
}
function getNodeAndChildrenAsString(indentString,theNode){
var structString = '';
var structString = '';
var nodeString;
var theChildren = theNode.childNodes;
// If theNode is an element node
if (theNode.nodeType == '1'){
nodeString = "<" + "b>" + theNode.tagName + "</b>";
// Otherwise, if theNode is a text node
}else if (theNode.nodeType == '3'){
nodeString = "<" + "b>Text: </b>" + theNode.data;
// In any other case (and here I'm assuming
// that the only other case would be a
// comment node)
}else{
nodeString = "<" + "b>Comment: </b>" + theNode.data;
}
structString += indentString + nodeString;
structString += '\n';
structWin.document.write(structString);
for (var i=0; i < theChildren.length; i++){
getNodeAndChildrenAsString(indentString + " ", theChildren[i]);
}
}
function writeClosingTags(){
structWin.document.write('<' + '/pre>\n\n<' + '/body>\n<' + '/html>');
}
</script>
<style type="text/css">
<!--
.red { color: #FF0000}
-->
</style>
</head>
<body onLoad="openStructWin();getNodeAndChildrenAsString
('',document.documentElement);writeClosingTags()">
<table width="600" border="0" cellspacing="10" cellpadding="2">
<tr>
<th colspan="2" align="left">Flavors</th>
</tr>
<tr>
<td>grape</td>
<td>cherry</td>
</tr>
<tr>
<td>lemon</td>
<td>lime</td>
</tr>
<tr>
<td>orange</td>
<td>raspberry</td>
</tr>
</table>
<p>
That was a table, and this is a paragraph with some <b>bold text</b> in it.
</p>
<p>
This paragraph contains an image. <img src="images/sun.gif" width="100"
height="100">
</p>
<!-- This is a comment before the text in the last paragraph. -->
<p>
This paragraph contains <b>bold text,</b> <code>code text,</code> and <span
class="red">styled text.</span>
</p>
</body>
</html>
Comment 1•25 years ago
|
||
The "phantom" text nodes are actually the newlines between the elements. The DOM
spec says that these text nodes can (but don't have to be) be preserved and
represented in the DOM. We choose to do so, so that document roundtripping can
occur. Since IE doesn't, it does unfortunately mean that you have to code around
them. Specifically, scripts that use hardcoded childNode offsets will not work
across browsers.
We might consider getting rid of these small text nodes for efficiency reasons.
For now, we've made the explicit decision to keep them.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
Thanks for clarifying, Vidur -- that was the suspicion of one of the engineers
here (Macromedia). This could also be what's causing the code in bug 26178 to
act oddly, though I did check the contents of the nodes before sorting them to
make sure I had the right ones. I'll look into it and update the status of
26178 accordingly.
Status: RESOLVED → CLOSED
I'm reopening this one because the more I think about it, the more I think
exposing whitespace as a text node is the wrong thing to do. Why? Because you
can't do it consistently. Text nodes are invalid in between table rows and
table cells, for example -- so the same newline that appears after a </td>
cannot be represented as a text node the way it can be if it appears after a
</p>. This harms the roundtripping argument, IMHO. Also, it seems to me that
whitespace shouldn't affect the structure of a document; I consider an HTML
document with no linebreaks to be structurally identical to one with all kinds
of whitespace between tags. If newlines are represented as text nodes, the two
documents would be structurally different.
Status: CLOSED → REOPENED
Resolution: INVALID → ---
Comment 4•24 years ago
|
||
After talking it over with Eric Krock, our decision is to Future this one. It
means, unfortunately, that scripts need to take into account the possible
existence of extra text nodes in element content. David Baron is probably going
to write a set of utilities (at the very least, a tech note) to help with common
operations. Lhylan is correct, though - we're not even including the text nodes
consistently.
Target Milestone: --- → Future
Comment 8•24 years ago
|
||
What is the DOM working group position on this? Vidur?
Comment 9•24 years ago
|
||
I've checked the DOM standard and it's not clear as to whether there should be
these phantom text nodes. I am running PC Windows 98 on a PC so should the
OS/platform be updated on this bug?
IMHO, I can't see the reason for all these little text nodes. It just fills the
DOM with nonsense. The frustrating thing is the incompatibility with IE. I
thought the W3C was meant to clear this up. If Netscape and Microsoft are going
to differ on interpretation, then we are heading back to the bad old days of
browser incompatibility. Don't you guys ever talk to another ?
Comment 10•24 years ago
|
||
Tim: Not only do they talk to each other, they come up with the spec together...
Comment 11•24 years ago
|
||
The DOM spec doesn't really say, I don't think. The SGML spec has some
provisions for ignoring newlines just inside start/end tags, but since we
already totally ignore what it says on whitespace (as have basically all
HTML browsers ever), I don't think we should take it too seriously.
Preserving the white space in the DOM is needed for things like
'white-space: pre' to work and for the editor to output HTML as it
was read in. There are some notes on how to deal with these "extra"
text nodes at:
http://mozilla.org/docs/dom/technote/whitespace/
Comment 12•24 years ago
|
||
Is bug 65658 a dup?
Comment 13•24 years ago
|
||
*** Bug 65658 has been marked as a duplicate of this bug. ***
Updated•24 years ago
|
Component: DOM Level 1 → DOM Core
Comment 16•23 years ago
|
||
*** Bug 89782 has been marked as a duplicate of this bug. ***
Comment 17•23 years ago
|
||
I hope this can be taken care of before Moz 1.0...
Assignee | ||
Comment 19•23 years ago
|
||
*** Bug 104785 has been marked as a duplicate of this bug. ***
Comment 20•23 years ago
|
||
Updated•23 years ago
|
Comment 21•23 years ago
|
||
Vidur can you give an update on this? These "phantom" nodes are going
to play havoc on folks writing DOM js code - if they have to do a browser
detect just to filter out all these empty text nodes, and doing things like
childNodes is going to report very different numbers between IE and
mozilla.
Comment 22•23 years ago
|
||
Adding roger (dhtml apps perspective)
Comment 23•23 years ago
|
||
*** Bug 114749 has been marked as a duplicate of this bug. ***
Comment 24•23 years ago
|
||
*** Bug 118213 has been marked as a duplicate of this bug. ***
Comment 25•23 years ago
|
||
Newlines are translated into a space, is this a wontfix
Comment 26•23 years ago
|
||
why? why wouldnt newlines be ignored between tags?
Comment 27•23 years ago
|
||
The more I think about this, by speaking with web app developers, and by working
on DOM samples, the more I think the current way to handle whitespace nodes is
flawed. It is however necessary to take a decision and resolve this bug before
1.0, because changing the way it works after 1.0 would be seriously bad for
backwards-compatibility.
I think including whitespace nodes is clumsy for the following reasons:
1) It is not compatible with MSIE (and probably Opera).
2) The DOM code is heavily dependant on the markup (insert a new line somewhere
and your average childNodes[] or firstChild breaks)
3) Although I have no proof, building nodes for each whitespace node (about 2
per tag if your markup is readable) is probably a perf hit and a footprint hit
(although jst's recent mInner changes made this better)
4) It's highly non-intuitive for new developers. I find myself often wondering
why my .firstChild has no properties.
5) Those nodes carry no information (see below for exceptions)
These observations are based on my own experience, now nearly one year of
discussions with web developers, lurking in developer newsgroups and mailing
lists, and developing web apps myself. This is one of the mostfreq question in
the developer newsgroups and mailing lists.
The only good reasons to keep te whitespace nodes were given by David Baron in a
comment in this bug:
i) 'white-space: pre' currently needs this to work
ii) the editor needs them to output HTML as it was read in.
This is perhaps not a good argument, but for ii), the editor already makes
enough of a mess of your source code not to worry about some extra new lines. As
far as I'm concerned it should put new lines automatically to make code more
readable :-P
Not sure about i), but perhaps layout has some "whitespace frames" that could be
used for the same purpose?
This bug escaped my attention in my quest for Mozilla 1.0 bugs, but now I
definitely include it.
Thanks for reading :-)
Reporter | ||
Comment 28•23 years ago
|
||
um, I think Sivakiran's missing the point. it's not about newlines being
converted to spaces -- it's about N6 REPORTING THOSE SPACES AS TEXT NODES!
because text nodes are not allowed between a TR and a TD, for example, it's
impossible to apply the "whitespace = text node" rule consistently. so that's
the logical argument for fixing this bug. Fabian posted a very eloquent version
of the practical argument while I was busy changing my password. ;)
Comment 29•23 years ago
|
||
Which nodes are you proposing to eliminate? Only the ones inside start-tags and
end-tags, or all of the ones you can (whatever that means)? If the former, how
do you plan to handle the childNodes array for something like:
<ul>
<li>foo</li>
<li>bar</li>
</ul>
I think the DOM benifits of such an approach are minimal.
If the latter, how do you plan to ensure that <span>two</span>
<span>words</span> aren't merged together?
Which does IE do?
In all cases, how do you plan to keep 'white-space: pre' working?
Comment 30•23 years ago
|
||
well how about making the actual text (non-parsed) source of the document
accessible and then have parser provide line/column position of every
node/tocken inside of it, so that you can locate nessesary portions of text.
not only you can get 'white-space: pre' working this way, but you can also fix
View Source not displaying actual source of the document.
Comment 31•23 years ago
|
||
To get this, removal of white space textnodes that are not allowed, to work
one would have to check the DTD of the SGML and/or XML document. Is this
information available from the SGML/XML parser?
I like Alexey Chernyak's idea about adding the line and column position to the
elements/nodes. This would allow the composer to maintain the layout without
actually inserting text nodes in the DOM tree where they are not allowed.
Comment 32•23 years ago
|
||
I believe, although I'm not entirely certain, that DOM-2 Range covers resolution
down to the individual character. I do know Mozilla supports DOM-2 Range.
Comment 33•23 years ago
|
||
the real pain for web developers is that stuff like previousSibling etc. get
practically useless... or you have to call helper functions every time you want
to use it. just stumbled across it yesterday:
<div id="div1"></div>
<div id="div2"></div>
!=
<div id="div1"></div><div id="div2"></div>
and the stupid thing is, codewise i want to have version 1., because everyting
else is unreadable. js-wise i want to have version 2., because there is no
whitespace in it. but still i cant be sure that no one changes it (including my
unaware self) at a later time and breaks the code.
so what about (uuuh, i know this would have to go through the standardisation
process) making previousSibling etc. a function like
HTMLElement.previousSibling(boolean skipWhitespace) or even
HTMLElement.previousSibling(boolean skipTextNodes).
another possibility would be to distinguish between whitespace and newlines, and
skip all the newlines. as far as i understand this would allow white-space:pre
to work. newlines between tags are always just for source code readability, right?
Comment 34•23 years ago
|
||
I wonder if DOM Traversal (TreeWalker (and NodeIterator, once we have it))
would be the right way to work around the problem for scripts.
The problem of whitespace within tables and stuff is a validation issue. Mozilla
doesn't validate, so we shouldn't bother. And I read
http://www.w3.org/TR/2001/WD-DOM-Level-2-HTML-20011210/html.html#ID-6986576
to indicate that one should use the .cells to get to the tr or th. (If that one
would contain textnodes, that'd be bah.)
Comment 35•23 years ago
|
||
Is anything moving with regards to this bug? A solution is really important to
some of the stuff I am trying to do, so I hope so. Also does anyone know if
there is any movement for fuller implementation on the CSS2-Transverse-Range module?
Comment 36•23 years ago
|
||
nominating nsbeta1 - can we ship with all this cruft in the dom?
Keywords: nsbeta1
Comment 37•23 years ago
|
||
I still think this should be WONTFIX. Furthermore, nobody's actually proposed
how to fix it -- that requires answering the questions in comment 29.
Comment 38•23 years ago
|
||
Maybe we should do some sort of check for "white-space: pre" before removing
superfluous white space. There cannot be so many elements which have that
natively. Likewise, someone would have to set a style attribute to contain that
phrase before it would take effect on an element that didn't default to that.
Comment 39•23 years ago
|
||
The DOM doesn't currently know anything about the style system, and it would be
a major architectural change if it did. (How could we resolve style without
having DOM nodes to resolve the style on?)
Comment 40•23 years ago
|
||
:(
This is a lot like math classes on scientific notation: which "0" digits are
significant? 000025.7000...
My opinion is, if it renders, include it. If it doesn't render or isn't meant
to render, exclude it.
For HTML, white space between <head> and <body> tags doesn't render. White
space between <pre> and </pre> tags does. White space between any two tags in
general, if there is at least one space or tab between them, we typically treat
as a single space. (We have for when we want a space.)
I haven't even begun to think about white space implications for XML DOM.
What if we did a process late in the game, before any event handlers or
scripting take over but after styling, of cleaning up the nodes?
Comment 41•23 years ago
|
||
David, latest spec very clearly defines how white space characters should be
treated:
http://www.w3.org/TR/2001/WD-xhtml1-20011004/#uaconf
Here are relevant extracts out of that document for every-day pages:
...snip...
* All white space surrounding block elements should be removed.
* Comments are removed entirely and do not affect white space handling. One
white space character on either side of a comment is treated as two white space
characters.
...snip...
* Leading and trailing white space inside a block element must be removed.
* A sequence of white space characters without any LINE FEED characters must be
reduced to a single SPACE character.
* A sequence of white space characters with one or more LINE FEED characters
must be reduced in the same way as a single LINE FEED character.
...snip...
* The LINE FEED character must be converted into a SPACE character.
This very well answers your question:
<span>two</span><span>words</span> should be merged together.
while in:
<span>two</span>
<span>words</span>
LINE FEED and any other white space characters in between should be replaced
with a SPACE character.
This is a very straight forward spec. And we should be thankful to W3C for
defining it so clearly.
As for 'white-space: pre', see comment 30
Comment 42•23 years ago
|
||
Here's an example of what a tree traversal tool like DOM Inspector see's when
it looks at a website's DOM:
(see image)
#text nodes all over the place...
Comment 43•23 years ago
|
||
Leading and trailing newlines of a textnode should be skipped. Isn't it as
simple as that?
if i write
<tag1>
</tag1>
i obviously want a newline there. not so for
<tag1>
</tag1>
if i write:
<pre>
Some text here.
Some on another line.
</pre>
i want the data of the text node to be == " Some text here.\n\n Some on another
line."
Comment 44•23 years ago
|
||
That's not how it works.
The <pre>...</pre> element means "preformatted text". That means the browser
must copy the text to the screen, character for character. Including the
newlines immediately adjoining the tag. The exception are additional markup
tags, such as <em>...</em>, which would still apply inside the preformatted text
tags.
Inside <pre>...</pre> tags is always considered "significant", unconditionally.
The preformatted text element is a block-level element, like the <p>...</p>
element. Typically people do not add newline characters immediately following
the opening tag of either one, but if they do for the preformatted text element,
the browser must assume that's intentional.
Comment 45•23 years ago
|
||
As Alexey points out the W3C recommendation is very clear on how white space
should be treated.
Comment 46•23 years ago
|
||
Moving to jst@netscape.com's bug list. Apologies for letting it languish on mine.
Assignee: vidur → jst
Status: REOPENED → NEW
Comment 47•23 years ago
|
||
From jscript@pacbell.net aka WeirdAl
> Maybe we should do some sort of check for "white-space: pre" before removing
> superfluous white space.
not so easy, what if I dynamically set an elements CSS white-space property to pre ?
Parsing should not depend on the stylesheet I think.
From alexey@ihug.com.au
> David, latest spec very clearly defines how white space characters should be
> treated:
> http://www.w3.org/TR/2001/WD-xhtml1-20011004/#uaconf
This will only apply to xhtml served as xml when the xml:space attribute of the
element is not set to preserve. I assume that <pre> in xhtml has the xml:space
attribute
set to preserve.
If the white spaces are removed in HTML based on what that doc mentions but
without a way
to revert back to current behavior, we have effectively removed support for css
property
white-space: pre; in html.
From jscript@pacbell.net aka WeirdAl
> What if we did a process late in the game, before any event handlers or
> scripting take over but after styling, of cleaning up the nodes?
Again not so easy, what if I dynamically set an elements CSS white-space
property to pre ?
Comment 48•23 years ago
|
||
As a first pass, how about just deleting #text nodes whose content is
*entirely* white space (incl. newline)? That should eliminate most of
the problem with the phantom nodes in the DOM tree, but still leave
the information that "white-space: pre" needs.
Comment 49•23 years ago
|
||
> how about just deleting #text nodes whose content is *entirely* white space
<p><em>This</em> <em>should</em> <em>have</em> <em>spaces</em></p>
Reporter | ||
Comment 50•23 years ago
|
||
and it does in IE:
HTML
HEAD
TITLE
SCRIPT
STYLE
BODY
P
EM
Text: This
Text:
EM
Text: should
Text:
EM
Text: have
Text:
EM
Text: spaces
In N6, the same text nodes are there, but so are several others:
HTML
HEAD
TITLE
Text: Foo Document
Text:
SCRIPT Text:
var structWin;
function openStructWin(){
structWin = window.open
('','foo','width=600,scrollbars=yes,resizable=yes,menubar=yes');
structWin.document.write('<' + 'html>\n<' + 'head>\n<' + 'title>Document
Structure\n<' + 'style type="text/css">\nPRE ' + '{font-size: 24pt}'
+ '\n\n\n\n<' + 'body bgcolor="#FFFFFF">\n\n<' + 'pre>');
}
function getNodeAndChildrenAsString(indentString,theNode){
var structString = '';
var structString = '';
var nodeString;
var theChildren = theNode.childNodes;
// If theNode is an element node
if (theNode.nodeType == '1'){
nodeString = "<" + "b>" + theNode.tagName + "";
// Otherwise, if theNode is a text node
}else if (theNode.nodeType == '3'){
nodeString = "<" + "b>Text: " + theNode.data;
// In any other case (and here I'm assuming
// that the only other case would be a
// comment node)
}else{
nodeString = "<" + "b>Comment: " + theNode.data;
}
structString += indentString + nodeString;
structString += '\n';
structWin.document.write(structString);
for (var i=0; i < theChildren.length; i++){
getNodeAndChildrenAsString(indentString + " ", theChildren[i]);
}
}
function writeClosingTags(){
structWin.document.write('<' + '/pre>\n\n<' + '/body>\n<' + '/html>');
}
Text:
STYLE
Text:
Text:
BODY
Text:
P
EM
Text: This
Text:
EM
Text: should
Text:
EM
Text: have
Text:
EM
Text: spaces
Text:
Reporter | ||
Comment 51•23 years ago
|
||
Guess I should have said what the point of that little demonstration was...
My point was: how is IE doing it? I'm sorry I can't propose a solution, since I
don't know how any of the code works, but it seems like IE must be following
rules. That is, "whitespace is allowed here, but not here" or some such. I
would guess that one of the rules in play in the example above is that
whitespace between tags that are within a paragraph are significant and must be
preserved as text nodes. Others might be: Whitespace between block-level tags
is not significant, and therefore need not be preserved as text nodes. Text
nodes between TR and TD tags is not allowed, and therefore whitespace between
TR and TD tags is not preserved as text nodes.
Maybe this is just really hard, or not the way you've tackled this thus far? As
an engineer on another product, I can't stand it when people who've never seen
the codebase say, "this should be so easy to implement!", so I won't. ;)
Comment 52•23 years ago
|
||
You know, I've looked at this, and I've changed my mind.
I said:
>My opinion is, if it renders, include it. If it doesn't render or isn't meant
>to render, exclude it.
But that's a cheap way to approach a fundamental question. A document's
rendering does not necessarily correspond to how the DOM views it. Does the DOM
see a <!DOCTYPE > tag? Yes. Does the user? No.
We're dancing around the issue. Should there or should there not be whitespace
text nodes in the DOM? We haven't yet figured that out.
Frankly, when I restrict my perspective to that specific question, I am forced
to give those whitespace text nodes the benefit of the doubt. What reason do we
really have to take them out? IE's behavior, as we are all well aware, is not
necessarily the correct behavior. Nor should conveinience always dictate what
we require of Mozilla 1.0. Sloppy coding by users is what brought people to
condemn Netscape 6.0, because Mozilla and Netscape stopped supporting layers.
I recommend WONTFIX or INVALID.
Comment 53•23 years ago
|
||
There is no question on which nodes and where to delete. The spec is very clear
on that. No point in discussing it. The question at hand is how to get CSS
property 'white-space: pre' to work after white spaces were removed from DOM.
>We're dancing around the issue. Should there or should there not be whitespace
>text nodes in the DOM? We haven't yet figured that out.
The way I see it, DOM is a mechanism for describing a document structure (HTML).
And HTML specification defines the rules for structure of HTML documents. The
structure described by our DOM violates those rules, so it basically can't be
called an HTML DOM.
We need a different solution for handling 'white-space: pre' from the one we
have now. Also we need different solution for ViewSource, which right now relies
on whitespaces in DOM and doesn't show the actual source of the document! 2
possible approaches are:
1. Storing discarded white space information so it can be used by "pre" or
ViewSource. This will keep ViewSource working, but will not help to show the
*actual* source.
2. Ability to retrieve and use raw portions of code. This is preferable for
ViewSource. However coordinates would have to be preserved for each element for
ViewSource colouring to work. This also would involve re-parsing for "pre", for
it may contain other tags inside of it.
The second approach is more favourable, but looks harder.
So, is this feasible?
Fixing this bug and ViewSource before 1.0 release would be really awesome.
Keywords: mozilla0.9.9,
mozilla1.0
Comment 54•23 years ago
|
||
A few comments on the XHTML spec link Alexey posted:
1) The line in question ("All white space surrounding block elements should be
removed.") is a "should" not a "must".
2) By "block" I presume it means things that are declared to be blocks in the
DTD? That is, at http://www.zbarsky.org:8000/~bzbarsky/domTest.html the first
two "Some text" occurences should be on one line with no space between them
while the second two should be on two separate lines?
Comment 55•23 years ago
|
||
In reply to comment 41: The XHTML spec is on crack. See my post in www-talk:
http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0141.html
I was asked to make a "definitive standards statement". My opinion are my own,
and are thus not normative or anything, but: I would say this bug is a WONTFIX.
In fact I put a comment to that effect in the status whiteboard last July.
Comment 56•23 years ago
|
||
Web authors should read:
http://www.mozilla.org/docs/dom/technote/whitespace/
I'm going to mark this WONTFIX because bz gave me the go-ahead to do so. :-)
Status: NEW → RESOLVED
Closed: 25 years ago → 23 years ago
Resolution: --- → WONTFIX
Comment 57•23 years ago
|
||
To clarify the go-ahead part.... The XHTML specification in question is a
Working Draft. As such it is most definitely not final.
If one implements what that draft currently says then no strings anywhere in the
DOM would ever have newlines in them; an obviously ridiculous proposition when
one considers the contents of textareas or <pre> elements. The comment handling
recommended in the draft is also completely bogus -- all the comments would be
gone from the DOM....
I feel fairly certain that this specification's whitespace handling will be
amended before it gets to be a recommendation. (At least I hope so for the sake
of the sanity of the CSS and DOM specification authors).
Comment 58•23 years ago
|
||
Actually, the XHTML specification in question is a REC.
http://www.w3.org/TR/xhtml1/
Comment 59•23 years ago
|
||
Boris, XHTML *is* already a reccomendation. If is final. I have provided a link
to the Second Edition draft which is much more specific on white space handling
than the first edition. Second Edition doesn't change any of these rules, it
just makes them more clear and more precise. You can have a look at non-draft
XHTML Reccomendation as well:
http://www.w3.org/TR/xhtml1/#uaconf
Ian, Boris, are you ready to say that
<div>
</div> and <div></div>
are 2 completely different documents DOM-wise? If you are, I'll go with that.
But with my limited knowledge of DOM spec I am really uncomfortable with that
statement.
Comment 60•23 years ago
|
||
alexey: Yes, they are different. IMHO. Why wouldn't they be?
Regarding the XHTML spec: Like I said above, it's on crack. It is totally out of
line for the XHTML working group to be laying down rules on how the parser and
the DOM should interact when handling XML. There is no way, IMHO of course, that
we should special-case different namespaces' white-space handling.
Comment 61•23 years ago
|
||
Sorry to drag this one up again, but I think that it is very important and I
have a new suggestion.
Would it be possible to somehow correlate the document with its DTD? If #PCDATA
is not valid in any given element then the children of all elements of this type
could be examined and, if one of its child nodes is an (all whitespace) text
node, is could be removed. In this way at least the 'illegal' text nodes could
be removed so that they don't turn up between </td> <td> etc. These whitespace
text nodes are only there because of the formatting of the source code, and have
nothing to do with whitespace: pre. Of course
<div>
</div> and <div></div> etc.
would remain different (as they should IMHO) and whitespace: pre would not be
broken which seems to be the main concern.
Please consider this or some other means of getting rid of at lest the 'illegal'
whitespace text nodes. My concern is that leaving these text nodes in the DOM
after Moz1.0 will also mess things up for scripts working on none HTML documents
such as SVG. Helper functions are all very well, but they are a messy work around.
Comment 62•23 years ago
|
||
This controversy could be dealt with effectively (for the XML DOM, at least) by
implementing a validating XML parser in Mozilla. Here is the reasoning:
A validating parser would be able to read DTDs. This would allow us to
determine which elements have xml:space="preserve" in which whitespace *must* be
preserved; we could then change our application convention for "default" to
condense whitespace. This would mean that xml:space="preserve" elements
retained whitespace within them in the DOM, and that it would be condensed or
stripped away in xml:space="default" elements. We would also set white-space:
pre on these elements in our html.css, so that they would *display* without
space. The disadvantage, obviously, is that white-space: pre on other elements
would have no effect. However, document authors wishing to make use of this
could use an external DTD subset to add the xml:space attribute to any such
elements, which would cause the parser to preserve spaces in them as well, which
they could then reflect or not reflect in display through whitespace: pre.
Comment 63•23 years ago
|
||
*** Bug 131169 has been marked as a duplicate of this bug. ***
Comment 64•23 years ago
|
||
Actually we can't do any of this at all anyway, otherwise you lose
round-tripping of white-space in document source. e.g. around <table> tags. This
is important for various reasons, e.g. the "generated source" bookmarklet.
Comment 65•23 years ago
|
||
*** Bug 144603 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 66•23 years ago
|
||
*** Bug 147489 has been marked as a duplicate of this bug. ***
Comment 67•23 years ago
|
||
WONTFIX?? Aren't you be serious?
My bug report has been marked as a duplicate, but please have a look at it
because I've given suggestion. It's bug 147489.
If you don't want to look at it, I can summarize what I said:
I don't care about how IE treats it, and I don't care about XHTML. I just use
the (old) HTML 4.01 and DOM1. Look at this section in W3C DOM specification:
http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/introduction.html#ID-E7C30821
By looking at the tree inside the figure, I can surely say that those guys in
W3C implicitly agree that there's no #text siblings for block-level elements.
HTML 4 is very clear about block-level elements and in-line elements.
So the suggestion is simple:
eliminate block-level elements' whitespace siblings,
but preserve in-line elements' whitespace siblings.
*** Bug 147487 has been marked as a duplicate of this bug. ***
Comment 69•23 years ago
|
||
Since you have obviously not read this bug... "What happens when the element's
style is changed from block to inline?"
The problem is that HTML has this concept of block-level elements that is
totally divorced from the CSS concept of block elements. Thing is, Mozilla is
a CSS browser, not an HTML browser. So if we have to pick which concept
of "block" we go with we sort of have to choose the CSS one.
Comment 70•23 years ago
|
||
Boris,
I've read through about 80% of this whole bug before writing my comment, but
maybe I miss something. So, which comment does your question refer to?
Comment 71•23 years ago
|
||
Re comment #54 (which is in response to comment #41):
In HTML 4 spec, there's a section on conformance (2nd paragraph):
http://www.w3.org/TR/html4/conform.html
which further refers to
http://www.ietf.org/rfc/rfc2119.txt
In summary, "should" means "recommended", "but the full implications must be
understood and carefully weighed before choosing a different course." (sic)
On the other hand, when you look up in a dictionary, "should" mostly means
"express obligation" and "must".
IMO, we have to reopen this bug and consider the issue seriously.
Comment 72•23 years ago
|
||
HTML does not give any rules for what must appear in the DOM, so the HTML spec
is irrelevant in this case.
Comment 73•23 years ago
|
||
> when you look up in a dictionary
This is what that RFC exists for. To define exactly what those words mean in
RFCs. Said meaning has nothing to do with their dictionary meaning; the RFC
authors chose to pick words that already had existing meanings but they could
just as easily have come up with brand-new terms and defined them. "should"
and "must" are totally different in an RFC and in the HTML specification.
Comment 29 is the comment I was referring to. Consider taking some <td> or
<li> nodes and setting them to display:inline....
Comment 74•23 years ago
|
||
Re comment #72:
Ok, granted! but DOM does refer back to HTML:
http://www.w3.org/TR/REC-DOM-Level-1/introduction.html#ID-E7C30821
Looking at the last sentence in the paragraph below the figure:
"...if any two Document Object Model implementations are used to create a
representation of the same document, they will create the same structure model,
with precisely the same objects and relationships."
Since the structure model produced by Mozilla is different from what they
represent in the figure, Mozilla has a wrong DOM implementation. QED!
Comment 75•23 years ago
|
||
In complement to my previous comment, look at this (maybe you would say it's
irrelevant because it's XML, not HTML, but I would say they are the same):
http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-745549614
Could you see how they write the XML code? It's like this:
<elementExample id="demo">
<subelement1/>
<subelement2><subsubelement/></subelement2>
</elementExample>
And they said "... node for "elementExample", which contains TWO child Element
nodes, ..."
They said TWO, not FIVE!
But if we use Mozilla's DOM implementation, if we take SGML linebreak rule into
account <http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.1>, we have at
least to write the code in this way:
<elementExample id="demo">
<subelement1/><subelement2><subsubelement/></subelement2>
</elementExample>
If SGML linebreak rule isn't respected, we even have to write the code in this
way in order to produce the same structure:
<elementExample
id="demo"><subelement1/><subelement2><subsubelement/></subelement2></elementExample>
So, either we say HTML, XML and DOM1 specifications are all wrong, or there's
problem in Mozilla's implementation.
As a last word, if we let this state remain in Mozilla 1.0, it won't be accepted
by many people and this bug could be one main responsible of its failure.
Comment 76•23 years ago
|
||
Re comment #73 :
Exactly! And the RFC states clearly that "should" = "recomended", not "optional".
Q: And what _would_ you do if someone _recomends_ you to do something?
A: I think you'd _better_ do it rather than ignoring it.
I talked about dictionary is to point out that if we use the normal meaning of
"should", it's not "recommended" either.
Comment 77•23 years ago
|
||
> different from what they represent in the figure
The figure is informative, not normative.
> And they said "... node for "elementExample", which contains TWO child
> Element nodes, ..."
> They said TWO, not FIVE!
Yes. #text nodes are not Element nodes.
As the definition of "should" says: "but the full implications must be
understood and carefully weighed before choosing a different course". They
have been. The implications of following that "should" are inconsistent
behavior and incorrect layout when style is changed on some elements. Having
correct layout outweighed the dubious benefit of following a "should" in which
the HTML specification tries to dictate how the DOM should be constructed
(something that is outside the scope of the HTML specification).
Comment 78•23 years ago
|
||
Re comment #29 :
For the <ul> code:
<ul>
<li>foo</li>
<li>bar</li>
</ul>
as I stated before, we just have to strip off block level elements' whitespace
siblings, ie <li>'s whitespace sibling. In other words, Mozilla's present
implementation gives:
UL
+--#text
+--LI
+--#text
+--LI
+--#text
Since LI is block element, this gives
UL
+--LI
+--LI
______________________________________________
For SPAN:
<span>two</span> <span>words</span>
Mozilla gives:
|
+--SPAN
+--#text
+--SPAN
We must not strip the #text node because SPAN is in-line element, so the #text
node is conserved.
______________________________________________
Lastly, for 'white-space: pre' ...... hmmm, I admit that this is very delicate!
But first of all, we could look altogether at its spec:
http://www.w3.org/TR/REC-CSS1#white-space
Please note that it only applies to block-level elements......
So what? I don't know yet :( I'm in Europe and it's evening now. Let me go
home and think of this issue during the night.
But IMO, there're much more developpers twisting DOM nodes than there're
developpers to use this CSS property. Or maybe those whitespace siblings could
be made "invisible" in DOM, and when a 'white-space: pre' is used, they're
rendered visible again?
Comment 79•23 years ago
|
||
<ul style="display: inline">
<li style="display: inline">foo</li>
<li style="display: inline">bar</li>
</ul>
Should there be a space between "foo" and "bar"? The point is that the
block/inline distinction for _layout_ purposes is determined by CSS, not HTML.
Comment 80•23 years ago
|
||
I'm someone who like to do things according to intuition and common sense, and I
feel really relectant to pick every word somebody else had said, like what those
lawyers did when defending their clients ...... but it seems I've no choice.
Re comment #77 :
Part 1)
I don't see in the page that the figure is only informative, but not nonative.
And even if granted that it's informative, it's then informative on the
structure as it's written in the paragraph.
Part 2)
Good remark! ... but exactly with the same argument: they didn't talk about text
nodes!
Comment 81•23 years ago
|
||
> Looking at the last sentence in the paragraph below the figure:
> "...if any two Document Object Model implementations are used to create a
> representation of the same document, they will create the same structure
> model, with precisely the same objects and relationships."
If you look at the equivalent sentence in the latest version of the
specification (http://www.w3.org/TR/DOM-Level-2-Core/introduction.html) you'll
see that it now says:
# One important property of DOM structure models is structural isomorphism: if
# any two Document Object Model implementations are used to create a
# representation of the same document, they will create the same structure
# model, in accordance with the XML Information Set.
The XML Information Set (infoset for short) includes white-space nodes.
In fact it goes _on_ to say:
# Note: There may be some variations depending on the parser being used to build
# the DOM. For instance, the DOM may not contain whitespaces in element content
# if the parser discards them.
In other words, we are explicitly within our rights to include the white-space
nodes according to the latest version of the DOM Core Specification.
> Please note that [white-space] only applies to block-level elements......
That is also an error in the spec, and it has been corrected in the recently
published working draft of the next version of the text module. The
'white-space' property applies to all elements and generated content.
Comment 82•23 years ago
|
||
> intuition and common sense
Apply those to http://web.mit.edu/bzbarsky/www/testcases/testTextNodes.html
where IE5.0 has the second <li> as the nextSibling of the first <li> but shows
space between the two! Where the hell did that space come from? Is this the
behavior you want from Mozilla?
Comment 83•22 years ago
|
||
*** Bug 159352 has been marked as a duplicate of this bug. ***
Comment 84•22 years ago
|
||
reopening bug, i think this is still an open issue
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 85•22 years ago
|
||
sivarikan, would you care to explain in more detail? I assume that you've read
the discussion on this bug and have something insightful to add?
Comment 86•22 years ago
|
||
#document
HTML
HEAD
TITLE
#text
#text -- not allowed
SCRIPT
#text
#text
BODY
...
...
When trying to access the script node, i actually got the text node.
we should not have a #text node as child of head element.
ref: http://www.w3.org/TR/1998/REC-html40-19980424/struct/global.html#h-7.4.1
Comment 87•22 years ago
|
||
This is the HTML parser we're talking about? Or the XML parser parsing an XHTML
document?
Comment 88•22 years ago
|
||
HTML parser
Comment 89•22 years ago
|
||
OK. Could you please file a bug on the parser module saying that text nodes
should not be created where they are illegal per the DTD? For example, the
following "text" strings should be dropped because there is nowhere in that
document where a textnode would be valid:
<html>
<head>
text
</head>
<body>
text
<dl>
text
</dl>
</body>
</html>
Or did you want to just special-case <head> elements here?
Comment 90•22 years ago
|
||
Recommend removing the nsbeta1 keyword as well, as long as we're not certain we
should do this even in Mozilla, much less Netscape.
Keywords: mozilla0.9.9,
mozilla1.0
Reporter | ||
Comment 91•22 years ago
|
||
Re: comment #89 -- if you do file a separate bug, can you make sure to note the
table tags case (that text nodes are not allowed between them)?
I'm cautiously thrilled to think that this discussion could move off of whether
to preserve whitespace at all and on to whether to create text nodes where they
are explicitly forbidden, since that was my original argument for reopening
this bug (comment #3). having said that, thanks much to everyone who's thought
long and hard about all the issues involved here.
Comment 92•22 years ago
|
||
The last comments on this bug are INVALID. While arbitrary text is not allowed
between elements in HTML <head> blocks, text consisting of exclusively
white-space characters _is_ allowed, and no spec that I know of says that this
should not be represented in the DOM.
Status: REOPENED → RESOLVED
Closed: 23 years ago → 22 years ago
Resolution: --- → INVALID
Comment 93•22 years ago
|
||
In fact, looking carefully at the spec, I'd say our behavior is mandated for
XML. (I don't think much of it for HTML, but I generally think that representing
HTML4 with DOM is as bozotic as Appendix C, a view that's unlikely to gain
political traction...)
DOM1, Interface Text: "The Text interface represents the textual content (termed
character data in XML) of an Element or Attr."
XML 1.0, section 2.4: "All text that is not markup constitutes the character
data of the document."
(Would it have been that painful for XML to distinguish non-significant
whitespace from character data?!)
Comment 94•22 years ago
|
||
why should it be represented in DOM when they are not allowed?
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Comment 95•22 years ago
|
||
*sigh* HTML follows the rules of SGML, so the content model of the HTML DTD
differentiates between whitespace (the "s" production of SGML) and actual
character data. Whitespace can crop up in the head, even though character data is
not allowed, because of this distinction. XML and DOM don't make the distinction;
to them, everything that is not markup is character data. Therefore, things that
are not character data in HTML/SGML become character data in the DOM. Again, the
problem is that HTML is being forced into a representation (DOM) which is simply
inadequate to represent it, and this is one of the places where it shows. (And if
there's some way we could modify DOM Core to adequately represent SGML-based
HTML, I'm all for it.)
Comment 96•22 years ago
|
||
Sivarikan, you never answered my question from comment 89... What should the DOM
representation of that document be in Mozilla and why?
Comment 97•22 years ago
|
||
boris,
this is what i am expecting and yes i opened a bug on parser module.
#document
HTML
HEAD
TITLE
#text
SCRIPT
#text
Comment 98•22 years ago
|
||
sivarikan, you must be looking at the wrong HTML... see comment 89 again.
Comment 99•22 years ago
|
||
boris, may be i was not clear before, <head> is just an example, in general i am
talking about all the cases where having a text node is illegal according to DTD.
Comment 100•22 years ago
|
||
Yeah, that's my point too. Text is not allowed in <body> by the DTD. Nor in
<dl>. Nor in various other places where people commonly put it. If we made any
attempt to enforce that, large chunks of the web would start failing to render...
For that matter, why pick on this part of the DTD? Why not use a validating
SGML parser that will completely fail to parse 99% of the pages on the web?
(Think <p> inside <font>, <a> around all sorts of crud, <form> inside <table>,
etc, etc.)
Sorry, but parsing tag soup per DTD is a lost cause.
Comment 101•22 years ago
|
||
Not to mention that as choess and hixie pointed out whitespace is not CDATA and
is therefore allowed in <head>... The DTD does not say anything about the DOM
that is produces, since there is no real concept of a DOM in SGML-land, again as
choess points out.
Comment 102•22 years ago
|
||
This specific bug (whitespace nodes existing in the DOM), as described above, is
invalid. Marking as such.
Status: REOPENED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 103•22 years ago
|
||
If there was such a thing as "whitespace nodes", then perhaps this would be
invalid. The problem is that this whitespace is being represented as TEXT NODES
in the DOM, in places where text nodes are not allowed. That's the logical
problem. The practical problem is that all these extra text nodes make coding
for Mozilla/Netscape a nightmare. For those of us who thought standards would
save the web... well, I guess we were wrong. :(
Comment 104•22 years ago
|
||
> in places where text nodes are not allowed
Why are they not allowed there exactly? PCDATA is not allowed in the source in
those places. If it's present, that's an error in the source. Whitespace _is_
allowed in the source there. The DTD has nothing to do with the DOM representation.
Standards that do not specify behavior (as the DOM standard does not here) can
be a PITA, as you noted. This is a problem with the DOM standard that pops up
all over the place...
FWIW, I have mostly seen this "bug" exhibited throughout the web in a table
context. A valid standard compliant workaround for tables is to replace this:
tableElt.childNodes[0].childNodes[0].childNodes[0];
with this:
tableElt.tBodies[0].rows[0].cells[0];
Of course there aren't workarounds for everything, but from what I've seen, this
should cure many issues.
Comment 106•22 years ago
|
||
> The problem is that this whitespace is being represented as TEXT NODES
> in the DOM, in places where text nodes are not allowed.
No spec says text nodes aren't allowed there. The HTML spec isn't defined in
terms of the DOM, and the DOM spec isn't defined in terms of SGML DTDs.
Comment 107•22 years ago
|
||
n.b. SGML (well, HyTime) does have a DOM-like standard (closer to XML Infoset,
actually), SGML groves. But that's another story.
Comment 108•22 years ago
|
||
http://www.codingforums.com/showthread.php?s=&threadid=7028
The above link has a script or two that can be customized to remove the
whitespace nodes in a document. A few notes:
Re comment 29, if the onus is on the webpage developer (as this bug's invalid
status suggests, and I agree with that), then the webpage developer can simply
move the space inside one of the spans. True, a stying effect like
text-decoration:underline can spoil things a little, but other styling effects
might be able to achieve the same whitespace rendering in the document. The
page developer simply has to be careful.
Status: RESOLVED → VERIFIED
Updated•22 years ago
|
Whiteboard: WONTFIX? → http://mozilla.org/docs/dom/technote/whitespace/
Comment 109•22 years ago
|
||
so here's a thought. Strings have a shared, common empty buffer that is used
whenever a string is empty, to guarantee that a string has a non-null value.
Could we use the same approach here? Have a single shared #text node that
represents all unstyled text nodes in the tree? Making things like the .parent
node might be ugly, but maybe a simple stack-based wrapper around the real node
in places where you need to access the text node would suffice?
does this sound feasable enough to file a bug?
Comment 110•22 years ago
|
||
*** Bug 178508 has been marked as a duplicate of this bug. ***
Comment 111•22 years ago
|
||
*** Bug 179709 has been marked as a duplicate of this bug. ***
Comment 112•22 years ago
|
||
Hmm... this bug makes for a very interesting read!
<angry swedish web developer mode>
So the solution IN REAL LIFE (=to get it to work in the 2 major browsers on
earth) is to code everything on one row, so my example from bug 179709 would
look like this:
<table border="1"><tr id="TableRow" onClick="getLastChild(this);"><td
id="A">A</td><td id="B">B</td></tr></table>
Yes, Ahaa, Mmm... it really feels like a BIG STEP FORWARD folks! ;)
</angry swedish web developer mode>
I just hope there is a way to solve this "problem" (no, I don't want to have to
use helper functions), because this is really realy bad for Mozilla acceptance
among the "I only code for IE because it's got biggest market share" group. (of
which I'm not a member)
Sorry to have taken up your time!
If you're needing this for tables only, you can use DOM HTML to your advantage.
var myTable = document.getElementById("myTable");
var lastTBody = myTable.tBodies[myTable.tBodies.length - 1];
var lastRow = lastTBody.rows[lastTBody.rows.length - 1];
var lastCell = lastRow.cells[lastRow.cells.length - 1];
Of course, not every HTML element has a similar API...
Comment 114•22 years ago
|
||
*** Bug 189467 has been marked as a duplicate of this bug. ***
Comment 115•22 years ago
|
||
*** Bug 196983 has been marked as a duplicate of this bug. ***
Comment 116•22 years ago
|
||
To follow up the <angry swedish web developer mode> comment here is:
<angry dutch web developer mode>
I'm slightly shocked after reading this bug from top to bottom. We have an
expression for this in dutch which translates as "Operation succesful, patient
deceased". It seems that you (the Mozilla developers) have succeeded in
defending your point of view to the point that using the DOM becomes fairly useless.
I would like to use a function like:
function toggle(thingy) {
elem = thingy.nextSibling;
if(elem.style.display == "none") { display(elem); } else { hide(elem); }
}
on some code like:
<h3 onclick="toggle(this)">Header</h3>
<table>
<tr><td class="desc">Content</td><td>Content2</td></tr>
<tr><td class="desc">Content</td><td>Content2</td></tr>
</table>
Which according to your Vulcan Logic (TM) does not work because there is a text
node between the </h3> and the <table>.
I'd have to write it as:
<h3 onclick="toggle(this)">Header</h3><table>
<tr><td class="desc">Content</td><td>Content2</td></tr>
<tr><td class="desc">Content</td><td>Content2</td></tr>
</table>
Hurray for legibility!!!
Any web developers remember writing code like this to get a background color on
a thin table cell?:
<TD><FONT SIZE=1> </FONT></TD>
Or remember when
<TR>
<TD>
<IMG HEIGHT=10 SRC="bg.gif">
</TD>
</TR>
Had to be written as
<TR><TD><IMG HEIGHT=10 SRC="bg.gif"</TD></TR>
to render without extra white-space?
I thought/hoped we had left those dark ages behind us.
No matter how "right" you are, I predict many, many duplicates for this bug. IE
DOM developers are going to run, not walk, away from Mozilla, since any
next/previousSibling function is useless without a wrapper. This behaviour is
going to bite and discourage the vast majority of beginning DOM developers,
which is unfortunate, as the idea behind the DOM is really neat.
</angry dutch web developer mode>
Sorry for the rant, but I really hope you will reconsider your point of view on
this issue.
Comment 117•22 years ago
|
||
Which part of: "Either the text nodes are there or the layout may be
incorrect" do people fail to understand? Yes, it sucks. No, the CSS and DOM
specs leave no other choice. Please raise this point with the W3C if you don't
like it.
Comment 118•22 years ago
|
||
I reported this duplicate.
http://bugzilla.mozilla.org/show_bug.cgi?id=196983
I think these new-lines and white-spaces are not HTML Contents.
It so problem....
Comment 119•22 years ago
|
||
*** Bug 206729 has been marked as a duplicate of this bug. ***
Comment 120•21 years ago
|
||
*** Bug 214943 has been marked as a duplicate of this bug. ***
Comment 121•21 years ago
|
||
*** Bug 217842 has been marked as a duplicate of this bug. ***
Comment 122•21 years ago
|
||
*** Bug 221364 has been marked as a duplicate of this bug. ***
Comment 123•20 years ago
|
||
*** Bug 252684 has been marked as a duplicate of this bug. ***
Comment 124•20 years ago
|
||
*** Bug 258564 has been marked as a duplicate of this bug. ***
Comment 125•20 years ago
|
||
*** Bug 263813 has been marked as a duplicate of this bug. ***
Comment 126•20 years ago
|
||
As others have pointed out, this issue will cause much grief for developers
attempting to walk DOM trees.
My primary concern at the moment revolves around XML in an XHTML document (e.g.
something pulled down by an XMLHttpRequest). With no formal XML parser or DTD
validator, these text nodes become both extraneous and erroneous.
If there is to eventually be a formal XML parser then fine, but if childNodes()
and the like are to be used to walk XML as well then we will all be severely
crippled at best.
For example, as others have pointed out, the following XML:
<myData>
<myChilddata>
<data1>A</data1>
<data2>B</data2>
</myChilddata>
</myData>
...under current circumstances is completely different from the following XML:
<myData><myChilddata><data1>A</data1><data2>B</data2></myChilddata></myData>
Something will definitely need to be done about this -- either within the
context of this bug or a more formalized XML-oriented solution.
Comment 127•20 years ago
|
||
This is a parser versus application/render issue.
With the latest specs it is clear that this bug is INVALID.
XML parser MUST preserve white space and pass it to application according to
both XML 1.0 and XML 1.1 specs:
http://www.w3.org/TR/REC-xml/#sec-white-space
http://www.w3.org/TR/xml11/#sec-white-space
XHTML 1.0, places a burden of defining white space handling onto CSS2:
http://www.w3.org/TR/xhtml1/#uaconf
Further XHTML languages don't even want to deal with whitespaces leaving it all
to CSS:
http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/introduction.html#s_intro_formatting
However CSS2 does a pretty poor job at this:
http://www.w3.org/TR/REC-CSS2/text.html#white-space-prop
But this is fixed in the upcoming CSS 2.1 standard:
http://www.w3.org/TR/CSS21/text.html#q8
And made even more complex in CSS3:
http://www.w3.org/TR/css3-text/#white-space-processing
Archaic HTML talks about how to "render" whitespaces, not about how to "parse"
them either:
http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks
DOM is simply a Parser's output. As such, whitespaces belong to the DOM
application, and it is up to CSS to decide what to do with them.
You really can't say 2 documents are identical just because they have same
markup. They are not. And CSS is the way to show the differences. As such DOM
should reflect this difference.
Comment 128•19 years ago
|
||
*** Bug 299108 has been marked as a duplicate of this bug. ***
Comment 129•19 years ago
|
||
*** Bug 311654 has been marked as a duplicate of this bug. ***
Comment 130•19 years ago
|
||
<angry german web developer mode>
HTML is still written by humans and has to be readable to them.
FF's pedantic reading of the specs (which aren't that specific) won't persuade
the other browser's developers to adopt your (exact and even maybe right) point
of view but cause other web developers to ignore FF interpretation of their
code, resulting in less comfort for FF's users.
</angry german web developer mode>
Does whitespace and only whitespace between tags transport so much information
that worth? Worth the protectionism?
Comment 131•19 years ago
|
||
*** Bug 315938 has been marked as a duplicate of this bug. ***
Comment 132•19 years ago
|
||
My god dudes. Why are we all sitting on our hands about this?
This is far from a minor irritation. It renders nextSibling, lastChild, firstChild, and any kind of indexing useless. Not just in Firefox, but in the Web at large since we develop applications that respect Gecko these days. If that proves too hard then we'll just develop for IE and tell our customers about it's larger market share and what it would cost to additionally support Netscape like we're used to.
The fact that a portion of the DOM doesn't make any sense when implemented like this should be enough to stop you worrying if the spec explicitly states it or not. I'll call this comment a spec if it makes you more comfortable.
I'm considering learning C++ so I can fix this, please someone beat me to it.
Comment 133•19 years ago
|
||
> I'll call this comment a spec if it makes you more comfortable.
Your comment doesn't make a good spec, since it's self-contradictory.
Comment 134•19 years ago
|
||
> Your comment doesn't make a good spec, since it's self-contradictory.
Do excuse the use of language, but I don't think many are giving this the priority it deserves. It's amazing that I try to do a bit of DOM Level 1 coding and find that Mozilla recommend I put a mass of helper functions in to circumvent a bug. Moreover, the bug has been open for ages and half the people who have looked at it don't think it's a bug. It makes some core Level 1 properties unusable in a sane way, of course it's a bug.
This is out in the wild in Firefox 1.5, it's going to become difficult to change soon and it will mean that the DOM standard failed as an interoperable system for coding. We'll have to get back to sniffing to work out which type of implementation we have and versioning the code. I got the impression that you guys were with me in wanting to see the end of that kind of pain.
Just drop the empty text nodes when parsing white-space:normal nodes. There is no functional use for them when programming but a lot of fiddling when writing many kinds of interactive page. Having to join all the lines up on a page to get a portable DOM is synonymous with the kind of bug that made NS4 most developers least favourite browser.
Comment 135•19 years ago
|
||
> It makes some core Level 1 properties unusable in a sane way, of course it's a
> bug.
No, just because the properties are less useful does NOT make the behavior a bug -- the behavior is allowed by the DOM spec and required by other W3C specs.
> it's going to become difficult to change soon
It's not going to be changed. That's why the bug is resolved "invalid". There's no way to change it without breaking basic CSS functionality or violating either the DOM or CSS spec (or both, as in IE).
> Just drop the empty text nodes when parsing white-space:normal nodes.
They don't have a "white-space" value while being parsed. If you don't understand that, I suggest you actually read the DOM and CSS specs until you do.
We're as sorry as you are that IE is breaking the DOM and CSS specs this badly, but there's nothing we can do about IE bugs other than complaining to Microsoft (which I urge you to do).
Comment 136•19 years ago
|
||
Is there any definite decision how this will be handled in coming releases of Firefox?!? This behaviour and interpretation seems to be controversial.
We have customers requesting functionality that worked with Firefox 1.0 and complaining about problems because of the changed behaviour in Firefox 1.5.
We have to use kind of excessive DOM-processing in our application, and we need to be sure that we change things for the current release and don't have to fix things for coming releases again.
Comment 137•19 years ago
|
||
(In reply to comment #136)
> Is there any definite decision how this will be handled
Yes -- as it is now.
> This behaviour and interpretation seems to be controversial.
It's not.
> We have customers requesting functionality that worked with Firefox 1.0 and
> complaining about problems because of the changed behaviour in Firefox 1.5.
This behavior didn't change and isn't planned to change, so I don't see how what you said is relevant.
Comment 138•19 years ago
|
||
(In reply to comment #137)
It did change from Firefox 1.0 to 1.5.
At least in reference to what i originally posted under https://bugzilla.mozilla.org/show_bug.cgi?id=320353
Comment 139•19 years ago
|
||
> https://bugzilla.mozilla.org/show_bug.cgi?id=320353
Parsing of <frameset> in particular was buggy on the 1.7 branch; that's been fixed.
Comment 140•19 years ago
|
||
*** Bug 324195 has been marked as a duplicate of this bug. ***
Comment 141•19 years ago
|
||
*** Bug 324195 has been marked as a duplicate of this bug. ***
Comment 142•19 years ago
|
||
*** Bug 326078 has been marked as a duplicate of this bug. ***
Comment 143•19 years ago
|
||
(In reply to comment #137)
> > Is there any definite decision how this will be handled
> Yes -- as it is now.
Her famous last words... :-) We (developers) had similar answers about innerHTML way ago (Netscape 6 betas).
Phantom nodes preservation *is* a mistake - more over not directly required by W3C. You'll have to change it sooner or later, but later it is - more expensive it will be, so why not start sooner?
FOr the transition period you could (as suggessted by some) to make a flag preservePhantomNodes one could turn on if needed - but off by default.
You even can make preservePhantomNodes == true by default if you really want to.
But the problem needs to be addressed. Right now all your supporters have to write additional filters atop of all node-related DOM methods, which is totally abnormal IMHO. Do you really think it's just fine?
Comment 144•19 years ago
|
||
> Her famous last words... :-) We (developers) had similar answers about
> innerHTML way ago (Netscape 6 betas).
Agreed. innerHTML, document.all, the list goes on... Is IE 7.0 going to now start preserving phantom nodes even though prior versions didn't, all in the name of the spec? I doubt it.
I've personally handled a few hundred business cases where phantom nodes weren't wanted or needed. I'm curious - has anyone actually run into a business case where they *did* need them to remain?
Comment 145•19 years ago
|
||
(In reply to comment #143)
> But the problem needs to be addressed. Right now all your supporters have to
> write additional filters atop of all node-related DOM methods, which is totally
> abnormal IMHO. Do you really think it's just fine?
Hear hear. I'm a jobbing web developer and getting sick of writing out loops to find the nodes I want when there are supposedly DOM mechanisms. More complicated DOM work turns into a plethora of nested loops and it gets difficult to keep track. Since any code already created is catering for both manifestations of the DOM, getting rid of this bug shouldn't break any web sites. Please please write it out.
Comment 146•19 years ago
|
||
*** Bug 329019 has been marked as a duplicate of this bug. ***
Comment 147•19 years ago
|
||
*** Bug 332821 has been marked as a duplicate of this bug. ***
Comment 148•19 years ago
|
||
I'm talking here about HTML case, i'm not and will not consider XML.
DOM tree are not walkable easily with Fx, i urge you guys to reconsider the invalid state.
With so many Duplicate, dont you think your point (not removing phantom node) is plain wrong ?
<p>
<span>one</span>
<span>two</span>
</p>
It's plain and simple, the P tag has ONLY 2 childs, no reason for Firefox to create new text phantom nodes. Especially since NO OTHER browser is acting this wrong way.
From my side of view, the white-space:pre issue is not a real issue. You can eventually keep an internal DOM representation for all the CSS rendering issues you encounter, but give us the excepted DOM representation without thoses ugly phantom text nodes. nextSibling and such is TOTALLY unusable the way it is right now. And the solution which consists of using some (firefox only) helpers functions from comment #11 is wrong. We dont need thoses nodes, we dont want them and worst we have to use some loops (in firefox ONLY) to fix them.
Why do you force me to use this kind of js, or the helpers functions from comment #11, everytime i want to use the childnodes and xxxxxSibling() from DOM ?
DOM.cleanWhitespace = function(element, recursif)
{
element = (typeof element == 'string')? $(element):element;
for (var i=element.childNodes.length-1; i>=0; i--)
{
var node = element.childNodes[i];
if (node.nodeType == document.TEXT_NODE)
{
var nodeValue = node.nodeValue.trim();
if (nodeValue === '' || nodeValue == '\n' || nodeValue == '\t' || nodeValue == '\r' || nodeValue == '\r\n' || nodeValue == '\n\r')
{
element.removeChild(node);
}
}
else if (node.nodeType == document.ELEMENT_NODE)
{
DOM.cleanWhitespace(node, recursif);
}
}
};
Please reconsider the invalid state. Thanks.
So many Duplicate (31) bugs means something imo. When 1 or 2 of my users are reporting a bug, i consider it a specific issue and i fix for them only (helpers functions), but when 31 users are reporting the exact same issue, it's my duty to reconsider my position (even if i'm right) and fix the issue.
147 comments, 31 duplicate = 21,08% of the comments
The issue is real, the invalid state of this bug is just wrong. I though standards would help us to stop using browser specific code. You are breaking it here by not resolving the bug :(
#5 Dup 48560
#7 Dup 62269
#13 Dup 65658
#16 Dup 89782
#19 Dup 104785
#23 Dup 114749
#24 Dup 118213
#63 Dup 131169
#65 Dup 144603
#66 Dup 147489
#68 Dup 147487
#83 Dup 159352
#110 Dup 178508
#111 Dup 179709
#114 Dup 189467
#115 Dup 196983
#119 Dup 206729
#120 Dup 214943
#121 Dup 217842
#122 Dup 221364
#123 Dup 252684
#124 Dup 258564
#125 Dup 263813
#128 Dup 299108
#129 Dup 311654
#131 Dup 315938
#140 Dup 324195
#141 Dup 324195
#142 Dup 326078
#146 Dup 329019
#147 Dup 332821
Comment 149•19 years ago
|
||
> The issue is real, the invalid state of this bug is just wrong. I though
> standards would help us to stop using browser specific code. You are breaking
> it here by not resolving the bug :(
>
> #5 Dup 48560
> #7 Dup 62269
> #13 Dup 65658
> #16 Dup 89782
> #19 Dup 104785
> #23 Dup 114749
> #24 Dup 118213
> #63 Dup 131169
> #65 Dup 144603
> #66 Dup 147489
> #68 Dup 147487
> #83 Dup 159352
> #110 Dup 178508
> #111 Dup 179709
> #114 Dup 189467
> #115 Dup 196983
> #119 Dup 206729
> #120 Dup 214943
> #121 Dup 217842
> #122 Dup 221364
> #123 Dup 252684
> #124 Dup 258564
> #125 Dup 263813
> #128 Dup 299108
> #129 Dup 311654
> #131 Dup 315938
> #140 Dup 324195
> #141 Dup 324195
> #142 Dup 326078
> #146 Dup 329019
> #147 Dup 332821
Fully completely sustained. "Firefox vs Users" is an opposition I would not imagine in my nightmare...
Besides endless dups just search the Web for relevant blogs and forums. What's wrong to mark it ACTIVE BLOCKING and add a flag like -moz-preserve-phantom-nodes we could set to false (leave it default true, no problem). What's wrong in doing it in the next minor update rather then fight for nothing till the last bullet?
(Not sure though if this whole thread is not in a "kill file" anyway)
Comment 150•19 years ago
|
||
> (Not sure though if this whole thread is not in a "kill file" anyway)
It pretty much is, since none of the people doing the talking, yourself included, seem to understand what's going on with the relevant standards, much less the changes they want made in the rendering engine.
I will make one more futile attempt to set people straight, however. I advise reading closely.
What you seem to want is a fundamental rewrite of the CSS engine in Gecko so it's not DOM-based (like IE). Then you want the DOM exposed to the web page to not reflect the actual data structures the browser has in memory but rather to depend on the CSS formatting. All of that that seems like a poor idea, esp. since it would introduce a lot of IE's CSS bugs; bugs that are basically due to its internal representation NOT being a DOM.
Note that Opera 8.5 behaves like Gecko does on the testcase in this bug and on other testcases I've tried for this behavior, in both standards and quirks mode. So does Konqueror in standards mode (can't test it in quirks mode at the moment). So does Safari, last I checked (I don't have a Safari build on me right now).
Given that, the "Especially since NO OTHER browser is acting this wrong way" crap in comment 148 simply indicates to me that laurent vilday likes to make claims without testing them. In fact, every single modern browser other than IE (we're going to consider IE a modern browser here for the sake of argument) behaves like Gecko does.
I would also like to reply to one other part of comment 148. Specifically, the part where it says "So many Duplicate (31) bugs means something imo." What that means is that 31 of the people out there who like coding to IE's mis-implementation of the DOM spec filed bugs requesting that we introduce the same bugs. No more, no less. Given the number of "IE-only" sites out there even now, this number is not really all that surprising, at least to me.
Comment 151•19 years ago
|
||
(In reply to comment #150)
> What you seem to want is a fundamental rewrite of the CSS engine in
> Gecko so it's not DOM-based (like IE).
We don't need to sound so tragic. We (accounting at least 31 filed dups I can use "we") simply want a possibility to /choose/ the most convenient way to handle the DOM.
W3C Box Model is more than disconvenient yet there are maybe people who just love it and cannot live without it. So Mozilla just added -moz-box-sizing so anyone could "go to the hell by his own road" :-) It seems that it did not cause the sky to fail onto the earth, didn't it?
The same way it is needed to give the same choice in DOM Tree structure: some -moz-preserve-everything or so. Whoever likes phantom nodes and even sees some usee for them is welcome to leave the default "yes". The rest can set it to "false". The phantom nodes are being digged out anyway by parser from the source pretty-print, so the relevant patch would take a line or two (check false/true in a flag and either throw away or add to the tree).
I can imagine some circumstances where it is needed to switch a fragment from parsed to <pre> state and back - or to restore the source byte-in-byte as it came from the server. But these are /occasional/ usages as opposed to the mass usage, and really shouldn't be the subject of such intensive preoccupation. Still -moz-preserve-everything (or whatever) set to true takes care even of these occasional situation.
> I would also like to reply to one other part of comment 148.
> Specifically, the part where it says "So many Duplicate (31) bugs means something imo."
> What that means is that 31 of the people out there who like coding to IE's
> mis-implementation of the DOM spec filed bugs requesting that we introduce the
> same bugs.
Then my question is: how many bugs to you need to be filed to admit that there is something rotten in the kingDOM? 310? 3,100? 31,000? I believe NN6 failed for innerHTML after 2,000 or so claims. Do you really need another Chartist Movement :-))
Also please note that these are not "just 31 people". These are active Firefox supporters bothered to learn about bugzilla, open an account, prepare testcase and file bug properly. I would easily add behind each of them at least 100 end users who just did not have time for all of that or who was not aware of bugzilla, or simply dropped Firefox. IMHighlyHO.
Comment 152•19 years ago
|
||
VK, this bug is about what getFirstChild and getNextSibling return. Those are defined by the DOM spec and return whatever the DOM has. If what you want are separate methods to access only parts of the DOM (similar to what the SVG 1.2 Tiny spec has -- they only see Element nodes), then feel free to file _separate_ bugs on that. Please make sure to clearly define exactly what your proposed methods should return in all cases. Once there's a clear need established and a clear description of what the methods should do (which was the situation with innerHTML), implementing them can actually be discussed in a reasonable way.
Also, if you have issues with the DOM spec you may want to consider raising them with the W3C so that _all_ browsers would implement these methods you want. Unless you plan to write script that only works in Gecko but breaks in Opera or Safari or Konqueror?
Comment 153•19 years ago
|
||
(In reply to comment #152)
Phantom nodes filed as Bug #339511
> VK, this bug is about what getFirstChild and getNextSibling return.
Not really, it is about "Mozilla reports phantom text nodes in the DOM tree" as the bug description states. Native DOM methods problems is just one of outcomes.
> Those are defined by the DOM spec and return whatever the DOM has.
Last two weeks (when I had free time of course) I spent by trying to find these definitive DOM specs and failed. It seems though that it is the same with all other researchers. All I see is "somebody said something, and as we did not find any better place, we just dumped it in here as emty text nodes" - and I was really careful in reading your arguments in this thread. Yet I might missed something vitally important.
> then feel free to file _separate_ bugs on that.
Not a separate bug really, but the same old mistake reviewed after six years once over again: with new facts and in a whole new situation.
I also linked some testcases which (I think) will be a big surprise to you in application of "DOM specs" ;-)
Phantom nodes filed as Bug #339511
Comment 154•19 years ago
|
||
*** Bug 339511 has been marked as a duplicate of this bug. ***
Comment 155•19 years ago
|
||
(In reply to comment #154)
> *** Bug 339511 has been marked as a duplicate of this bug. ***
By mistake which is corrected now. Please note that bug #339511 is a feature request, not a "bug" (as something contradicting to the declared behavior).
Comment 156•19 years ago
|
||
*** Bug 339511 has been marked as a duplicate of this bug. ***
Comment 157•19 years ago
|
||
*** Bug 339766 has been marked as a duplicate of this bug. ***
Comment 158•18 years ago
|
||
A friend of mine, Joao Eiras from Portugal,who is a w3c mailing member, he gave me a valious hint:
TO WORK WITH TABLES THE BEST APPROACH IS table.rows[y].cells[x]
I think it solves 90% of the trouble... (using the correct method)
The funniest thing is that I developed a Javascript Self Explorer (available at http://sitedosergio.sitesbr.net - inside the Computings > Javascript menu) that shows me that exist "rows" and "cells" properties for table objects... but it's hard to know or remember everything ... :D
Thanks Joao !
Comment 159•18 years ago
|
||
*** Bug 364248 has been marked as a duplicate of this bug. ***
Comment 161•18 years ago
|
||
I have seen the following workaround somewhere: embed your readability white space into tags, i.e. use <tag ></tag > instead of <tag> </tag>. In this way the source code contains whitespace and it is more readable and the whitespace gets consumed by the tag parser and does not make it into the DOM tree. This trick, however, does not work for HTML comments, which enter the DOM tree as comment nodes. While you have complete control on where you put your comments, it is not possible to hide them from the DOM. And adding a comment to HTML or removing one can break your script, which is even more unexpected. That is going to happen when a future maintainer decides he needs an annotation here and there!
All in all, it seems there is no reliable way to handle this except for using helper functions.
Comment 162•18 years ago
|
||
(In reply to comment #29)
> Which nodes are you proposing to eliminate? Only the ones inside start-tags
> and
> end-tags, or all of the ones you can (whatever that means)? If the former, how
> do you plan to handle the childNodes array for something like:
>
> <ul>
> <li>foo</li>
> <li>bar</li>
> </ul>
>
> I think the DOM benifits of such an approach are minimal.
>
> If the latter, how do you plan to ensure that <span>two</span>
> <span>words</span> aren't merged together?
>
> Which does IE do?
IE blindly eats up white space between elements and inside elements whether it is appropriate or not:
<http://www.microsoft.com/communities/newsgroups/list/en-us/default.aspx?&lang=en&cr=us&guid=&sloc=en-us&dg=microsoft.public.internetexplorer.general&p=1&tid=925c63ae-5f0b-452b-8b61-a5d5a67a6330&mid=925c63ae-5f0b-452b-8b61-a5d5a67a6330>
>
> In all cases, how do you plan to keep 'white-space: pre' working?
>
Comment 164•17 years ago
|
||
My problem with the current implementation relates to loading up XML documents. I can understand that you need to keep the white-space around with HTML so that "white-space: pre" works, however I don't see the need for it when loading up XML documents. Given that an XML document is a representation of data, having the white-space nodes there does not make any sense at all.
However, I'm not just raking up the old arguments again, I do have a question to ask:
I'm using:
XmlDoc = document.implementation.createDocument;
followed by:
XmlDoc.load('xmlDoc.xml');
to load up an XML document. Is the load method used by the browser when it is creating the DOM of an HTML page?
Because if not, can the load method just not create white-space nodes? Or take an additional boolean parameter to specify it?
Failing that, I see that the third parameter for the document.implementation.createDocument method is not implemented yet - can it not be a boolean to switch white-space node creation on or off?
Comment 165•17 years ago
|
||
You should rather load XML data using XMLHttpRequest.
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
You need to log in
before you can comment on or make changes to this bug.
Description
•