Closed
Bug 2750
Opened 26 years ago
Closed 18 years ago
Line feeds after start tags and before end tags should be ignored.
Categories
(Core :: DOM: HTML Parser, defect)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
INVALID
People
(Reporter: ian, Unassigned)
References
()
Details
(Keywords: testcase, Whiteboard: [Hixie-P5])
Attachments
(1 file)
(deleted),
text/html
|
Details |
See the test page quoted.
Basically,
<span>
Hello
</span>
Should be parsed as <span>Hello</span> not as <span> Hello </span>.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → WONTFIX
I'm denying this bug because you can't really ignore these newlines. For
example, if CSS-PRE was mapped onto a non-PRE element, our behavior would change
and suddenly the newlines would be significant. The real newline handlers are in
layout.
Reporter | ||
Updated•26 years ago
|
Status: RESOLVED → REOPENED
Reporter | ||
Comment 3•26 years ago
|
||
Newlines after start tags and before end tags are *NEVER* significant
according to the HTML 4.0 specification and the SGML standard.
Whether this is inside or outside a PRE or whitespace:pre block is
completely irrelevant. It is a parsing issue.
From HTML4, section B.3.1:
> SGML (see [ISO8879], section 7.6.1) specifies that a line break
> immediately following a start tag must be ignored, as must a
> line break immediately before an end tag. This applies to all HTML
> elements without exception.
My comments stand. This will not be changed in the parser. If there's an
interpretation to be made, it must be handled in the content model and reflow
process.
Reporter | ||
Comment 5•26 years ago
|
||
Rick - Based on our discussion, is this bug to be reopened or not?
I suggest a dual mode of operation, a strictly correct one for HTML4 Strict
DTDs and an "old-style" mode for other DTDs.
Updated•26 years ago
|
QA Contact: 4141
Comment 6•26 years ago
|
||
Reassigning QA contact to gem@netscape.com (HTML parser contact).
Reporter | ||
Updated•26 years ago
|
Status: RESOLVED → REOPENED
Reporter | ||
Updated•26 years ago
|
Status: REOPENED → RESOLVED
Closed: 26 years ago → 26 years ago
Resolution: WONTFIX → REMIND
Reporter | ||
Updated•26 years ago
|
Status: RESOLVED → VERIFIED
Target Milestone: M4
Reporter | ||
Comment 7•26 years ago
|
||
[Missed M4. Removing target milestone.]
Reporter | ||
Comment 10•25 years ago
|
||
Reopening and targetting for post feature-complete milestone.
Reducing priority and severity.
Severity: normal → minor
Status: VERIFIED → REOPENED
OS: Windows 98 → All
Priority: P2 → P3
Hardware: Other → All
Resolution: REMIND → ---
Target Milestone: --- → M18
Comment 11•25 years ago
|
||
Ian -- I think you do brilliant work, so please don't take this personally. But
this is not a parser bug. I will not change this. The layout system MUST handle
this. I'll give this bug to nisheeth over in layout.
Assignee: rickg → nisheeth
Status: REOPENED → NEW
Component: Parser → Layout
Reporter | ||
Comment 12•25 years ago
|
||
This is an HTML-only thing. In XHTML/XML, this does not apply. However, in HTML
(and any other SGML languages we understand), a newline immediately following a
start tag and a newline immediately preceeding an end tag should be ignored, as
if it did not exist. It should not even appear in the DOM. This is per HTML4,
section B.3.1:
# SGML (see [ISO8879], section 7.6.1) specifies that a line break
# immediately following a start tag must be ignored, as must a
# line break immediately before an end tag. This applies to all HTML
# elements without exception.
I do not see how Layout can handle this. This has nothing to do with the <pre>
element or the 'white-space' CSS property. For example, the following:
<pre>
line 1
line 2
</pre>
...should be treated as *exactly* the same as the following:
<pre>line 1
line 2</pre>
We discussed this by e-mail in February/March 1999. If you would like me to
quote some of my arguments here for the record I would be happy to do so since I
still have those mail archives.
Comment 13•25 years ago
|
||
I'm re-assigning this to you, harish, in light of Ian's comments. If Rick and
Ian agree that this needs to be implemented in layout, please re-assign this to
me. Thanks.
Assignee: nisheeth → harishd
Comment 14•25 years ago
|
||
This bug has been marked "future" because the original netscape engineer working
on this is over-burdened. If you feel this
is an error, that you or another known resource will be working on this bug,or
if it blocks your work in some way -- please
attach your concern to the bug for reconsideration.
Reporter | ||
Comment 15•25 years ago
|
||
*** Bug 20083 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 16•25 years ago
|
||
Reassigning to Erik as per bug 20083.
Comment 17•25 years ago
|
||
erik is on sabbitcal. shanjian, please take a look at this.
Assignee: erik → shanjian
Updated•25 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Comment 18•24 years ago
|
||
Taking QA per managerial policy.
QA Contact: gem → py8ieh=bugzilla
Comment 20•24 years ago
|
||
It looks both IE5 and Nav4 behave the same way as we do from bug 20083. It seems
there are no strong reason we should fix this soon. Mark this as future.
Target Milestone: --- → Future
Comment 21•24 years ago
|
||
> It looks both IE5 and Nav4 behave the same way
> as we do from bug 20083.
Which is a bug!
> It seems
> there are no strong reason we should fix
> this soon.
Using XSLT to transform XML to XHTML/HTML/XML with indentation often cause the
start and end tags to be separated from the content like in the example.
BTW, wouldn't this also affect the DOM?
Updated•24 years ago
|
Keywords: mozilla0.9
Comment 22•24 years ago
|
||
makingtest - ziz@ziz.org
Comment 23•24 years ago
|
||
Comment 24•24 years ago
|
||
testcase - ziz@ziz.org
Reporter | ||
Comment 25•24 years ago
|
||
After all this time, and with the advent of XML, maybe I should just call it
quits and move on and mark this WONTFIX... :-)
Whiteboard: [Hixie-P5]
Reporter | ||
Comment 26•24 years ago
|
||
*** Bug 78061 has been marked as a duplicate of this bug. ***
Comment 27•24 years ago
|
||
erik resign. reassign all his bug to ftang for now.
Assignee: erik → ftang
Comment 28•24 years ago
|
||
mark all future new as assigned after move from erik to ftang
Status: NEW → ASSIGNED
If I supply a fix for this will it be checked in? I'm not saying that I'll be
able to fix this even if I try, but I don't wanna spend cycles on this if a
patch will be rejected anyway. This seems like a controversal change that could
cause some problems in layout, but nevertheless a change that is the RighThing.
I'm not saying that I'll actually will be able to fix it, but I would like to
try if it's not a waste of time...
Reporter | ||
Comment 30•24 years ago
|
||
ftang: I'm reassigning to Harish, who should own this bug really.
harishd: see comment above, from Jonas Sicking.
Assignee: ftang → harishd
Status: ASSIGNED → NEW
Component: Layout → Parser
Comment 31•24 years ago
|
||
Jonas: Mozilla promises standards and if we're not following the spec. then
there should be a good reason ( though I can't think of any! ) why we are so. If
you could come up with a fix that wouldn't break existing pages / functionality
/ backwards compatibility then I would be glad to accept your change.
Removing FUTURE and putting the bug on the 0.9.5 radar.
Target Milestone: Future → mozilla0.9.5
Comment 33•23 years ago
|
||
--> 0.9.7
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Comment 34•23 years ago
|
||
Note that the SGML rules for this are slightly more complicated than just
removing line feeds after start tags and before end tags. (And that should
properly be line *feed*; only one line feed is removed per parse.) Some of it
will have to be done in the content sink.
In full:
"If an RS in content is not interpreted as markup, it is ignored."
"The first RE in an element is ignored if no RS, data, or proper subelement
preceded it."
"The last RE in an element is ignored if no data or proper subelement follows it."
"An RE that does not immediately follow an RS or RE is ignored if no data or
proper subelement intervened."
Because of the way we create RE and RS from the entity manager (again, see bug
47078), this is simpler than it sounds; but leave this bug open once we trim
line feeds after start tags and before end tags so that the final, small changes
in the content sink can be made.
Comment 35•23 years ago
|
||
Don't have the time to fix this for 0.9.7. Moving to 0.9.9.
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Comment 36•23 years ago
|
||
Whoever fixes this please note that bug 17003 has foo in there to strip leading
CRLF / LF on TEXTAREA, which can be happily removed when this lands.
Comment 37•23 years ago
|
||
Would this remove many content nodes from normal pages? In that case it could
also improve performance.
Comment 38•23 years ago
|
||
*** Bug 131479 has been marked as a duplicate of this bug. ***
Comment 39•23 years ago
|
||
Since there's some dispute over whether the "showing line breaks after img tags
as underscores" bug is a parser or a layout bug and in case this patch is still
being considered, someone should make the point that:
Having the parser discard formatting whitespace in this one specific case but
not in other cases would be inconsistent with the model we've used up to now,
and will break the attempts by the html serializer to be able to reconstruct the
original html. For instance, many people want to be able to use composer to
make changes on an existing document, without losing all their line breaks.
Since these line breaks are legal and make for more readable html code, having
the parser remove them is not a good solution. Much better to make layout
ignore them.
Perhaps the best solution would be to have the parser generate "non-significant
whitespace" nodes instead of text nodes, which layout can then proceed to
ignore. It should generally do this on other whitespace, too. There's another
bug somewhere on that, but it's stalled, perhaps because the parser can't
reliably tell what whitespace is significant or not (since a pre style rule may
or may not apply).
Comment 40•23 years ago
|
||
Fixing this in layout makes more sense to me too. New lines are part of the
Document and must not be removed by a browser at any stage however the browser
can be smart enough to handle that based on the specs and requirements.
Comment 41•23 years ago
|
||
Please read the relevant specifications before making absurd proclamations. The
line feeds covered by this case, which applies to *HTML 4* documents (a subset
of *SGML*) are *explicitly required* by ISO 8879 to be removed from the document
during parsing. XML requires that "An XML processor must always pass all
characters in a document that are not markup through to the application"; this
does not apply to HTML 4.
Having fulminated, the real problem here is that DOM isn't really suitable for
preserving the original formatting of a source. The DOM is, prima facie,
incapable of accurately representing the entire set of HTML 4 documents,
regardless of what the W3C may say. To do a "null transformation" of an HTML
document (doc1->representation->doc2, doc1 and doc2 are identical) requires
support for the SGML Property Set (a HyTime thing). See
<URL:http://lists.w3.org/Archives/Public/www-dom/1999OctDec/0071.html>.
Since I don't anticipate this appearing in Mozilla any time soon, akkana's
suggestion of "non-significant whitespace" nodes in the DOM seems to be the
best, although we should also be hiding access to them from external sources
(since they aren't really data or part of the DOM, just packed in their for the
convenience of the application). However, given bugs like bug 129508, I have to
question whether retaining these linebreaks is even worth doing, given the
mangling we inflict by normalizing docs when they come back out of DOM.
Comment 42•22 years ago
|
||
Though this is important I hardly have the time to work on it.
--> Futuring
Target Milestone: mozilla1.2beta → Future
Comment 43•22 years ago
|
||
*** Bug 214722 has been marked as a duplicate of this bug. ***
Comment 44•21 years ago
|
||
*** Bug 107927 has been marked as a duplicate of this bug. ***
Comment 45•20 years ago
|
||
*** Bug 287547 has been marked as a duplicate of this bug. ***
Updated•20 years ago
|
Assignee: harishd → parser
Status: ASSIGNED → NEW
Priority: P3 → --
QA Contact: ian → mrbkap
Target Milestone: Future → ---
Comment 46•20 years ago
|
||
*** Bug 287907 has been marked as a duplicate of this bug. ***
Comment 47•20 years ago
|
||
bug 26179 has mawled through whitespaces in DOM issue and has been settled.
This is now clearly a layout and not a parser bug.
HTML itself speaks about "rendering, not "parsing":
http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks
XHTML 1.0, places a burden of defining white space rendering onto CSS2:
http://www.w3.org/TR/xhtml1/#uaconf
While being XML it's parser must preserve white space according to both XML 1.0
and XML 1.1 specs:
http://www.w3.org/TR/REC-xml/#sec-white-space
http://www.w3.org/TR/xml11/#sec-white-space
Further XHTML languages don't even want to deal with whitespaces leaving it all
to CSS:
http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/introduction.html#s_intro_formatting
However CSS2 does a pretty poor job at this:
http://www.w3.org/TR/REC-CSS2/text.html#white-space-prop
But this is fixed in the upcoming CSS 2.1 standard:
http://www.w3.org/TR/CSS21/text.html#q8
And made even more complex in CSS3:
http://www.w3.org/TR/css3-text/#white-space-processing
As such the fixing of this bug should go in accordance with CSS rules, targeting
both CSS 2.1 and CSS3 specs.
Some other bugs that dealt with this, which might need to be duped to this:
bug 36717 - testcase
bug 132561 - testcase and patch
bug 157698 - testcase
bug 197716 - testcase
bug 209073 - and bugs mentioned in it
bug 245926 - testcase
Assignee: parser → nobody
Component: HTML: Parser → Layout
QA Contact: mrbkap → layout
Comment 48•20 years ago
|
||
White space processing defined in CSS3 Text will probably be simplified in the
next version, to be closer to CSS2.1.
Do NOT change the meaning of a 6 year old bug just because someone errorlously
duped the wrong bug against it.
This bug IS about parsing and nothing else. It affects the DOM and form
submission and therefor can NOT be fixed by changing layout.
If you need a layout bug file a NEW one. DON'T whine in here or argue about it,
it only adds spam. Just file a new bug and dup whatever bugs you feel are
appropriate against it.
Assignee: nobody → parser
Component: Layout → HTML: Parser
QA Contact: layout → mrbkap
Comment 50•20 years ago
|
||
*** Bug 291239 has been marked as a duplicate of this bug. ***
Comment 51•19 years ago
|
||
In the light of HTML 5, and Opera, Internet Explorer and Mozilla agreeing on the
same behavior this should probably just be WONTFIX. (Or INVALID per HTML 5.)
Reporter | ||
Comment 52•18 years ago
|
||
Yeah.
Status: NEW → RESOLVED
Closed: 26 years ago → 18 years ago
Resolution: --- → WONTFIX
Updated•14 years ago
|
Assignee: parser → nobody
QA Contact: mrbkap → parser
Resolution: WONTFIX → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•