Closed Bug 2750 Opened 26 years ago Closed 18 years ago

Line feeds after start tags and before end tags should be ignored.

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
minor

Tracking

()

RESOLVED INVALID

People

(Reporter: ian, Unassigned)

References

()

Details

(Keywords: testcase, Whiteboard: [Hixie-P5])

Attachments

(1 file)

See the test page quoted. Basically, <span> Hello </span> Should be parsed as <span>Hello</span> not as <span> Hello </span>.
Status: NEW → ASSIGNED
Setting all current Open/Normal to M4.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → WONTFIX
I'm denying this bug because you can't really ignore these newlines. For example, if CSS-PRE was mapped onto a non-PRE element, our behavior would change and suddenly the newlines would be significant. The real newline handlers are in layout.
Status: RESOLVED → REOPENED
Newlines after start tags and before end tags are *NEVER* significant according to the HTML 4.0 specification and the SGML standard. Whether this is inside or outside a PRE or whitespace:pre block is completely irrelevant. It is a parsing issue. From HTML4, section B.3.1: > SGML (see [ISO8879], section 7.6.1) specifies that a line break > immediately following a start tag must be ignored, as must a > line break immediately before an end tag. This applies to all HTML > elements without exception.
Status: REOPENED → RESOLVED
Closed: 26 years ago26 years ago
My comments stand. This will not be changed in the parser. If there's an interpretation to be made, it must be handled in the content model and reflow process.
Rick - Based on our discussion, is this bug to be reopened or not? I suggest a dual mode of operation, a strictly correct one for HTML4 Strict DTDs and an "old-style" mode for other DTDs.
QA Contact: 4141
Reassigning QA contact to gem@netscape.com (HTML parser contact).
Status: RESOLVED → REOPENED
Status: REOPENED → RESOLVED
Closed: 26 years ago26 years ago
Resolution: WONTFIX → REMIND
Status: RESOLVED → VERIFIED
Target Milestone: M4
[Missed M4. Removing target milestone.]
Blocks: html4.01
*** Bug 8856 has been marked as a duplicate of this bug. ***
*** Bug 10490 has been marked as a duplicate of this bug. ***
Reopening and targetting for post feature-complete milestone. Reducing priority and severity.
Severity: normal → minor
Status: VERIFIED → REOPENED
OS: Windows 98 → All
Priority: P2 → P3
Hardware: Other → All
Resolution: REMIND → ---
Target Milestone: --- → M18
Ian -- I think you do brilliant work, so please don't take this personally. But this is not a parser bug. I will not change this. The layout system MUST handle this. I'll give this bug to nisheeth over in layout.
Assignee: rickg → nisheeth
Status: REOPENED → NEW
Component: Parser → Layout
This is an HTML-only thing. In XHTML/XML, this does not apply. However, in HTML (and any other SGML languages we understand), a newline immediately following a start tag and a newline immediately preceeding an end tag should be ignored, as if it did not exist. It should not even appear in the DOM. This is per HTML4, section B.3.1: # SGML (see [ISO8879], section 7.6.1) specifies that a line break # immediately following a start tag must be ignored, as must a # line break immediately before an end tag. This applies to all HTML # elements without exception. I do not see how Layout can handle this. This has nothing to do with the <pre> element or the 'white-space' CSS property. For example, the following: <pre> line 1 line 2 </pre> ...should be treated as *exactly* the same as the following: <pre>line 1 line 2</pre> We discussed this by e-mail in February/March 1999. If you would like me to quote some of my arguments here for the record I would be happy to do so since I still have those mail archives.
I'm re-assigning this to you, harish, in light of Ian's comments. If Rick and Ian agree that this needs to be implemented in layout, please re-assign this to me. Thanks.
Assignee: nisheeth → harishd
Status: NEW → ASSIGNED
Target Milestone: M18 → Future
This bug has been marked "future" because the original netscape engineer working on this is over-burdened. If you feel this is an error, that you or another known resource will be working on this bug,or if it blocks your work in some way -- please attach your concern to the bug for reconsideration.
*** Bug 20083 has been marked as a duplicate of this bug. ***
Reassigning to Erik as per bug 20083.
Assignee: harishd → erik
Status: ASSIGNED → NEW
Keywords: correctness
Target Milestone: Future → M19
erik is on sabbitcal. shanjian, please take a look at this.
Assignee: erik → shanjian
Status: NEW → ASSIGNED
Blocks: 38196
Taking QA per managerial policy.
QA Contact: gem → py8ieh=bugzilla
reassign back to erik.
Assignee: shanjian → erik
Status: ASSIGNED → NEW
It looks both IE5 and Nav4 behave the same way as we do from bug 20083. It seems there are no strong reason we should fix this soon. Mark this as future.
Target Milestone: --- → Future
> It looks both IE5 and Nav4 behave the same way > as we do from bug 20083. Which is a bug! > It seems > there are no strong reason we should fix > this soon. Using XSLT to transform XML to XHTML/HTML/XML with indentation often cause the start and end tags to be separated from the content like in the example. BTW, wouldn't this also affect the DOM?
makingtest - ziz@ziz.org
testcase - ziz@ziz.org
After all this time, and with the advent of XML, maybe I should just call it quits and move on and mark this WONTFIX... :-)
Whiteboard: [Hixie-P5]
*** Bug 78061 has been marked as a duplicate of this bug. ***
erik resign. reassign all his bug to ftang for now.
Assignee: erik → ftang
mark all future new as assigned after move from erik to ftang
Status: NEW → ASSIGNED
If I supply a fix for this will it be checked in? I'm not saying that I'll be able to fix this even if I try, but I don't wanna spend cycles on this if a patch will be rejected anyway. This seems like a controversal change that could cause some problems in layout, but nevertheless a change that is the RighThing. I'm not saying that I'll actually will be able to fix it, but I would like to try if it's not a waste of time...
ftang: I'm reassigning to Harish, who should own this bug really. harishd: see comment above, from Jonas Sicking.
Assignee: ftang → harishd
Status: ASSIGNED → NEW
Component: Layout → Parser
Jonas: Mozilla promises standards and if we're not following the spec. then there should be a good reason ( though I can't think of any! ) why we are so. If you could come up with a fix that wouldn't break existing pages / functionality / backwards compatibility then I would be glad to accept your change. Removing FUTURE and putting the bug on the 0.9.5 radar.
Target Milestone: Future → mozilla0.9.5
Keywords: testcase
--> 0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
--> 0.9.7
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Note that the SGML rules for this are slightly more complicated than just removing line feeds after start tags and before end tags. (And that should properly be line *feed*; only one line feed is removed per parse.) Some of it will have to be done in the content sink. In full: "If an RS in content is not interpreted as markup, it is ignored." "The first RE in an element is ignored if no RS, data, or proper subelement preceded it." "The last RE in an element is ignored if no data or proper subelement follows it." "An RE that does not immediately follow an RS or RE is ignored if no data or proper subelement intervened." Because of the way we create RE and RS from the entity manager (again, see bug 47078), this is simpler than it sounds; but leave this bug open once we trim line feeds after start tags and before end tags so that the final, small changes in the content sink can be made.
Don't have the time to fix this for 0.9.7. Moving to 0.9.9.
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Whoever fixes this please note that bug 17003 has foo in there to strip leading CRLF / LF on TEXTAREA, which can be happily removed when this lands.
Would this remove many content nodes from normal pages? In that case it could also improve performance.
Target Milestone: mozilla0.9.9 → mozilla1.0.1
*** Bug 131479 has been marked as a duplicate of this bug. ***
Blocks: 107927
Target Milestone: mozilla1.0.1 → mozilla1.2beta
Since there's some dispute over whether the "showing line breaks after img tags as underscores" bug is a parser or a layout bug and in case this patch is still being considered, someone should make the point that: Having the parser discard formatting whitespace in this one specific case but not in other cases would be inconsistent with the model we've used up to now, and will break the attempts by the html serializer to be able to reconstruct the original html. For instance, many people want to be able to use composer to make changes on an existing document, without losing all their line breaks. Since these line breaks are legal and make for more readable html code, having the parser remove them is not a good solution. Much better to make layout ignore them. Perhaps the best solution would be to have the parser generate "non-significant whitespace" nodes instead of text nodes, which layout can then proceed to ignore. It should generally do this on other whitespace, too. There's another bug somewhere on that, but it's stalled, perhaps because the parser can't reliably tell what whitespace is significant or not (since a pre style rule may or may not apply).
Fixing this in layout makes more sense to me too. New lines are part of the Document and must not be removed by a browser at any stage however the browser can be smart enough to handle that based on the specs and requirements.
Please read the relevant specifications before making absurd proclamations. The line feeds covered by this case, which applies to *HTML 4* documents (a subset of *SGML*) are *explicitly required* by ISO 8879 to be removed from the document during parsing. XML requires that "An XML processor must always pass all characters in a document that are not markup through to the application"; this does not apply to HTML 4. Having fulminated, the real problem here is that DOM isn't really suitable for preserving the original formatting of a source. The DOM is, prima facie, incapable of accurately representing the entire set of HTML 4 documents, regardless of what the W3C may say. To do a "null transformation" of an HTML document (doc1->representation->doc2, doc1 and doc2 are identical) requires support for the SGML Property Set (a HyTime thing). See <URL:http://lists.w3.org/Archives/Public/www-dom/1999OctDec/0071.html>. Since I don't anticipate this appearing in Mozilla any time soon, akkana's suggestion of "non-significant whitespace" nodes in the DOM seems to be the best, although we should also be hiding access to them from external sources (since they aren't really data or part of the DOM, just packed in their for the convenience of the application). However, given bugs like bug 129508, I have to question whether retaining these linebreaks is even worth doing, given the mangling we inflict by normalizing docs when they come back out of DOM.
Though this is important I hardly have the time to work on it. --> Futuring
Target Milestone: mozilla1.2beta → Future
*** Bug 214722 has been marked as a duplicate of this bug. ***
*** Bug 107927 has been marked as a duplicate of this bug. ***
*** Bug 287547 has been marked as a duplicate of this bug. ***
Assignee: harishd → parser
Status: ASSIGNED → NEW
Priority: P3 → --
QA Contact: ian → mrbkap
Target Milestone: Future → ---
*** Bug 287907 has been marked as a duplicate of this bug. ***
bug 26179 has mawled through whitespaces in DOM issue and has been settled. This is now clearly a layout and not a parser bug. HTML itself speaks about "rendering, not "parsing": http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks XHTML 1.0, places a burden of defining white space rendering onto CSS2: http://www.w3.org/TR/xhtml1/#uaconf While being XML it's parser must preserve white space according to both XML 1.0 and XML 1.1 specs: http://www.w3.org/TR/REC-xml/#sec-white-space http://www.w3.org/TR/xml11/#sec-white-space Further XHTML languages don't even want to deal with whitespaces leaving it all to CSS: http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/introduction.html#s_intro_formatting However CSS2 does a pretty poor job at this: http://www.w3.org/TR/REC-CSS2/text.html#white-space-prop But this is fixed in the upcoming CSS 2.1 standard: http://www.w3.org/TR/CSS21/text.html#q8 And made even more complex in CSS3: http://www.w3.org/TR/css3-text/#white-space-processing As such the fixing of this bug should go in accordance with CSS rules, targeting both CSS 2.1 and CSS3 specs. Some other bugs that dealt with this, which might need to be duped to this: bug 36717 - testcase bug 132561 - testcase and patch bug 157698 - testcase bug 197716 - testcase bug 209073 - and bugs mentioned in it bug 245926 - testcase
Assignee: parser → nobody
Component: HTML: Parser → Layout
QA Contact: mrbkap → layout
White space processing defined in CSS3 Text will probably be simplified in the next version, to be closer to CSS2.1.
Do NOT change the meaning of a 6 year old bug just because someone errorlously duped the wrong bug against it. This bug IS about parsing and nothing else. It affects the DOM and form submission and therefor can NOT be fixed by changing layout. If you need a layout bug file a NEW one. DON'T whine in here or argue about it, it only adds spam. Just file a new bug and dup whatever bugs you feel are appropriate against it.
Assignee: nobody → parser
Component: Layout → HTML: Parser
QA Contact: layout → mrbkap
*** Bug 291239 has been marked as a duplicate of this bug. ***
In the light of HTML 5, and Opera, Internet Explorer and Mozilla agreeing on the same behavior this should probably just be WONTFIX. (Or INVALID per HTML 5.)
Yeah.
Status: NEW → RESOLVED
Closed: 26 years ago18 years ago
Resolution: --- → WONTFIX
Assignee: parser → nobody
QA Contact: mrbkap → parser
Resolution: WONTFIX → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: