Closed
Bug 30386
Opened 25 years ago
Closed 23 years ago
" " in internal href adds strange "Â" character
Categories
(Core :: Layout, defect, P3)
Tracking
()
VERIFIED
FIXED
Future
People
(Reporter: faught, Assigned: waterson)
Details
(Keywords: relnote, testcase, Whiteboard: relnote-devel)
Attachments
(2 files)
I loaded this test page using Mozilla M14 (build ID 2000022820):
----------cut here
<a href="#x x">Link here</a><p>
<a name="x x">Target here</a>
----------cut here
When I click on the "Link here" link, nothing happens (to test internal links
like this, you have to reduce the height of the browser so it's shorter than the
page, and/or add additional lines between the link and the target). The URL it
was trying to go to was "file:///C|/TEMP/spacelink.html#x x" - it inserted an
extra character before the space.
Note that this name attribute isn't actually valid HTML, but Navigator 4.61 does
handle it properly. At the very least, the extra character inserted into the
URL looks like a bug.
This might be related to bug #29312.
Comment 1•25 years ago
|
||
Comment 2•25 years ago
|
||
The attachment is the reporter's testcase; the problem is confirmed with
2000-03-04-17-M15 on WinNT.
The incorrect #x x" shows up in the status bar after clicking on the
"Link here" link, but not in the URL bar. Each click on that link adds
another blank entry onto the Go menu session history; it takes the same number
of presses on the [Back] button to return to the testcase page.
The HTML 4.0 DTD defines URIs to be CDATA, which can contain character entities.
URIs are further restricted to ASCII characters by the
HTML 4 spec: http://206.51.27.220/html/struct/links.html#h-12.2.1 and
http://www.w3.org/TR/REC-html40/appendix/notes.html#non-ascii-chars
On the other hand, at the latter URL under B.2.2 the use of & as the start
character of a character entity takes precedence over its use as a delimiter
between form fields. The spec makes no mention of character entities within
fragment identifiers, but if they are legal inside the query part of the URI,
they probably should be in the fragment identifier too, especially since
RFC 2396 puts few limits on the legal ASCII characters in a fragment.
The legal characters for URIs are defined in section 2 of RFC 2396
http://www.ietf.org/rfc/rfc2396.txt The fragment identifier is defined in
section 4.1. No characters are reserved in the fragment identifier portion of
a URI reference. The only characters excluded will be those listed in section
2.4.3 - control characters, space, and delimiters. Quoting from Appendix A:
> fragment = *uric
> uric = reserved | unreserved | escaped
> reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
> unreserved = alphanum | mark
> mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
"(" | ")"
[alphanum means what you'd expect]
The question of whether the fragment should appear in the URL bar as "x x" or as
"x x" is another question, but since the RFC allows those characters and
the HTML 4.01 DTD calls URIs CDATA, it should be as legal as "x;acirc;x".
If character entities are not allowable in fragment identifiers for some
reason given by the HTML spec that I have missed, "x x" should still
be legal as a literal in the fragment identifier only, and that fragment
identifier ought to work.
Section 2.1 of RFC 2396 states that the problem of identifying the correct
character encoding for byte sequences defined by %xx in URIs is one left
for a future version of the spec. Given that, allowing (non-numerical)
character entities in all parts of a URL where they are not otherwise
prohibited seems sensible.
Status: UNCONFIRMED → NEW
Ever confirmed: true
This works for me on NT; chris, can you confirm?
Assignee: rickg → petersen
Comment 4•25 years ago
|
||
With the April 12th build, I get the "Â" character appearing the status bar of
the window when a mouseover occurs on link. Clicking on link doesn't go to the
target in the file. I have attached a modified version of the original testcase
with BR elements to seperate the link and target.
Comment 5•25 years ago
|
||
A bit more data here: I fixed the ReduceEntities() function this weekend, and
used this as a testcase: <a href="foo ¢®">. The odd thing is that the
sink is doing exactly the right thing, but when you mouse over the link we see
the extra funky "Â" character. I think it's in the HTML attribute handling of
the content model or in the link handling code.
Comment 9•25 years ago
|
||
1) Can anyone explain a valid reason *why* someone would use or other
entities as the target of a HREF?
2) Can anyone demonstrate that this usage is common (or even present at all) on
the Web, esp. Top 100?
4xp, relnote, FUTURE unless someone demonstrates this is a real problem for real
sites.
Reporter | ||
Comment 10•25 years ago
|
||
This bug report originated from a real web page at
http://www.geraldmweinberg.com. The maintainer changed the offending code when
I suggested that it wasn't compliant with the standard. I don't have any
further data on how common this construct is in general.
Comment 11•24 years ago
|
||
It's indeed html attribute handling. Over to Waterson.
Assignee: harishd → waterson
Assignee | ||
Comment 12•24 years ago
|
||
marking assigned, unless jst wants to look at it first.
Status: NEW → ASSIGNED
Comment 13•24 years ago
|
||
The simplest testcase is already attached. Marking "testcase"
Keywords: testcase
Comment 15•24 years ago
|
||
Composer is also seeing some strangeness with entities in links/anchors. CC
Charley so he can followup when he returns from sabbatical.
Updated•24 years ago
|
Whiteboard: relnote-devel
Comment 16•24 years ago
|
||
Nom. nsbeta1 for backward compatibility with existing (****) HTML content.
Keywords: nsbeta1
Comment 17•24 years ago
|
||
Reassigning QA Contact for all open and unverified bugs previously under Lorca's
care to Gerardo as per phone conversation this morning.
QA Contact: lorca → gerardok
Comment 19•23 years ago
|
||
Both attached test cases are working for me using the Mozilla 0.9.1 build
(Build ID: 2001060713) on Linux. Is this still a bug?
Comment 20•23 years ago
|
||
Build 2001061304 win32 installer sea talkback trunk
1 The "strange" character  no longer appear
2 The "strange" character is now replaced with %A0
3 In the second testcase the link is proven to actually work!
Questions:
Is this correct behavior?
Should the note in:
http://developer.netscape.com/docs/technote/gecko/n6release.html
be corrected?
Comment 21•23 years ago
|
||
I get the same results ("#x x" is now x%A0x", and the link works) using
Mozilla _0.9_ on WinNT.
> Questions:
> Is this correct behavior?
Well, the important thing is that it is not incorrect: "%A0" is (roughly
speaking) a synonym for " ", which is a synonym for " " in HTML's
default character set. More to the point, this is exactly what the HTML 4 spec
says should happen for non-ascii characters:
http://www.w3.org/TR/REC-html40/appendix/notes.html#non-ascii-chars
-- and since character entity references are meant to stand in for single
non-ascii characters, this behaviour is sensible enough.
Unless someone can point to a spec that says otherwise, the difference
(in display or in HREF-matching) between " " and "%A0" for " " is a
difference that makes no difference.
> Should the note in:
> http://developer.netscape.com/docs/technote/gecko/n6release.html
> be corrected?
No, those notes are for Netscape 6, which is finalized; this bug is
as active as ever in all N6 binaries. Before Netscape 6.next comes out,
I'm sure someone will go through all closed "relnote" bugs and prune notes
accordingly.
Quoting from above:
> --- Additional Comments From ekrock 2000-05-20 23:14 ---
> 1) Can anyone explain a valid reason *why* someone would use or other
> entities as the target of a HREF?
Valid, no, but I'd guess cut-n-paste from a heading element ;->
Calling this FIXED.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 22•23 years ago
|
||
Verified on:
build: 2001-07-02-04-Trunk
platform: WinNT
Loaded both the test cases and they load fine.
Status: RESOLVED → VERIFIED
SPAM. HTML Element component deprecated, changing component to Layout. See bug
88132 for details.
Component: HTML Element → Layout
You need to log in
before you can comment on or make changes to this bug.
Description
•