Closed
Bug 172186
Opened 22 years ago
Closed 18 years ago
Don't spell check URLs and email addresses in email
Categories
(Core :: Spelling checker, defect)
Core
Spelling checker
Tracking
()
RESOLVED
FIXED
mozilla1.5beta
People
(Reporter: panemec, Assigned: mscott)
References
Details
(Keywords: fixed1.8.1, verified1.8.1.3)
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826
Spellcheck for URLs is rarely helpful, more often it will identify correct URLs
that are combinations of words.
This is similar to the reasoning why quoted messages should not be checked for
spelling errors.
Reproducible: Always
Steps to Reproduce:
1. Compose new email
2. enter in URL (like http://mozilla.org/ or http://blahx.com/)
3. check spelling
Actual Results:
words in the URL are checked like any other words - Mozilla, org, blahx are all
spelling errors (in the default English dictionary)
Expected Results:
URLs are recognized (like they are in received messages) and not checked for
spelling.
This can be disabled in a control panel setting if this is objectionable.
Comment 2•22 years ago
|
||
Jean-Francois, would know. I spoke briefly with Kin and it is doable but it will
require search the text nodes and possibly breaking them up. Not a uiqkc and
easy task.
Updated•22 years ago
|
OS: Windows XP → All
Hardware: PC → All
To clarify, it should skip any url, not just web addresses. That is, it should
skip:
- any domain name ([host.]domain.tld)
- any e-mail address (user@host.tld)
- any other syntax I'm not thinking of (e.g. 'host.domain.tld:port#', etc.)
At least it could skip anything beginning with http or ftp and anything that
includes an @ or a . (period) inside the word. (Easy for me to say, of course.
;-) )
Comment 4•22 years ago
|
||
Yes, with bug 116242, this should be easy, it should "just work" ;-).
Comment 5•21 years ago
|
||
Yeah ... I even had spell check try to correct "bugzilla-daemon@mozilla.org"
Anyways if I might be so bold as to guess ... but I think part of the solution
should be to modify how the spell-checker parses the word. There is no reason
that something like "mozilla.org" should be parsed as two seperate words ...
there is no "space" between the word
Assignee | ||
Comment 6•21 years ago
|
||
I intend to fix this for thunderbird. Taking.
Assignee: ducarroz → scott
Comment 7•21 years ago
|
||
OK, I'll have to fix the blocking bug during next week, then.
Assignee | ||
Comment 8•21 years ago
|
||
Ben, I don't really need anything as complicated as the interface proposed in
116242. The editor instance already tokenizes the words based on white space.
I just need to a routine I can ask that says "does this string fragment contain
a url". I don't need a range of where it starts/ends, etc. Just a bool.
Input to this routine would look like:
wwww.abc.com,
www.abc.com
http://www.abc.com
www.abc.om;
www.abc.com.
Note the possibility of trailing punctuation characters.
You will never see
John Smith said http://www.abc.com.
Hopefully that makes it easier.
Comment 9•21 years ago
|
||
> I just need to a routine I can ask that says "does this string fragment contain
> a url". I don't need a range of where it starts/ends, etc. Just a bool.
Yes. The interface there is intended for other purposes as well, e.g. for a
context menu item to load the first URL in the selected text, that's why it's
slightly more complicated. However, I don't think that's a problem, because I
also said:
"If an URL has been found, the function returns NS_SUCCESS and fills |start| and
|end| with the indices of the start and end of the first URL found (|end| is the
index of the last charater of the URL, not the char after it). If no URL could
be found, it returns a certain (non-fatal) error code."
So, you'd only have to pass 2 dummy-ints to it, which you can ignore. The "bool"
you need is just NS_SUCCESS of the function. Would that work for you?
If that's really a problem, then I could add another function, but I don't think
it is necessary.
Assignee | ||
Comment 10•21 years ago
|
||
Another thing, this method is going to get called on every word in the document.
So it needs to be extremely fast. A method which just ran through the detection
code and ignored all the code in mozTxtToHTML which trys to calculate the url
boundaries, complete the abbreviated url, etc. would be even better. In theory
there this routine wouldn't require any additional string copies of the passed
in string.
I would rather have something that was really fast but missed 10% of the cases
than something that caught everything but was slower.
ideally we could just switch of the first letter for known protocols
i.e.
switch fragment(0):
case 'w': // look for www.
case 'h': // look for http://
case 'i': // look for imap://
Of course a speedy method like this would have the downside of only recognizing
hard coded schemes and wouldn't be as smart as the existing mozTxtToHtml stuff
for url detection.
Status: NEW → ASSIGNED
Comment 11•21 years ago
|
||
I think you will need the full checks of mozTXTToHTMLConv's URL boundary
recognition functions, because you want to catch the cases with
interpunctionation etc., which is exactly what these functions do (in part).
However, I think we can avoid almost all processing by just searching for a
colon, dot or at-sigh (:.@) in the word. You could do that in the caller, so
you'd never call the function for normal words. (If you want to even avoid
triggering on normal cases of "end of sentence." etc., you could require these
symbols to be in the middle of the word, not the end.)
> I would rather have something that was really fast but missed 10% of the cases
> than something that caught everything but was slower.
Personally, I'd rather have a function that is slower and catches 99.9%, because
the cost of a miss is extremely high in terms of speed, because it opens a
dialog box and asks the user, not?
> ideally we could just switch of the first letter for known protocols
Doesn't work for me@example.com or <http://www.bucksch.org>.
Assignee | ||
Comment 12•21 years ago
|
||
Ben your comment #11 sounds like a good compromise to me. I'll write up the
pre-flighting code in the spell checker if you can give me the routine that will
determine if the passed in string is a url or not.
Let's make this happen =).
Assignee | ||
Comment 13•21 years ago
|
||
Ben, I take it all back. I will need the API to look like what you described.
That is, I need to know the start and end position of the url in the passed in
string.
Assignee | ||
Comment 14•21 years ago
|
||
Ben, I've got the spell checker code standing by. I have everything filled out
except for the call to your routine which determines if it really is a url and
if so, where does the url begin and where does it end.
Hopefully we can nail this bug!
Assignee | ||
Comment 15•21 years ago
|
||
Ben, have you had a chance to work on exposing the API for this via bug 116242
yet? I've got the spell checker code written and ready to go. I just need the
routine to call.
Comment 16•21 years ago
|
||
Not yet, sorry. I am very busy atm. Is this urgent or can it wait e.g. another
week? (Dunno what your plans are, if you want to have it in a certain release
and when that freeze is.)
Assignee | ||
Comment 17•21 years ago
|
||
I wanted to test the spell checking code I wrote in mozEnglishWordUtils to make
sure my code worked. In order to do that, I made a FindURLInPlaintext method of
my own for testing purposes.
So far, things look pretty good with this patch.
Ben, for your version of FindURLInPlaintext, maybe you are ok with the one I
wrote or you can at least use it as a starting point and cleanup / adjust the
arguments and implementation as necessary. I just wrote it up so I had
something to test again. Maybe what I have is good enough or you might decide
to improve on it, feel free!
Assignee | ||
Updated•21 years ago
|
Target Milestone: --- → mozilla1.5beta
Comment 18•21 years ago
|
||
This is mscott's fix, adapted to use the new function created in bug 116242.
Untested for now.
Comment 19•21 years ago
|
||
*** Bug 216455 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 20•21 years ago
|
||
My original testing code had a crasher in it. Just posting an updated patch
which addresses the crash. This patch isn't intended to be checked in.
A variant of Ben's patch is what we would check in. I just haven't had a chance
to review it yet.
Attachment #129842 -
Attachment is obsolete: true
Assignee | ||
Comment 21•21 years ago
|
||
the 8/27 patch is now in the Thunderbird 0.3 build.
Comment 22•21 years ago
|
||
*** Bug 223788 has been marked as a duplicate of this bug. ***
Updated•21 years ago
|
Summary: don't spell check URLs in email → Don't spell check URLs and email addresses in email
Comment 23•21 years ago
|
||
I thik the target milestone mebbe needs to be updated. 1.5 has already shipped.
Comment 24•21 years ago
|
||
I'm not sure if this should be a seperate bug report but it is related. The
spellchecker should not spellcheck signatures. So anything beyond the --[space]
shoudl be ignored by the spellchecker. I got tired of spellchecking my name.
I'm not sure if this creates problem if the signature placement is different to
mine (mine is after the quoted text foa reply message)
Updated•21 years ago
|
Attachment #129881 -
Attachment is obsolete: true
Comment 25•21 years ago
|
||
Comment 26•21 years ago
|
||
Comment on attachment 129881 [details] [diff] [review]
Different, proposed fix (by Ben Bucksch)
This patch is *not* obsolete! It's an alternative to the first patch, and only
the first patch has been obsoleted by the third one.
Attachment #129881 -
Attachment is obsolete: false
Updated•21 years ago
|
Attachment #130509 -
Attachment description: updated patch to fix a crash of the fix that should not be checked in → Original fix, with crash fixed
Updated•21 years ago
|
Attachment #129881 -
Attachment description: Proposed Fix, version 2 → Different, porposed fix (by Ben Bucksch)
Updated•21 years ago
|
Attachment #129881 -
Attachment description: Different, porposed fix (by Ben Bucksch) → Different, proposed fix (by Ben Bucksch)
Comment 27•21 years ago
|
||
*** Bug 235152 has been marked as a duplicate of this bug. ***
Comment 28•21 years ago
|
||
*** Bug 237277 has been marked as a duplicate of this bug. ***
Updated•20 years ago
|
Product: MailNews → Core
Comment 29•19 years ago
|
||
*** Bug 311012 has been marked as a duplicate of this bug. ***
Comment 30•18 years ago
|
||
If this bug refers to the inline spellchecking, it is fixed now. If it is not inline, I'm not sure if it's fixed or not. If not, please reopen and clear the fixed1.8.1 keyword. Fixing it in that case would mean using the new word finding functionality used by the inline spellchecker.
Comment 31•18 years ago
|
||
Verified fixed for 1.8.1.3 using Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.3) Gecko/20070326 Thunderbird/2.0.0.0 Mnenhy/0.7.5.0 ID:2007032620 (Thunderbird 2 RC1), verified after various tests - Email addresses and URL`s are not spell checked.
Keywords: verified1.8.1.3
You need to log in
before you can comment on or make changes to this bug.
Description
•