Closed Bug 172186 Opened 22 years ago Closed 18 years ago

Don't spell check URLs and email addresses in email

Categories

(Core :: Spelling checker, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.5beta

People

(Reporter: panemec, Assigned: mscott)

References

Details

(Keywords: fixed1.8.1, verified1.8.1.3)

Attachments

(2 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826 Spellcheck for URLs is rarely helpful, more often it will identify correct URLs that are combinations of words. This is similar to the reasoning why quoted messages should not be checked for spelling errors. Reproducible: Always Steps to Reproduce: 1. Compose new email 2. enter in URL (like http://mozilla.org/ or http://blahx.com/) 3. check spelling Actual Results: words in the URL are checked like any other words - Mozilla, org, blahx are all spelling errors (in the default English dictionary) Expected Results: URLs are recognized (like they are in received messages) and not checked for spelling. This can be disabled in a control panel setting if this is objectionable.
Rod--is this a duplicate?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Jean-Francois, would know. I spoke briefly with Kin and it is doable but it will require search the text nodes and possibly breaking them up. Not a uiqkc and easy task.
OS: Windows XP → All
Hardware: PC → All
Blocks: 119232
Depends on: 116242
To clarify, it should skip any url, not just web addresses. That is, it should skip: - any domain name ([host.]domain.tld) - any e-mail address (user@host.tld) - any other syntax I'm not thinking of (e.g. 'host.domain.tld:port#', etc.) At least it could skip anything beginning with http or ftp and anything that includes an @ or a . (period) inside the word. (Easy for me to say, of course. ;-) )
Yes, with bug 116242, this should be easy, it should "just work" ;-).
Yeah ... I even had spell check try to correct "bugzilla-daemon@mozilla.org" Anyways if I might be so bold as to guess ... but I think part of the solution should be to modify how the spell-checker parses the word. There is no reason that something like "mozilla.org" should be parsed as two seperate words ... there is no "space" between the word
I intend to fix this for thunderbird. Taking.
Assignee: ducarroz → scott
OK, I'll have to fix the blocking bug during next week, then.
Ben, I don't really need anything as complicated as the interface proposed in 116242. The editor instance already tokenizes the words based on white space. I just need to a routine I can ask that says "does this string fragment contain a url". I don't need a range of where it starts/ends, etc. Just a bool. Input to this routine would look like: wwww.abc.com, www.abc.com http://www.abc.com www.abc.om; www.abc.com. Note the possibility of trailing punctuation characters. You will never see John Smith said http://www.abc.com. Hopefully that makes it easier.
> I just need to a routine I can ask that says "does this string fragment contain > a url". I don't need a range of where it starts/ends, etc. Just a bool. Yes. The interface there is intended for other purposes as well, e.g. for a context menu item to load the first URL in the selected text, that's why it's slightly more complicated. However, I don't think that's a problem, because I also said: "If an URL has been found, the function returns NS_SUCCESS and fills |start| and |end| with the indices of the start and end of the first URL found (|end| is the index of the last charater of the URL, not the char after it). If no URL could be found, it returns a certain (non-fatal) error code." So, you'd only have to pass 2 dummy-ints to it, which you can ignore. The "bool" you need is just NS_SUCCESS of the function. Would that work for you? If that's really a problem, then I could add another function, but I don't think it is necessary.
Another thing, this method is going to get called on every word in the document. So it needs to be extremely fast. A method which just ran through the detection code and ignored all the code in mozTxtToHTML which trys to calculate the url boundaries, complete the abbreviated url, etc. would be even better. In theory there this routine wouldn't require any additional string copies of the passed in string. I would rather have something that was really fast but missed 10% of the cases than something that caught everything but was slower. ideally we could just switch of the first letter for known protocols i.e. switch fragment(0): case 'w': // look for www. case 'h': // look for http:// case 'i': // look for imap:// Of course a speedy method like this would have the downside of only recognizing hard coded schemes and wouldn't be as smart as the existing mozTxtToHtml stuff for url detection.
Status: NEW → ASSIGNED
I think you will need the full checks of mozTXTToHTMLConv's URL boundary recognition functions, because you want to catch the cases with interpunctionation etc., which is exactly what these functions do (in part). However, I think we can avoid almost all processing by just searching for a colon, dot or at-sigh (:.@) in the word. You could do that in the caller, so you'd never call the function for normal words. (If you want to even avoid triggering on normal cases of "end of sentence." etc., you could require these symbols to be in the middle of the word, not the end.) > I would rather have something that was really fast but missed 10% of the cases > than something that caught everything but was slower. Personally, I'd rather have a function that is slower and catches 99.9%, because the cost of a miss is extremely high in terms of speed, because it opens a dialog box and asks the user, not? > ideally we could just switch of the first letter for known protocols Doesn't work for me@example.com or <http://www.bucksch.org>.
Ben your comment #11 sounds like a good compromise to me. I'll write up the pre-flighting code in the spell checker if you can give me the routine that will determine if the passed in string is a url or not. Let's make this happen =).
Ben, I take it all back. I will need the API to look like what you described. That is, I need to know the start and end position of the url in the passed in string.
Ben, I've got the spell checker code standing by. I have everything filled out except for the call to your routine which determines if it really is a url and if so, where does the url begin and where does it end. Hopefully we can nail this bug!
Ben, have you had a chance to work on exposing the API for this via bug 116242 yet? I've got the spell checker code written and ready to go. I just need the routine to call.
Not yet, sorry. I am very busy atm. Is this urgent or can it wait e.g. another week? (Dunno what your plans are, if you want to have it in a certain release and when that freeze is.)
Attached patch possible fix (obsolete) (deleted) — Splinter Review
I wanted to test the spell checking code I wrote in mozEnglishWordUtils to make sure my code worked. In order to do that, I made a FindURLInPlaintext method of my own for testing purposes. So far, things look pretty good with this patch. Ben, for your version of FindURLInPlaintext, maybe you are ok with the one I wrote or you can at least use it as a starting point and cleanup / adjust the arguments and implementation as necessary. I just wrote it up so I had something to test again. Maybe what I have is good enough or you might decide to improve on it, feel free!
Target Milestone: --- → mozilla1.5beta
This is mscott's fix, adapted to use the new function created in bug 116242. Untested for now.
*** Bug 216455 has been marked as a duplicate of this bug. ***
Attached patch Original fix, with crash fixed (deleted) — Splinter Review
My original testing code had a crasher in it. Just posting an updated patch which addresses the crash. This patch isn't intended to be checked in. A variant of Ben's patch is what we would check in. I just haven't had a chance to review it yet.
Attachment #129842 - Attachment is obsolete: true
Blocks: 219336
the 8/27 patch is now in the Thunderbird 0.3 build.
*** Bug 223788 has been marked as a duplicate of this bug. ***
Summary: don't spell check URLs in email → Don't spell check URLs and email addresses in email
I thik the target milestone mebbe needs to be updated. 1.5 has already shipped.
I'm not sure if this should be a seperate bug report but it is related. The spellchecker should not spellcheck signatures. So anything beyond the --[space] shoudl be ignored by the spellchecker. I got tired of spellchecking my name. I'm not sure if this creates problem if the signature placement is different to mine (mine is after the quoted text foa reply message)
Attachment #129881 - Attachment is obsolete: true
(In reply to comment #24) > The spellchecker should not spellcheck signatures. Bug 223440
Comment on attachment 129881 [details] [diff] [review] Different, proposed fix (by Ben Bucksch) This patch is *not* obsolete! It's an alternative to the first patch, and only the first patch has been obsoleted by the third one.
Attachment #129881 - Attachment is obsolete: false
Attachment #130509 - Attachment description: updated patch to fix a crash of the fix that should not be checked in → Original fix, with crash fixed
Attachment #129881 - Attachment description: Proposed Fix, version 2 → Different, porposed fix (by Ben Bucksch)
Attachment #129881 - Attachment description: Different, porposed fix (by Ben Bucksch) → Different, proposed fix (by Ben Bucksch)
*** Bug 235152 has been marked as a duplicate of this bug. ***
*** Bug 237277 has been marked as a duplicate of this bug. ***
QA Contact: esther → core.spelling-checker
Product: MailNews → Core
Blocks: 234936
Component: MailNews: Composition → Spelling checker
*** Bug 311012 has been marked as a duplicate of this bug. ***
No longer blocks: 234936
If this bug refers to the inline spellchecking, it is fixed now. If it is not inline, I'm not sure if it's fixed or not. If not, please reopen and clear the fixed1.8.1 keyword. Fixing it in that case would mean using the new word finding functionality used by the inline spellchecker.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Verified fixed for 1.8.1.3 using Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.3) Gecko/20070326 Thunderbird/2.0.0.0 Mnenhy/0.7.5.0 ID:2007032620 (Thunderbird 2 RC1), verified after various tests - Email addresses and URL`s are not spell checked.
Keywords: verified1.8.1.3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: