Closed Bug 37031 Opened 27 years ago Closed 7 years ago

searching message body yields false positives because base64 encoded binary attachments are treated as plaintext

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 59.0

People

(Reporter: lord, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: testcase)

Attachments

(1 file, 1 obsolete file)

(This bug imported from BugSplat, Netscape's internal bugsystem. It was known there as bug #70793 http://scopus.netscape.com/bugsplat/show_bug.cgi?id=70793 Imported into Bugzilla on 04/24/00 17:19) When searching for "WFMT" in my sent box, I get lots of false hits. When I search the (false hit) e-mail, that string does not appear. When I View Source and look for that string, it does appear. Searching for a string should look in the clear text, not the base64 encoded ciphertext. -Bob ------- Additional Comments From jfriend 06/09/97 14:32 ------- And it should know about S/MIME and it should know about charsets and it should know about ... Uhhh, basically it should parse a MIME message. Latering... ------- Additional Comments From phil 06/30/97 17:35 ------- Not a 4.02 candidate ------- Additional Comments From phil 11/25/97 08:57 ------- I'm going to reopen this bug so we can reconsider this whole class of problems for 5.0. I think it would be pretty high cost to fix, but maybe someone will come up with an easy way to use the MIME parser when searching local messages. ------- Additional Comments From mscott 02/12/98 11:35 ------- *** Bug 106269 has been marked as a duplicate of this bug. *** ------- Additional Comments From momoi 02/12/98 12:21 ------- Since the bug I filed (#106269) has been tagged as a duplicate of this bug, and yet what I request as an enhancement is somewhat more specific than the original bug description, let me write in my suggestion here for basic level coverage. 1. Users expect to be able to search viewable texts. Thus, whether or not Base64 encoded, we should be able to search through text/plain or text/html type of files. 2. We should follow a simple method for dealing with different encodings. Once decoded, Base64'ed text files should be searched following the document encoding setting. 2 means that if the folder encoding is set to a Japanese one, search will be done with the assumption that the user wants to look for a Japanese string and not European strings with accents in it. And so on. (Actually this functionality is already available for cleartext in Dogbert 4.05 and later.) I believe this will produce highly usable results. At the next level of coverage, we might want to pay attention to (among others): A. charset parameter in the mail or attachment headers -- to speed up search B. search in S/MIME'ed messages ------- Additional Comments From nhotta 02/12/98 13:20 ------- I think, A. charset parameter in the mail or attachment headers -- is important for Japanese. Because mail body charset and attachment charset is usually different. ------- Additional Comments From phil 02/16/98 10:02 ------- Of course, this is lower priority than the AB work. ------- Additional Comments From jfriend 03/08/98 17:30 ------- Mass change TFV to 4.5 from 5.0. ------- Additional Comments From jfriend 03/31/98 10:13 ------- Moving TFV from 5.0 to 4.5. ------- Additional Comments From laurel 05/01/98 12:07 ------- QA assigned to field. ------- Additional Comments From mscott 07/09/98 18:01 ------- (phil) not going to get to this in Nova. Marking later. ------- Additional Comments From lord 07/09/98 22:35 ------- How does RESOLVED LATER work? Shouldn't we at least assign to 5.0 so it stays on the radar? ------- Additional Comments From phil 07/10/98 10:20 ------- We regularly reopen latered bugs from previous projects. That's how this bug got open for 4.5 :-) I'd prefer not to have open 5.0 bugs that we don't plan to fix anytime soon. Having open 5.0 bugs that we plan to really work on is fine, but I don't think this falls into that category. ------- Additional Comments From marek Mar-01-2000 17:22 ------- mass-changing all Communicator and Communicator Pro bugs filed for Comm. and Nav. versions RESOLVED LATER and REMIND with TFV of 4.5 or greater. If you feel this was done in error, please reopen the bug.
This bug should be re-opened for Mozilla and tested with IMAP and POP/Local search when these features are implemented.
Status: RESOLVED → UNCONFIRMED
QA Contact: pmock → momoi
Status: UNCONFIRMED → NEW
Ever confirmed: true
Changing repka to chrisk in cc list.
momoi, does Mozilla actually exhibit this bug in a current release?
steve, search is not implemented yet but when it is, we should test it with B64-encoded text. Bob's original problem was that in 4.x, we matched the search key string with the encoded data itself. In Mozilla, we should not be doing search on encoded data. In this sense, we probably should use this bug as a reminder that search work OK with Base64-encoded data. It is expected that this problem should not occur with Mozilla -- I believe whoever is implementing search on the Mail side already thought about this kind of problem.
I would really appreciate it if you would refrain from filing bugs against features that have not been implemented yet. If the feature gets implemented improperly, then filing a bug makes sense. Filing a bug just in case the feature will not be implemented properly does not make sense. A really good way to handle this is to read our QA plan and make sure it includes tests you feel are required.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → INVALID
Re-opening, resolving it as remind and assigning to momoi for tracking.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Keep as a reminder
Assignee: mscott → momoi
Status: REOPENED → NEW
Resolved as remind.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → REMIND
I'm going to re-open this bug and assing it to taka. taka, when you are fixing Bug 43221, are you going to take care of B64-encoded body text or not? You can either mark this bug as Wontfix if you're not going to allow search through B64-encoded body text or keep this bug open to check when the other bug is fixed if you plan to coverthe B64 case. Re-opening and assigning it to taka -- will do that next. QA contact to ji.
Status: RESOLVED → REOPENED
QA Contact: momoi → ji
Resolution: REMIND → ---
NOw that it is re-opened, it can be assigned to taka.
Assignee: momoi → taka
Status: REOPENED → NEW
As jfriend said in the very beginning, the current implementation doesn't recognize MIME multi-part message at all. I recommend to change the summary so that it is more applicable to explain what's really happening. I know what to do, but it's time consuming work, not a simple bug fix.
.
Assignee: taka → ducarroz
QA Contact: ji → stephend
Product: MailNews → Core
*** Bug 98141 has been marked as a duplicate of this bug. ***
*** Bug 298070 has been marked as a duplicate of this bug. ***
not sure if this bug should stay as core:mailnews:mime or should be moved to core:mailnews:search behavior verified on Mozilla Thunderbird Stable: version 1.5.0.7 (20060909) and Nightly: version 3 alpha 1 (20060926) If you search after the body (either using quickserch "Entire message", or Ctrl+Shift+F for Body + Contains), it searches within the plaintext encoding of images (and other non-text attachments). This results in false positives. Reproducible: Always Steps to Reproduce: 1. Receive an attachment 2 [review]. Search a string contained in the plaintext encoding of the attachment Actual Results: the resulting list displays text contained in the attachments plaintext representation, even if the attachment is not a text file, and even if the main mail message does not contain the keyword. Expected Results: List only results that contain the keyword in the message body I am limiting my expectations in this Bug to: what SHOULD NOT happen. Searching should not return results when keywords are found in plaintext encoding of non-text attachments. (updated summary: from "searching looks through base64 encoding, not the cleartext" to "searching message body should not treat base64 encoded binary attachments as plaintext" default behavior searches message body AND the attachment, so the "not the cleartext" part of the previous summary was incorrect)
Summary: searching looks through base64 encoding, not the cleartext → searching message body should not treat base64 encoded binary attachments as plaintext
Attached file an email that has an attachment producing the bug. (obsolete) (deleted) —
.EML file (possibly not useful until after something like bug 171907 is fixed) some keywords that produce false positives: BaNg ABBA GAWD BBQ kIA logo these keywords are found in the base64 encoded attachment text, but not in the plaintext message body.
an mbox file containing an email (attachment 240168 [details]) that has an attachment producing the bug. can be imported by placing in the proper mail directory of a profile.
*** Bug 136055 has been marked as a duplicate of this bug. ***
behavior still exists in Thunderbird 2.0 beta 2 version 2 beta 2 (20070116) The previous "steps to reproduce" may be a bit too ambiguous. Below is an alternate set of steps, with more information about finding a string that will trigger a false positive. Alternate Steps to Reproduce: 1) Receive any email that has a binary attachment (any non-plaintext file). 2) Select the message in the list-pane. view source. select and copy an arbitrary string within the base64-encoded representation of the attachment. Close the source window. 3) In the list view, do a quicksearch (or ctrl+f search) for "Entire Message". 4) paste the arbitrary string that was copied earlier. Actual Results: The search results include the email with the attachment, even if the body of the message does not contain the arbitrary string. i.e. the arbitrary string was only contained in the base64 encoding of the attachment. Expected Results: Search results should only show messages that contain the keyword in the message body.
OS: Windows 95 → Windows XP
Summary: searching message body should not treat base64 encoded binary attachments as plaintext → searching message body yields false positives because base64 encoded binary attachments are treated as plaintext
See also the very similar bug 132340.
So *that's* why my searches always give me a bunch of irrelevant ****. Confirmed to be present in Thunderbird version 2.0.0.6 (20070728)
Searching messages by [BODY] [contains] "word" do not find any messages with requested word(s) if messages are encoded as Base64, even when there are a lot of messages in mailbox with requested "word" visible via mail preview. Because of it, the tool "Message Filters..." also does not work correctly with [BODY] [contains] criteria. Confirmed to be present in Thunderbird version 2.0.0.6 (20070728) and Thunderbird version 3.0a1pre (2007091203)
Note: tested on messages containing plain English text and text/html encoded with base64
For those who are not following up on bug 132340, my current fix partially resolves this bug. It will still yield these false positives if there are base64-encoded messages after the top level of a multipart/* tree. AFAICT, these cases are going to be limited in practical usage to a multipart/mixed with a message/rfc822 content with a binary attachment (e.g., I forward a message as an attachment with an image, say). This kind of case is not easy to implement without some minor refactoring of current nsMsgBodyHandler, but it is on my todo list...
Assignee: ducarroz → nobody
OS: Windows XP → All
QA Contact: stephend → mime
Hardware: PC → All
I have a message filter that inadvertently is executing its action because certain words were found in the sender's signature, which contained an inline image (a company logo). I would like the "body" "contains" filter to not search the base64 encoding of inline images.
Is this fixed by Bug 132340 – Local body search does not work if the body is encoded as Base64 ?
(In reply to comment #28) > Is this fixed by Bug 132340 – Local body search does not work if the body is > encoded as Base64 ? > Not completely, see comment 25.
12 years and 400,000 bugs later this still exists.... will it ever get fixed?
(In reply to comment #32) > 12 years and 400,000 bugs later this still exists.... will it ever get fixed? This is fixed in the most common and simplest cases thanks to bug 132340; a complete fix is probably waiting on some libmime sanity. Besides, many of those 400k bugs haven't been fixed yet either, and there are older unfixed bugs (bug 11054 and bug 11050 pertain to mailnews, and bug 915 and bug 350 are really old unfixed bugs).
Is the testcase still valid for this bug? Or, was it fixed by bug 132340?
Flags: blocking-thunderbird3?
For the testcase, I don't see the bug anymore in TB3. bug 132340 was marked as fixed on 12/1, so I used that date when downloading nightlies. bug behavior seen -------------------- 2007-11-30 http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2007/11/2007-11-30-03-trunk/thunderbird-3.0a1pre.en-US.win32.zip version 3.0a1pre (2007113003) bug behavior not seen ------------------------ 2007-01-03 http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2007/12/2007-12-01-03-trunk/thunderbird-3.0a1pre.en-US.win32.zip version 3.0a1pre (2007120103) 2007-07-29 http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-trunk/thunderbird-3.0a2pre.en-US.win32.zip Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1a2pre) Gecko/2008072908 Shredder/3.0a2pre
As the person who fixed bug 132340, let me explain the situation: The fix for bug 132340 did not correct all the cases. The fully correct case would rely on libmime, which is unwieldy for correct MIME usages, which is what we want. I have left this bug open as the bug for fully fixing the problem (otherwise it would have been a dupe of bug 132340).
Product: Core → MailNews Core
As a side effect, a filter "body contains X" can't match message encoded in base64.
Comment on attachment 240168 [details] an email that has an attachment producing the bug. The attachment here was fixed by bug 132340 (as mentioned above). That said, the fix that bug 132340 did only parsed an upper-level multipart, assuming that anything below the second level would be textual, so nested multiparts containing base64 would still trigger this bug. The most common case of this happening that I can come up with right now would be a message with a base64-encoded message (like a Word document) forwarded as an attachment.
Attachment #240168 - Attachment is obsolete: true
Given updated status in coment #40, I don't think this blocks.
Flags: blocking-thunderbird3? → blocking-thunderbird3-
Priority: P2 → --
Blocks: 471328
No longer blocks: 471328
(In reply to comment #41) > Given updated status in coment #40, I don't think this blocks. I don't see why not. This is pretty common (this happened to me right now).
Olivier Faurax wrote : > a filter "body contains X" can't match message encoded in base64 Confirmed with Thunderbird 3.1.7. Spammers seem to take advantage of this bug to avoid filtering on body.
Got the same problem: I'm not sure if the message has an attachment (at least I cannot see any "attachment icon"), however it's a kind of "a multipart message" (as per message's header): Content-Type: multipart/alternative; boundary="_000_4ADEFF0A9090607..._" MIME-Version: 1.0 These parts go as follows: --_000_4ADEFF0A9090607..._ Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 RG9i...DQo= --_000_4ADEFF0A9090607..._ Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: base64 PCFE...NCg== --_000_4ADEFF0A9090607..._-- And when I search for the "bottom" string I got this message as there's a line within the html formatting section as follows: P { margin-bottom: 0.21cm } And this is NOT correct. PS Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 - Build ID: 20110414112900
(In reply to comment #47) > And when I search for the "bottom" string I got this message as there's a > line within the html formatting section as follows: > P { margin-bottom: 0.21cm } > > And this is NOT correct. That is a distinct bug from what is being tracked here, since that is being caused by the body filter erroneously finding match in <style> (and presumably <script>) tags which aren't filtered out to the screen, as opposed to the search matching the raw base64 encoding string.
> That is a distinct bug from what is being tracked here, since that is being > caused by the body filter erroneously finding match in <style> (and > presumably <script>) tags which aren't filtered out to the screen, as > opposed to the search matching the raw base64 encoding string. Should I report a new bug then?
@hawran: I'd say yes.
(In reply to Joshua Cranmer [:jcranmer] from comment #48) > since that is being caused by the body filter erroneously finding match in <style> > (and presumably <script>) tags which aren't filtered out to the screen, (snip) Bug 628098(listed in dependency tree for meta bug 519202) seems report for <style> case. I recently reopened the bug, because I thought the bug was wrongly dupe'ed to bug 379988. IIRC, I saw similar bug report or similar report/comment in a bug or a few bug, but I'm not sure.
Same problem on quoted-printable has been found by bug 481616. (See bug 481616 comment #12, please) It looks for me next. base64 quoted-printable (a) body of non-multipart mail fixed by bug 132340 found by bug 481616 (b) body part of multipart mail fixed by bug 132340 found by bug 481616 (c) sub parts of multipart mail bug 37031(this bug) found by bug 481616 Should we separately keep bug 481616 for (c) on quoted-printable from this bug for base64? Or this bug can be enhanced for both (c) of base64 case and (c) of quoted-printable case?
I fixed this in bug 1259534.
Status: NEW → RESOLVED
Closed: 24 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 59.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: