Closed Bug 26915 Opened 25 years ago Closed 25 years ago

Improve performance of TXT->HTML converter for beta1

Categories

(MailNews Core :: MIME, defect, P2)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: BenB, Assigned: BenB)

Details

(Keywords: perf, Whiteboard: [PDT+])

Attachments

(7 files)

Reading a 100K plain text and sending a 100K HTML msg currently roughtly takes 10s on an optimized Linux build. Mailnews heads for good performance for M14/beta1. Per Phil's request, I'll try to bring the numbers down, but no promises about quantity.
Status: NEW → ASSIGNED
Priority: P3 → P2
Target Milestone: M14
Keywords: perf
Depends on: 24945
Update: time for a normal (normal english text) 100K msg is 1.5 - 2 s, which I argue is good, considering that we do 3 times as much as 4.x did (URLs + structs + glyphs). 2 other test 100K msgs of mine, one with C++ source code and one with many URLs, don't look as good: 5s and 9s. I'm running out of ideas, but I keep looking.
Blocked by 27335
Depends on: 27335
Update: The 5s/9s may not be caused by the converter: I tried to disable all functions ans still get numbers in that area. Of course, I'm only guessing because of 27335. - Still blocked by 27335. - Did a bit general "performance audit", i.e. looking at the code for perf bugs and possibilities to improve perf. - I'm loosing a lot of time, because the tree is in an inferior state. Currently, I'm not even able to start up mail, because both -mail and menus are broken. - I spent approx. one work week (40hs) on this bug, half of it for organisational issues like building (see above) and communication, the other half for hacking/testing. Because the tree closes soon, I'll attach my current state of work, although I don't know much about the effectivity. Check it in or not.
No longer depends on: 27335
Have some kind of workaround for performance measurement: ScanHTML is fed with the full msg text, so I can indirectly measure the converter speed without download/charset conv/display overhead. I also have a somewhat working Mailnews now, doing last tests. Patch will follow soon.
No longer depends on: 24945
Adding 22960 as dependency. mscott, remove it, if you disagree. Adding beta1 for checkin approval. The patch is against the current tree and includes the fix for 21203. Some performance numbers: Linux debug build from 2000-02-15 22:00. It's the time spent in ScanHTML, which includes ScanTXT, but *not* the gylph substitution (which isn't done on mail send), so add approx. 50%. That are the most exact numbers I have due to 27335. Code version RightRewrite AllPerfWork Mail Size-matters 16s 1.1s English 1.7s 1.1s URLs 1.6s 1.2s Code 14s 5.5s RightRewrite Tree version + fix for 21203 AllPerfWork The (soon to be) attached patch Size-Matters chofmann's statistics English Plain english text (copied from Communicator help) URLs Mixed text containing 700 URLs Code mozTXTToHTMLConv (C++) source code 3 times All mails are ~100K in size. Processing time should be directly proportional to the size now, i.e. a 1K normal msg needs ~10ms, a 1MB msg ~11s. I suggest to check the patch in for M14 (after careful review, of course).
Blocks: 22960
Keywords: beta1
Attached patch Current state of work (deleted) — Splinter Review
Putting on the PDT+ radar for beta1.
Whiteboard: [PDT+]
Putting on the PDT+ radar for beta1.
Whiteboard: [PDT+] → [PDT+] Patch available, waiting for review and check in
Ben I tried to apply your patch but it continually failed. I'm not sure how you generated it, but can you do the following in that directory for me: cvs diff -c > converterPatch.diff Then attach this file to the bug (or email it to me). For some reason patch isn't working on the patch that is currently here. Thanks!
Sorry, I diffed against a local file. Patches now also include the changed comments in mozITXTToHTMLConv.idl and mozTXTToHTMLConv.h and a minor code change in the latter, that I missed earlier.
Ben, are you waiting for a code review?
Yes
mscott? If I understtod it right, M14 will be released today...
Keywords: patch
Yes, this won't make M14. Maybe beta1 though.
I actually attached a patch to Bug #21203 which contains Ben's changes in this bug + some more stuff that I made. That patch would fix the performance characteristics discussed in this bug too.
Narrowing scope of bug.
Summary: Improve performance of TXT->HTML converter → Improve performance of TXT->HTML converter for beta1
Al[My]PerfWork mscott size-matters 1.1s 0.37s large text (english) 1.1s 1.5s large text 100K (clip) 1.2s 3.2s large msg (code) 5.5s 1.5s mscott With mscott's patch recently attached to bug #21203. OtherIdentifers See above.
Sorry, wring identifers. Next try: All[My]PerfWork mscott Size-Matters 1.1s 0.37s English 1.1s 1.5s URLs 1.2s 3.2s Code 5.5s 1.5s
Ben, can you send me more info about the tests you used under URL and English so I can analyze why with my code mixed with yours we are slower? Thanks.
Attached file Test msg "English" (deleted) —
mscott, I attached one of the test msgs. Code and Size-matters shouldn't be a problem. URLs however is my "clipboard" (20 times), so I can't attach it. I think, the most interesting thing about it is one line (20 times of couse) containing 30 email addresses comma-delimited. To run the test, just uncomment |#define DEBUG_BenB_Pref| at the top. You'll see something like "ScanHTML time: xxxx" on the console at the send operation, with xxxx in ms. Again, I don't think, forcing string pointers into this class is the right way. If we need to improve performance further, we could e.g. pass offsets into the string to the functions instead of |mozRightSubstring|s. Or fix bug #23327, which would virtually all of GlyphHit and helper functions (and maybe even StructPhraseHit) out to regexps. For the speed decrease of your patch: It might be caused by your rewrite of (Un)Escape (although I didn't check that). You now walk at least two more times through the whole msg.
I poked a couple more things and ran Ben's English test. Keep in mind that my machine is faster than Ben's so these numbers don't match the numbers he added. Ben, if you have time to do another pass with these tests on this new patch, I'd greatly appreciate it. My 1st Patch This new patch English: 680ms 596ms (By English test, I meant the 1st message in the attachment you added to this bug titled "test: bold, italic").
Ops, sorry, I meant the last, very long (100K) msg, titled "large text". I didn't notice, that there are other msgs in the file. I'm getting 34ms for the msg you tested ("Re: test: *bold*, _italic_") and your first patch (I'll test the newer one later), so there must be something wrong. BTW: I optimized my function for english (or any other western language) text, because I thought and still think, this would be the vastly most important case. That's the reason, why my version is slower for source code.
Ben, I wonder if the differences in the values we are each generating are because of how the performance time is measured. I was using the following line you had in ScanHTML: printf("ScanHTML time: %d ms\n", PR_IntervalNow() - parsing_start); However PR_IntervalNow returns a value which isn't in milliseconds but instead is a number based on your clock speed. So the numbers would vary widely between different machines. I think you need to convert that number into milliseconds by doing the following: printf("ScanHTML time: %d ms\n", PR_IntervalToMilliseconds(PR_IntervalNow() - parsing_start));
With the above mentioned change to properly convert the interval into milliseconds, I now see 10ms with my latest patch to display the text, bold, etc. message 2.1s to display the large english text message (95KB)
Although the changes are very numerous, I'm convinced that this is a great checkin, and we should just land it. Lots of N-squared algorithms removed... great results... and it looks like it was done with the lowest possible risk, and a tremendous gain. When can we see this landed? Please update the status whiteboard with a landing date. Thanks, Jim
The changes I made to Unescape to do it in place was way slow. I rewrote it. With the latest changes and properly converting the interval to milliseconds on my machine, here are the numbers I'm now seeing: My Patch Ben's Original Patch Size Matters msg.(140K) 300ms 600ms Test: bold, italic msg. 9ms 41ms the url msg 1ms 3ms English msg (95k) 1.9s!! 829ms I'm looking into why the english message is slow for me right now. I'm attaching the set of diffs I used to generate these numbers.
Attached patch round 3. (deleted) — Splinter Review
I landed ben's patch + my own tonight yielding the numbers listed in the table above. This has us in the ball park for beta1 so I'm going to mark this fixed. Ben if you want to work on this more or something after beta we can re-open it and move it to M15 or something.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
- What this checked in for M14 or the trunk? - I get unrealistic perf numbers from ScanHTML (< 50 ms) for the test cases URLs and source code. Might be related to bug #27991, but this bug is too old for this problem, the latter didn't show up at 2000-03-01 05:21 (see my comment above). Did somebody change something? - Filed bug #30232 about further perf work.
> I get unrealistic perf numbers from ScanHTML With only my patch, the numbers are ok.
Ben: the stuff I checked in last night only went into the beta1 tip and not the M14 branch.
Who is the new owner of mozTXTToHTMLConv?
Keywords: patch
Whiteboard: [PDT+] Patch available, waiting for review and check in → [PDT+]
BTW: Please, don't load it on Rich (unless he really wants to take it).
suresh - what are your recent timings on this?
Lisa - I do not have numbers for this. I think this bug is about just the time spent in ScanHTML.
Per a mail from Ben...ok to mark Verified
Status: RESOLVED → VERIFIED
No longer blocks: 22960
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: