Closed
Bug 26915
Opened 25 years ago
Closed 25 years ago
Improve performance of TXT->HTML converter for beta1
Categories
(MailNews Core :: MIME, defect, P2)
MailNews Core
MIME
Tracking
(Not tracked)
VERIFIED
FIXED
M14
People
(Reporter: BenB, Assigned: BenB)
Details
(Keywords: perf, Whiteboard: [PDT+])
Attachments
(7 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
message/rfc822
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
Reading a 100K plain text and sending a 100K HTML msg currently roughtly takes
10s on an optimized Linux build.
Mailnews heads for good performance for M14/beta1. Per Phil's request, I'll try
to bring the numbers down, but no promises about quantity.
Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Priority: P3 → P2
Target Milestone: M14
Assignee | ||
Comment 1•25 years ago
|
||
Update: time for a normal (normal english text) 100K msg is 1.5 - 2 s, which I
argue is good, considering that we do 3 times as much as 4.x did (URLs + structs
+ glyphs).
2 other test 100K msgs of mine, one with C++ source code and one with many URLs,
don't look as good: 5s and 9s.
I'm running out of ideas, but I keep looking.
Assignee | ||
Comment 3•25 years ago
|
||
Update:
The 5s/9s may not be caused by the converter: I tried to disable all functions
ans still get numbers in that area. Of course, I'm only guessing because of
27335.
- Still blocked by 27335.
- Did a bit general "performance audit", i.e. looking at the code for perf bugs
and possibilities to improve perf.
- I'm loosing a lot of time, because the tree is in an inferior state.
Currently, I'm not even able to start up mail, because both -mail and menus are
broken.
- I spent approx. one work week (40hs) on this bug, half of it for
organisational issues like building (see above) and communication, the other
half for hacking/testing.
Because the tree closes soon, I'll attach my current state of work, although I
don't know much about the effectivity. Check it in or not.
Assignee | ||
Comment 4•25 years ago
|
||
Have some kind of workaround for performance measurement: ScanHTML is fed with
the full msg text, so I can indirectly measure the converter speed without
download/charset conv/display overhead.
I also have a somewhat working Mailnews now, doing last tests. Patch will follow
soon.
Assignee | ||
Comment 5•25 years ago
|
||
Adding 22960 as dependency. mscott, remove it, if you disagree.
Adding beta1 for checkin approval.
The patch is against the current tree and includes the fix for 21203.
Some performance numbers:
Linux debug build from 2000-02-15 22:00. It's the time spent in ScanHTML, which
includes ScanTXT, but *not* the gylph substitution (which isn't done on mail
send), so add approx. 50%. That are the most exact numbers I have due to 27335.
Code version RightRewrite AllPerfWork
Mail
Size-matters 16s 1.1s
English 1.7s 1.1s
URLs 1.6s 1.2s
Code 14s 5.5s
RightRewrite Tree version + fix for 21203
AllPerfWork The (soon to be) attached patch
Size-Matters chofmann's statistics
English Plain english text (copied from Communicator help)
URLs Mixed text containing 700 URLs
Code mozTXTToHTMLConv (C++) source code 3 times
All mails are ~100K in size. Processing time should be directly proportional to
the size now, i.e. a 1K normal msg needs ~10ms, a 1MB msg ~11s.
I suggest to check the patch in for M14 (after careful review, of course).
Assignee | ||
Comment 6•25 years ago
|
||
Assignee | ||
Updated•25 years ago
|
Whiteboard: [PDT+] → [PDT+] Patch available, waiting for review and check in
Comment 9•25 years ago
|
||
Ben I tried to apply your patch but it continually failed. I'm not sure how you
generated it, but can you do the following in that directory for me:
cvs diff -c > converterPatch.diff
Then attach this file to the bug (or email it to me).
For some reason patch isn't working on the patch that is currently here. Thanks!
Assignee | ||
Comment 10•25 years ago
|
||
Sorry, I diffed against a local file. Patches now also include the changed
comments in mozITXTToHTMLConv.idl and mozTXTToHTMLConv.h and a minor code change
in the latter, that I missed earlier.
Assignee | ||
Comment 11•25 years ago
|
||
Assignee | ||
Comment 12•25 years ago
|
||
Assignee | ||
Comment 13•25 years ago
|
||
Comment 14•25 years ago
|
||
Ben, are you waiting for a code review?
Assignee | ||
Comment 15•25 years ago
|
||
Yes
Assignee | ||
Comment 16•25 years ago
|
||
mscott? If I understtod it right, M14 will be released today...
Keywords: patch
Comment 17•25 years ago
|
||
Yes, this won't make M14. Maybe beta1 though.
Comment 18•25 years ago
|
||
I actually attached a patch to Bug #21203 which contains Ben's changes in this
bug + some more stuff that I made. That patch would fix the performance
characteristics discussed in this bug too.
Assignee | ||
Comment 19•25 years ago
|
||
Narrowing scope of bug.
Summary: Improve performance of TXT->HTML converter → Improve performance of TXT->HTML converter for beta1
Assignee | ||
Comment 20•25 years ago
|
||
Al[My]PerfWork mscott
size-matters 1.1s 0.37s
large text (english) 1.1s 1.5s
large text 100K (clip) 1.2s 3.2s
large msg (code) 5.5s 1.5s
mscott With mscott's patch recently attached to bug #21203.
OtherIdentifers See above.
Assignee | ||
Comment 21•25 years ago
|
||
Sorry, wring identifers. Next try:
All[My]PerfWork mscott
Size-Matters 1.1s 0.37s
English 1.1s 1.5s
URLs 1.2s 3.2s
Code 5.5s 1.5s
Comment 22•25 years ago
|
||
Ben, can you send me more info about the tests you used under URL and English so
I can analyze why with my code mixed with yours we are slower? Thanks.
Assignee | ||
Comment 23•25 years ago
|
||
Assignee | ||
Comment 24•25 years ago
|
||
mscott,
I attached one of the test msgs. Code and Size-matters shouldn't be a problem.
URLs however is my "clipboard" (20 times), so I can't attach it. I think, the
most interesting thing about it is one line (20 times of couse) containing 30
email addresses comma-delimited.
To run the test, just uncomment |#define DEBUG_BenB_Pref| at the top. You'll see
something like "ScanHTML time: xxxx" on the console at the send operation, with
xxxx in ms.
Again, I don't think, forcing string pointers into this class is the right way.
If we need to improve performance further, we could e.g. pass offsets into the
string to the functions instead of |mozRightSubstring|s. Or fix bug #23327,
which would virtually all of GlyphHit and helper functions (and maybe even
StructPhraseHit) out to regexps.
For the speed decrease of your patch: It might be caused by your rewrite of
(Un)Escape (although I didn't check that). You now walk at least two more times
through the whole msg.
Comment 25•25 years ago
|
||
I poked a couple more things and ran Ben's English test. Keep in mind that my
machine is faster than Ben's so these numbers don't match the numbers he added.
Ben, if you have time to do another pass with these tests on this new patch, I'd
greatly appreciate it.
My 1st Patch This new patch
English: 680ms 596ms
(By English test, I meant the 1st message in the attachment you added to this
bug titled "test: bold, italic").
Comment 26•25 years ago
|
||
Assignee | ||
Comment 27•25 years ago
|
||
Ops, sorry, I meant the last, very long (100K) msg, titled "large text". I
didn't notice, that there are other msgs in the file.
I'm getting 34ms for the msg you tested ("Re: test: *bold*, _italic_") and your
first patch (I'll test the newer one later), so there must be something wrong.
BTW: I optimized my function for english (or any other western language) text,
because I thought and still think, this would be the vastly most important case.
That's the reason, why my version is slower for source code.
Comment 28•25 years ago
|
||
Ben, I wonder if the differences in the values we are each generating are
because of how the performance time is measured. I was using the following line
you had in ScanHTML:
printf("ScanHTML time: %d ms\n", PR_IntervalNow() - parsing_start);
However PR_IntervalNow returns a value which isn't in milliseconds but instead
is a number based on your clock speed. So the numbers would vary widely between
different machines.
I think you need to convert that number into milliseconds by doing the following:
printf("ScanHTML time: %d ms\n", PR_IntervalToMilliseconds(PR_IntervalNow()
- parsing_start));
Comment 29•25 years ago
|
||
With the above mentioned change to properly convert the interval into
milliseconds, I now see
10ms with my latest patch to display the text, bold, etc. message
2.1s to display the large english text message (95KB)
Comment 30•25 years ago
|
||
Although the changes are very numerous, I'm convinced that this is a great
checkin, and we should just land it. Lots of N-squared algorithms removed...
great results... and it looks like it was done with the lowest possible risk,
and a tremendous gain. When can we see this landed? Please update the status
whiteboard with a landing date.
Thanks,
Jim
Comment 31•25 years ago
|
||
The changes I made to Unescape to do it in place was way slow. I rewrote it.
With the latest changes and properly converting the interval to milliseconds on
my machine, here are the numbers I'm now seeing:
My Patch Ben's Original Patch
Size Matters msg.(140K) 300ms 600ms
Test: bold, italic msg. 9ms 41ms
the url msg 1ms 3ms
English msg (95k) 1.9s!! 829ms
I'm looking into why the english message is slow for me right now. I'm attaching
the set of diffs I used to generate these numbers.
Comment 32•25 years ago
|
||
Comment 33•25 years ago
|
||
I landed ben's patch + my own tonight yielding the numbers listed in the table
above. This has us in the ball park for beta1 so I'm going to mark this fixed.
Ben if you want to work on this more or something after beta we can re-open it
and move it to M15 or something.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 34•25 years ago
|
||
- What this checked in for M14 or the trunk?
- I get unrealistic perf numbers from ScanHTML (< 50 ms) for the test cases URLs
and source code. Might be related to bug #27991, but this bug is too old for
this problem, the latter didn't show up at 2000-03-01 05:21 (see my comment
above). Did somebody change something?
- Filed bug #30232 about further perf work.
Assignee | ||
Comment 35•25 years ago
|
||
> I get unrealistic perf numbers from ScanHTML
With only my patch, the numbers are ok.
Comment 36•25 years ago
|
||
Ben: the stuff I checked in last night only went into the beta1 tip and not the
M14 branch.
Assignee | ||
Comment 37•25 years ago
|
||
Who is the new owner of mozTXTToHTMLConv?
Keywords: patch
Whiteboard: [PDT+] Patch available, waiting for review and check in → [PDT+]
Assignee | ||
Comment 38•25 years ago
|
||
BTW: Please, don't load it on Rich (unless he really wants to take it).
Comment 39•25 years ago
|
||
suresh - what are your recent timings on this?
Comment 40•25 years ago
|
||
Lisa - I do not have numbers for this. I think this bug is about just the time
spent in ScanHTML.
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•