Opened 3 years ago

Closed 3 years ago

#25103 closed enhancement (fixed)

Improve webstats performance

Reported by: iwakeh Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: #25100 Points:
Reviewer: Sponsor:

Description

See parent for background.

Child Tickets

Change History (15)

comment:1 Changed 3 years ago by iwakeh

Status: newneeds_review

Please review the last commit on the current release branch. The commit avoid preparing the date string prematurely.

comment:2 Changed 3 years ago by karsten

Status: needs_reviewmerge_ready

Commit f16477f looks good. Not merging to master yet, in case there are even more/better changes to improve webstats performance. But if not, this one is good to go.

comment:3 Changed 3 years ago by iwakeh

Please find two more commits for review. The second one needs an adaption in CollecTor; results will be summarized on ticket #25100.

comment:4 Changed 3 years ago by karsten

Status: merge_readyneeds_revision

Commit 841e667 should print 0 as 0, not as -. Basically, it should be this.size >= 0 in two places. Can you prepare a fixup commit for that?

Commit f6cfd4b looks fine.

comment:5 in reply to:  4 Changed 3 years ago by iwakeh

Status: needs_revisionneeds_review

Replying to karsten:

Commit 841e667 should print 0 as 0, not as -. Basically, it should be this.size >= 0 in two places. Can you prepare a fixup commit for that?

Good catch! Please review another commit fixing this.

comment:6 Changed 3 years ago by karsten

Status: needs_reviewmerge_ready

Fixup commit 0a44786 looks good!

comment:7 Changed 3 years ago by iwakeh

Status: merge_readyneeds_review

Please review this commit (based on the branch used above).

This optimization of the log lines memory footprint will also benefit other API users when processing log descriptors. 'WebServerAccessLogLine' collects different items and uses references and is applied for fields that cannot become an enum type, e.g., there are usually only 'HTTP/1.0' and 'HTTP/1.1' as protocols, but we chose to also accommodate/allow others to be valid, too.

I still have tests running importing CollecTor webstats logs. An example of the gain these changes offer: A heap dump of approx. 7.5G containing 73*10^6 WebServerAccessLogLines only contains about
51*10^3 String instances and less than 10^4 LocalDate instances. As the heap is from importing logs with CollecTor most of the LocalDate instances are parts of paths and the dateList contains just the 327 dates of the days available for import.

I'll post more regarding CollecTor's webstat module on #25100.

There is also another commit providing hashCode and equals implementations for future use, but they not used currently.

Last edited 3 years ago by iwakeh (previous) (diff)

comment:8 Changed 3 years ago by karsten

Interesting approach. I'd probably have stored the int value rather than the T reference. But I see how your approach should work as well.

One thing, though: you're never using the int value returned by numFromList other than for immediately looking up the reference in the list. Why not use a HashSet instead and change numFromList to fromSet that returns the T reference directly? Just in case there are more than just a few thousand entries in that list. And even with those, the set might be faster.

comment:9 in reply to:  8 ; Changed 3 years ago by iwakeh

Replying to karsten:

Interesting approach. I'd probably have stored the int value rather than the T reference. But I see how your approach should work as well.

One thing, though: you're never using the int value returned by numFromList other than for immediately looking up the reference in the list. Why not use a HashSet instead and change numFromList to fromSet that returns the T reference directly? Just in case there are more than just a few thousand entries in that list. And even with those, the set might be faster.

My choice of collection was rather a reflex. Maybe, I stared at logs and heap dumps for too long, but how do you retrieve a particular object reference from a Set?

comment:10 in reply to:  9 ; Changed 3 years ago by karsten

Replying to iwakeh:

Replying to karsten:

Interesting approach. I'd probably have stored the int value rather than the T reference. But I see how your approach should work as well.

One thing, though: you're never using the int value returned by numFromList other than for immediately looking up the reference in the list. Why not use a HashSet instead and change numFromList to fromSet that returns the T reference directly? Just in case there are more than just a few thousand entries in that list. And even with those, the set might be faster.

My choice of collection was rather a reflex. Maybe, I stared at logs and heap dumps for too long, but how do you retrieve a particular object reference from a Set?

Fair point. I guess what I meant is a HashMap<T, T>, not a HashSet<T>. Please find commit 0b501e2 in my task-25103 branch that is based on yours. Mostly untested, except for running ant checks test.

comment:11 in reply to:  10 Changed 3 years ago by iwakeh

Replying to karsten:

...

Fair point. I guess what I meant is a HashMap<T, T>, not a HashSet<T>. Please find commit 0b501e2 in my task-25103 branch that is based on yours. Mostly untested, except for running ant checks test.

Good idea! This also avoids the awkward -1 'treatment'.
Please find two commits with a tweak and setting the initial capacities for request and date maps to accommodate a year worth of data before re-hashing. It also adds a null-check to fromMap. The methods that trigger adding to the value map are only used internally and a null value there is a 'pathology'. It would cause wrong log lines or failure later down the processing line. Better to fail early here.

comment:12 Changed 3 years ago by karsten

Both commits, cefee8b and 98eeca2, look good to me!

I didn't start running more tests yet. If you want to run more tests yourself, go for it!

comment:13 in reply to:  12 Changed 3 years ago by iwakeh

Replying to karsten:

Both commits, cefee8b and 98eeca2, look good to me!

I didn't start running more tests yet. If you want to run more tests yourself, go for it!

I am at it. Guess, the test method of just importing logs is save regarding the fact that the developer shouldn't test at the same time.

comment:14 Changed 3 years ago by karsten

Status: needs_reviewmerge_ready

Sounds good! Pushed to master! We can leave this open for additional tests or close it, up to you. Thanks!

comment:15 Changed 3 years ago by iwakeh

Resolution: fixed
Status: merge_readyclosed

If anything new comes up, it should be a new ticket.

Closing. Thanks!

Note: See TracTickets for help on using tickets.