EPOL-19: Parse latency histogram#5
Conversation
|
Request to add following information in the comments:
|
6b4ef01 to
f6caa3d
Compare
debug_multiline.log While rebuilding PROX I encountered a build fail caused by undeclared PTHREAD in files display.c and lconf.c |
f6caa3d to
d01ad7c
Compare
|
Added #include <pthread.h> in display.c and lconf.c to resolve undeclared PTHREAD issues. |
|
Can you measure the impact of incorporating this in terms of CPU cycles spent. Also, Is this correc to assume this parsing computation is being done by master process only? Do the measurements and update in confluence with full load:
Get required infra and time window to run this. We need to make sure you have exclusive access, cluster is tuned and nothing else is running on the cluster when we are measuring. |
Brief summary: With the new changes, the packet loss with 1 queue remains about the same as without changes, around 0.01%. However, with 3 queues, the loss is significantly higher: 0.01% without changes vs 0.075% with changes. The new change introduces 2 simple getters while handling the lat all stats command, but the impact is observed even when the command has not been invoked. With 3 cores, the Tx/Rx bandwidth increases, but latency also rises: from 14336-16383 TSC to 20480-22527 TSC and higher (shifting from buckets 7 and 8 to buckets 9, 10 and above). Report for manual testing (with 100 Gb/s NIC yet): |
| try: | ||
| # lat all stats sends +-130 lines (128 buckets + stats lines) | ||
| lines_received = 0 | ||
| max_expected_lines = 130 |
There was a problem hiding this comment.
If the bucket count increases, this will fail, right?
Why don't we run the loop until chunk is None?
There was a problem hiding this comment.
This will issue an infinite loop since recv() usually returns empty bytes or raises exceptions. But will work with a short timeout. So I added 2 sec timeout to collect all histogram data.
cmpd_parser.c was modified: Added new command "lat tot stats" - Added function parse_cmd_lat_tot_stats - A copy of "lat all stats", but shows total packets latency histogram instead of packets per second histogram Added new function handle_total_latency_histogram - Shows latency histogram for all packets since reset - Called in parse_cmd_lat_tot_stats Added new function parse_bucket_size_freq to parce bucket size and TSC frequency - Called in handle_stats_and_packets to add info for "lat all stats" - Called in handle_total_latency_histogram to add info for "lat tot stats" stats_latency.h & stats_latency.c were modified: Added new function stats_core_lat_total_histogram - A copy of stats_core_lat_histogram but returns tot_lat_test.buckets instead of lat_test.buckets. lat_test is for collecting latency data per second while tot_lat_test is for collecting total latency data prox_client.py was modified: Added new function _recv_multiline - A copy of _recv but for multi-line parsing Added new function _parse_histogram_response - Processes multiline responses from PROX "lat all stats" and "lat tot stats" commands, returns tuple of histogram_data + bucket_info Called in _get_latency_histogram_stats Added new function _get_latency_histogram_stats - Collects and returns all histogram and additional data - Called in lat_all_stats & lat_tot_stats Added new function lat_all_stats - Collects current latency histogram and additional data from PROX showing packets per second Added new function lat_tot_stats - Collects current latency histogram and additional data from PROX showing total packets amount since reset in files display.c and lconf.c added #include <pthread.h> to solve undeclared PTHREAD issues.
d01ad7c to
933b6bf
Compare
cmpd_parser.c was modified.
Added new function parse_bucket_size_freq to parce bucket size and TSC frequency. Call it in handle_stats_and_packets.
prox_client.py was modified.
Added new functions:
_recv_multiline for multi-line parsing
lat_all_stats for parsing latency histogram, bucket size and TSC frequency.