Description
ob version 4.2.5.3
obdiag version 3.6.0
I installed obdiag on four machines and conducted tests on the same oceanbase cluster, and discovered a phenomenon:
1、There are two machines running obdiag. Each collection item is basically very slow and will eventually get stuck. Even if the execution time exceeds 24 hours, no result will be returned.
2、There is a machine running obdiag. At first, it is relatively fast, but eventually it gets stuck and will never return the result.
3、There is a machine running obdiag. It is relatively fast overall and can eventually return the result
The above-mentioned phenomenon can be stably reproduced in my environment.
Regarding the first point, with the assistance of the oceanbase team in analysis and investigation, it was determined that it was caused by an excessive number of records in this file [~ /.ssh/known_hosts](This file contains over 7,700 records). When I cleared this file and re-executed it, I got the result very quickly.
Regarding the second point, with the assistance of the oceanbase team in analyzing the logs, it was found that the [cluster.observer_port] collection item was very slow。We can delete file to skip this collection. [~/.obdiag/check/tasks/observer/cluster/observer_port.py]
Description
ob version 4.2.5.3
obdiag version 3.6.0
I installed obdiag on four machines and conducted tests on the same oceanbase cluster, and discovered a phenomenon:
1、There are two machines running obdiag. Each collection item is basically very slow and will eventually get stuck. Even if the execution time exceeds 24 hours, no result will be returned.
2、There is a machine running obdiag. At first, it is relatively fast, but eventually it gets stuck and will never return the result.
3、There is a machine running obdiag. It is relatively fast overall and can eventually return the result
The above-mentioned phenomenon can be stably reproduced in my environment.
Regarding the first point, with the assistance of the oceanbase team in analysis and investigation, it was determined that it was caused by an excessive number of records in this file [~ /.ssh/known_hosts](This file contains over 7,700 records). When I cleared this file and re-executed it, I got the result very quickly.
Regarding the second point, with the assistance of the oceanbase team in analyzing the logs, it was found that the [cluster.observer_port] collection item was very slow。We can delete file to skip this collection. [~/.obdiag/check/tasks/observer/cluster/observer_port.py]