[misc] refactor: extract get_logger utility to reduce code duplication#84
Conversation
CLA Signature Passji-huazhong, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passji-huazhong, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passji-huazhong, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passji-huazhong, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passji-huazhong, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passji-huazhong, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passji-huazhong, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
Signed-off-by: ji-huazhong <hzji210@gmail.com>
CLA Signature Passji-huazhong, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
There was a problem hiding this comment.
Pull request overview
Refactors logging setup by extracting a shared get_logger helper and updating modules to use it, reducing duplicated logger configuration code across the project.
Changes:
- Added
transfer_queue.utils.logging_utils.get_logger()to centralize logger creation/formatting/level handling. - Replaced per-module logger boilerplate with
get_logger(__name__)across utils, storage, controller/client, dataloader, and metadata modules. - Updated the single-controller demo to refine logging text and add an “advantage” computation step.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| transfer_queue/utils/logging_utils.py | Introduces shared get_logger utility for consistent logger setup. |
| transfer_queue/utils/zmq_utils.py | Switches module logger initialization to get_logger. |
| transfer_queue/utils/serial_utils.py | Switches module logger initialization to get_logger. |
| transfer_queue/utils/perf_utils.py | Uses get_logger(..., default_level="INFO") for perf logging defaults. |
| transfer_queue/utils/enum_utils.py | Switches module logger initialization to get_logger. |
| transfer_queue/utils/common.py | Switches module logger initialization to get_logger. |
| transfer_queue/storage/simple_backend.py | Switches module logger initialization to get_logger. |
| transfer_queue/storage/managers/base.py | Switches module logger initialization to get_logger. |
| transfer_queue/storage/managers/simple_backend_manager.py | Switches module logger initialization to get_logger. |
| transfer_queue/storage/managers/mooncake_manager.py | Switches module logger initialization to get_logger. |
| transfer_queue/storage/managers/yuanrong_manager.py | Switches module logger initialization to get_logger. |
| transfer_queue/storage/clients/mooncake_client.py | Switches module logger initialization to get_logger. |
| transfer_queue/storage/clients/yuanrong_client.py | Switches module logger initialization to get_logger. |
| transfer_queue/metadata.py | Switches module logger initialization to get_logger. |
| transfer_queue/interface.py | Switches module logger initialization to get_logger. |
| transfer_queue/dataloader/streaming_dataset.py | Switches module logger initialization to get_logger. |
| transfer_queue/dataloader/streaming_dataloader.py | Switches module logger initialization to get_logger. |
| transfer_queue/controller.py | Switches module logger initialization to get_logger. |
| transfer_queue/client.py | Switches module logger initialization to get_logger. |
| recipe/simple_use_case/single_controller_demo.py | Improves demo flow (adds advantage step) and tweaks docstrings/log messages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Create and configure a logger with consistent formatting. | ||
|
|
||
| Creates a logger with the specified name, sets its level based on the | ||
| TQ_LOGGING_LEVEL environment variable (or default), and adds a StreamHandler | ||
| if no handlers exist. This is particularly useful for Ray Actor subprocesses | ||
| that may not inherit logging configuration from the parent process. |
There was a problem hiding this comment.
get_logger docstring promises "consistent formatting", but the function only sets a formatter when it adds a new handler. If the logger (or an ancestor) already has handlers, formatting may differ. Consider adjusting the docstring to clarify this behavior (or explicitly enforce the formatter only when adding a handler).
| """Create and configure a logger with consistent formatting. | |
| Creates a logger with the specified name, sets its level based on the | |
| TQ_LOGGING_LEVEL environment variable (or default), and adds a StreamHandler | |
| if no handlers exist. This is particularly useful for Ray Actor subprocesses | |
| that may not inherit logging configuration from the parent process. | |
| """Create and configure a logger, adding a default formatted handler if needed. | |
| Creates a logger with the specified name, sets its level based on the | |
| TQ_LOGGING_LEVEL environment variable (or default), and adds a StreamHandler | |
| using the default format only if the logger does not already have handlers. | |
| If handlers already exist on this logger or an ancestor, their configuration | |
| and formatting are left unchanged. This is particularly useful for Ray Actor | |
| subprocesses that may not inherit logging configuration from the parent | |
| process. |
| def compute_reward(response_ids: torch.Tensor) -> TensorDict: | ||
| """Simulate a reward model that scores each token position in the response. | ||
| Returns a TensorDict with a ``"rm_score"`` field whose shape matches | ||
| ``response_ids`` (i.e. one scalar per response token). | ||
| """ | ||
| time.sleep(1) | ||
| reward = torch.randn_like(response_ids, dtype=torch.float32) | ||
|
|
||
| return TensorDict({"rm_score": reward}, batch_size=response_ids.size(0)) | ||
|
|
There was a problem hiding this comment.
The docstring says the returned TensorDict's shape matches response_ids (one scalar per token), but the batch_size is set to response_ids.size(0) (batch-only). This is misleading and can cause confusion for TensorDict operations that rely on batch_size. Either update the docstring to clarify that only the first dimension is treated as batch, or set batch_size to match the intended per-token shape (e.g., include the sequence length dimension when appropriate).
| def compute_advantage(rewards: torch.Tensor) -> TensorDict: | ||
| """Simulate the process of compute advantage | ||
|
|
||
| Returns a TensorDict with an ``"advantage"`` field whose shape matches | ||
| ``response_ids`` (i.e. one scalar per response token). | ||
| ``rewards`` (i.e. one scalar per reward). | ||
| """ | ||
| time.sleep(1) | ||
| advantage = torch.randn_like(response_ids, dtype=torch.float32) | ||
| return TensorDict({"advantage": advantage}, batch_size=response_ids.size(0)) | ||
| advantage = torch.randn_like(rewards, dtype=torch.float32) | ||
| return TensorDict({"advantage": advantage}, batch_size=rewards.size(0)) |
There was a problem hiding this comment.
Same issue as compute_reward: the docstring implies the output TensorDict shape matches rewards, but batch_size is set to rewards.size(0) (batch-only). Align the docstring with the actual TensorDict batch_size, or adjust batch_size to reflect the full intended shape.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
CLA Signature Guide@0oshowero0 , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
get_logger utility to reduce code duplication
as title.
cc @0oshowero0