feat: 地图实时定位 & 角色朝向预测#104
Conversation
|
@sourcery-ai review |
Reviewer's Guide添加一个由 AI 驱动的导航栈:基于 YOLO-Pose 的角度预测器、使用该预测器并结合基于颜色的线路跟随的自动按线导航控制器,以及用于大地图的两级 SIFT 金字塔分块 + 特征缓存定位器,并将它们接入自定义 Action 框架和各类流水线中。 使用 AI 角度预测的“按线自动导航”顺序图sequenceDiagram
actor User
participant AgentServer
participant AutoNavigateByLine
participant Context
participant Controller
participant OnnxRuntimeSession as OnnxSession
participant MovementController as Mover
User->>AgentServer: request custom_action auto_navigate_by_line
AgentServer->>AutoNavigateByLine: instantiate and run(context, argv)
AutoNavigateByLine->>AutoNavigateByLine: _build_config(custom_action_param)
AutoNavigateByLine->>Context: get tasker.controller
Context-->>AutoNavigateByLine: Controller
AutoNavigateByLine->>AutoNavigateByLine: _resolve_backend(config.backend)
AutoNavigateByLine->>AutoNavigateByLine: _get_session(backend)
AutoNavigateByLine-->>OnnxSession: create or reuse session
AutoNavigateByLine->>Controller: optional set_mouse_lock_follow(True)
AutoNavigateByLine->>MovementController: create Mover(controller, config)
loop navigation_loop
AutoNavigateByLine->>Controller: post_screencap()
Controller-->>AutoNavigateByLine: frame
AutoNavigateByLine->>AutoNavigateByLine: _get_bgr_frame(controller)
AutoNavigateByLine->>AutoNavigateByLine: _predict_pointer_angle(frame, config, session, input_name)
AutoNavigateByLine->>OnnxSession: run(input)
OnnxSession-->>AutoNavigateByLine: keypoints, scores
AutoNavigateByLine-->>AutoNavigateByLine: compute pointer angle
AutoNavigateByLine->>AutoNavigateByLine: _detect_navigation_line(frame, config)
AutoNavigateByLine-->>AutoNavigateByLine: select best LineDetection
alt goal_reached
AutoNavigateByLine->>AutoNavigateByLine: _is_goal_reached(frame, config)
AutoNavigateByLine-->>AutoNavigateByLine: break loop
else line_detected
AutoNavigateByLine->>MovementController: set_forward(True)
AutoNavigateByLine->>MovementController: set_sprint(hold_sprint)
AutoNavigateByLine->>MovementController: apply_steering(heading_error, reliability)
MovementController->>Controller: post_relative_move or post_key_down/post_key_up
else line_lost
AutoNavigateByLine->>MovementController: set_sprint(False)
AutoNavigateByLine->>MovementController: set_forward(keep_forward?)
AutoNavigateByLine->>MovementController: search_turn(step)
MovementController->>Controller: post_relative_move or turn_keys
end
AutoNavigateByLine->>AutoNavigateByLine: _handle_custom_stuck(frame, config, line, pointer)
AutoNavigateByLine->>AutoNavigateByLine: optional _show_debug_window()
end
AutoNavigateByLine->>MovementController: stop_all()
MovementController->>Controller: release keys
AutoNavigateByLine->>Controller: optional set_mouse_lock_follow(False)
AutoNavigateByLine-->>AgentServer: RunResult(success=True)
AgentServer-->>User: navigation finished
金字塔地图定位顺序图sequenceDiagram
actor User
participant AgentServer
participant MapLocatorPyramid as PyramidLocator
participant Context
participant Controller
participant SIFT
participant BFMatcher
User->>AgentServer: request custom_action map_locator_pyramid
AgentServer->>PyramidLocator: instantiate and run(context, argv)
PyramidLocator->>PyramidLocator: resolve default_big_map path
PyramidLocator->>PyramidLocator: load original_map
PyramidLocator->>PyramidLocator: build global_map (downscale)
PyramidLocator->>SIFT: detectAndCompute(global_gray)
SIFT-->>PyramidLocator: global_keypoints, global_descriptors
PyramidLocator->>PyramidLocator: load_or_save_global_cache(npz)
loop preload_chunks
PyramidLocator->>PyramidLocator: compute chunk bounds
PyramidLocator->>PyramidLocator: load chunk cache(npz)
alt cache_missing
PyramidLocator->>SIFT: detectAndCompute(chunk_gray)
SIFT-->>PyramidLocator: chunk_keypoints, chunk_descriptors
PyramidLocator->>PyramidLocator: save chunk cache
end
PyramidLocator->>PyramidLocator: store chunk_pts, des_chunk, offsets
end
note over PyramidLocator: runtime loop for realtime localization
loop realtime_localization
PyramidLocator->>Controller: post_screencap()
Controller-->>PyramidLocator: frame
PyramidLocator->>PyramidLocator: crop mini_map_roi
PyramidLocator->>PyramidLocator: apply circular and HSV masks
PyramidLocator->>SIFT: detectAndCompute(mini_gray)
SIFT-->>PyramidLocator: kp_mini, des_mini
alt no_previous_center
PyramidLocator->>BFMatcher: knnMatch(des_mini, global_des)
BFMatcher-->>PyramidLocator: matches
PyramidLocator->>PyramidLocator: filter by ratio
PyramidLocator->>PyramidLocator: estimateAffinePartial2D(minimap->global)
PyramidLocator-->>PyramidLocator: approx_player_point
end
PyramidLocator->>PyramidLocator: select nearest chunk and neighbors
PyramidLocator->>BFMatcher: knnMatch(des_mini, des_chunk)
BFMatcher-->>PyramidLocator: chunk_matches
PyramidLocator->>PyramidLocator: filter by ratio
PyramidLocator->>PyramidLocator: estimateAffinePartial2D(minimap->chunk)
PyramidLocator-->>PyramidLocator: raw_player_point
PyramidLocator->>PyramidLocator: validate jump vs last_center
PyramidLocator-->>PyramidLocator: smoothed player_point or last_center
PyramidLocator->>PyramidLocator: draw polygon and point on global_map
PyramidLocator->>PyramidLocator: show debug window via OpenCV
alt user_presses_q
PyramidLocator-->>PyramidLocator: break loop
end
end
PyramidLocator->>PyramidLocator: destroyAllWindows()
PyramidLocator-->>AgentServer: RunResult(success=True)
AgentServer-->>User: locator finished
新导航与地图定位 Action 的类图classDiagram
class AgentServer
class Context
class Controller
class CustomAction
class OnnxRuntimeSession
class MapImage
class PredictAngle {
+Path model_path
+list pointer_roi
+float threshold
+dict _session_cache
+dict _provider_name_map
+run(context, argv) RunResult
-_resolve_backend(custom_action_param) str
-_get_session(backend) tuple
}
class PointerDetection {
+float angle
+float confidence
}
class LineDetection {
+str color_name
+tuple target_point
+float heading_error
+float angle_min
+float angle_max
+float area
+float score
+float reliability
+int edge_points
}
class MovementController {
+Controller controller
+dict config
+int move_key
+int left_key
+int right_key
+str steering_mode
+int~optional~ sprint_key
+int~optional~ current_turn_key
+bool move_pressed
+bool sprint_pressed
+set_forward(active) void
+set_sprint(active) void
+apply_steering(heading_error, reliability) void
+search_turn(step) void
+stop_all() void
-_apply_mouse_steering(heading_error, reliability) void
-_apply_key_steering(heading_error) void
-_set_turn_key(key) void
-_clear_turn_key() void
}
class AutoNavigateByLine {
+Path model_path
+dict _session_cache
+dict _provider_name_map
+dict _debug_state
+run(context, argv) RunResult
-_build_config(custom_action_param) dict
-_parse_param(custom_action_param) dict
-_deep_update(base, override) void
-_get_bgr_frame(controller) ndarray
-_predict_pointer_angle(frame_bgr, config, session, input_name) PointerDetection
-_detect_navigation_line(frame_bgr, config) LineDetection
-_build_color_mask(hsv, ranges) ndarray
-_build_line_ring_mask(roi_shape, ring_cfg) ndarray
-_intersect_binary_masks(left, right) ndarray
-_build_mask_ring_overlay(raw_mask, ring_mask, intersection_mask) ndarray
-_pick_best_line(mask, color_name, anchor, line_cfg) tuple
-_summarize_line_failure(debug_info) str
-_is_goal_reached(frame_bgr, config) bool
-_handle_custom_stuck(frame_bgr, config, line, pointer) void
-_show_debug_window(frame_bgr, config, pointer, line, smoothed_heading_error) void
-_show_debug_analysis_window(config, selected_line_debug) void
-_mask_to_bgr(mask, width, height, label) ndarray
-_default_nav_line_roi(frame_shape) tuple
-_resolve_anchor(anchor, roi) tuple
-_resolve_circle_center(center, width, height) tuple
-_resolve_roi(roi, frame_shape, allow_default) tuple
-_resolve_backend(backend) str
-_get_session(backend) tuple
}
class MapLocator {
+Path abs_path
+str map_name
+Path default_big_map
+run(context, argv) RunResult
}
class MapLocatorPyramid {
+Path abs_path
+str map_name
+Path default_big_map
+run(context, argv) RunResult
}
AgentServer <|.. PredictAngle
AgentServer <|.. AutoNavigateByLine
AgentServer <|.. MapLocator
AgentServer <|.. MapLocatorPyramid
CustomAction <|-- PredictAngle
CustomAction <|-- AutoNavigateByLine
CustomAction <|-- MapLocator
CustomAction <|-- MapLocatorPyramid
Context --> Controller
PredictAngle --> OnnxRuntimeSession
PredictAngle --> Controller
AutoNavigateByLine --> MovementController
AutoNavigateByLine --> PointerDetection
AutoNavigateByLine --> LineDetection
AutoNavigateByLine --> OnnxRuntimeSession
AutoNavigateByLine --> Controller
MapLocator --> MapImage
MapLocatorPyramid --> MapImage
MapLocatorPyramid --> "many" MapImage : chunked_view
文件级变更
Tips and commandsInteracting with Sourcery
Customizing Your Experience打开你的 dashboard 以:
Getting HelpOriginal review guide in EnglishReviewer's GuideAdds an AI-powered navigation stack: a YOLO-Pose based angle predictor, an auto-navigation-by-line controller using that predictor plus color-based line following, and a two-level SIFT-based map locator with pyramid chunking and feature caching for large maps, wiring them into the custom action framework and pipelines. Sequence diagram for auto navigation by line with AI angle predictionsequenceDiagram
actor User
participant AgentServer
participant AutoNavigateByLine
participant Context
participant Controller
participant OnnxRuntimeSession as OnnxSession
participant MovementController as Mover
User->>AgentServer: request custom_action auto_navigate_by_line
AgentServer->>AutoNavigateByLine: instantiate and run(context, argv)
AutoNavigateByLine->>AutoNavigateByLine: _build_config(custom_action_param)
AutoNavigateByLine->>Context: get tasker.controller
Context-->>AutoNavigateByLine: Controller
AutoNavigateByLine->>AutoNavigateByLine: _resolve_backend(config.backend)
AutoNavigateByLine->>AutoNavigateByLine: _get_session(backend)
AutoNavigateByLine-->>OnnxSession: create or reuse session
AutoNavigateByLine->>Controller: optional set_mouse_lock_follow(True)
AutoNavigateByLine->>MovementController: create Mover(controller, config)
loop navigation_loop
AutoNavigateByLine->>Controller: post_screencap()
Controller-->>AutoNavigateByLine: frame
AutoNavigateByLine->>AutoNavigateByLine: _get_bgr_frame(controller)
AutoNavigateByLine->>AutoNavigateByLine: _predict_pointer_angle(frame, config, session, input_name)
AutoNavigateByLine->>OnnxSession: run(input)
OnnxSession-->>AutoNavigateByLine: keypoints, scores
AutoNavigateByLine-->>AutoNavigateByLine: compute pointer angle
AutoNavigateByLine->>AutoNavigateByLine: _detect_navigation_line(frame, config)
AutoNavigateByLine-->>AutoNavigateByLine: select best LineDetection
alt goal_reached
AutoNavigateByLine->>AutoNavigateByLine: _is_goal_reached(frame, config)
AutoNavigateByLine-->>AutoNavigateByLine: break loop
else line_detected
AutoNavigateByLine->>MovementController: set_forward(True)
AutoNavigateByLine->>MovementController: set_sprint(hold_sprint)
AutoNavigateByLine->>MovementController: apply_steering(heading_error, reliability)
MovementController->>Controller: post_relative_move or post_key_down/post_key_up
else line_lost
AutoNavigateByLine->>MovementController: set_sprint(False)
AutoNavigateByLine->>MovementController: set_forward(keep_forward?)
AutoNavigateByLine->>MovementController: search_turn(step)
MovementController->>Controller: post_relative_move or turn_keys
end
AutoNavigateByLine->>AutoNavigateByLine: _handle_custom_stuck(frame, config, line, pointer)
AutoNavigateByLine->>AutoNavigateByLine: optional _show_debug_window()
end
AutoNavigateByLine->>MovementController: stop_all()
MovementController->>Controller: release keys
AutoNavigateByLine->>Controller: optional set_mouse_lock_follow(False)
AutoNavigateByLine-->>AgentServer: RunResult(success=True)
AgentServer-->>User: navigation finished
Sequence diagram for pyramid map localizationsequenceDiagram
actor User
participant AgentServer
participant MapLocatorPyramid as PyramidLocator
participant Context
participant Controller
participant SIFT
participant BFMatcher
User->>AgentServer: request custom_action map_locator_pyramid
AgentServer->>PyramidLocator: instantiate and run(context, argv)
PyramidLocator->>PyramidLocator: resolve default_big_map path
PyramidLocator->>PyramidLocator: load original_map
PyramidLocator->>PyramidLocator: build global_map (downscale)
PyramidLocator->>SIFT: detectAndCompute(global_gray)
SIFT-->>PyramidLocator: global_keypoints, global_descriptors
PyramidLocator->>PyramidLocator: load_or_save_global_cache(npz)
loop preload_chunks
PyramidLocator->>PyramidLocator: compute chunk bounds
PyramidLocator->>PyramidLocator: load chunk cache(npz)
alt cache_missing
PyramidLocator->>SIFT: detectAndCompute(chunk_gray)
SIFT-->>PyramidLocator: chunk_keypoints, chunk_descriptors
PyramidLocator->>PyramidLocator: save chunk cache
end
PyramidLocator->>PyramidLocator: store chunk_pts, des_chunk, offsets
end
note over PyramidLocator: runtime loop for realtime localization
loop realtime_localization
PyramidLocator->>Controller: post_screencap()
Controller-->>PyramidLocator: frame
PyramidLocator->>PyramidLocator: crop mini_map_roi
PyramidLocator->>PyramidLocator: apply circular and HSV masks
PyramidLocator->>SIFT: detectAndCompute(mini_gray)
SIFT-->>PyramidLocator: kp_mini, des_mini
alt no_previous_center
PyramidLocator->>BFMatcher: knnMatch(des_mini, global_des)
BFMatcher-->>PyramidLocator: matches
PyramidLocator->>PyramidLocator: filter by ratio
PyramidLocator->>PyramidLocator: estimateAffinePartial2D(minimap->global)
PyramidLocator-->>PyramidLocator: approx_player_point
end
PyramidLocator->>PyramidLocator: select nearest chunk and neighbors
PyramidLocator->>BFMatcher: knnMatch(des_mini, des_chunk)
BFMatcher-->>PyramidLocator: chunk_matches
PyramidLocator->>PyramidLocator: filter by ratio
PyramidLocator->>PyramidLocator: estimateAffinePartial2D(minimap->chunk)
PyramidLocator-->>PyramidLocator: raw_player_point
PyramidLocator->>PyramidLocator: validate jump vs last_center
PyramidLocator-->>PyramidLocator: smoothed player_point or last_center
PyramidLocator->>PyramidLocator: draw polygon and point on global_map
PyramidLocator->>PyramidLocator: show debug window via OpenCV
alt user_presses_q
PyramidLocator-->>PyramidLocator: break loop
end
end
PyramidLocator->>PyramidLocator: destroyAllWindows()
PyramidLocator-->>AgentServer: RunResult(success=True)
AgentServer-->>User: locator finished
Class diagram for new navigation and map localization actionsclassDiagram
class AgentServer
class Context
class Controller
class CustomAction
class OnnxRuntimeSession
class MapImage
class PredictAngle {
+Path model_path
+list pointer_roi
+float threshold
+dict _session_cache
+dict _provider_name_map
+run(context, argv) RunResult
-_resolve_backend(custom_action_param) str
-_get_session(backend) tuple
}
class PointerDetection {
+float angle
+float confidence
}
class LineDetection {
+str color_name
+tuple target_point
+float heading_error
+float angle_min
+float angle_max
+float area
+float score
+float reliability
+int edge_points
}
class MovementController {
+Controller controller
+dict config
+int move_key
+int left_key
+int right_key
+str steering_mode
+int~optional~ sprint_key
+int~optional~ current_turn_key
+bool move_pressed
+bool sprint_pressed
+set_forward(active) void
+set_sprint(active) void
+apply_steering(heading_error, reliability) void
+search_turn(step) void
+stop_all() void
-_apply_mouse_steering(heading_error, reliability) void
-_apply_key_steering(heading_error) void
-_set_turn_key(key) void
-_clear_turn_key() void
}
class AutoNavigateByLine {
+Path model_path
+dict _session_cache
+dict _provider_name_map
+dict _debug_state
+run(context, argv) RunResult
-_build_config(custom_action_param) dict
-_parse_param(custom_action_param) dict
-_deep_update(base, override) void
-_get_bgr_frame(controller) ndarray
-_predict_pointer_angle(frame_bgr, config, session, input_name) PointerDetection
-_detect_navigation_line(frame_bgr, config) LineDetection
-_build_color_mask(hsv, ranges) ndarray
-_build_line_ring_mask(roi_shape, ring_cfg) ndarray
-_intersect_binary_masks(left, right) ndarray
-_build_mask_ring_overlay(raw_mask, ring_mask, intersection_mask) ndarray
-_pick_best_line(mask, color_name, anchor, line_cfg) tuple
-_summarize_line_failure(debug_info) str
-_is_goal_reached(frame_bgr, config) bool
-_handle_custom_stuck(frame_bgr, config, line, pointer) void
-_show_debug_window(frame_bgr, config, pointer, line, smoothed_heading_error) void
-_show_debug_analysis_window(config, selected_line_debug) void
-_mask_to_bgr(mask, width, height, label) ndarray
-_default_nav_line_roi(frame_shape) tuple
-_resolve_anchor(anchor, roi) tuple
-_resolve_circle_center(center, width, height) tuple
-_resolve_roi(roi, frame_shape, allow_default) tuple
-_resolve_backend(backend) str
-_get_session(backend) tuple
}
class MapLocator {
+Path abs_path
+str map_name
+Path default_big_map
+run(context, argv) RunResult
}
class MapLocatorPyramid {
+Path abs_path
+str map_name
+Path default_big_map
+run(context, argv) RunResult
}
AgentServer <|.. PredictAngle
AgentServer <|.. AutoNavigateByLine
AgentServer <|.. MapLocator
AgentServer <|.. MapLocatorPyramid
CustomAction <|-- PredictAngle
CustomAction <|-- AutoNavigateByLine
CustomAction <|-- MapLocator
CustomAction <|-- MapLocatorPyramid
Context --> Controller
PredictAngle --> OnnxRuntimeSession
PredictAngle --> Controller
AutoNavigateByLine --> MovementController
AutoNavigateByLine --> PointerDetection
AutoNavigateByLine --> LineDetection
AutoNavigateByLine --> OnnxRuntimeSession
AutoNavigateByLine --> Controller
MapLocator --> MapImage
MapLocatorPyramid --> MapImage
MapLocatorPyramid --> "many" MapImage : chunked_view
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - 我发现了两个问题,并留下了一些总体反馈:
PredictAngle和AutoNavigateByLine之间在 ONNX 后端/Session 管理逻辑上有相当多的重复(后端解析、provider 映射、Session 缓存);建议把这些提取到一个共享工具中,以保持行为一致,并让之后修改后端处理逻辑时更容易。- 两种地图定位实现(
MapLocator和MapLocatorPyramid)在 SIFT 特征处理、缓存以及 RANSAC 匹配逻辑上有大量共用部分;可以考虑把公共部分抽取为可复用的 helper(比如基础定位器或特征缓存模块),以降低复杂度并减少两种模式之间产生细微差异的风险。 - 很多关键参数(ROI、HSV 阈值、半径、平滑因子)目前是直接在动作中硬编码的;如果更多地通过
custom_action_param或集中配置来传入这些参数,会让导航和地图定位行为更容易针对不同分辨率或游戏进行调优,而无需改动代码。
给 AI 代理的提示
Please address the comments from this code review:
## Overall Comments
- `PredictAngle` 和 `AutoNavigateByLine` 之间在 ONNX 后端/Session 管理逻辑上有相当多的重复(后端解析、provider 映射、Session 缓存);建议把这些提取到一个共享工具中,以保持行为一致,并让之后修改后端处理逻辑时更容易。
- 两种地图定位实现(`MapLocator` 和 `MapLocatorPyramid`)在 SIFT 特征处理、缓存以及 RANSAC 匹配逻辑上有大量共用部分;可以考虑把公共部分抽取为可复用的 helper(比如基础定位器或特征缓存模块),以降低复杂度并减少两种模式之间产生细微差异的风险。
- 很多关键参数(ROI、HSV 阈值、半径、平滑因子)目前是直接在动作中硬编码的;如果更多地通过 `custom_action_param` 或集中配置来传入这些参数,会让导航和地图定位行为更容易针对不同分辨率或游戏进行调优,而无需改动代码。
## Individual Comments
### Comment 1
<location path="agent/custom/action/auto_navigate_by_line.py" line_range="837-844" />
<code_context>
+ return None
+ return min(cx, width - 1), min(cy, height - 1)
+
+ def _resolve_roi(
+ self,
+ roi: Optional[List[int]],
+ frame_shape: Tuple[int, ...],
+ allow_default: bool = False,
+ ) -> Optional[Tuple[int, int, int, int]]:
+ if not isinstance(roi, (list, tuple)) or len(roi) != 4:
+ return None if allow_default else None
+
+ x, y, w, h = [int(v) for v in roi]
</code_context>
<issue_to_address>
**issue (bug_risk):** `_resolve_roi` 中的 `allow_default` 标志目前没有任何效果,当 ROI 无效时,该方法始终返回 `None`。
按当前写法,`allow_default` 实际上是个空操作,因为对无效/缺失 ROI 的两条分支都会返回 `None`:
```python
if not isinstance(roi, (list, tuple)) or len(roi) != 4:
return None if allow_default else None
if min(w, h) <= 0 or x < 0 or y < 0:
return None if allow_default else None
```
这会掩盖配置问题,同时让这个标志具有误导性。建议要么为 `allow_default` 提供与众不同的行为(例如返回特定的哨兵值),要么移除该标志并在 ROI 无效时始终返回 `None`,以保持 API 的语义清晰。
</issue_to_address>
### Comment 2
<location path="agent/custom/action/map_locator_pyramid.py" line_range="109" />
<code_context>
+ except Exception as e:
+ print(f"全局特征缓存保存失败: {e}")
+
+ print(f"全局视图特征点数: {len(global_points)}")
+
+ # 3. 初始化分块缓存信息
</code_context>
<issue_to_address>
**issue:** 如果全局特征提取失败并且 `global_points` 保持为 `None`,可能会导致崩溃。
当 SIFT 无法生成描述符时(例如输入纹理非常少或输入无效),`global_des`/`global_points` 可能会保持为 `None`,此时调用 `len(global_points)` 会抛出 `TypeError`。建议同时防护 `None` 和空列表这两种情况:
```python
if global_points is None or global_des is None or len(global_points) == 0:
print("全局视图特征点不足,无法进行定位")
return CustomAction.RunResult(success=False)
print(f"全局视图特征点数: {len(global_points)}")
```
这样可以显式处理全局特征不足的情况,并避免在边缘场景下发生运行时崩溃。
</issue_to_address>帮我变得更有用!请给每条评论点 👍 或 👎,我会根据你的反馈改进后续的评审。
Original comment in English
Hey - I've found 2 issues, and left some high level feedback:
- There is quite a bit of duplicated ONNX backend/session management logic between
PredictAngleandAutoNavigateByLine(backend resolution, provider maps, session cache); consider extracting this into a shared utility to keep the behavior consistent and make future changes to backend handling easier. - The two map locator implementations (
MapLocatorandMapLocatorPyramid) share substantial SIFT feature, caching, and RANSAC matching logic; factoring the common pieces into reusable helpers (e.g., a base locator or feature-cache module) would reduce complexity and the chance of subtle divergence between the two modes. - Many key parameters (ROIs, HSV thresholds, radii, smoothing factors) are currently hard-coded in the actions; wiring more of these through
custom_action_paramor a central config would make the navigation and map-location behavior easier to tune for different resolutions or games without code changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There is quite a bit of duplicated ONNX backend/session management logic between `PredictAngle` and `AutoNavigateByLine` (backend resolution, provider maps, session cache); consider extracting this into a shared utility to keep the behavior consistent and make future changes to backend handling easier.
- The two map locator implementations (`MapLocator` and `MapLocatorPyramid`) share substantial SIFT feature, caching, and RANSAC matching logic; factoring the common pieces into reusable helpers (e.g., a base locator or feature-cache module) would reduce complexity and the chance of subtle divergence between the two modes.
- Many key parameters (ROIs, HSV thresholds, radii, smoothing factors) are currently hard-coded in the actions; wiring more of these through `custom_action_param` or a central config would make the navigation and map-location behavior easier to tune for different resolutions or games without code changes.
## Individual Comments
### Comment 1
<location path="agent/custom/action/auto_navigate_by_line.py" line_range="837-844" />
<code_context>
+ return None
+ return min(cx, width - 1), min(cy, height - 1)
+
+ def _resolve_roi(
+ self,
+ roi: Optional[List[int]],
+ frame_shape: Tuple[int, ...],
+ allow_default: bool = False,
+ ) -> Optional[Tuple[int, int, int, int]]:
+ if not isinstance(roi, (list, tuple)) or len(roi) != 4:
+ return None if allow_default else None
+
+ x, y, w, h = [int(v) for v in roi]
</code_context>
<issue_to_address>
**issue (bug_risk):** The `allow_default` flag in `_resolve_roi` currently has no effect and the method always returns `None` when the ROI is invalid.
As written, `allow_default` is a no-op because both branches return `None` for invalid/missing ROIs:
```python
if not isinstance(roi, (list, tuple)) or len(roi) != 4:
return None if allow_default else None
if min(w, h) <= 0 or x < 0 or y < 0:
return None if allow_default else None
```
This can hide configuration issues and makes the flag misleading. Either give `allow_default` a distinct behavior (e.g., return a specific sentinel value) or remove the flag and always `return None` for invalid ROIs to keep the API clear.
</issue_to_address>
### Comment 2
<location path="agent/custom/action/map_locator_pyramid.py" line_range="109" />
<code_context>
+ except Exception as e:
+ print(f"全局特征缓存保存失败: {e}")
+
+ print(f"全局视图特征点数: {len(global_points)}")
+
+ # 3. 初始化分块缓存信息
</code_context>
<issue_to_address>
**issue:** Potential crash if global feature extraction fails and `global_points` stays `None`.
If SIFT fails to produce descriptors (e.g. very low-texture or invalid input), `global_des`/`global_points` can remain `None`, so `len(global_points)` will raise a `TypeError`. Consider guarding both `None` and empty cases:
```python
if global_points is None or global_des is None or len(global_points) == 0:
print("全局视图特征点不足,无法进行定位")
return CustomAction.RunResult(success=False)
print(f"全局视图特征点数: {len(global_points)}")
```
This makes the handling of insufficient global features explicit and prevents a runtime crash in edge cases.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _resolve_roi( | ||
| self, | ||
| roi: Optional[List[int]], | ||
| frame_shape: Tuple[int, ...], | ||
| allow_default: bool = False, | ||
| ) -> Optional[Tuple[int, int, int, int]]: | ||
| if not isinstance(roi, (list, tuple)) or len(roi) != 4: | ||
| return None if allow_default else None |
There was a problem hiding this comment.
issue (bug_risk): _resolve_roi 中的 allow_default 标志目前没有任何效果,当 ROI 无效时,该方法始终返回 None。
按当前写法,allow_default 实际上是个空操作,因为对无效/缺失 ROI 的两条分支都会返回 None:
if not isinstance(roi, (list, tuple)) or len(roi) != 4:
return None if allow_default else None
if min(w, h) <= 0 or x < 0 or y < 0:
return None if allow_default else None这会掩盖配置问题,同时让这个标志具有误导性。建议要么为 allow_default 提供与众不同的行为(例如返回特定的哨兵值),要么移除该标志并在 ROI 无效时始终返回 None,以保持 API 的语义清晰。
Original comment in English
issue (bug_risk): The allow_default flag in _resolve_roi currently has no effect and the method always returns None when the ROI is invalid.
As written, allow_default is a no-op because both branches return None for invalid/missing ROIs:
if not isinstance(roi, (list, tuple)) or len(roi) != 4:
return None if allow_default else None
if min(w, h) <= 0 or x < 0 or y < 0:
return None if allow_default else NoneThis can hide configuration issues and makes the flag misleading. Either give allow_default a distinct behavior (e.g., return a specific sentinel value) or remove the flag and always return None for invalid ROIs to keep the API clear.
| except Exception as e: | ||
| print(f"全局特征缓存保存失败: {e}") | ||
|
|
||
| print(f"全局视图特征点数: {len(global_points)}") |
There was a problem hiding this comment.
issue: 如果全局特征提取失败并且 global_points 保持为 None,可能会导致崩溃。
当 SIFT 无法生成描述符时(例如输入纹理非常少或输入无效),global_des/global_points 可能会保持为 None,此时调用 len(global_points) 会抛出 TypeError。建议同时防护 None 和空列表这两种情况:
if global_points is None or global_des is None or len(global_points) == 0:
print("全局视图特征点不足,无法进行定位")
return CustomAction.RunResult(success=False)
print(f"全局视图特征点数: {len(global_points)}")这样可以显式处理全局特征不足的情况,并避免在边缘场景下发生运行时崩溃。
Original comment in English
issue: Potential crash if global feature extraction fails and global_points stays None.
If SIFT fails to produce descriptors (e.g. very low-texture or invalid input), global_des/global_points can remain None, so len(global_points) will raise a TypeError. Consider guarding both None and empty cases:
if global_points is None or global_des is None or len(global_points) == 0:
print("全局视图特征点不足,无法进行定位")
return CustomAction.RunResult(success=False)
print(f"全局视图特征点数: {len(global_points)}")This makes the handling of insufficient global features explicit and prevents a runtime crash in edge cases.
地图定位与 AI 指针角度识别
本 PR 为 MAA 自定义动作库引入了两个核心组件:基于金字塔架构的大地图定位器 (MapLocatorPyramid) 和 基于 YOLO26n-Pose 的指针角度预测器 (PredictAngle)。这两项功能共同提升了脚本在复杂大场景下的空间感知能力。
1. MapLocatorPyramid (金字塔分块地图定位)
该模块旨在解决超大分辨率地图下的高性能定位问题,采用了典型的“粗定位 + 精定位”层级策略。
技术架构:
两级匹配:首先通过低分辨率全局图进行“模糊定位”,锁定大致区域后,再在对应的高清分块(Chunk)中进行“精确匹配”。
特征缓存机制:自动将大地图的 SIFT 特征点和描述符序列化为 .npz 文件(包含全局缓存和分块文件夹),大幅缩短二次启动的加载时间。
鲁棒性优化:
引入 RANSAC 算法剔除误匹配点。
坐标平滑与跳变过滤:通过插值平滑坐标移动,并根据置信度(Inliers 数量)动态判断是否接受大范围的坐标跳转。
调试支持:集成了实时的 OpenCV 调试窗口,可视化展示当前所在的地图分块、匹配特征点数、以及在全局缩略图中的实时位置。
2. PredictAngle (基于 AI 的方向预测)
利用深度学习模型精准提取导航指针的方向,相比传统图像算法具有更强的抗干扰性。
核心特性:
模型驱动:采用 YOLO-Pose 架构的 ONNX 模型,通过识别指针的三个关键点(顶点、左尾部、右尾部)来计算几何中心和向量夹角。
高性能推理:集成 onnxruntime,支持 CPU、CUDA (NVIDIA)、DirectML (Windows) 多后端自动切换和手动配置。
高精度:利用 atan2 计算数学角度,并提供置信度阈值过滤,确保输出方向的准确性。
交互性:内置实时预览窗口,动态标注检测到的 Keypoints、Bounding Box 以及计算出的偏转角度。
Summary by Sourcery
为大型游戏内地图新增基于 AI 的导航和地图定位能力。
新功能:
PredictAngle自定义动作,使用 YOLO-Pose ONNX 模型来估计导航指针方向,并支持多种后端。AutoNavigateByLine自定义动作,通过指针角度预测和输入控制,实时沿配置好的导航线进行自动导航。MapLocator自定义动作,使用基于 SIFT 的全局地图定位功能,将小地图定位到大型世界地图上。MapLocatorPyramid自定义动作,实现多层级、分块式地图定位,并通过特征缓存支持超大地图。Original summary in English
Summary by Sourcery
Add AI-based navigation and map localization capabilities for large in-game maps.
New Features: