Skip to content

feat: 地图实时定位 & 角色朝向预测#104

Open
1bananachicken wants to merge 13 commits into
devfrom
feature-autonavi
Open

feat: 地图实时定位 & 角色朝向预测#104
1bananachicken wants to merge 13 commits into
devfrom
feature-autonavi

Conversation

@1bananachicken
Copy link
Copy Markdown
Owner

@1bananachicken 1bananachicken commented May 4, 2026

地图定位与 AI 指针角度识别

本 PR 为 MAA 自定义动作库引入了两个核心组件:基于金字塔架构的大地图定位器 (MapLocatorPyramid) 和 基于 YOLO26n-Pose 的指针角度预测器 (PredictAngle)。这两项功能共同提升了脚本在复杂大场景下的空间感知能力。

1. MapLocatorPyramid (金字塔分块地图定位)

该模块旨在解决超大分辨率地图下的高性能定位问题,采用了典型的“粗定位 + 精定位”层级策略。

技术架构:

两级匹配:首先通过低分辨率全局图进行“模糊定位”,锁定大致区域后,再在对应的高清分块(Chunk)中进行“精确匹配”。

特征缓存机制:自动将大地图的 SIFT 特征点和描述符序列化为 .npz 文件(包含全局缓存和分块文件夹),大幅缩短二次启动的加载时间。

鲁棒性优化:

引入 RANSAC 算法剔除误匹配点。

坐标平滑与跳变过滤:通过插值平滑坐标移动,并根据置信度(Inliers 数量)动态判断是否接受大范围的坐标跳转。

调试支持:集成了实时的 OpenCV 调试窗口,可视化展示当前所在的地图分块、匹配特征点数、以及在全局缩略图中的实时位置。

2. PredictAngle (基于 AI 的方向预测)

利用深度学习模型精准提取导航指针的方向,相比传统图像算法具有更强的抗干扰性。

核心特性:

模型驱动:采用 YOLO-Pose 架构的 ONNX 模型,通过识别指针的三个关键点(顶点、左尾部、右尾部)来计算几何中心和向量夹角。

高性能推理:集成 onnxruntime,支持 CPU、CUDA (NVIDIA)、DirectML (Windows) 多后端自动切换和手动配置。

高精度:利用 atan2 计算数学角度,并提供置信度阈值过滤,确保输出方向的准确性。

交互性:内置实时预览窗口,动态标注检测到的 Keypoints、Bounding Box 以及计算出的偏转角度。

Summary by Sourcery

为大型游戏内地图新增基于 AI 的导航和地图定位能力。

新功能:

  • 引入 PredictAngle 自定义动作,使用 YOLO-Pose ONNX 模型来估计导航指针方向,并支持多种后端。
  • 添加 AutoNavigateByLine 自定义动作,通过指针角度预测和输入控制,实时沿配置好的导航线进行自动导航。
  • 添加 MapLocator 自定义动作,使用基于 SIFT 的全局地图定位功能,将小地图定位到大型世界地图上。
  • 添加 MapLocatorPyramid 自定义动作,实现多层级、分块式地图定位,并通过特征缓存支持超大地图。
  • 在自定义动作包中注册新的导航和地图定位动作,并添加相应的角度预测和地图定位流水线定义。
Original summary in English

Summary by Sourcery

Add AI-based navigation and map localization capabilities for large in-game maps.

New Features:

  • Introduce PredictAngle custom action using a YOLO-Pose ONNX model to estimate navigation pointer direction with multi-backend support.
  • Add AutoNavigateByLine custom action to follow configured navigation lines in real time using pointer angle prediction and input control.
  • Add MapLocator custom action for SIFT-based global map localization from the mini-map to a large world map.
  • Add MapLocatorPyramid custom action implementing a multi-level, chunked map locator with feature caching for very large maps.
  • Register the new navigation and map locator actions in the custom action package and add corresponding pipeline definitions for angle prediction and map location.

@1bananachicken
Copy link
Copy Markdown
Owner Author

@sourcery-ai review

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 5, 2026

Reviewer's Guide

添加一个由 AI 驱动的导航栈:基于 YOLO-Pose 的角度预测器、使用该预测器并结合基于颜色的线路跟随的自动按线导航控制器,以及用于大地图的两级 SIFT 金字塔分块 + 特征缓存定位器,并将它们接入自定义 Action 框架和各类流水线中。

使用 AI 角度预测的“按线自动导航”顺序图

sequenceDiagram
    actor User
    participant AgentServer
    participant AutoNavigateByLine
    participant Context
    participant Controller
    participant OnnxRuntimeSession as OnnxSession
    participant MovementController as Mover

    User->>AgentServer: request custom_action auto_navigate_by_line
    AgentServer->>AutoNavigateByLine: instantiate and run(context, argv)

    AutoNavigateByLine->>AutoNavigateByLine: _build_config(custom_action_param)
    AutoNavigateByLine->>Context: get tasker.controller
    Context-->>AutoNavigateByLine: Controller

    AutoNavigateByLine->>AutoNavigateByLine: _resolve_backend(config.backend)
    AutoNavigateByLine->>AutoNavigateByLine: _get_session(backend)
    AutoNavigateByLine-->>OnnxSession: create or reuse session

    AutoNavigateByLine->>Controller: optional set_mouse_lock_follow(True)

    AutoNavigateByLine->>MovementController: create Mover(controller, config)

    loop navigation_loop
        AutoNavigateByLine->>Controller: post_screencap()
        Controller-->>AutoNavigateByLine: frame

        AutoNavigateByLine->>AutoNavigateByLine: _get_bgr_frame(controller)

        AutoNavigateByLine->>AutoNavigateByLine: _predict_pointer_angle(frame, config, session, input_name)
        AutoNavigateByLine->>OnnxSession: run(input)
        OnnxSession-->>AutoNavigateByLine: keypoints, scores
        AutoNavigateByLine-->>AutoNavigateByLine: compute pointer angle

        AutoNavigateByLine->>AutoNavigateByLine: _detect_navigation_line(frame, config)
        AutoNavigateByLine-->>AutoNavigateByLine: select best LineDetection

        alt goal_reached
            AutoNavigateByLine->>AutoNavigateByLine: _is_goal_reached(frame, config)
            AutoNavigateByLine-->>AutoNavigateByLine: break loop
        else line_detected
            AutoNavigateByLine->>MovementController: set_forward(True)
            AutoNavigateByLine->>MovementController: set_sprint(hold_sprint)
            AutoNavigateByLine->>MovementController: apply_steering(heading_error, reliability)
            MovementController->>Controller: post_relative_move or post_key_down/post_key_up
        else line_lost
            AutoNavigateByLine->>MovementController: set_sprint(False)
            AutoNavigateByLine->>MovementController: set_forward(keep_forward?)
            AutoNavigateByLine->>MovementController: search_turn(step)
            MovementController->>Controller: post_relative_move or turn_keys
        end

        AutoNavigateByLine->>AutoNavigateByLine: _handle_custom_stuck(frame, config, line, pointer)

        AutoNavigateByLine->>AutoNavigateByLine: optional _show_debug_window()
    end

    AutoNavigateByLine->>MovementController: stop_all()
    MovementController->>Controller: release keys

    AutoNavigateByLine->>Controller: optional set_mouse_lock_follow(False)
    AutoNavigateByLine-->>AgentServer: RunResult(success=True)
    AgentServer-->>User: navigation finished
Loading

金字塔地图定位顺序图

sequenceDiagram
    actor User
    participant AgentServer
    participant MapLocatorPyramid as PyramidLocator
    participant Context
    participant Controller
    participant SIFT
    participant BFMatcher

    User->>AgentServer: request custom_action map_locator_pyramid
    AgentServer->>PyramidLocator: instantiate and run(context, argv)

    PyramidLocator->>PyramidLocator: resolve default_big_map path
    PyramidLocator->>PyramidLocator: load original_map
    PyramidLocator->>PyramidLocator: build global_map (downscale)

    PyramidLocator->>SIFT: detectAndCompute(global_gray)
    SIFT-->>PyramidLocator: global_keypoints, global_descriptors
    PyramidLocator->>PyramidLocator: load_or_save_global_cache(npz)

    loop preload_chunks
        PyramidLocator->>PyramidLocator: compute chunk bounds
        PyramidLocator->>PyramidLocator: load chunk cache(npz)
        alt cache_missing
            PyramidLocator->>SIFT: detectAndCompute(chunk_gray)
            SIFT-->>PyramidLocator: chunk_keypoints, chunk_descriptors
            PyramidLocator->>PyramidLocator: save chunk cache
        end
        PyramidLocator->>PyramidLocator: store chunk_pts, des_chunk, offsets
    end

    note over PyramidLocator: runtime loop for realtime localization

    loop realtime_localization
        PyramidLocator->>Controller: post_screencap()
        Controller-->>PyramidLocator: frame
        PyramidLocator->>PyramidLocator: crop mini_map_roi
        PyramidLocator->>PyramidLocator: apply circular and HSV masks
        PyramidLocator->>SIFT: detectAndCompute(mini_gray)
        SIFT-->>PyramidLocator: kp_mini, des_mini

        alt no_previous_center
            PyramidLocator->>BFMatcher: knnMatch(des_mini, global_des)
            BFMatcher-->>PyramidLocator: matches
            PyramidLocator->>PyramidLocator: filter by ratio
            PyramidLocator->>PyramidLocator: estimateAffinePartial2D(minimap->global)
            PyramidLocator-->>PyramidLocator: approx_player_point
        end

        PyramidLocator->>PyramidLocator: select nearest chunk and neighbors
        PyramidLocator->>BFMatcher: knnMatch(des_mini, des_chunk)
        BFMatcher-->>PyramidLocator: chunk_matches
        PyramidLocator->>PyramidLocator: filter by ratio
        PyramidLocator->>PyramidLocator: estimateAffinePartial2D(minimap->chunk)
        PyramidLocator-->>PyramidLocator: raw_player_point

        PyramidLocator->>PyramidLocator: validate jump vs last_center
        PyramidLocator-->>PyramidLocator: smoothed player_point or last_center

        PyramidLocator->>PyramidLocator: draw polygon and point on global_map
        PyramidLocator->>PyramidLocator: show debug window via OpenCV

        alt user_presses_q
            PyramidLocator-->>PyramidLocator: break loop
        end
    end

    PyramidLocator->>PyramidLocator: destroyAllWindows()
    PyramidLocator-->>AgentServer: RunResult(success=True)
    AgentServer-->>User: locator finished
Loading

新导航与地图定位 Action 的类图

classDiagram
    class AgentServer
    class Context
    class Controller
    class CustomAction
    class OnnxRuntimeSession
    class MapImage

    class PredictAngle {
        +Path model_path
        +list pointer_roi
        +float threshold
        +dict _session_cache
        +dict _provider_name_map
        +run(context, argv) RunResult
        -_resolve_backend(custom_action_param) str
        -_get_session(backend) tuple
    }

    class PointerDetection {
        +float angle
        +float confidence
    }

    class LineDetection {
        +str color_name
        +tuple target_point
        +float heading_error
        +float angle_min
        +float angle_max
        +float area
        +float score
        +float reliability
        +int edge_points
    }

    class MovementController {
        +Controller controller
        +dict config
        +int move_key
        +int left_key
        +int right_key
        +str steering_mode
        +int~optional~ sprint_key
        +int~optional~ current_turn_key
        +bool move_pressed
        +bool sprint_pressed
        +set_forward(active) void
        +set_sprint(active) void
        +apply_steering(heading_error, reliability) void
        +search_turn(step) void
        +stop_all() void
        -_apply_mouse_steering(heading_error, reliability) void
        -_apply_key_steering(heading_error) void
        -_set_turn_key(key) void
        -_clear_turn_key() void
    }

    class AutoNavigateByLine {
        +Path model_path
        +dict _session_cache
        +dict _provider_name_map
        +dict _debug_state
        +run(context, argv) RunResult
        -_build_config(custom_action_param) dict
        -_parse_param(custom_action_param) dict
        -_deep_update(base, override) void
        -_get_bgr_frame(controller) ndarray
        -_predict_pointer_angle(frame_bgr, config, session, input_name) PointerDetection
        -_detect_navigation_line(frame_bgr, config) LineDetection
        -_build_color_mask(hsv, ranges) ndarray
        -_build_line_ring_mask(roi_shape, ring_cfg) ndarray
        -_intersect_binary_masks(left, right) ndarray
        -_build_mask_ring_overlay(raw_mask, ring_mask, intersection_mask) ndarray
        -_pick_best_line(mask, color_name, anchor, line_cfg) tuple
        -_summarize_line_failure(debug_info) str
        -_is_goal_reached(frame_bgr, config) bool
        -_handle_custom_stuck(frame_bgr, config, line, pointer) void
        -_show_debug_window(frame_bgr, config, pointer, line, smoothed_heading_error) void
        -_show_debug_analysis_window(config, selected_line_debug) void
        -_mask_to_bgr(mask, width, height, label) ndarray
        -_default_nav_line_roi(frame_shape) tuple
        -_resolve_anchor(anchor, roi) tuple
        -_resolve_circle_center(center, width, height) tuple
        -_resolve_roi(roi, frame_shape, allow_default) tuple
        -_resolve_backend(backend) str
        -_get_session(backend) tuple
    }

    class MapLocator {
        +Path abs_path
        +str map_name
        +Path default_big_map
        +run(context, argv) RunResult
    }

    class MapLocatorPyramid {
        +Path abs_path
        +str map_name
        +Path default_big_map
        +run(context, argv) RunResult
    }

    AgentServer <|.. PredictAngle
    AgentServer <|.. AutoNavigateByLine
    AgentServer <|.. MapLocator
    AgentServer <|.. MapLocatorPyramid

    CustomAction <|-- PredictAngle
    CustomAction <|-- AutoNavigateByLine
    CustomAction <|-- MapLocator
    CustomAction <|-- MapLocatorPyramid

    Context --> Controller

    PredictAngle --> OnnxRuntimeSession
    PredictAngle --> Controller

    AutoNavigateByLine --> MovementController
    AutoNavigateByLine --> PointerDetection
    AutoNavigateByLine --> LineDetection
    AutoNavigateByLine --> OnnxRuntimeSession
    AutoNavigateByLine --> Controller

    MapLocator --> MapImage
    MapLocatorPyramid --> MapImage

    MapLocatorPyramid --> "many" MapImage : chunked_view
Loading

文件级变更

Change Details Files
引入一个基于 YOLO-Pose 的指针角度预测器,作为可复用的自定义 Action,并将其接入 agent 的 action 导出和流水线。
  • 注册新的 PredictAngle CustomAction,使用 onnxruntime 加载 pointer_model.onnx,并在固定 ROI 中推理出三个关键点。
  • 使用 atan2 从预测的指针尖端和尾部关键点计算指针朝向角度,带置信度阈值,并提供实时 OpenCV 预览窗口。
  • 支持多种 ONNX 后端(CPU/CUDA/DirectML),可自动选择,并允许通过环境变量或 custom_action_param 简单覆盖后端。
  • 从自定义 Action 包中导出 PredictAngle,以便在流水线和脚本中使用。
agent/custom/action/predict_angle.py
agent/custom/action/__init__.py
assets/resource/base/pipeline/AnglePredictor.json
添加一个基于金字塔的大地图定位器:先进行粗略的全局 SIFT 匹配,再在预计算的高分辨率分块中做精细匹配,并使用磁盘特征缓存。
  • 加载全分辨率世界地图,构建下采样的全局视图并计算 SIFT 特征,将其缓存到带元数据校验的 .npz 文件中。
  • 预先将原始地图切分为重叠的分块,为每个分块预计算并缓存 SIFT 关键点/描述子到专用缓存目录。
  • 运行时,提取并掩模圆形小地图 ROI,计算 SIFT 特征,先做全局粗匹配以估计玩家的大致坐标,再通过对最近的优选分块进行 RANSAC 仿射估计来细化位置。
  • 跟踪并平滑玩家坐标,利用依赖内点数和匹配数的阈值拒绝大跳变,并在缩放的调试地图上可视化玩家位置和分块索引,同时显示小地图视图。
  • 将 MapLocatorPyramid 注册为自定义 Action 并导出,供脚本和流水线使用。
agent/custom/action/map_locator_pyramid.py
agent/custom/action/__init__.py
assets/resource/base/pipeline/MapLocator.json
引入一个更简单的单尺度 SIFT 地图定位器用于对比/测试,并将其集成到 Action 导出中。
  • 在(可选下采样的)整幅地图图像上计算 SIFT 特征并做对比度增强,将描述子和关键点缓存到磁盘。
  • 每帧裁剪并预处理小地图 ROI,将其 SIFT 特征与全局地图匹配,运行 RANSAC 估计仿射变换,并从小地图中心推导玩家坐标。
  • 应用与金字塔定位器类似的跳变过滤和时间平滑,并在 OpenCV 调试窗口中渲染估计玩家位置及匹配统计信息。
  • 将 MapLocator 注册为自定义 Action 并导出,以便脚本使用。
agent/custom/action/map_locator.py
agent/custom/action/__init__.py
实现一个按线自动导航控制器,使用角度预测器和基于颜色的线检测,通过键盘或鼠标控制玩家移动。
  • 定义配置驱动的 MovementController,用于处理前进/冲刺移动,并通过 A/D 按键或相对鼠标移动进行转向,支持死区与增益设置。
  • 新增高层的 AutoNavigateByLine 自定义 Action,负责构建配置(支持 JSON 覆盖)、解析 ONNX 后端,并在循环中按帧估计指针角度和导航线状态。
  • 通过在可配置 ROI 上调用共享 ONNX 模型完成指针角度推理,将三个关键点预测转换为世界朝向角度,并存储调试状态。
  • 在可配置 ROI 内使用 HSV 颜色范围、可选环形掩模、轮廓分析和打分启发式检测导航线;相对于锚点计算航向误差,并基于面积估计可靠性。
  • 通过基于颜色的 ROI 检查实现目标检测;在丢线时处理“保持前进”持续时间与扫描行为;支持用户自定义卡死检测 Hook;为指针和线掩模提供丰富的 OpenCV 调试可视化。
  • 导出 AutoNavigateByLine 并将其接入 Action 包导出列表。
agent/custom/action/auto_navigate_by_line.py
agent/custom/action/__init__.py
调整项目接线和元数据以支持新的导航功能。
  • 更新 agent/custom/action/__init__.py,导入并在 __all__ 中暴露 PredictAngle、AutoNavigateByLine、MapLocator 和 MapLocatorPyramid。
  • assets/resource/base/pipeline/ 下添加新的角度预测和地图定位流水线 JSON 描述文件。
  • 保持 requirements.txt 实质内容不变,仅确保包含 opencv-python 并统一换行格式。
  • 按需更新辅助仓库配置文件(.gitignore.vscode/settings.jsonagent/custom/__init__.pyassets/interface.json)以支持新的 Action/流水线。
agent/custom/action/__init__.py
requirements.txt
.gitignore
.vscode/settings.json
agent/custom/__init__.py
assets/interface.json
assets/resource/base/pipeline/AnglePredictor.json
assets/resource/base/pipeline/MapLocator.json

Tips and commands

Interacting with Sourcery

  • 触发新评审: 在 Pull Request 中评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的评审评论。
  • 从评审评论生成 GitHub Issue: 在某条评审评论下回复,请求 Sourcery 从该评论创建一个 Issue。你也可以直接在评论中回复 @sourcery-ai issue 来从该评论生成 Issue。
  • 生成 Pull Request 标题: 在 Pull Request 标题中任意位置写上 @sourcery-ai,即可随时生成标题。也可以在 Pull Request 中评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 Pull Request 摘要: 在 Pull Request 正文任意位置写上 @sourcery-ai summary,即可在对应位置生成 PR 摘要。你也可以评论 @sourcery-ai summary 在 Pull Request 中(重新)生成摘要。
  • 生成 Reviewer's Guide: 在 Pull Request 中评论 @sourcery-ai guide,可随时(重新)生成 reviewer 的指南。
  • 一次性解决所有 Sourcery 评论: 在 Pull Request 中评论 @sourcery-ai resolve 来将所有 Sourcery 评论标记为已解决。如果你已经处理完所有评论且不想再看到它们,这会很有用。
  • 清除所有 Sourcery 评审: 在 Pull Request 中评论 @sourcery-ai dismiss 来清除所有现有的 Sourcery 评审。特别适合想从头开始新的评审——别忘了评论 @sourcery-ai review 以触发新评审!

Customizing Your Experience

打开你的 dashboard 以:

  • 启用或禁用评审功能,例如 Sourcery 自动生成的 Pull Request 摘要、reviewer's guide 等。
  • 更改评审语言。
  • 添加、删除或编辑自定义评审说明。
  • 调整其他评审设置。

Getting Help

Original review guide in English

Reviewer's Guide

Adds an AI-powered navigation stack: a YOLO-Pose based angle predictor, an auto-navigation-by-line controller using that predictor plus color-based line following, and a two-level SIFT-based map locator with pyramid chunking and feature caching for large maps, wiring them into the custom action framework and pipelines.

Sequence diagram for auto navigation by line with AI angle prediction

sequenceDiagram
    actor User
    participant AgentServer
    participant AutoNavigateByLine
    participant Context
    participant Controller
    participant OnnxRuntimeSession as OnnxSession
    participant MovementController as Mover

    User->>AgentServer: request custom_action auto_navigate_by_line
    AgentServer->>AutoNavigateByLine: instantiate and run(context, argv)

    AutoNavigateByLine->>AutoNavigateByLine: _build_config(custom_action_param)
    AutoNavigateByLine->>Context: get tasker.controller
    Context-->>AutoNavigateByLine: Controller

    AutoNavigateByLine->>AutoNavigateByLine: _resolve_backend(config.backend)
    AutoNavigateByLine->>AutoNavigateByLine: _get_session(backend)
    AutoNavigateByLine-->>OnnxSession: create or reuse session

    AutoNavigateByLine->>Controller: optional set_mouse_lock_follow(True)

    AutoNavigateByLine->>MovementController: create Mover(controller, config)

    loop navigation_loop
        AutoNavigateByLine->>Controller: post_screencap()
        Controller-->>AutoNavigateByLine: frame

        AutoNavigateByLine->>AutoNavigateByLine: _get_bgr_frame(controller)

        AutoNavigateByLine->>AutoNavigateByLine: _predict_pointer_angle(frame, config, session, input_name)
        AutoNavigateByLine->>OnnxSession: run(input)
        OnnxSession-->>AutoNavigateByLine: keypoints, scores
        AutoNavigateByLine-->>AutoNavigateByLine: compute pointer angle

        AutoNavigateByLine->>AutoNavigateByLine: _detect_navigation_line(frame, config)
        AutoNavigateByLine-->>AutoNavigateByLine: select best LineDetection

        alt goal_reached
            AutoNavigateByLine->>AutoNavigateByLine: _is_goal_reached(frame, config)
            AutoNavigateByLine-->>AutoNavigateByLine: break loop
        else line_detected
            AutoNavigateByLine->>MovementController: set_forward(True)
            AutoNavigateByLine->>MovementController: set_sprint(hold_sprint)
            AutoNavigateByLine->>MovementController: apply_steering(heading_error, reliability)
            MovementController->>Controller: post_relative_move or post_key_down/post_key_up
        else line_lost
            AutoNavigateByLine->>MovementController: set_sprint(False)
            AutoNavigateByLine->>MovementController: set_forward(keep_forward?)
            AutoNavigateByLine->>MovementController: search_turn(step)
            MovementController->>Controller: post_relative_move or turn_keys
        end

        AutoNavigateByLine->>AutoNavigateByLine: _handle_custom_stuck(frame, config, line, pointer)

        AutoNavigateByLine->>AutoNavigateByLine: optional _show_debug_window()
    end

    AutoNavigateByLine->>MovementController: stop_all()
    MovementController->>Controller: release keys

    AutoNavigateByLine->>Controller: optional set_mouse_lock_follow(False)
    AutoNavigateByLine-->>AgentServer: RunResult(success=True)
    AgentServer-->>User: navigation finished
Loading

Sequence diagram for pyramid map localization

sequenceDiagram
    actor User
    participant AgentServer
    participant MapLocatorPyramid as PyramidLocator
    participant Context
    participant Controller
    participant SIFT
    participant BFMatcher

    User->>AgentServer: request custom_action map_locator_pyramid
    AgentServer->>PyramidLocator: instantiate and run(context, argv)

    PyramidLocator->>PyramidLocator: resolve default_big_map path
    PyramidLocator->>PyramidLocator: load original_map
    PyramidLocator->>PyramidLocator: build global_map (downscale)

    PyramidLocator->>SIFT: detectAndCompute(global_gray)
    SIFT-->>PyramidLocator: global_keypoints, global_descriptors
    PyramidLocator->>PyramidLocator: load_or_save_global_cache(npz)

    loop preload_chunks
        PyramidLocator->>PyramidLocator: compute chunk bounds
        PyramidLocator->>PyramidLocator: load chunk cache(npz)
        alt cache_missing
            PyramidLocator->>SIFT: detectAndCompute(chunk_gray)
            SIFT-->>PyramidLocator: chunk_keypoints, chunk_descriptors
            PyramidLocator->>PyramidLocator: save chunk cache
        end
        PyramidLocator->>PyramidLocator: store chunk_pts, des_chunk, offsets
    end

    note over PyramidLocator: runtime loop for realtime localization

    loop realtime_localization
        PyramidLocator->>Controller: post_screencap()
        Controller-->>PyramidLocator: frame
        PyramidLocator->>PyramidLocator: crop mini_map_roi
        PyramidLocator->>PyramidLocator: apply circular and HSV masks
        PyramidLocator->>SIFT: detectAndCompute(mini_gray)
        SIFT-->>PyramidLocator: kp_mini, des_mini

        alt no_previous_center
            PyramidLocator->>BFMatcher: knnMatch(des_mini, global_des)
            BFMatcher-->>PyramidLocator: matches
            PyramidLocator->>PyramidLocator: filter by ratio
            PyramidLocator->>PyramidLocator: estimateAffinePartial2D(minimap->global)
            PyramidLocator-->>PyramidLocator: approx_player_point
        end

        PyramidLocator->>PyramidLocator: select nearest chunk and neighbors
        PyramidLocator->>BFMatcher: knnMatch(des_mini, des_chunk)
        BFMatcher-->>PyramidLocator: chunk_matches
        PyramidLocator->>PyramidLocator: filter by ratio
        PyramidLocator->>PyramidLocator: estimateAffinePartial2D(minimap->chunk)
        PyramidLocator-->>PyramidLocator: raw_player_point

        PyramidLocator->>PyramidLocator: validate jump vs last_center
        PyramidLocator-->>PyramidLocator: smoothed player_point or last_center

        PyramidLocator->>PyramidLocator: draw polygon and point on global_map
        PyramidLocator->>PyramidLocator: show debug window via OpenCV

        alt user_presses_q
            PyramidLocator-->>PyramidLocator: break loop
        end
    end

    PyramidLocator->>PyramidLocator: destroyAllWindows()
    PyramidLocator-->>AgentServer: RunResult(success=True)
    AgentServer-->>User: locator finished
Loading

Class diagram for new navigation and map localization actions

classDiagram
    class AgentServer
    class Context
    class Controller
    class CustomAction
    class OnnxRuntimeSession
    class MapImage

    class PredictAngle {
        +Path model_path
        +list pointer_roi
        +float threshold
        +dict _session_cache
        +dict _provider_name_map
        +run(context, argv) RunResult
        -_resolve_backend(custom_action_param) str
        -_get_session(backend) tuple
    }

    class PointerDetection {
        +float angle
        +float confidence
    }

    class LineDetection {
        +str color_name
        +tuple target_point
        +float heading_error
        +float angle_min
        +float angle_max
        +float area
        +float score
        +float reliability
        +int edge_points
    }

    class MovementController {
        +Controller controller
        +dict config
        +int move_key
        +int left_key
        +int right_key
        +str steering_mode
        +int~optional~ sprint_key
        +int~optional~ current_turn_key
        +bool move_pressed
        +bool sprint_pressed
        +set_forward(active) void
        +set_sprint(active) void
        +apply_steering(heading_error, reliability) void
        +search_turn(step) void
        +stop_all() void
        -_apply_mouse_steering(heading_error, reliability) void
        -_apply_key_steering(heading_error) void
        -_set_turn_key(key) void
        -_clear_turn_key() void
    }

    class AutoNavigateByLine {
        +Path model_path
        +dict _session_cache
        +dict _provider_name_map
        +dict _debug_state
        +run(context, argv) RunResult
        -_build_config(custom_action_param) dict
        -_parse_param(custom_action_param) dict
        -_deep_update(base, override) void
        -_get_bgr_frame(controller) ndarray
        -_predict_pointer_angle(frame_bgr, config, session, input_name) PointerDetection
        -_detect_navigation_line(frame_bgr, config) LineDetection
        -_build_color_mask(hsv, ranges) ndarray
        -_build_line_ring_mask(roi_shape, ring_cfg) ndarray
        -_intersect_binary_masks(left, right) ndarray
        -_build_mask_ring_overlay(raw_mask, ring_mask, intersection_mask) ndarray
        -_pick_best_line(mask, color_name, anchor, line_cfg) tuple
        -_summarize_line_failure(debug_info) str
        -_is_goal_reached(frame_bgr, config) bool
        -_handle_custom_stuck(frame_bgr, config, line, pointer) void
        -_show_debug_window(frame_bgr, config, pointer, line, smoothed_heading_error) void
        -_show_debug_analysis_window(config, selected_line_debug) void
        -_mask_to_bgr(mask, width, height, label) ndarray
        -_default_nav_line_roi(frame_shape) tuple
        -_resolve_anchor(anchor, roi) tuple
        -_resolve_circle_center(center, width, height) tuple
        -_resolve_roi(roi, frame_shape, allow_default) tuple
        -_resolve_backend(backend) str
        -_get_session(backend) tuple
    }

    class MapLocator {
        +Path abs_path
        +str map_name
        +Path default_big_map
        +run(context, argv) RunResult
    }

    class MapLocatorPyramid {
        +Path abs_path
        +str map_name
        +Path default_big_map
        +run(context, argv) RunResult
    }

    AgentServer <|.. PredictAngle
    AgentServer <|.. AutoNavigateByLine
    AgentServer <|.. MapLocator
    AgentServer <|.. MapLocatorPyramid

    CustomAction <|-- PredictAngle
    CustomAction <|-- AutoNavigateByLine
    CustomAction <|-- MapLocator
    CustomAction <|-- MapLocatorPyramid

    Context --> Controller

    PredictAngle --> OnnxRuntimeSession
    PredictAngle --> Controller

    AutoNavigateByLine --> MovementController
    AutoNavigateByLine --> PointerDetection
    AutoNavigateByLine --> LineDetection
    AutoNavigateByLine --> OnnxRuntimeSession
    AutoNavigateByLine --> Controller

    MapLocator --> MapImage
    MapLocatorPyramid --> MapImage

    MapLocatorPyramid --> "many" MapImage : chunked_view
Loading

File-Level Changes

Change Details Files
Introduce a YOLO-Pose based pointer angle predictor as a reusable custom action and wire it into the agent action exports and pipelines.
  • Register a new PredictAngle CustomAction using onnxruntime to load pointer_model.onnx and infer three keypoints from a fixed ROI.
  • Compute pointer heading angle from predicted tip and tail keypoints using atan2, with confidence thresholding and a live OpenCV preview window.
  • Support multiple ONNX backends (CPU/CUDA/DirectML) with auto-selection and simple backend override via environment variable or custom_action_param.
  • Export PredictAngle from the custom action package for use in pipelines and scripts.
agent/custom/action/predict_angle.py
agent/custom/action/__init__.py
assets/resource/base/pipeline/AnglePredictor.json
Add a pyramid-based large-map locator that does coarse global SIFT matching then refined matching inside precomputed high-res chunks, with on-disk feature caching.
  • Load the full-resolution world map, build a downscaled global view and compute SIFT features, caching them to an .npz file with metadata validation.
  • Pre-split the original map into overlapping chunks, precompute and cache SIFT keypoints/descriptors per chunk in a dedicated cache directory.
  • At runtime, extract and mask the circular minimap ROI, compute SIFT features, perform global coarse matching to estimate approximate player coordinates, then refine using the best nearby chunk via RANSAC affine estimation.
  • Track and smooth player coordinates, reject large jumps using inlier and match-count dependent thresholds, and visualize position and chunk index on a rescaled debug map plus minimap view.
  • Register MapLocatorPyramid as a custom action and export it for use by scripts and pipelines.
agent/custom/action/map_locator_pyramid.py
agent/custom/action/__init__.py
assets/resource/base/pipeline/MapLocator.json
Introduce a simpler single-scale SIFT-based map locator for comparison/testing and integrate it into the action exports.
  • Compute SIFT features on a (optionally downscaled) full map image with contrast enhancement, caching descriptors and keypoints to disk.
  • Per frame, crop and pre-process the minimap ROI, match SIFT features to the global map, run RANSAC to estimate an affine transform, and derive the player coordinate from the minimap center.
  • Apply jump filtering and temporal smoothing similar to the pyramid locator and render the estimated player location and match stats in an OpenCV debug window.
  • Register MapLocator as a custom action and export it for use in scripts.
agent/custom/action/map_locator.py
agent/custom/action/__init__.py
Implement an auto-navigation-by-line controller that uses the angle predictor and color-based line detection to steer the player via keyboard or mouse.
  • Define configuration-driven MovementController to handle forward/sprint movement and steering via A/D key presses or relative mouse movement with deadzone and gain settings.
  • Add a high-level AutoNavigateByLine custom action that builds its config (with JSON overrides), resolves ONNX backend, and loops over frames to estimate pointer angle and navigation line state.
  • Implement pointer angle inference by calling the shared ONNX model over a configurable ROI and converting three-keypoint predictions into a world heading angle with stored debug state.
  • Detect navigation lines within a configurable ROI using HSV color ranges, an optional ring mask, contour analysis, and a scoring heuristic; compute heading error relative to an anchor and reliability from area.
  • Handle goal detection via color-based ROI checks, line-loss behavior (keep-forward duration, scanning), optional user hooks for stuck detection, and rich OpenCV debug visualizations for both pointer and line masks.
  • Export AutoNavigateByLine and wire it into the action package exports.
agent/custom/action/auto_navigate_by_line.py
agent/custom/action/__init__.py
Adjust project wiring and metadata to support the new navigation features.
  • Update agent/custom/action/init.py to import and expose PredictAngle, AutoNavigateByLine, MapLocator, and MapLocatorPyramid in all.
  • Add new pipeline JSON descriptors for angle prediction and map location under assets/resource/base/pipeline/.
  • Keep requirements.txt effectively unchanged aside from ensuring opencv-python is present and newline formatting is normalized.
  • Update ancillary repo config files (.gitignore, .vscode/settings.json, agent/custom/init.py, assets/interface.json) as needed for the new actions/pipelines.
agent/custom/action/__init__.py
requirements.txt
.gitignore
.vscode/settings.json
agent/custom/__init__.py
assets/interface.json
assets/resource/base/pipeline/AnglePredictor.json
assets/resource/base/pipeline/MapLocator.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了两个问题,并留下了一些总体反馈:

  • PredictAngleAutoNavigateByLine 之间在 ONNX 后端/Session 管理逻辑上有相当多的重复(后端解析、provider 映射、Session 缓存);建议把这些提取到一个共享工具中,以保持行为一致,并让之后修改后端处理逻辑时更容易。
  • 两种地图定位实现(MapLocatorMapLocatorPyramid)在 SIFT 特征处理、缓存以及 RANSAC 匹配逻辑上有大量共用部分;可以考虑把公共部分抽取为可复用的 helper(比如基础定位器或特征缓存模块),以降低复杂度并减少两种模式之间产生细微差异的风险。
  • 很多关键参数(ROI、HSV 阈值、半径、平滑因子)目前是直接在动作中硬编码的;如果更多地通过 custom_action_param 或集中配置来传入这些参数,会让导航和地图定位行为更容易针对不同分辨率或游戏进行调优,而无需改动代码。
给 AI 代理的提示
Please address the comments from this code review:

## Overall Comments
- `PredictAngle``AutoNavigateByLine` 之间在 ONNX 后端/Session 管理逻辑上有相当多的重复(后端解析、provider 映射、Session 缓存);建议把这些提取到一个共享工具中,以保持行为一致,并让之后修改后端处理逻辑时更容易。
- 两种地图定位实现(`MapLocator``MapLocatorPyramid`)在 SIFT 特征处理、缓存以及 RANSAC 匹配逻辑上有大量共用部分;可以考虑把公共部分抽取为可复用的 helper(比如基础定位器或特征缓存模块),以降低复杂度并减少两种模式之间产生细微差异的风险。
- 很多关键参数(ROI、HSV 阈值、半径、平滑因子)目前是直接在动作中硬编码的;如果更多地通过 `custom_action_param` 或集中配置来传入这些参数,会让导航和地图定位行为更容易针对不同分辨率或游戏进行调优,而无需改动代码。

## Individual Comments

### Comment 1
<location path="agent/custom/action/auto_navigate_by_line.py" line_range="837-844" />
<code_context>
+            return None
+        return min(cx, width - 1), min(cy, height - 1)
+
+    def _resolve_roi(
+        self,
+        roi: Optional[List[int]],
+        frame_shape: Tuple[int, ...],
+        allow_default: bool = False,
+    ) -> Optional[Tuple[int, int, int, int]]:
+        if not isinstance(roi, (list, tuple)) or len(roi) != 4:
+            return None if allow_default else None
+
+        x, y, w, h = [int(v) for v in roi]
</code_context>
<issue_to_address>
**issue (bug_risk):** `_resolve_roi` 中的 `allow_default` 标志目前没有任何效果,当 ROI 无效时,该方法始终返回 `None`。

按当前写法,`allow_default` 实际上是个空操作,因为对无效/缺失 ROI 的两条分支都会返回 `None````python
if not isinstance(roi, (list, tuple)) or len(roi) != 4:
    return None if allow_default else None

if min(w, h) <= 0 or x < 0 or y < 0:
    return None if allow_default else None
```
这会掩盖配置问题,同时让这个标志具有误导性。建议要么为 `allow_default` 提供与众不同的行为(例如返回特定的哨兵值),要么移除该标志并在 ROI 无效时始终返回 `None`,以保持 API 的语义清晰。
</issue_to_address>

### Comment 2
<location path="agent/custom/action/map_locator_pyramid.py" line_range="109" />
<code_context>
+                except Exception as e:
+                    print(f"全局特征缓存保存失败: {e}")
+
+        print(f"全局视图特征点数: {len(global_points)}")
+
+        # 3. 初始化分块缓存信息
</code_context>
<issue_to_address>
**issue:** 如果全局特征提取失败并且 `global_points` 保持为 `None`,可能会导致崩溃。

当 SIFT 无法生成描述符时(例如输入纹理非常少或输入无效),`global_des`/`global_points` 可能会保持为 `None`,此时调用 `len(global_points)` 会抛出 `TypeError`。建议同时防护 `None` 和空列表这两种情况:

```python
if global_points is None or global_des is None or len(global_points) == 0:
    print("全局视图特征点不足,无法进行定位")
    return CustomAction.RunResult(success=False)

print(f"全局视图特征点数: {len(global_points)}")
```

这样可以显式处理全局特征不足的情况,并避免在边缘场景下发生运行时崩溃。
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请给每条评论点 👍 或 👎,我会根据你的反馈改进后续的评审。
Original comment in English

Hey - I've found 2 issues, and left some high level feedback:

  • There is quite a bit of duplicated ONNX backend/session management logic between PredictAngle and AutoNavigateByLine (backend resolution, provider maps, session cache); consider extracting this into a shared utility to keep the behavior consistent and make future changes to backend handling easier.
  • The two map locator implementations (MapLocator and MapLocatorPyramid) share substantial SIFT feature, caching, and RANSAC matching logic; factoring the common pieces into reusable helpers (e.g., a base locator or feature-cache module) would reduce complexity and the chance of subtle divergence between the two modes.
  • Many key parameters (ROIs, HSV thresholds, radii, smoothing factors) are currently hard-coded in the actions; wiring more of these through custom_action_param or a central config would make the navigation and map-location behavior easier to tune for different resolutions or games without code changes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There is quite a bit of duplicated ONNX backend/session management logic between `PredictAngle` and `AutoNavigateByLine` (backend resolution, provider maps, session cache); consider extracting this into a shared utility to keep the behavior consistent and make future changes to backend handling easier.
- The two map locator implementations (`MapLocator` and `MapLocatorPyramid`) share substantial SIFT feature, caching, and RANSAC matching logic; factoring the common pieces into reusable helpers (e.g., a base locator or feature-cache module) would reduce complexity and the chance of subtle divergence between the two modes.
- Many key parameters (ROIs, HSV thresholds, radii, smoothing factors) are currently hard-coded in the actions; wiring more of these through `custom_action_param` or a central config would make the navigation and map-location behavior easier to tune for different resolutions or games without code changes.

## Individual Comments

### Comment 1
<location path="agent/custom/action/auto_navigate_by_line.py" line_range="837-844" />
<code_context>
+            return None
+        return min(cx, width - 1), min(cy, height - 1)
+
+    def _resolve_roi(
+        self,
+        roi: Optional[List[int]],
+        frame_shape: Tuple[int, ...],
+        allow_default: bool = False,
+    ) -> Optional[Tuple[int, int, int, int]]:
+        if not isinstance(roi, (list, tuple)) or len(roi) != 4:
+            return None if allow_default else None
+
+        x, y, w, h = [int(v) for v in roi]
</code_context>
<issue_to_address>
**issue (bug_risk):** The `allow_default` flag in `_resolve_roi` currently has no effect and the method always returns `None` when the ROI is invalid.

As written, `allow_default` is a no-op because both branches return `None` for invalid/missing ROIs:

```python
if not isinstance(roi, (list, tuple)) or len(roi) != 4:
    return None if allow_default else None

if min(w, h) <= 0 or x < 0 or y < 0:
    return None if allow_default else None
```
This can hide configuration issues and makes the flag misleading. Either give `allow_default` a distinct behavior (e.g., return a specific sentinel value) or remove the flag and always `return None` for invalid ROIs to keep the API clear.
</issue_to_address>

### Comment 2
<location path="agent/custom/action/map_locator_pyramid.py" line_range="109" />
<code_context>
+                except Exception as e:
+                    print(f"全局特征缓存保存失败: {e}")
+
+        print(f"全局视图特征点数: {len(global_points)}")
+
+        # 3. 初始化分块缓存信息
</code_context>
<issue_to_address>
**issue:** Potential crash if global feature extraction fails and `global_points` stays `None`.

If SIFT fails to produce descriptors (e.g. very low-texture or invalid input), `global_des`/`global_points` can remain `None`, so `len(global_points)` will raise a `TypeError`. Consider guarding both `None` and empty cases:

```python
if global_points is None or global_des is None or len(global_points) == 0:
    print("全局视图特征点不足,无法进行定位")
    return CustomAction.RunResult(success=False)

print(f"全局视图特征点数: {len(global_points)}")
```

This makes the handling of insufficient global features explicit and prevents a runtime crash in edge cases.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +837 to +844
def _resolve_roi(
self,
roi: Optional[List[int]],
frame_shape: Tuple[int, ...],
allow_default: bool = False,
) -> Optional[Tuple[int, int, int, int]]:
if not isinstance(roi, (list, tuple)) or len(roi) != 4:
return None if allow_default else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): _resolve_roi 中的 allow_default 标志目前没有任何效果,当 ROI 无效时,该方法始终返回 None

按当前写法,allow_default 实际上是个空操作,因为对无效/缺失 ROI 的两条分支都会返回 None

if not isinstance(roi, (list, tuple)) or len(roi) != 4:
    return None if allow_default else None

if min(w, h) <= 0 or x < 0 or y < 0:
    return None if allow_default else None

这会掩盖配置问题,同时让这个标志具有误导性。建议要么为 allow_default 提供与众不同的行为(例如返回特定的哨兵值),要么移除该标志并在 ROI 无效时始终返回 None,以保持 API 的语义清晰。

Original comment in English

issue (bug_risk): The allow_default flag in _resolve_roi currently has no effect and the method always returns None when the ROI is invalid.

As written, allow_default is a no-op because both branches return None for invalid/missing ROIs:

if not isinstance(roi, (list, tuple)) or len(roi) != 4:
    return None if allow_default else None

if min(w, h) <= 0 or x < 0 or y < 0:
    return None if allow_default else None

This can hide configuration issues and makes the flag misleading. Either give allow_default a distinct behavior (e.g., return a specific sentinel value) or remove the flag and always return None for invalid ROIs to keep the API clear.

except Exception as e:
print(f"全局特征缓存保存失败: {e}")

print(f"全局视图特征点数: {len(global_points)}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: 如果全局特征提取失败并且 global_points 保持为 None,可能会导致崩溃。

当 SIFT 无法生成描述符时(例如输入纹理非常少或输入无效),global_des/global_points 可能会保持为 None,此时调用 len(global_points) 会抛出 TypeError。建议同时防护 None 和空列表这两种情况:

if global_points is None or global_des is None or len(global_points) == 0:
    print("全局视图特征点不足,无法进行定位")
    return CustomAction.RunResult(success=False)

print(f"全局视图特征点数: {len(global_points)}")

这样可以显式处理全局特征不足的情况,并避免在边缘场景下发生运行时崩溃。

Original comment in English

issue: Potential crash if global feature extraction fails and global_points stays None.

If SIFT fails to produce descriptors (e.g. very low-texture or invalid input), global_des/global_points can remain None, so len(global_points) will raise a TypeError. Consider guarding both None and empty cases:

if global_points is None or global_des is None or len(global_points) == 0:
    print("全局视图特征点不足,无法进行定位")
    return CustomAction.RunResult(success=False)

print(f"全局视图特征点数: {len(global_points)}")

This makes the handling of insufficient global features explicit and prevents a runtime crash in edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant