Skip to content

Update AIE4 app health interface for 0.0.20 FW#1140

Open
NishadSaraf wants to merge 1 commit intoamd:mainfrom
NishadSaraf:20
Open

Update AIE4 app health interface for 0.0.20 FW#1140
NishadSaraf wants to merge 1 commit intoamd:mainfrom
NishadSaraf:20

Conversation

@NishadSaraf
Copy link
Member

Sync the AIE4 FW message interface to match 0.0.20 firmware:

  • Pack ctx_status and num_uc as 16-bit bitfields in aie4_msg_app_health_report and add runlist_read_idx field
  • Use runlist_read_idx from cached health report in job_timeout() to identify the failing subcmd in a chained command
  • Log runlist_read_idx in async context error health report output
  • Make context switch hysteresis a module parameter (aie4_ctx_hysteresis_us) defaulting to 1000us
  • Update shim test 73 to submit a runlist with timeout in the second subcmd to exercise the runlist_read_idx path

@NishadSaraf
Copy link
Member Author

retest this please

maxzhen
maxzhen previously approved these changes Mar 3, 2026
Sync the AIE4 FW message interface to match 0.0.20 firmware:

- Pack ctx_status and num_uc as 16-bit bitfields in
  aie4_msg_app_health_report and add runlist_read_idx field
- Use runlist_read_idx from cached health report in job_timeout()
  to identify the failing subcmd in a chained command
- Log runlist_read_idx in async context error health report output
- Make context switch hysteresis a module parameter
  (aie4_ctx_hysteresis_us) defaulting to 1000us
- Update shim test 73 to submit a runlist with timeout in the
  second subcmd to exercise the runlist_read_idx path

Signed-off-by: Nishad Saraf <nishads@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants