Skip to content

Morello: disable VMAP_STACK#24

Open
heshamelmatary wants to merge 1 commit into
codasip-cheri-riscv-7.0from
vmap_stack
Open

Morello: disable VMAP_STACK#24
heshamelmatary wants to merge 1 commit into
codasip-cheri-riscv-7.0from
vmap_stack

Conversation

@heshamelmatary
Copy link
Copy Markdown
Collaborator

VMAP_STACK checks for kernel stack overflows on kernel entry and reports a fault if the checks fail. This was originally disabled when the Morello port was developed. Since then, commit ef6861b (see [1]) enabled VMAP_STACK by default on arm64.

Unfortunately, the VMAP_STACK entry code clobbers [c]sp on entry, causing the stack pointer capability to lose its tag. Since there are not enough scratch registers available at that stage of execution, this commit explicitly disables VMAP_STACK again for Morello.

[1] https://lore.kernel.org/all/20250707-arm64_vmap-v1-0-8de98ca0f91c@debian.org/

VMAP_STACK checks for kernel stack overflows on kernel entry
and reports a fault if the checks fail. This was originally
disabled when the Morello port was developed. Since then,
commit ef6861b (see [1]) enabled VMAP_STACK by default on
arm64.

Unfortunately, the VMAP_STACK entry code clobbers [c]sp
on entry, causing the stack pointer capability to lose its
tag. Since there are not enough scratch registers available
at that stage of execution, this commit explicitly disables
VMAP_STACK again for Morello.

[1] https://lore.kernel.org/all/20250707-arm64_vmap-v1-0-8de98ca0f91c@debian.org/

Signed-off-by: Hesham Almatary <hesham.almatary@cl.cam.ac.uk>
@kevin-brodsky-arm
Copy link
Copy Markdown
Collaborator

I don't think this is sufficient. The patch that enabled VMAP_STACK conditionally is actually part of a series of 8 patches, quite a few things behaved differently without VMAP_STACK: https://lore.kernel.org/all/20250707-arm64_vmap-v1-0-8de98ca0f91c@debian.org/

@kevin-brodsky-arm
Copy link
Copy Markdown
Collaborator

Reverting the whole series isn't practical IMHO, things are likely to go wrong when rebasing. I think there are two options:

  1. The minimal hack that I had in mind was to simply not do the stack overflow check at the beginning of kernel_ventry. This should make no difference under normal conditions. If an overflow does happen, then rather than a nice panic it will go horribly wrong and most likely get stuck in a recursive exception. Not great, but definitely good enough to experiment.
  2. I think there is hope that we can stash CSP somewhere else to keep doing this stack overflow check. Most likely candidate is CTPIDR_EL1 (or CTPIDR_EL2, see set_my_cpu_offset()). It is used by the kernel but its value never changes, so I'm pretty sure we could overwrite it on exception entry and then restore it after the stack overflow check.

@heshamelmatary
Copy link
Copy Markdown
Collaborator Author

Reverting the whole series isn't practical IMHO, things are likely to go wrong when rebasing. I think there are two options:

  1. The minimal hack that I had in mind was to simply not do the stack overflow check at the beginning of kernel_ventry.

This wasn't enough unfortunately. There was another bad stack check during an exit system call and a task clean-up in the kernel's exit_task_stack_account that was triggering another fault/panic if CONFIG_VMAP_STACK is enabled.

This should make no difference under normal conditions. If an overflow does happen, then rather than a nice panic it will go horribly wrong and most likely get stuck in a recursive exception. Not great, but definitely good enough to experiment.
2. I think there is hope that we can stash CSP somewhere else to keep doing this stack overflow check. Most likely candidate is CTPIDR_EL1 (or CTPIDR_EL2, see set_my_cpu_offset()). It is used by the kernel but its value never changes, so I'm pretty sure we could overwrite it on exception entry and then restore it after the stack overflow check.

I can have a look at implementing that later. I'd need to prioritise the purecap Morello kernel though as hybrid will most likely be obsolete. I was just trying to do the bare minimum for hybrid to get a shell.

@kevin-brodsky-arm
Copy link
Copy Markdown
Collaborator

Ah right then you'd need to disable all overflow checks. Unfortunately it looks like this is going to be quite a bit of work to maintain no matter what you do.

I don't think there should be much difference between hybrid and purecap kernel if you were to amend the checks to use e.g. CTPIDR_EL1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants