Needed to fix #1753 (may not be sufficient if there are other unrelated issues).
Range-extension thunks are mostly needed when producing large binaries for RISC architectures e.g. aarch64.
Where a branch instruction has limited range, e.g. +/- 128MiB, but the target of the branch is beyond that, we need to insert a thunk somewhere within the range that branches to the actual location.
The tricky part is that in order to know whether a thunk is needed, we need to know addresses. In order to figure out addresses, we need to know sizes, but the sizes of the thunks depends on how many we need. So there's basically a cycle in the state dependencies.
One option (call it A) would be to have an iterative algorithm e.g.:
- Determine sizes
- Determine addresses
- Scan relocations, looking of any that are out of range. Allocate thunks as needed.
- Adjust sizes based on allocated thunks
- If any adjustments were made, then go back to determining addresses
Given that determining addresses currently takes about 5% of link time, this could get expensive.
An alternative approach (B) would be to pessimistically allocate a thunk for every branch that might possibly require one. For an initial implementation, that might be sufficient. We could later add a pass where we eliminate unneeded thunks. An elimination approach has the advantage that we can probably stop after one iteration. While eliminating some thunks might reduce sizes enough that other thunks can then be eliminated, I suspect this would be of limited benefit, so probably isn't worth it. Most of the the gains would come from the first pass.
It might be tempting to create a thunk per input symbol ID rather than per relocation. This might work, but there are definitely circumstances in which it wouldn't work. e.g. if a single input symbol is referenced from two different functions that are placed in very different parts of the file - e.g. one function is marked as "cold" and placed with other cold functions.
It would also be tempting to create one thunk per symbol per input section, however the bookkeeping for doing that is likely to be expensive.
Even creating a thunk per eligible relocation requires a bit of bookkeeping. We'd need a count of thunks per input section. Storing any data per input section tends to slow us down. Fortunately there are currently a few free bytes that we can use without increasing the size of SectionSlot.
Needed to fix #1753 (may not be sufficient if there are other unrelated issues).
Range-extension thunks are mostly needed when producing large binaries for RISC architectures e.g. aarch64.
Where a branch instruction has limited range, e.g. +/- 128MiB, but the target of the branch is beyond that, we need to insert a thunk somewhere within the range that branches to the actual location.
The tricky part is that in order to know whether a thunk is needed, we need to know addresses. In order to figure out addresses, we need to know sizes, but the sizes of the thunks depends on how many we need. So there's basically a cycle in the state dependencies.
One option (call it A) would be to have an iterative algorithm e.g.:
Given that determining addresses currently takes about 5% of link time, this could get expensive.
An alternative approach (B) would be to pessimistically allocate a thunk for every branch that might possibly require one. For an initial implementation, that might be sufficient. We could later add a pass where we eliminate unneeded thunks. An elimination approach has the advantage that we can probably stop after one iteration. While eliminating some thunks might reduce sizes enough that other thunks can then be eliminated, I suspect this would be of limited benefit, so probably isn't worth it. Most of the the gains would come from the first pass.
It might be tempting to create a thunk per input symbol ID rather than per relocation. This might work, but there are definitely circumstances in which it wouldn't work. e.g. if a single input symbol is referenced from two different functions that are placed in very different parts of the file - e.g. one function is marked as "cold" and placed with other cold functions.
It would also be tempting to create one thunk per symbol per input section, however the bookkeeping for doing that is likely to be expensive.
Even creating a thunk per eligible relocation requires a bit of bookkeeping. We'd need a count of thunks per input section. Storing any data per input section tends to slow us down. Fortunately there are currently a few free bytes that we can use without increasing the size of
SectionSlot.