This repository was archived by the owner on Oct 31, 2025. It is now read-only.
Refactoring & improvements to algorithmic complexity of inline.rs.#811
Open
ElectronicRU wants to merge 12 commits intoEmbarkStudios:mainfrom
Open
Refactoring & improvements to algorithmic complexity of inline.rs.#811ElectronicRU wants to merge 12 commits intoEmbarkStudios:mainfrom
ElectronicRU wants to merge 12 commits intoEmbarkStudios:mainfrom
Conversation
Contributor
Author
|
I've tried to explain what exactly I am doing in each commit, and the rationale behind it. This is a second draft, but if some parts seem unclear or I missed something simple, be sure to tell. |
ElectronicRU
added a commit
to ElectronicRU/rust-gpu
that referenced
this pull request
Nov 30, 2021
khyperia
approved these changes
Dec 2, 2021
Contributor
khyperia
left a comment
There was a problem hiding this comment.
LGTM, just want to wait a bit to see if eddyb wants to review!
…c inlining. By inlining in callee -> caller order, we avoid the need to continue inlining the code we just inlined. A simple reachability test from one of the entry points helps avoid unnecessary work as well. The algorithm however remains quadratic in case where OpAccessChains repeatedly find their way into function parameters. There are two ways out: either a more complex control flow analysis, or conservatively inlining all function calls which reference FunctionParameters as arguments. I don't think either case is very worth it.
We need pointer types, and re-checking all the types to see if we already have one is rather slow, it's better to keep track.
…ned. The functions we are going to delete definitely either need to be inlined, or are never called (so we don't care what to decide about them).
Since during inlining, the only escaping value is the return value, we can calculate and update whether it has an invalid-to-call-with value as well. (Note that this is, strictly speaking, more rigor than get_invalid_values() applies, because it doesn't look behind OpPhis) As a nice bonus, we got rid of OpLoad/OpStore in favor of OpPhi, which means no type mucking and no work created for mem2reg.
Originally, this algorithm walked a linked list by the back-edges, copying and skipping. It is easier to just go with front-edges and gobble up a series of potential blocks at once. The predecessor finding algorithm really just wanted to find 1-to-1 edges (it was split between `compute_all_preds` and `fuse_trivial_branches`), so made it that.
Just inlining entry points deletes functions from tests and makes everyone sad.
This partially reverts commit 990425b.
Contributor
Author
|
Updated the MR and resolved conflicts, @eddyb do you wish to take a look? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
inline.rs was a sore spot for me reading the codebase because of how many TODOs and quadratic algorithms it had sitting there.
Looking at it, I was able to see some nicer algorithms trying to poke out. Also a little bit more uniformity for doing the same stuff (like finding variable split points), less cloning, and more caching :)
Tests aren't affected at all, which I'd consider a success.
P.S. Idle hands shave yaks, or so they say.