Recently I got interested in the new font rendering system. I'd noticed before that if you put entries in mapping.json for all the fonts, things slow down a lot (at least on Linux), and happened across that again when I got curious about the font replacements again.
So, I dug into it some. My theory is that the way DirectWrite is used is not conducive to the pattern that ToEE uses to render text. Basically everything is immediate mode, so every frame you follow these steps:
- Push a font.
- Calculate some size information for a string.
- Draw the string.
- Pop the font.
The font replacement system hooks into 2 and 3, and drawing a string—for instance—has the steps:
- Use font and styling to calculate layout of the string.
- Draw the laid-out string to the screen.
This is actually done twice for drawing, because drop shadows do a slightly different layout (in some cases). Also the measurement step has to do the layout step as well, so layout is done 3 times per frame (at least for some text).
I suspect this is the performance problem, because from what I've read, the layout part can be expensive. There's a default DrawText for simple stuff, but it's recommended to use DrawTextLayout, where you perform layout once, and then repeatedly draw it. Temple+ actually uses DrawTextLayout, but because ToEE isn't structured to hold on to the layouts, it's actually just doing the same thing as DrawText itself, really.
I did some work to separate layout from drawing more. However, when I went to try it out, I realized that we actually replace the "draw the string" function in one spot, and refer to it elsewhere. That's not a problem for GUI elements we've hooked (they could just refer directly to the new system). But, I did realize that I'm not sure the layout separation can really occur without rewriting every part of the gui that uses (a lot of) text. For instance, I was going to use the spell dialog as a test case. Right now what it does is:
- When opened, calculate a list of the names of all spells that are going to appear in the dialogue, storing it in a list.
- Each frame, draw those names to the screen in the right place using the 4 steps above.
- When the dialog is scrolled, update the list.
But what needs to happen is:
- When opened, calculate the names of the spells that are going to appear, and do all font/styling based layout, storing that in the list (but note that the styling information is in the original code for the step below).
- Each frame, draw the pre-laid-out text to the screen in the right place.
- When the dialog is scrolled, update the list.
That isn't a problem for spells, because we've actually reimplemented the spell dialog in Temple+. But there are lots of dialogs where we've only hooked the drawing functions, and the original DLL only pre-calculates the strings. So it can't work this way without reimplementation (I think).
So, I was wondering if anyone else had thoughts on this, as redoing every dialog is a bit much. One option would be to put a cache at a lower layer, that maps (string, font, style) to the already laid out text. I'm not super familiar with the options for that in C++, though, so I'm not certain of the relative performance vs. laying out the text.
I guess another thing I should ask is, do people that actually run Windows also find performance to be bad with full font replacement? I assume that's why it's not on by default. But I'm using wine, so its DirectWrite implementation could be a lot slower.
Recently I got interested in the new font rendering system. I'd noticed before that if you put entries in
mapping.jsonfor all the fonts, things slow down a lot (at least on Linux), and happened across that again when I got curious about the font replacements again.So, I dug into it some. My theory is that the way DirectWrite is used is not conducive to the pattern that ToEE uses to render text. Basically everything is immediate mode, so every frame you follow these steps:
The font replacement system hooks into 2 and 3, and drawing a string—for instance—has the steps:
This is actually done twice for drawing, because drop shadows do a slightly different layout (in some cases). Also the measurement step has to do the layout step as well, so layout is done 3 times per frame (at least for some text).
I suspect this is the performance problem, because from what I've read, the layout part can be expensive. There's a default
DrawTextfor simple stuff, but it's recommended to useDrawTextLayout, where you perform layout once, and then repeatedly draw it. Temple+ actually usesDrawTextLayout, but because ToEE isn't structured to hold on to the layouts, it's actually just doing the same thing asDrawTextitself, really.I did some work to separate layout from drawing more. However, when I went to try it out, I realized that we actually replace the "draw the string" function in one spot, and refer to it elsewhere. That's not a problem for GUI elements we've hooked (they could just refer directly to the new system). But, I did realize that I'm not sure the layout separation can really occur without rewriting every part of the gui that uses (a lot of) text. For instance, I was going to use the spell dialog as a test case. Right now what it does is:
But what needs to happen is:
That isn't a problem for spells, because we've actually reimplemented the spell dialog in Temple+. But there are lots of dialogs where we've only hooked the drawing functions, and the original DLL only pre-calculates the strings. So it can't work this way without reimplementation (I think).
So, I was wondering if anyone else had thoughts on this, as redoing every dialog is a bit much. One option would be to put a cache at a lower layer, that maps
(string, font, style)to the already laid out text. I'm not super familiar with the options for that in C++, though, so I'm not certain of the relative performance vs. laying out the text.I guess another thing I should ask is, do people that actually run Windows also find performance to be bad with full font replacement? I assume that's why it's not on by default. But I'm using wine, so its DirectWrite implementation could be a lot slower.