Would it be possible to have a setting much like ollama and some others where after a variable amount of time the models would get unloaded? Just with a P40 GPU the power draw goes from 9watts at ideal to 51 watts ideal but with the models loaded. And a P4 goes from 7 watts to 25 watts.
Would it be possible to have a setting much like ollama and some others where after a variable amount of time the models would get unloaded? Just with a P40 GPU the power draw goes from 9watts at ideal to 51 watts ideal but with the models loaded. And a P4 goes from 7 watts to 25 watts.