Skip to content

Isolate plugins in out-of-process COM host#40120

Open
benhillis wants to merge 2 commits intomasterfrom
user/benhill/plugin_host
Open

Isolate plugins in out-of-process COM host#40120
benhillis wants to merge 2 commits intomasterfrom
user/benhill/plugin_host

Conversation

@benhillis
Copy link
Copy Markdown
Member

Moves plugin DLLs from LoadLibrary in wslservice.exe to isolated wslpluginhost.exe processes via COM. A crashing plugin kills only its host process — the service logs it and continues. Plugin API is unchanged; Defender and existing plugins work unmodified.

Callbacks (MountFolder, ExecuteBinary) arrive on a different COM thread, so they can't re-enter the recursive mutex. Instead of restructuring every call site, callbacks use a separate \shared_mutex\ — shared for reads, exclusive in _VmTerminate\ before destroying the VM.

Plugin hosts are in a job object for automatic cleanup on service exit. COM activation is SYSTEM-only via AppID. Proxy/stub is consolidated into wslserviceproxystub.dll. One new exe, no new DLLs.

Copilot AI review requested due to automatic review settings April 6, 2026 16:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves WSL plugins from in-process LoadLibrary inside wslservice.exe to isolated out-of-process wslpluginhost.exe COM local servers, aiming to keep the service alive even if a plugin crashes.

Changes:

  • Introduces wslpluginhost.exe (COM local server) that loads one plugin DLL and forwards lifecycle events, while proxying plugin API callbacks back to the service.
  • Adds new COM contracts (IWslPluginHost, IWslPluginHostCallback) and consolidates proxy/stub generation into wslserviceproxystub.dll.
  • Updates service-side plugin management and adds a new shared_mutex path intended to avoid re-entrancy/deadlock on COM RPC callback threads.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/windows/wslpluginhost/exe/resource.h Adds resource IDs for new host executable.
src/windows/wslpluginhost/exe/PluginHost.h Declares COM class implementing IWslPluginHost plus API callback stubs.
src/windows/wslpluginhost/exe/PluginHost.cpp Implements plugin DLL loading, lifecycle dispatch, and callback forwarding to service.
src/windows/wslpluginhost/exe/main.rc Adds version/icon resources for wslpluginhost.exe.
src/windows/wslpluginhost/exe/main.cpp Implements COM local server entrypoint and class factory registration.
src/windows/wslpluginhost/exe/CMakeLists.txt Adds build target for wslpluginhost.exe.
src/windows/wslpluginhost/CMakeLists.txt Wires new subdirectory into build.
src/windows/service/stub/CMakeLists.txt Adds MIDL proxy/stub sources for WslPluginHost.idl into wslserviceproxystub.
src/windows/service/inc/WslPluginHost.idl Defines out-of-proc COM interfaces for plugin hosting + callbacks.
src/windows/service/inc/CMakeLists.txt Adds new wslpluginhostidl MIDL generation target.
src/windows/service/exe/PluginManager.h Refactors plugin manager to track out-of-proc hosts and adds callback implementation type.
src/windows/service/exe/PluginManager.cpp Implements COM activation of hosts, job object assignment, and service-side callback handlers.
src/windows/service/exe/LxssUserSession.h Adds shared_mutex and makes plugin-callback methods private/friend-only.
src/windows/service/exe/LxssUserSession.cpp Switches plugin callback locking from m_instanceLock to m_callbackLock and gates VM teardown.
src/windows/service/exe/CMakeLists.txt Adds dependency on wslpluginhostidl.
src/windows/common/precomp.h Adds <shared_mutex> include for new locking.
msipackage/package.wix.in Installs wslpluginhost.exe and registers COM AppID/CLSID/interfaces for activation and proxy/stub.
msipackage/CMakeLists.txt Adds wslpluginhost.exe to packaged binaries and build dependencies.
CMakeLists.txt Adds subdirectory for wslpluginhost and adjusts global include directories.
Comments suppressed due to low confidence (2)

src/windows/service/exe/LxssUserSession.cpp:3644

  • CreateLinuxProcess now only takes m_callbackLock, but it calls _RunningInstance(), which is annotated Requires_lock_held(m_instanceLock) and reads m_runningInstances. This is both a locking-contract violation and can race with writers that still use m_instanceLock only. Refactor so callback code can safely read the running-instance map (e.g., provide a callback-safe lookup guarded by m_callbackLock, and ensure all writes to m_runningInstances/m_utilityVm also take m_callbackLock exclusively after m_instanceLock per the stated lock ordering).
    // Shared lock prevents _VmTerminate from destroying the VM or instances
    // while we use them. See MountRootNamespaceFolder for rationale.
    std::shared_lock lock(m_callbackLock);
    RETURN_HR_IF(E_NOT_VALID_STATE, !m_utilityVm);

    if (Distro == nullptr)
    {
        *Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
    }
    else
    {
        const auto distro = _RunningInstance(Distro);
        THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);

        const auto wsl2Distro = dynamic_cast<WslCoreInstance*>(distro.get());

src/windows/service/exe/LxssUserSession.cpp:2614

  • m_runningInstances is updated here without taking m_callbackLock, but plugin callbacks now read m_runningInstances under m_callbackLock (and intentionally do not take m_instanceLock). Because these are different locks, this doesn’t provide synchronization and can lead to data races/UB when a callback runs concurrently with instance creation/termination. Writers that mutate m_runningInstances (and m_utilityVm if accessed by callbacks) need to also take m_callbackLock (exclusive) in the documented order m_instanceLock → m_callbackLock.
                // This needs to be done before plugins are notified because they might try to run a command inside the distribution.
                m_runningInstances[registration.Id()] = instance;

                if (version == LXSS_WSL_VERSION_2)
                {
                    auto cleanupOnFailure =
                        wil::scope_exit_log(WI_DIAGNOSTICS_INFO, [&]() { m_runningInstances.erase(registration.Id()); });
                    m_pluginManager.OnDistributionStarted(&m_session, instance->DistributionInformation());

Copilot AI review requested due to automatic review settings April 6, 2026 17:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/windows/service/exe/LxssUserSession.cpp:3643

  • CreateLinuxProcess now only takes m_callbackLock, but it reads m_runningInstances via _RunningInstance(), which is annotated Requires_lock_held(m_instanceLock) and the map itself is Guarded_by(m_instanceLock). This is a real race/contract violation (and can also break static analysis). You’ll need a consistent locking strategy for callback threads (e.g., make all accesses/mutations of m_runningInstances + m_utilityVm also take m_callbackLock in the documented order, or refactor callbacks to avoid touching m_runningInstances without m_instanceLock).
    // Shared lock prevents _VmTerminate from destroying the VM or instances
    // while we use them. See MountRootNamespaceFolder for rationale.
    std::shared_lock lock(m_callbackLock);
    RETURN_HR_IF(E_NOT_VALID_STATE, !m_utilityVm);

    if (Distro == nullptr)
    {
        *Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
    }
    else
    {
        const auto distro = _RunningInstance(Distro);
        THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);

@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 3418f7e to 3f03d4f Compare April 6, 2026 17:48
Copilot AI review requested due to automatic review settings April 6, 2026 18:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

src/windows/service/exe/LxssUserSession.cpp:3653

  • CreateLinuxProcess now only takes m_callbackLock but calls _RunningInstance(Distro) (which is annotated _Requires_lock_held_(m_instanceLock) and touches m_lockedDistributions). This violates the locking contract and can race with instance state changes or deadlock if someone later adds the required lock. Consider adding a callback-safe lookup that is guarded by m_callbackLock only (and does not call _EnsureNotLocked), or refactoring _RunningInstance / the guarded annotations so callback code never needs m_instanceLock.
    if (Distro == nullptr)
    {
        *Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
    }
    else
    {
        const auto distro = _RunningInstance(Distro);
        THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);

        const auto wsl2Distro = dynamic_cast<WslCoreInstance*>(distro.get());
        THROW_HR_IF(WSL_E_WSL2_NEEDED, !wsl2Distro);

@benhillis
Copy link
Copy Markdown
Member Author

Hey @benhillis 👋 — Following up on this PR. It currently has merge conflicts that need to be resolved, and the CI build didn't run (shows action_required). There are also 20 unresolved review threads, including some security-relevant findings (TOCTOU on DLL signature validation, PROCESS_ALL_ACCESS over-privilege, potential UAF on g_pluginHost). Is this change still actively being worked on? If so, could you rebase to resolve the conflicts and address the outstanding review feedback?

Copilot AI review requested due to automatic review settings April 7, 2026 04:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

@benhillis benhillis force-pushed the user/benhill/plugin_host branch from f32e629 to 5641289 Compare April 7, 2026 04:48
Copilot AI review requested due to automatic review settings April 7, 2026 05:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.

@benhillis
Copy link
Copy Markdown
Member Author

CI is all green now and conflicts are resolved. Would love to get some eyes on this when someone has a chance — the main design change is moving plugin loading out of wslservice into separate COM hosts so a bad plugin can't take down the service.

Copilot AI review requested due to automatic review settings April 10, 2026 04:07
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 3492d6b to 57501d9 Compare April 10, 2026 04:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 8 comments.

Comment on lines +114 to 117

wil::unique_socket sock;
auto result = session->CreateLinuxProcess(DistributionId, Path, args.data(), &sock);

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PluginHostCallbackImpl::ExecuteBinaryInDistribution has the same issue as ExecuteBinary: &sock is a wil::unique_socket* but CreateLinuxProcess expects a SOCKET* out parameter. Adjust the call so the out parameter type matches and the socket is correctly owned/closed on failure.

Copilot uses AI. Check for mistakes.
Comment on lines +268 to +272
std::vector<BYTE> PluginManager::SerializeSid(PSID Sid)
{
const DWORD sidLength = GetLengthSid(Sid);
std::vector<BYTE> buffer(sidLength);
CopySid(sidLength, buffer.data(), Sid);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SerializeSid ignores failures/invalid SIDs: GetLengthSid can return 0 and CopySid returns BOOL, but the return value is unchecked. Please validate the SID (or at least THROW_IF_WIN32_BOOL_FALSE(CopySid(...))) so we don’t serialize an empty/invalid SID and later reinterpret it as a PSID in the host process.

Copilot uses AI. Check for mistakes.
Comment on lines +807 to 810
/// Reads guarded by m_instanceLock OR m_callbackLock (shared).
/// Mutations require BOTH m_instanceLock AND m_callbackLock (exclusive).
/// </summary>
_Guarded_by_(m_instanceLock) std::map<GUID, std::shared_ptr<LxssRunningInstance>, wsl::windows::common::helpers::GuidLess> m_runningInstances;
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says reads of m_runningInstances are guarded by m_instanceLock OR m_callbackLock, but the field is still annotated Guarded_by(m_instanceLock). With callbacks now intentionally avoiding m_instanceLock, this is inconsistent and will cause static analysis/maintenance confusion. Consider updating annotations (or adding separate helper APIs that explicitly require/hold m_callbackLock) to match the new locking model.

Suggested change
/// Reads guarded by m_instanceLock OR m_callbackLock (shared).
/// Mutations require BOTH m_instanceLock AND m_callbackLock (exclusive).
/// </summary>
_Guarded_by_(m_instanceLock) std::map<GUID, std::shared_ptr<LxssRunningInstance>, wsl::windows::common::helpers::GuidLess> m_runningInstances;
/// Reads require either m_instanceLock or m_callbackLock (shared).
/// Mutations require both m_instanceLock and m_callbackLock (exclusive).
/// This uses a dual-lock access model and therefore is not annotated with a
/// single _Guarded_by_ lock.
/// </summary>
std::map<GUID, std::shared_ptr<LxssRunningInstance>, wsl::windows::common::helpers::GuidLess> m_runningInstances;

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +32
PluginHost* wsl::windows::pluginhost::g_pluginHost = nullptr;

// Thread ID of the thread currently dispatching a plugin hook.
// Only that thread may call PluginError. Using thread ID instead of
// thread_local to avoid TLS initialization issues across DLL/EXE boundaries.
static std::atomic<DWORD> g_hookThreadId{0};
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_pluginHost is a process-wide raw pointer that’s read by API stubs from arbitrary plugin threads and written/reset during Initialize/destruction. Without atomic synchronization this is a data race and can lead to use-after-free when the host is shutting down. Consider making g_pluginHost an atomic pointer (or otherwise synchronizing access) and ensuring stub methods can’t observe a partially torn-down instance.

Copilot uses AI. Check for mistakes.
Comment on lines 353 to 361
@@ -354,11 +354,10 @@ class PluginTests
{
WSL1_TEST_ONLY();

constexpr auto ExpectedOutput = LR"(Plugin loaded. TestMode=1)";

// Plugins are not loaded for WSL1-only sessions (no VM, no plugin hooks).
// Verify that WSL1 works without plugins.
ConfigurePlugin(PluginTestType::Success);
StartWsl(0);
ValidateLogFile(ExpectedOutput);
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SuccessWSL1 no longer asserts anything about plugin behavior. Since ConfigurePlugin sets up logging, it would be good to assert that the plugin log file is absent/empty (and does not contain "Plugin loaded") to actually verify that plugins are not loaded for WSL1-only sessions.

Copilot uses AI. Check for mistakes.
// Plugins are not loaded for WSL1-only sessions, so a plugin that
// would fail to load on WSL2 has no effect on WSL1.
ConfigurePlugin(PluginTestType::FailToLoad);
StartWsl(0);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadFailureNonFatalWSL1 similarly no longer validates that the plugin wasn’t loaded (or that no plugin failure was recorded). Add an assertion that the log file is absent/empty to ensure WSL1 remains unaffected by a plugin that would fail on WSL2.

Suggested change
StartWsl(0);
StartWsl(0);
VERIFY_IS_TRUE(
!std::filesystem::exists(logFile) || (std::filesystem::file_size(logFile) == 0),
std::format("Expected plugin log file '{}' to be absent or empty for WSL1", logFile).c_str());

Copilot uses AI. Check for mistakes.
Comment on lines 3637 to 3644
HRESULT LxssUserSessionImpl::CreateLinuxProcess(_In_opt_ const GUID* Distro, _In_ LPCSTR Path, _In_ LPCSTR* Arguments, _Out_ SOCKET* Socket)
{
std::lock_guard lock(m_instanceLock);
// Shared lock prevents _VmTerminate from destroying the VM or instances
// while we use them. See MountRootNamespaceFolder for rationale.
std::shared_lock lock(m_callbackLock);
RETURN_HR_IF(E_NOT_VALID_STATE, !m_utilityVm);

if (Distro == nullptr)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateLinuxProcess now intentionally avoids m_instanceLock and only takes m_callbackLock (shared). However, the implementation later calls _RunningInstance (which is annotated Requires_lock_held(m_instanceLock) and reads m_runningInstances). Please align the helper/annotations with the new locking model (e.g., require/hold m_callbackLock for _RunningInstance) to avoid lock-contract violations and potential races.

Copilot uses AI. Check for mistakes.
@benhillis benhillis marked this pull request as ready for review April 10, 2026 15:29
@benhillis benhillis requested a review from a team as a code owner April 10, 2026 15:29
Ben Hillis added 2 commits April 10, 2026 16:03
Plugin DLLs are now loaded in isolated wslpluginhost.exe processes
instead of directly in wslservice.exe via LoadLibrary. This prevents
a buggy or malicious plugin from crashing the WSL service.

Architecture:
- New IWslPluginHost/IWslPluginHostCallback COM interfaces
  (WslPluginHost.idl) for cross-process plugin lifecycle management
- New wslpluginhost.exe: COM local server (REGCLS_SINGLEUSE), one
  per plugin, loads the plugin DLL and dispatches notifications
- Refactored PluginManager: CoCreateInstance replaces LoadLibrary,
  PluginError returned via [out] parameter, crash recovery via
  IsHostCrash() detecting RPC_E_DISCONNECTED/SERVER_DIED

Callback safety:
- Plugin callbacks (MountFolder, ExecuteBinary) arrive on a different
  COM RPC thread and use std::shared_lock(m_callbackLock) instead of
  m_instanceLock to avoid re-entrancy deadlocks
- _VmTerminate takes exclusive m_callbackLock before destroying the
  VM, blocking until in-flight callbacks complete
- Lock ordering: m_instanceLock -> m_callbackLock (never reverse)
- All writes to m_runningInstances take m_callbackLock exclusive to
  prevent data races with concurrent callback reads

Security:
- COM AppID with SYSTEM-only launch/access permissions
- Plugin signature validation (ValidateFileSignature) keeps the file
  handle open until after LoadLibrary to prevent TOCTOU attacks
- Plugin host processes use minimal access rights for handles

Process lifecycle:
- Plugin hosts added to a job object with KILL_ON_JOB_CLOSE for
  automatic cleanup if wslservice exits
- g_pluginHost is process-wide (REGCLS_SINGLEUSE guarantees one
  plugin per process), nulled on destruction to prevent UAF
- std::call_once for thread-safe initialization and job creation

Packaging:
- WslPluginHost.idl compiled into existing wslserviceproxystub.dll
- MSI: COM class/interface registration, AppID security, proxy/stub
- wslpluginhost.exe added to build/signing pipeline, WER crash dump
  list, LSP registration, and test validation

Plugins are not loaded for WSL1-only sessions since all plugin hooks
require a WSL2 VM. WslPluginApi.h is unchanged - existing plugin DLLs
work unmodified.
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 57501d9 to ed98200 Compare April 10, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants