[Feature] Mac native backend support with vLLM-Metal, MLX, and llama.cpp by ricky-chaoju · Pull Request #8 · Infinirc/lmstack

ricky-chaoju · 2026-02-02T04:49:15Z

Summary

Add Apple Silicon GPU detection for Mac workers
Add vLLM-Metal, MLX, and llama.cpp native backend support
Auto-install dependencies (no manual pip/brew required)
Auto-convert HuggingFace models to MLX/GGUF formats
Download pre-quantized GGUF models directly from HuggingFace
Add deployment progress logs visible in web UI
Add official MLX logo and fix dropdown alignment

- Add official MLX logo (light/dark variants) - Fix backend dropdown logo alignment with centered icons - Fix model dropdown tag alignment with fixed width

Copilot

Pull request overview

This pull request adds comprehensive Mac native backend support for LMStack, enabling Apple Silicon GPU acceleration through vLLM-Metal, MLX, and llama.cpp backends. The implementation includes automatic dependency installation, model format conversion, and deployment progress tracking.

Changes:

Added native Mac backend support for vLLM-Metal, MLX-LM, and llama.cpp with automatic installation
Implemented automatic model conversion from HuggingFace to MLX/GGUF formats with caching
Added deployment progress logging visible in the web UI for native deployments
Enhanced Apple Silicon GPU detection and worker capability reporting
Updated frontend UI to support new backends with format compatibility indicators

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 20 comments.

Show a summary per file

File	Description
worker/routes/converter.py	New API routes for model format conversion (MLX/GGUF)
worker/routes/init.py	Added converter router to available routes
worker/native_ops/process_manager.py	Extended process manager with vLLM-Metal support, auto-install for MLX/llama.cpp, and log file management
worker/native_ops/converter.py	New model converter module handling HuggingFace to MLX/GGUF conversions with caching
worker/native_ops/init.py	Exported ModelConverter class
worker/docker_ops/gpu.py	Added Apple Silicon GPU detection via system_profiler
worker/agent.py	Integrated converter routes and auto-start Ollama with external access
frontend/src/pages/Deployments.tsx	Updated UI to show MLX/llama.cpp/vLLM-Metal backends for Mac workers with format indicators
frontend/src/components/logos/index.tsx	Added official MLX logo and improved llama.cpp branding
frontend/src/components/ModelFormatCompatibility.tsx	New component showing format compatibility and conversion warnings
frontend/src/components/ModelCompatibilityCheck.tsx	Extended to support new backend types
frontend/src/components/HuggingFaceModelPicker.tsx	Added format filtering (All/MLX Ready/GGUF Ready)
frontend/src/components/DeploymentAdvancedForm.tsx	Excluded MLX/llama.cpp from advanced settings
frontend/src/assets/mlx-logo-dark.png	Added MLX logo asset
frontend/src/api/index.ts	Exported ModelFormatInfo type
frontend/src/api/huggingface.ts	Added format info and MLX/GGUF search endpoints
backend/app/services/local_worker.py	Auto-start Ollama with external access on Mac before Docker worker
backend/app/services/deployer/service.py	Updated to use native deployment for vLLM on Mac (vLLM-Metal)
backend/app/services/deployer/native.py	Added conversion detection and early container_id assignment for log streaming
backend/app/models/worker.py	Updated to always show vLLM/MLX/llama.cpp as available on Mac (auto-installable)
backend/app/api/huggingface.py	Added format info, MLX search, and GGUF search endpoints

Comments suppressed due to low confidence (1)

backend/app/services/local_worker.py:319

This import of module os is redundant, as it was previously imported on line 7.

    import os

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-02T04:54:20Z

+    def _write_log(self, process_id: str, message: str) -> None:
+        """Write a message to a process's log file."""
+        log_file = self._log_dir / f"{process_id}.log"
+        with open(log_file, "a") as f:
+            from datetime import datetime
+
+            timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+            f.write(f"[{timestamp}] {message}\n")
+            f.flush()


The _write_log method opens the log file in append mode without any size limits or rotation. For long-running deployments with verbose output, this could lead to unbounded log file growth. Consider implementing log rotation or size limits to prevent disk space issues.

Copilot · 2026-02-02T04:54:21Z

+            stderr = await create_venv.stderr.read()
+            raise RuntimeError(f"Failed to create virtual environment: {stderr.decode()}")


The stderr output is read after the subprocess completes, but if stderr is None (which can happen if the subprocess was not properly initialized), calling .read() will raise an AttributeError. Add a null check before reading stderr to handle this edge case gracefully.

Suggested change

stderr = await create_venv.stderr.read()

raise RuntimeError(f"Failed to create virtual environment: {stderr.decode()}")

stderr_data = b""

if create_venv.stderr is not None:

stderr_data = await create_venv.stderr.read()

error_msg = stderr_data.decode() if stderr_data else "Unknown error"

raise RuntimeError(f"Failed to create virtual environment: {error_msg}")

Copilot · 2026-02-02T04:54:21Z

+def ensure_ollama_running_on_host(host: str = "0.0.0.0", port: int = OLLAMA_DEFAULT_PORT) -> bool:
+    """Ensure Ollama is running on the host with external access enabled.
+
+    This is called BEFORE starting Docker worker so that the container
+    can access Ollama on the host via localhost (with --network host).
+
+    Args:
+        host: Host to bind to (default 0.0.0.0 for external access)
+        port: Port to bind to (default 11434)
+
+    Returns:
+        True if Ollama is running and accessible
+    """
+    # Only run on macOS
+    if platform.system() != "Darwin":
+        return True  # Not needed on Linux (Docker can use GPU directly)
+
+    # Check if Ollama is installed
+    ollama_path = shutil.which("ollama")
+    if not ollama_path:
+        logger.info("Ollama is not installed on this Mac")
+        return False
+
+    # Check if Ollama is already running
+    try:
+        import httpx
+
+        with httpx.Client(timeout=2.0) as client:
+            response = client.get(f"http://localhost:{port}/api/tags")
+            if response.status_code == 200:
+                logger.info("Ollama service is already running")
+                return True
+    except Exception:
+        pass
+
+    # Ollama not running, start it with external access
+    logger.info(f"Starting Ollama service on {host}:{port}")
+
+    env = os.environ.copy()
+    env["OLLAMA_HOST"] = f"{host}:{port}"
+
+    try:
+        # Start ollama serve in background
+        process = subprocess.Popen(
+            [ollama_path, "serve"],
+            stdout=subprocess.PIPE,
+            stderr=subprocess.STDOUT,
+            env=env,
+            start_new_session=True,
+        )
+        logger.info(f"Started Ollama service (PID {process.pid})")
+
+        # Wait for Ollama to be ready (up to 30 seconds)
+        import httpx
+
+        for _ in range(30):
+            time.sleep(1)
+            try:
+                with httpx.Client(timeout=2.0) as client:
+                    response = client.get(f"http://localhost:{port}/api/tags")
+                    if response.status_code == 200:
+                        logger.info("Ollama service is ready")
+                        return True
+            except Exception:
+                pass
+
+        logger.error("Ollama service failed to start in time")
+        return False
+
+    except Exception as e:
+        logger.error(f"Failed to start Ollama service: {e}")
+        return False


The ensure_ollama_running_on_host function in the backend service is nearly identical to the ensure_ollama_running method in the worker's process manager. This code duplication means any bug fixes or improvements need to be applied in two places. Consider refactoring this into a shared utility module that both the backend and worker can use.

Copilot · 2026-02-02T04:54:21Z

+    @staticmethod
+    def find_mlx_variant(hf_model_id: str) -> Optional[str]:
+        """Find MLX variant of a HuggingFace model.
+
+        Searches mlx-community for a converted version of the model.
+
+        Args:
+            hf_model_id: Original HuggingFace model ID
+
+        Returns:
+            MLX model ID if found, None otherwise
+        """
+        # Try common naming patterns
+        model_name = hf_model_id.split("/")[-1]
+        patterns = [
+            f"mlx-community/{model_name}",
+            f"mlx-community/{model_name}-mlx",
+            f"mlx-community/{model_name}-4bit",
+            f"mlx-community/{model_name}-8bit",
+        ]
+        return patterns[0] if patterns else None


The find_mlx_variant static method returns only the first pattern if patterns list is not empty, otherwise None. However, the logic doesn't actually check if these variant models exist on HuggingFace - it just constructs potential model IDs. The caller should verify the existence of these models before using them. Consider documenting this behavior or adding actual existence checks.

Copilot · 2026-02-02T04:54:21Z

+                        f"Download failed: {stdout.decode() if stdout else 'Unknown error'}"
+                    )
+                model_dir = self.cache_dir / "downloads" / hf_model_id.replace("/", "--")
+


The GGUF conversion process downloads the model using huggingface-cli but doesn't validate that the download was successful before attempting conversion. The code checks the return code of the download process (line 370), but if the model directory already exists (line 356), it skips the download and uses the existing directory. This could lead to conversion failures if the existing directory is incomplete or corrupted. Consider validating that the necessary model files exist before proceeding with conversion.

Suggested change

# Validate that the model directory contains the necessary files

config_file = model_dir / "config.json"

has_safetensors = any(model_dir.glob("*.safetensors"))

has_bin_weights = any(model_dir.glob("*.bin"))

if (

not model_dir.exists()

or not model_dir.is_dir()

or not config_file.exists()

or not (has_safetensors or has_bin_weights)

):

raise RuntimeError(

f"Model directory '{model_dir}' is missing required files. "

"Please clear the Hugging Face cache or downloads and retry."

)

Copilot · 2026-02-02T04:54:24Z

-            raise RuntimeError(
-                "llama-server not found. " "Please install llama.cpp: brew install llama.cpp"
-            )
+        effective_model_path = model_id


This assignment to 'effective_model_path' is unnecessary as it is redefined before this value is used.
This assignment to 'effective_model_path' is unnecessary as it is redefined before this value is used.
This assignment to 'effective_model_path' is unnecessary as it is redefined before this value is used.
This assignment to 'effective_model_path' is unnecessary as it is redefined before this value is used.

Copilot · 2026-02-02T04:54:25Z

+            if response.status_code == 200:
+                logger.info("Ollama service is already running")
+                return True
+    except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-02T04:54:25Z

+                    if response.status_code == 200:
+                        logger.info("Ollama service is ready")
+                        return True
+            except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-02T04:54:25Z

+                if response.status_code == 200:
+                    logger.info("Ollama service is already running")
+                    return True
+        except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-02T04:54:25Z

+                        if response.status_code == 200:
+                            logger.info("Ollama service is ready")
+                            return True
+                except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

ricky-chaoju added 7 commits February 2, 2026 09:46

feat: add Apple Silicon GPU detection for Mac workers

05401b2

feat: add MLX and llama.cpp integration with auto model conversion

4c3caf0

feat: add vLLM-Metal support and auto-install for Mac backends

5b0dc3f

fix: add log file support for native Mac backends

9ba3215

fix: download GGUF models directly instead of converting

e692d90

feat: add deployment progress logs for web display

42783dd

feat: add official MLX logo and fix dropdown alignment

03684a0

- Add official MLX logo (light/dark variants) - Fix backend dropdown logo alignment with centered icons - Fix model dropdown tag alignment with fixed width

ricky-chaoju requested a review from Copilot February 2, 2026 04:49

Copilot started reviewing on behalf of ricky-chaoju February 2, 2026 04:49 View session

Copilot AI reviewed Feb 2, 2026

View reviewed changes

ricky-chaoju merged commit 5541d42 into main Feb 2, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Mac native backend support with vLLM-Metal, MLX, and llama.cpp#8

[Feature] Mac native backend support with vLLM-Metal, MLX, and llama.cpp#8
ricky-chaoju merged 7 commits into
mainfrom
dev

ricky-chaoju commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		stderr = await create_venv.stderr.read()
		raise RuntimeError(f"Failed to create virtual environment: {stderr.decode()}")

-            stderr = await create_venv.stderr.read()
-            raise RuntimeError(f"Failed to create virtual environment: {stderr.decode()}")
+            stderr_data = b""
+            if create_venv.stderr is not None:
+                stderr_data = await create_venv.stderr.read()
+            error_msg = stderr_data.decode() if stderr_data else "Unknown error"
+            raise RuntimeError(f"Failed to create virtual environment: {error_msg}")

+            # Validate that the model directory contains the necessary files
+            config_file = model_dir / "config.json"
+            has_safetensors = any(model_dir.glob("*.safetensors"))
+            has_bin_weights = any(model_dir.glob("*.bin"))
+            if (
+                not model_dir.exists()
+                or not model_dir.is_dir()
+                or not config_file.exists()
+                or not (has_safetensors or has_bin_weights)
+            ):
+                raise RuntimeError(
+                    f"Model directory '{model_dir}' is missing required files. "
+                    "Please clear the Hugging Face cache or downloads and retry."
+                )

Conversation

ricky-chaoju commented Feb 2, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants