This document describes the current state of the application — a .NET MAUI app that performs real-time face landmark detection on a live camera preview across Android, iOS, and Windows, using platform-specific MediaPipe bindings.
Live camera (SkiaCamera) → RGBA frame → IFaceLandmarkDetector.EnqueuePreviewDetection() → async callback → FaceLandmarkResult → AppCamera overlay draws landmarks/rectangles/masks on SkiaSharp canvas
- DrawnUI framework — the entire UI is rendered via DrawnUI (
DrawnUi.Maui.CameraNuGet). The camera preview, overlays, and FPS counter are all SkiaSharp-based DrawnUI controls, not standard MAUI views. - Shared interface
IFaceLandmarkDetectorwith platform-specific implementations. - Event-driven detection — the interface uses
EnqueuePreviewDetection(byte[] rgbaBytes, PreviewDetectionRequest)withPreviewDetectionCompleted/PreviewDetectionFailedevents, not aTask<T>return. - Platform files in
Platforms/{Platform}/folders (compiled only for their platform). - DI registration in
MauiProgram.csvia#ifplatform conditionals. - Landmark visualization via
AppCamera(extendsSkiaCamera), which internally uses aLandmarkDrawablefor rendering overlays on the SkiaSharp canvas. - No image resizing needed — MediaPipe handles it internally. The camera preview frames are downscaled by
AppCamerabefore being sent to the detector for performance.
| Platform | Package | Version |
|---|---|---|
| All | DrawnUi.Maui.Camera |
1.9.7.6 |
| Android | AppoMobi.Preview.MediaPipeTasksVision.Android |
0.10.33-preview.14 |
| iOS | MediaPipeTasksVision.iOS |
0.10.21 |
| Windows | Mediapipe.Net |
0.9.2 |
| Windows | Mediapipe.Net.Runtime.CPU |
0.9.1 |
Also: AndroidStoreUncompressedFileExtensions is set to .task so the model isn't compressed in the APK.
- Android/iOS:
Resources/Raw/face_landmarker.task(modern MediaPipe Tasks bundle) - Windows:
Resources/Raw/face_detection_short_range.tflite,face_landmark.tflite,face_landmark_front_cpu.pbtxt(legacy MediaPipe graph + models) - Overlay images:
Resources/Raw/mask_spiderman.png,Resources/Raw/hat_cake.png(shared across all platforms)
See Includes.md for the full asset manifest and per-platform build conditions.
Event-driven interface for live preview detection:
EnqueuePreviewDetection(byte[] rgbaBytes, PreviewDetectionRequest request)— accepts raw RGBA bytes from the camera.PreviewDetectionCompleted/PreviewDetectionFailedevents for async results.- Configurable properties:
MaxFaces,MinFaceDetectionConfidence,MinFacePresenceConfidence,MinTrackingConfidence.
Shared result model:
FaceLandmarkResultcontainingFaces(list ofDetectedFace),ImageWidth,ImageHeight.- Performance metrics:
ConversionMilliseconds,InferenceMilliseconds,ResultMappingMilliseconds,UsedGpuDelegate. DetectionTypeenum:Disabled,Landmark,Rectangle,Mask.MaskConfigurationclass for configuring overlay images (filename, position, width multiplier, Y offset).- Each
DetectedFacecontainsLandmarksas normalized(X, Y)points.
Global static settings:
TryUseGpu— when true, Android/iOS attempt GPU delegate (with CPU fallback). Windows is always CPU-only.InitialDetectionType— the overlay mode shown at startup.
- Uses DrawnUI's
AppCanvas(SkiaSharp-based canvas) as the rendering surface. - Contains an
AppCameracontrol (extendsSkiaCamera) for live camera preview with face detection overlay. - Top bar contains a Debug button (toggles Android fast/slow JNI API) and a
Pickerwith four modes:Landmark— green dots at each landmark pointRectangle— bounding box around each faceMask (Spider-Man)— Spider-Man mask overlay anchored to face landmarksHat (Cake)— cake hat overlay positioned above the forehead
StatusLabeldisplays face count and detection benchmark timing.SkiaLabelFpsshows real-time FPS counter.- Constructor-injects
IFaceLandmarkDetector; includes a parameterless constructor fallback for Shell/DataTemplate scenarios.
- Manages the camera-to-detector pipeline: captures preview frames, downscales to a configurable ML dimension, and calls
EnqueuePreviewDetection. - Draws the face overlay (landmarks, rectangles, or masks) directly on the SkiaSharp canvas during rendering.
- Implements overlay smoothing (interpolation toward latest landmarks) and a per-landmark deadzone to suppress jitter.
- Exposes events:
PreviewDetectionMeasured,PreviewDetectionUpdated,PreviewDetectionFailed.
- DrawnUI canvas with XAML hot-reload support. Manages singleton instance lifecycle to prevent leaks during hot reload.
IDrawablethat renders face overlays using MAUI Graphics:Landmarkmode: green dots at each landmark position.Rectanglemode: bounding box computed from min/max of all landmarks.Maskmode: draws a mask/hat image anchored to specific landmark points (nose tip, forehead, chin), scaled to face width, rotated to match face tilt.
- Uses AspectFit math to map normalized landmark coordinates to view bounds.
MauiProgram.cs:
- Registers DrawnUI via
builder.UseDrawnUi(...)with a portrait desktop window configuration. - Platform-specific
IFaceLandmarkDetectorregistered as singleton via#ifconditionals:- Android:
Platforms.Droid.FaceLandmarkDetector - iOS:
Platforms.iOS.FaceLandmarkDetector - MacCatalyst:
Platforms.MacCatalyst.FaceLandmarkDetector - Windows:
Platforms.Windows.FaceLandmarkDetector
- Android:
MainPageregistered as transient.
- Uses
AppoMobi.Preview.MediaPipeTasksVision.Android. - Loads the
.taskmodel viaSetModelAssetPath("face_landmarker.task"). - Running mode:
LiveStream— usesDetectAsync(mpImage, timestampMs)with aResultListenercallback andErrorListener. - Converts incoming RGBA bytes to an Android
Bitmap(reusing a pooled bitmap), wraps inMPImageviaBitmapImageBuilder. - GPU delegate attempted first; falls back to CPU. Controlled by
DetectionSettings.TryUseGpu. - Lazily constructs the
FaceLandmarkerviaGetLandmarker()(recreated when settings likeMaxFaceschange). - Two result-mapping paths: a fast bulk JNI accessor (
FaceLandmarksXY) and a slow per-landmark wrapper path, toggled viaUseFastApistatic flag (exposed via Debug button in UI). - Reports
ConversionMilliseconds,InferenceMilliseconds,ResultMappingMilliseconds, andUsedGpuDelegatein results.
- Uses
MediaPipeTasksVision.iOS. - Loads the
.taskmodel from the app bundle viaNSBundle.MainBundle.PathForResource("face_landmarker", "task"). - Running mode:
Video— runsDetectVideoFrame(mpImage, timestampMs)on a background task so inference stays off the camera callback thread. - Converts RGBA bytes to
CGImage→UIImage→MPPImage, and maps the camera rotation into an explicitMPPImageorientation before inference. - GPU delegate attempted first via
MPPDelegate.Gpu; falls back to CPU automatically if GPU initialization fails. Controlled byDetectionSettings.TryUseGpu. - Lazily constructs the
MPPFaceLandmarkerviaGetLandmarker().
- Stub that raises
PlatformNotSupportedExceptionvia thePreviewDetectionFailedevent. - Also retains a legacy
DetectAsync(Stream)method (throwsPlatformNotSupportedException).
- Uses
Mediapipe.Net(0.9.2) +Mediapipe.Net.Runtime.CPU(0.9.1). - Loads the legacy
.tflitemodels and.pbtxtgraph config from MAUI raw assets. - Model loading: Reads the two
.tflitefiles from MAUI'sFileSystem.OpenAppPackageFileAsync, extracts them to disk undermediapipe-task/face_landmarker/mediapipe/modules/...so the native C++ graph can find them at the expected relative paths. Uses aCurrentDirectoryScopeto temporarily set the working directory. - Live preview: Uses a persistent
LiveGraphSessionthat keeps aCalculatorGraphrunning across frames, feeding RGBA→RGB-convertedImageFramepackets and pollingface_landmarksoutput. - One-shot detection: Also retains a
DetectAsync(Stream)method that creates a fresh graph per call, decoding via WinRTBitmapDecoder(BGRA8→RGB conversion). - Observes the internal graph node
face_landmarks(NormalizedLandmarkList) rather than the public vector outputmulti_face_landmarks, bypassing the missing customVectorenvelope inMediapipe.Net. - C/C++ interop stability: Explicit
Dispose()scopes and typed unmanaged pointers (e.g.,Timestamp(1L)) prevent the GC from prematurely destroying wrappers while the unmanaged background thread processes frames, resolving Access Violations (0xc0000005). - Landmark count:
with_attention=falsein graph side packets means Windows generates exactly 468 landmarks per face (no iris tracking). Android/iOS generate up to 478 landmarks including iris.
Platforms/iOS/Info.plistcontains bothNSCameraUsageDescriptionandNSPhotoLibraryUsageDescription.
- Target frameworks:
net10.0-android,net10.0-ios,net10.0-maccatalyst,net10.0-windows10.0.19041.0. - Uses
WindowsPackageType=None(unpackaged). - Conditional
MauiAssetitem groups per platform (see Includes.md). - Conditional NuGet package references per platform.
AndroidStoreUncompressedFileExtensionsset to.taskfor Android.
- Android/iOS/Windows:
- Launch the app — camera preview should start automatically.
- Confirm the status label shows detected face count and benchmark timing.
- Select
Landmark— confirm green dots overlay the face mesh. - Select
Rectangle— confirm a bounding box overlays each face. - Select
Mask (Spider-Man)— confirm the mask anchors to face tilt and proportions. - Select
Hat (Cake)— confirm the hat sits above the forehead. - Test with 0 faces, 1 face, multiple faces.
- Mac Catalyst:
- Launch and confirm a friendly
PlatformNotSupportedExceptionmessage for detection.
- Launch and confirm a friendly
The architecture (MediaPipe Tasks on mobile, Mediapipe.Net TFLite graphs on Windows) is a generalized pipeline. By swapping the model file and calling a different MediaPipe API class, entirely distinct computer vision tasks can be performed:
- Hand Landmarking (
hand_landmarker.task): Detects 21 3D knuckles and joints per hand. Used for sign language translation, gesture controls, or virtual finger-tracking. - Pose Landmarking (
pose_landmarker.task): Maps 33 major body joints. Used for fitness apps, motion capture, or fall detection. - Object Detection (
efficientdet.task): Draws bounding boxes around objects and identifies them from a trained list. - Image Segmentation (
image_segmenter.task): Performs pixel-perfect foreground/background separation (the technology behind Zoom/Teams background blur). - Image Classification (
classifier.task): Analyzes the whole image to classify it into categories.
Because the app already handles managing C++ unmanaged memory pointers on Windows and hooking the native iOS/Android MediaPipe binaries, adding any of these tasks would mainly involve parsing different output data structures.
Doable as a two-stage pipeline:
- Stage 1 (Current Engine): Use the current FaceLandmarkDetector to find the face. The Rectangle bounding box (min/max X and Y from landmarks) crops the face out of the original image.
- Stage 2 (New Model): Feed that cropped face image into a TFLite/ONNX Face Recognition model. This outputs a "Face Embedding" (a mathematical vector, usually 128 to 512 floats).
- Comparison: Compare the mathematical distance (Cosine Similarity or Euclidean Distance) between a saved vector and the newly generated vector. If the distance is below a threshold, it's a match.