Skip to content
Change the repository type filter

All

    Repositories list

    • ROM

      Public
      The official implementation of our paper "ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention"
      Python
      MIT License
      1300Updated Apr 11, 2026Apr 11, 2026
    • [CCS 2026] The official implementation of our CCS 2026 paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in La…
      Python
      Other
      3900Updated Apr 10, 2026Apr 10, 2026
    • AgentDyn

      Public
      The official implementation of the paper "AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System".
      Python
      MIT License
      14500Updated Apr 9, 2026Apr 9, 2026
    • DynAuditClaw — A security audit skill that dynamically discovers your OpenClaw agent's real configuration, designs targeted attack scenarios adapted to your spe…
      Python
      11000Updated Apr 6, 2026Apr 6, 2026
    • PRISM

      Public
      PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
      Python
      MIT License
      1700Updated Apr 3, 2026Apr 3, 2026
    • A security analysis report of the leaked Claude-Code
      1500Updated Apr 3, 2026Apr 3, 2026
    • DRIFT

      Public
      [NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents".
      Python
      34710Updated Mar 19, 2026Mar 19, 2026
    • seclaw

      Public
      🦾 SeClaw: The Security Armored Personal AI Assistant
      TypeScript
      MIT License
      12900Updated Mar 18, 2026Mar 18, 2026
    • llm-armor

      Public
      JavaScript
      0000Updated Mar 18, 2026Mar 18, 2026
    • armor

      Public
      Python
      MIT License
      0700Updated Mar 18, 2026Mar 18, 2026
    • Official code repository for "A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems" at ICLR 2026.
      JavaScript
      MIT License
      0000Updated Feb 26, 2026Feb 26, 2026
    • dVLM-AD

      Public
      Official Repo for “dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning”
      Python
      0600Updated Feb 22, 2026Feb 22, 2026
    • AdaShield

      Public
      [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting."
      Python
      47350Updated Feb 9, 2026Feb 9, 2026
    • DoxBench

      Public
      [ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"
      Jupyter Notebook
      Apache License 2.0
      32600Updated Feb 7, 2026Feb 7, 2026
    • The homepage of SaFo Lab
      HTML
      MIT License
      0200Updated Jan 28, 2026Jan 28, 2026
    • MetaAgent

      Public
      Offical Repository of MetaAgent Program
      Python
      84640Updated Dec 2, 2025Dec 2, 2025
    • A further improvement for the AutoDAN-Turbo through test-time scaling.
      Python
      MIT License
      41310Updated Oct 21, 2025Oct 21, 2025
    • [ICLR 2025 Spotlight] The official implementation of our ICLR2025 paper "AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs".
      Python
      MIT License
      6135850Updated Oct 8, 2025Oct 8, 2025
    • [ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".
      Python
      13900Updated Aug 4, 2025Aug 4, 2025
    • [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and further assess the robustn…
      Python
      108820Updated May 9, 2025May 9, 2025
    • OET

      Public
      Python
      MIT License
      11100Updated May 5, 2025May 5, 2025
    • FIUBench

      Public
      A Task of Fictitious Unlearning for VLMs
      Jupyter Notebook
      22770Updated Apr 6, 2025Apr 6, 2025
    • Dolphins

      Public
      [ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“
      Python
      MIT License
      148860Updated Feb 10, 2025Feb 10, 2025
    • List of T2I safety papers, updated daily, welcome to discuss using Discussions
      MIT License
      16800Updated Aug 12, 2024Aug 12, 2024
    • .github

      Public
      Open codes from SaFoLab at University of Wisconsin–Madison
      0100Updated Jul 3, 2024Jul 3, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.