Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions malware-detection-with-llm/dynamic_analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Dynamic Binary & Source Code Analysis with LLM

This project provides a **dynamic code and binary security analysis framework** enhanced with **LLM reasoning**. It automates the extraction of indicators from binaries and source code, correlates findings, and produces a risk assessment.
For moment, the setup only tests how the logs are saved in the host machine, using bind mounts between container and virtual machine, and then virtfs between virtual machine and host.
---

## Table of Contents
- [Usage](#usage)

---

### Currently issue
The issue i'm hitting with QEMU/KVM on WSL is that KVM requires hardware virtualization extensions (Intel VT-x/AMD-V) to be available to the Linux kernel, but WSL doesn't expose these to the guest Linux environment.
WSL itself is a virtualized environment: WSL2 runs Linux in a lightweight VM managed by Hyper-V. I am likely trying to run nested virtualization

## Usage

```bash
wget https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img
qemu-img resize noble-server-cloudimg-amd64.img +20G
cloud-localds seed.iso user-data.yaml meta-data.yaml
python3 ./script.py
```
2 changes: 2 additions & 0 deletions malware-detection-with-llm/dynamic_analysis/meta-data.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
instance-id: iid-local01
local-hostname: autovm
59 changes: 59 additions & 0 deletions malware-detection-with-llm/dynamic_analysis/script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import subprocess
import time
import os

CLOUD_IMAGE = "noble-server-cloudimg-amd64.img"
USER_DATA = "user-data.yaml"
META_DATA = "meta-data.yaml"
SEED_ISO = "seed.iso"

HOST_LOG_DIR = "logs_host"
VM_LOG_DIR = "/mnt/logs"

QEMU_CMD = [
"qemu-system-x86_64",
"-m", "4G",
"-smp", "2",
"-cpu", "qemu64", #"host", if you can access the host cpu features, otherwise qemu64 (much slower, but more compatible)
"-drive", f"file={CLOUD_IMAGE},format=qcow2",
"-drive", f"file={SEED_ISO},format=raw",
"-netdev", "user,id=net0",
"-device", "virtio-net-pci,netdev=net0",
"-virtfs", f"local,path={HOST_LOG_DIR},mount_tag=shared_logs,security_model=passthrough,id=logs",
"-nographic"
]


def create_seed_iso():
print("Creating seed.iso cloud-init...")
if os.path.exists(SEED_ISO):
os.remove(SEED_ISO)
subprocess.check_call([
"cloud-localds",
SEED_ISO,
USER_DATA,
META_DATA
])


def ensure_host_log_dir():
print("Checkings!")
if not os.path.exists(HOST_LOG_DIR):
os.makedirs(HOST_LOG_DIR)


def start_qemu():
print("Starting QEMU vm!")
return subprocess.Popen(QEMU_CMD)


def main():
ensure_host_log_dir()
create_seed_iso()
vm = start_qemu()
print("The container inside the VM will write logs to:", HOST_LOG_DIR)
vm.wait()


if __name__ == "__main__":
main()
17 changes: 17 additions & 0 deletions malware-detection-with-llm/dynamic_analysis/user-data.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
hostname: autovm
package_update: true
package_upgrade: true

packages:
- docker.io
#its a test only for communication through bind mounts, doesn't have logic yet
runcmd:
- mkdir -p /mnt/logs
- mount -t 9p -o trans=virtio,version=9p2000.L shared_logs /mnt/logs
- systemctl enable docker
- systemctl start docker
- docker run -d \
--restart=always \
--name logcollector \
-v /mnt/logs:/var/log/myapp \
bash -c "while true; do echo \"$(date) TEST LOG\" >> /var/log/myapp/test.log; sleep 1; done"
162 changes: 162 additions & 0 deletions malware-detection-with-llm/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
## How to do malware analysis on files to check if they are secure or not?

We cannot risk running untrusted code directly on our machines because it may affect the system irreversibly, corrupt configuration files, steal credentials, or escape into the host environment.
Therefore, all analysis must be performed inside a sandbox environment. The sandbox should be isolated enough that even malicious code cannot break out.

### Sandbox Strategy

If the code appears lightweight and low-risk, we can execute it inside a Docker container. Docker provides OS isolation, namespace separation, cgroups etc. and allows us to disable networking, mount the repo read-only, and restrict capabilities. We can communicate with the container runtime though bind mounts, so we don't have to ensure active networking for protocols like scp etc.
If the code seems dangerous or highly suspicious, we must use a QEMU virtual machine, because QEMU provides full hardware virtualization and does not share the kernel with the host OS. This prevents kernel-level exploits or container-escape vulnerabilities. Though, this imply much more complexity overhead compared with the containerized solution.

This approach balances speed (Docker) and security (QEMU), depending on threat level.

### Is static analysis sufficient?

Static analysis is useful, but entirely insufficient. There are famous cases where malware remained undetected until execution, that happened recently:
- Log4Shell in 2021: harmless-looking text strings triggered remote code execution inside the JVM at runtime.
- XZ Utils backdoor in 2024: the payload was deliberately hidden inside complex build scripts and bytecode, only visible during execution.
- Obfuscated Python malware: imports and payloads decrypted only at runtime, invisible to static AST parsing.

Because malware can hide behavior behind:
obfuscated loaders, runtime-decrypted payloads, malicious environment-variable triggers, delayed execution, JIT-level behavior (Java/Python and other runtimes),
we must actually run the code in a sandbox and capture its system behavior.

### Data we pass to the agent

We will pass multiple files to the analysis agent, including:
The original source code (or binary)
strace logs (syscall-level activity)
ltrace logs (library-level calls)
network logs (if networking is allowed, usually not)
file-system activity logs
runtime exceptions and crash reports
metadata about the environment
This enables the agent to reason from both static and dynamic behavior.
Static checks depending on the runtime / language
Malware analysis depends a lot on the type of runtime.

1. If we have a binary (C/C++/Rust/Go build or ELF executable) then we can examine it statically using:

`readelf` - Parses ELF headers, detects suspicious sections (encrypted data blobs, RWX memory, injected segments)
`objdump` - Disassembles code, analysing the symbols, allows detecting: ROP gadgets, shellcode signatures, stack canaries missing, suspicious inline assembly, calls to dangerous libc functions (system, execve, popen)
`strings` - Reveals hidden URLs, IPs, commands, encryption keys
`ldd` - Shows dynamic link dependencies, can show linking to: unknown libraries, malicious preloads, LD_LIBRARY_PATH, dependency hijacking etc.
`Makefile` inspection -look for: custom build steps that download remote code, dynamic code generation, suspicious compiler flags (exmple: -z execstack)

2. If the runtime includes Python

Python is much more dangerous because it'ss extremely dynamic. Static AST parsing helps, but we must also check runtime behavior.
Static checks
Parse AST: detect eval, exec, compile,
detect __import__, dynamic imports
detect suspicious modules (subprocess, ctypes, socket)
Detect obfuscation (base64 payloads, XOR loops)
Detect tampering with builtins
Runtime (sandbox) checks
Inside container:
monitor file access,
monitor socket attempts,
detect heavy CPU loops (crypto-miners),
detect subprocess spawning,
intercept ctypes calls (native library loading).
Extra tools:
`pyinstaller` --debug

`bandit` for static vulnerability scanning

`python -X utf8 -X dev -v` for verbose import tracing

3. If the runtime includes Java

Java malware can hide inside:
.class files
.jar files
dynamic class loaders
remote class fetching (dangerous)
reflection
Static checks
Decompile bytecode with `javap` or CFR
Detect: Class.forName, custom classloaders,
JNI (native code), dynamic bytecode generation,
embedded shell commands
Runtime checks
Inside sandbox:
`strace` JVM monitor: network connections, file I/O, subprocess calls via Runtime.getRuntime.exec()

### Dynamic analysis flow ###

After getting the scores of static analysis, the dynamic tests are launched within the sandbox environment:
we propose to capture:

**Monitorization of the process**
Capture `strace` syscalls, `ltrace` libcalls, file access logs (`inotify` like), networking attempts (even if blocked), signals that were raised, resources usage (CPU + memory)

**Seccomp**
Allows a process to define a filter that controls which
syscalls can execute (like a policy). We can collect all the syscalls denied attempts (strong malware indicators)

### Arhitecture for dynamic analysis

<img width="494" height="505" alt="image" src="https://github.com/user-attachments/assets/bca6c7d6-b4c5-470d-8616-fbb6381e881d" />


QEMU is a hosted virtual machine monitor: it emulates the machine's processor through dynamic binary translation and provides a set of different hardware and device models for the machine, enabling it to run a variety of guest operating systems. It also can be used with KVM to run virtual machines at near-native speed (by taking advantage of hardware extensions such as Intel VT-x). QEMU can also do emulation for user-level processes, allowing applications compiled for one architecture to run on another.
Qemu can use KVM or not.
If Qemu uses KVM, then Qemu emulates the peripherals (usb, mouse, keyboard, screen, disk, ...) and KVM runs the CPU code
If Qemu does not use KVM, then Qemu emulates the peripherals but also emulates the processor. That is, it runs the code by itself.
The advantage of using KVM is that it uses hardware acceleration provided by the CPU, because it is a kernel module and thus have privileges that Qemu alone wouldn't.
By the way, those hardware accelerations are related to being able to access peripherals (through pcie) in a secure way.
KVM is a “type 2” hypervisor, laying on the host OS.
The architecture of the test suite consists of a Quemu+KVM virtual machine running a Docker container inside it, both layers being equipped with reduced capabilities to encourage security. If we had chosen only one container, potentially dangerous instructions would have been run on the host kernel and therefore during the creation of logs during dynamic analysis we could have corrupted the system. although it is significantly more complex, we greatly reduce the risks.

Hardening primitives deployed across both sandbox tiers include: ASLR, NX/DEP, PIE-compiled binaries, PaX-style W^X emulation and page-fault–based enforcement, ROP and return-to-libc heuristics, eBPF/JIT restrictions, and vendor-specific hardware protection extensions.
Additional defensive configurations include:
* Intel VMX leveraged for hardware-assisted virtualization of the analysis guest, ensuring that privileged instructions executed by the sample terminate safely within the hypervisor boundary.
* Memory Protection Keys (PKU) within the container to gate process memory accessibility on a per-page basis, minimizing the risk of intra-process memory abuse.
* IOMMU-backed DMA isolation to prevent peripheral-level or emulated-device–level direct memory access attacks from breaching the hypervisor or host.
* Confidential-computing enclaves (Intel SGX, AMD SEV, ARM TrustZone) not as an execution substrate for the specimen, but as a protection layer for sensitive components—such as cryptographic keys used for report attestation or proprietary analysis instrumentation—shielding them even in the presence of partial guest compromise.
This multi-layered, hardware-anchored isolation model substantially elevates the assurance level of the dynamic analysis pipeline while constraining adversarial behavior to strictly sandboxed domains.

| Hardening Helper| Protection Target | Info |
|----------|----------|----------|
| ASLR | Memory Layout predictability | Makes the address to not be predictable for an attacker |
| NX/DEP | Executing injecting code | Prevents the execution from **.data** pages |
| PIE | Increases ASLR power | Allows randomization for the base of the addresses |
| PaX W^X | Write+Exec Memory | Forces strict separation, blocks injections |
| ROP heuristics | Code reuse exploits | detection of suspicious control flow |
| eBPF/JIT restrictions | Kernel Attack surface | prevents misusing eBPF for kernel compromise |
| RELRO | protect binaries from particularly GOT/PLT overwrites| startup performance cost because all symbols must be resolved at program startup |
| Intel VMX virt. | already used by KVM | nothing to do |

<u>**Unikernels** comparison</u>:
Unikernels reduce very much the attack surface: no package manager, no shell, no users, no systemd(init processes), minimal syscalls, small code(base). This proposal described before includes Guest Linux kernel, Docker daemon, container runtime, linux fs, net. stack etc.So the security is exponentially boosted. But for this malware analysis the environment is focused on hardened sandbox analysis, virtualization boundaries, layers of mitigations and very important communication needed (maybe ssh etc.). Also, the toolchain would be too complex and not all the apps would run easily.

----

A slash command in Gemini CLI is essentially a custom workflow or task you define for the AI agent.

- example of slash command structure:
```yaml
name: malware-analysis-with-llm
description: "Analyze code for malware or suspicious behavior with LLM reasoning"
inputs:
- name: code_path
type: string
required: true
pre_scripts:
- ./scripts/static_analysis.sh {{code_path}}
- ./scripts/dynamic_analysis.sh {{code_path}}
llm_prompt: |
You are an AI security agent.
Analyze the following outputs from pre_scripts:
- Strace&Traces logs: {{dyn_analysis_log}}
- Python AST summary: {{python_ast}}
- Java decompiled report: {{java_report}}
Detect suspicious code patterns, unsafe system calls, and potential malware.
Output a risk level (SAFE / SUSPICIOUS / DANGEROUS) and reasoning report.
post_scripts:
- ./scripts/format_report.py {{llm_output}}
outputs:
- risk_level
- report_path
end
10 changes: 10 additions & 0 deletions malware-detection-with-llm/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
astroid==4.0.2
bandit==1.9.2
capstone==5.0.6
markdown-it-py==4.0.0
mdurl==0.1.2
Pygments==2.19.2
PyYAML==6.0.3
rich==14.2.0
ROPGadget==7.7
stevedore==5.6.0
Loading