Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 131 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,150 @@
# RepoVis

Simple tool to gather information about one or more DNF repositories. Scans of versions, build times, and changelogs are performed, and a friendly summary in HTML, CSV, or YAML is presented.
RepoVis scans one or more DNF repositories for recent package updates, extracts changelogs and CVE references, and produces a summary report in **HTML**, **CSV**, or **YAML** format.

The tool will filter by "days ago" ("--days") or a particular start date ("--startdate"), and only consider that time period.
Key capabilities:

CVE text from the changelog is extracted for easy summary about what has or has not been fixed.
- Filter by a rolling window (`--days`) or a fixed start date (`--startdate`).
- Extract CVE identifiers from changelogs for a quick security overview.
- Enrich reports with **CVSS v3 scores** when CSAF advisory data is available.
- Support both system and custom repository configurations.

This program requires python3 and the DNF+RPM libraries to be installed
## Requirements

- Python ≥ 3.9 (ships as system Python on Rocky Linux / RHEL 9)
- DNF and RPM libraries (`dnf`, `rpm` Python bindings — pre-installed on RHEL/Rocky/CentOS)

## Quick Start

## Examples:
```bash
python3 repovis.py --days 30 --output html --file report.html <repo-name>
```

## CLI Reference

| Option | Description |
| ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `-d`, `--days N` | How many days back to search. Mutually exclusive with `--startdate`. |
| `-s`, `--startdate YYYY-MM-DD` | Earliest date to search from. Mutually exclusive with `--days`. |
| `-o`, `--output {html,csv,yaml-cve}` | Output format (default: `html`). |
| `-f`, `--file PATH` | Write report to a file instead of stdout. HTML output also copies `.css` and `.js` assets to the same directory. |
| `-r`, `--repodir PATH` | Alternate directory containing `.repo` files (default: `/etc/yum.repos.d/`). |
| `-c`, `--cveyaml PATH` | Custom YAML file with additional CVE fix data. Mutually exclusive with `--advisory-dir`. |
| `--advisory-dir PATH` | Path to a directory containing CSAF advisory JSON files (searched recursively). Generates supplemental CVE + CVSS data on the fly. Requires `--product-codes`. Mutually exclusive with `--cveyaml`. |
| `--product-codes CODE [CODE ...]` | One or more product codes to filter advisory data (e.g. `lts-9.2`, `rlc-9.2`, `fips-9.2-certified`). Required when `--advisory-dir` is set. |
| `-t`, `--title TEXT` | Report title for HTML and YAML output. |
| `--description TEXT` | Description header (below title) for HTML and YAML output. May contain custom HTML. |
| `repos` (positional) | One or more DNF repository names to scan, as shown in `dnf repolist`. |

> **Note:** You must specify either `--days` or `--startdate`.

## Examples

### HTML report — system repositories (last 21 days)

```bash
python3 repovis.py \
--days 21 \
--output html \
--file ./update_report/Updates.html \
--title "Rocky Linux 21 Day History" \
--description "Rocky package updates (BaseOS/AppStream) from the past 21 days." \
baseos appstream
```

### CSV report — custom repository with manual CVE YAML

Get a 21 day update report for Rocky Linux 8's: default repositories
Use `--repodir` when scanning repositories that are not in `/etc/yum.repos.d/`:

```bash
python3 repovis.py \
--startdate 2022-05-11 \
--cveyaml ./tmp/my_fixes.yaml \
--output csv \
--file ./update_since_2022.csv \
--repodir ./repos.tmp/ \
--title "Custom Repo Since May 2022" \
--description "Packages from the custom LTS repository with fixes added." \
custom-lts-repo-8 custom-lts-repo-8-additional
```
python3 repovis.py --days 21 --file ./update_report/Updates.html --output html --title "Rocky Linux 21 Day History" --description "Rocky Package updates (BaseOS/AppStream/PowerTools) from the past 21 days." baseos appstream powertools

### YAML-CVE report — with CSAF advisory directory

When you have a local clone of an advisories repository, use `--advisory-dir` and `--product-codes` instead of `--cveyaml`. This reads CSAF JSON files directly and also extracts CVSS v3 scoring data:

```bash
python3 repovis.py \
--advisory-dir ../advisories/csaf/advisories \
--product-codes lts-9.2 rlc-9.2 \
--repodir .tmp/ \
--output yaml-cve \
--startdate 2024-01-01 \
--file output.yaml \
--title "Rocky Linux 9.2 LTS CVE Report" \
--description "CVE summary for Rocky Linux 9.2 CIQ LTS repositories." \
rlc-9.2-lts.aarch64 rocky-9.2-baseos.aarch64 rocky-9.2-appstream.aarch64 rocky-9.2-extras.aarch64
```

<br />
### HTML report — with CSAF advisory directory

Same as above but with interactive HTML output:

Get a CSV report for a custom repository from a certain date, with known fixes (yaml) added:
(This is not a system repository, but assumes an alternate .repo file exists in ./repos.tmp/ ):
```bash
python3 repovis.py \
--advisory-dir ../advisories/csaf/advisories \
--product-codes lts-9.2 rlc-9.2 \
--repodir .tmp/ \
--output html \
--startdate 2024-01-01 \
--file output.html \
--title "Rocky Linux 9.2 LTS CVE Report" \
--description "CVE summary for Rocky Linux 9.2 CIQ LTS repositories." \
rlc-9.2-lts.aarch64 rocky-9.2-baseos.aarch64 rocky-9.2-appstream.aarch64 rocky-9.2-extras.aarch64
```
python3 ${REPOVIS} --startdate 2022-05-11 --cveyaml ./tmp/my_fixes.yaml --file ./update_since_2022.csv --output csv --repodir ./repos.tmp/ --title "Custom Repo Since May 2022" --description "Packages from the custom LTS repository, with fixes added. Since May 2022." custom-lts-repo-8 custom-lts-repo-8-additional

## CVE Data Sources

RepoVis supports two ways to supply supplemental CVE fix data (in addition to what is extracted from changelogs):

### 1. CSAF Advisory Directory (`--advisory-dir`)

Point to a directory containing CSAF advisory JSON files. RepoVis recursively scans for `*.json` files and extracts:

- CVE identifiers and fix dates
- CVSS v3 base score and severity

Use `--product-codes` to specify which product entries to match (e.g. `lts-9.2`, `rlc-9.2`, `fips-9.2-certified`). Only advisory entries whose product ID matches one of the given codes are included.

> This replaces the previous two-step workflow of running a separate advisory-parsing script and then passing the result via `--cveyaml`.

### 2. Manual CVE YAML (`--cveyaml`)

Changelogs, CVE codes, and other information can be added to or overridden via a custom YAML file:

```yaml
packages:
openssl:
cve_fixes:
'2025-03-01':
- CVE-2025-1234
- CVE-2025-5678
```

> **Note:** `--advisory-dir` and `--cveyaml` are mutually exclusive.

## CVSS Scoring

When advisory data is supplied via `--advisory-dir`, CVSS v3 base scores and severities are automatically included in all output formats:

- **HTML** — Inline coloured labels next to each CVE (Critical / High / Medium / Low).
- **CSV** — A `CVSS Scores` column with `CVE-ID:score:SEVERITY` entries.
- **YAML-CVE** — A separate top-level `cvss` section mapping CVE IDs to scores.

When no CVSS data is available, output is identical to the previous behaviour.

## Manual CVE fix additions
See [docs/cvss-scoring.md](docs/cvss-scoring.md) for full details.

Changelogs, CVE codes, and other information can be added to or overridden via custom YAML files read from a file.
## Further Documentation

- [Advisory Directory Option](docs/advisory-dir-option.md) — detailed architecture and data-model documentation for `--advisory-dir`.
- [CVSS Scoring](docs/cvss-scoring.md) — how CVSS data flows through the pipeline and appears in each output format.
195 changes: 195 additions & 0 deletions docs/advisory-dir-option.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# Feature: Local Advisory Directory & CVSS Scoring

## Overview

RepoVis can now read CIQ CSAF advisory JSON files directly from a local
directory, eliminating the need for a separate `read_lts_advisories.py`
script and its git-clone step. When advisory data is available, CVSS v3
scoring information (base score and base severity) is also extracted and
carried through the data model for downstream use.

## New CLI Options

### `--advisory-dir <path>`

Path to a local directory containing CSAF advisory JSON files (searched
recursively for `*.json`). The advisory data is processed on the fly to
generate supplemental CVE data — functionally equivalent to passing a
pre-built YAML file via `--cveyaml`.

- **Mutually exclusive** with `--cveyaml`.
- **Requires** `--product-codes`.

### `--product-codes <code> [<code> ...]`

One or more product-code strings used to filter which fixed RPMs in the
advisory JSONs are relevant (e.g. `lts-8.6`, `fipscompliant-8`,
`cbr-7.9`). Only entries whose product ID matches one of these codes
are included.

### Example

```bash
# Using a local clone of the advisories repo:
./repovis.py \
--advisory-dir ~/advisories/csaf \
--product-codes lts-9.2 \
-s 2025-01-01 \
-o html \
-f report.html \
rlc-9.2-lts.x86_64
```

This replaces the previous two-step workflow:

```bash
# Old workflow (no longer needed):
./read_lts_advisories.py lts-9.2 > cves.yaml
./repovis.py -c cves.yaml -s 2025-01-01 -o html rlc-9.2-lts.x86_64
```

## Backward Compatibility

### Existing `--cveyaml` files

The `--cveyaml` option continues to work exactly as before. YAML files
with plain string CVE lists are fully supported:

```yaml
packages:
openssl:
cve_fixes:
'2025-03-01':
- CVE-2025-1234
- CVE-2025-5678
```

### Mixed data sources

The `--advisory-dir` and `--cveyaml` options are mutually exclusive.
However, `PackageRead` internally merges advisory-generated data with
any programmatically supplied CVE data, deduplicating by CVE ID across
both plain string entries and rich dict entries.

## Data Model Changes

### `CvssInfo` (new dataclass in `lib/models.py`)

| Field | Type | Description |
| --------------- | ------- | -------------------------------- |
| `base_score` | `float` | CVSS v3 base score (e.g. `8.1`) |
| `base_severity` | `str` | CVSS v3 severity (e.g. `"HIGH"`) |

### `PackageInfo.cvss_data` (removed)

CVSS data was initially stored per-package on `PackageInfo.cvss_data`.
This has been replaced by a **global** `cvss_map: Dict[str, CvssInfo]`
on `PackageRead`, because CVSS scores are CVE-global (a given CVE always
has the same score regardless of which package ships the fix).

The global map is passed to the `Output` class and used to enrich all
three output formats (HTML, CSV, YAML-CVE).

The existing `cve_dict: Dict[str, List[str]]` field is **unchanged** —
CVE ID lists remain plain strings in all contexts.

## Internal Architecture

### `lib/advisory_read.py` (new module)

- `read_advisories_from_directory(advisory_dir, product_codes)` —
recursively scans a directory for `*.json` advisory files and returns
a data structure matching the supplemental CVE YAML schema.
- `_process_advisory(...)` — processes a single advisory document,
extracting CVE IDs, CVSS v3 scores, fix dates, and SRPM names.
- `_extract_cvss_v3(vuln)` — extracts `baseScore` and `baseSeverity`
from the CSAF `scores[].cvss_v3` block.

Each CVE entry produced by the advisory reader is a dict:

```python
{
"cve_id": "CVE-2025-1234",
"base_score": 8.1,
"base_severity": "HIGH"
}
```

### `lib/package_read.py` changes

- **`__init__`** accepts a new optional `cve_data` parameter for
pre-built CVE data (from the advisory reader).
- **`_merge_cve_data()`** merges advisory data into `cve_extra`,
handling both plain string and dict CVE entries when deduplicating.
- **`_get_cves_from_changelog()`** normalises dict entries back to
plain CVE-ID strings for `cve_dict`, preserving backward
compatibility with all output formatters.
- **`_get_cvss_data()`** (retained) extracts CVSS metadata from the
supplemental data for a single package. Superseded at the top level
by `_build_global_cvss_map()`.
- **`_build_global_cvss_map()`** (new) iterates all packages in
`cve_extra` once and builds a flat `Dict[str, CvssInfo]` stored as
`self.cvss_map`.

### `repovis.py` changes

- Imports `read_advisories_from_directory`.
- Adds `--advisory-dir` and `--product-codes` argument definitions.
- Validates mutual exclusivity and required combinations.
- Calls the advisory reader before constructing `PackageRead` and
passes the result via the new `cve_data` parameter.
- Passes `reader.cvss_map` to the `Output` constructor.

## CVSS in Output Formats

When CVSS data is available (via `--advisory-dir` or rich dict entries
in `--cveyaml` source data), scores and severities are included in all
three output formats. When no CVSS data exists, output is identical to
the previous behaviour — graceful degradation throughout.

### HTML

Each CVE in the list gets an inline coloured label:

```html
<li>CVE-2025-1234 <span class="cvss cvss-high">(8.1 HIGH)</span></li>
<li>CVE-2025-9999</li>
<!-- no CVSS data: rendered as before -->
```

Severity CSS classes (`cvss-critical`, `cvss-high`, `cvss-medium`,
`cvss-low`, `cvss-none`) are defined in `html_template.html`.

### CSV

A new `CVSS Scores` column is appended. Entries are positionally
aligned with the CVEs in the `CVE Fixes` column:

```
Package,Version,Module,Build Date,CVE Fixes,CVSS Scores
openssl,3.0.7-28.el9,-, 2025-03-01,CVE-2025-1234 CVE-2025-9999,CVE-2025-1234:8.1:HIGH CVE-2025-9999::
```

CVEs without CVSS data use the placeholder `CVE-ID::` to preserve
positional alignment with the CVE Fixes column.

### YAML-CVE

CVE ID lists in `cve_fixes` remain plain strings (unchanged). A
separate top-level `cvss` section is appended:

```yaml
packages:
openssl:
cve_fixes:
2025-03-01:
- CVE-2025-1234
- CVE-2025-9999
cvss:
CVE-2025-1234:
base_score: 8.1
base_severity: HIGH
```

Only CVEs that have CVSS data appear in the `cvss` section. The section
is omitted entirely when no CVSS data is available.
Loading