The idea is to create a simple package that is inteded for threat intelligence threat actors and malwares normalization: there are a lot of threat actors and malwares that has more than one name and sometimes you need to quickly answer the question "Does the malware/threat actor have a main (canonical) name"?
The package provide realy simple algorithm that is based on the couple of steps — strict matching and fuzzy matching.
Origin data about threat actors and malwares grabbed from MISP, MITRE, MALPEDIA and other sources.
Main idea is so simple:
- Check origin name if it is a canonical name (strict match)
- If yes, then return canonical name.
- If no, check if origin name is similar to canonical name (fuzzy match).
- If yes, then return canonical name of this synonym.
- If no, check if origin name is similar to synonyms (fuzzy match).
- In case if there are no mathes on the previous steps, then return origin name — so that we can't normalize this name now ¯_(ツ)_/¯
As a package:
import normalizer as norm
result = norm.normalize_threat_actor_name("NOBELIUM")
print(result.json())
...
{
"canonical_name": "UNC2452",
"synonyms": null,
"info": "Fuzzy match by synonym"
}
result = norm.normalize_malware_name("Totbrick", return_synonyms=True)
print(result.json())
...
{
"canonical_name": "TrickBot",
"synonyms":
[
"TSPY_TRICKLOAD",
"TheTrick",
"Totbrick",
"TrickLoader",
"Trickster"
],
"info": "Fuzzy match by synonym"
}Also, have a look at the tests — that's the simple explanation how does it work.