Temporal Rule Confidence Learning forTemporal Knowledge Graph Forecasting
pip install -r requirements.txt
python rule_based/main.py
or, if you want to specify a certain dataset
python rule_based/main.py --params DATASET_NAME tkgl-icews14
options for datasets: tkgl-icews14, tkgl-yago, tkgl-wikiold, tkgl-smallpedia, tkgl-polecat, tkgl-icews18, tkgl-gdelt, tkgl-wikidata, tkgl-icews
This runs all steps:
- loading the dataset
- preparing the example set
- learning the rules and parameters
- applying the rules to test and valid set
- evaluation on test and valid set
When you first run main.py for a dataset, you will be asked in terminal, whether you want to download the dataset. You need to answer yes in order to download it and run the code.
All relevant documents (examples E, in this case called learn_data, rules, rankings) are stored in subfolders of the folder "files"
If you want to run only a subset of the above steps, you can do so by setting it in rule_based/config-default.yaml
CREATE_LEARN_DATA_FLAG: True # do we want to create the Example data for learning the params for decay functions? if True: yes, if False: load from file
LOAD_PARAMS_FLAG: False # do we want to load the params for decay functions from file? if True: yes (needs to have been precomputed), if False: learn the params
APPLY_RULES_FLAG: True # do we want to apply the rules? if True: yes, if False: no
EVAL_VALSET_FLAG: True # do we want to compute also val mrr and apply the rules on the validation set? if True: yes, if False: no
EVAL_TESTSET_FLAG: False # to we want to compute test mrr? if True: yes, if False: no
Be aware, that e.g. in order to apply the rules, you before need to have created them, i.e. you need to first create the learn data, then learn the rules, and then only you can apply them.
See section "Configurations Guide" below, for an explanation on how to set configs
This project uses a flexible configuration system based on YAML files and command-line arguments.
By default, the system uses a file called config-default.yaml.
If you're a beginner user, you don't need to do anything: simply running the code will pick up all default settings automatically.
The default config contains:
- Dataset name
- Paths for saving results
- Which parts of the method to run (e.g. rule learning, evaluation)
- Number of CPUs
- Default values for hyperparameters
- Recommended values for specific datasets (which override the general defaults)
- Internal/debug settings (not recommended to change unless you're doing ablation studies)
If you want to customize parameters, there are multiple ways to do so. (click triangle on left to expand)
You can directly edit config-default.yaml.
✅ Easy, but can make version control and reproducibility harder.
Best practice:
- Copy
config-default.yamlto a new file (e.g.my-config.yaml). - Modify only the parameters you want.
- Tell the program to use your custom config:
Example (command line):
python rule_based/main.py --config my-config.yamlExample (in main.py)
parser.add_argument("--config", type=str, default="my-config.yaml", help="Path to the configuration file")
You can override any config value by passing --params at runtime.
Example: Overriding the dataset
python rule_based/main.py --params DATASET_NAME='tkgl-icews14'Example: Overriding multiple params
python rule_based/main.py --params DATASET_NAME='tkgl-icews14' Z_RULES_FACTOR=0.2 LEARN_WINDOW_SIZE=100This is useful for quick experiments, grid search, or scripting.
You can also pass parameter overrides directly from a Python script using the options_call argument in main().
from main import main
options_call = {
"DATASET_NAME": "tkgl-yago",
"Z_RULES_FACTOR": 0.22352,
"LEARN_PARAMS_OPTION": "static"
}
val_mrr = main(options_call=options_call)This is ideal when you're using Python to orchestrate multiple experiments or doing hyperparameter sweeps.
You can keep your experiment management clean and reproducible in code without editing config files or writing long command-line calls.
We recommend using only one of these override mechanisms at a time:
| Method | Best for |
|---|---|
| Config file | Reproducibility, sharing setups |
--params |
Quick overrides from terminal |
options_call |
Programmatic workflows in Python |
When multiple sources are used, the final configuration is resolved in this order (lowest to highest precedence):
1️⃣ Default parameters
- Defined at the top level of
config-default.yaml. - These are general-purpose starting values.
2️⃣ Dataset-specific overrides
- Located under
DATASET_OVERRIDESinconfig-default.yaml. - Activated automatically if
DATASET_NAMEis set (even via later overrides!). - These overwrite the general defaults to match known good settings for specific datasets.
3️⃣ Command-line overrides (--params)
- Passed as terminal arguments.
- These overwrite both defaults and dataset-specific settings.
- Great for quickly testing changes without editing files.
4️⃣ Programmatic overrides (options_call)
- Passed directly to the
main()function in code. - Highest precedence.
- Ideal for scripts, hyperparameter sweeps, or notebooks.
Special note on DATASET_NAME and dataset-specific overrides
If you change DATASET_NAME via --params or options_call, the system automatically re-applies the corresponding dataset-specific overrides—but only for parameters you did not explicitly set in your overrides.
- Results are stored in files/results/datasetname-results.csv
- This file contains all relevant config param values as well als hits and mrr values, and runtimes
- used so far in existing tkg works, e.g. baseline paper:
tkgl-icews14, tkgl-icews18, tkgl-gdelt, tkgl-yago, tkgl-wikiold - from tgb 2.0:
tkgl-icews, tkgl-polecat, tkgl-smallpedia, tkgl-wikidata
- folder tgb/datasets/tkgl-yago
- you can download the datasets by running the following:
name=`tkgl-yago`
from tgb.linkproppred.dataset import LinkPropPredDataset
dataset = LinkPropPredDataset(name= name, root=dir_data, preprocess=True)
- when running this code you will be asked whether the dataset should be downloaded
*entity2id.txt andrel2id.txt contain the mapping from ids to strings; for wiki datasets and gdelt datasets, I fetched the strings from the internet and for gdelt from the cameo database
*node_mapping.csv andrel_mapping.csv contain the infos from original id, the id that is used in tgb internatlly, and the string.
- the split is done automatically in tgb. you can e.g. access it by
self.train_data = self.all_quads[self.dataset.train_mask]
self.val_data = self.all_quads[self.dataset.val_mask]
self.test_data = self.all_quads[self.dataset.test_mask]
- when loading the datasets in tgb, they automatically contain the inverse triples, i.e. for each triple, sub_id, rel_id, ob_id, the inverse triple ob_id, rel_id+num_rels, sub_id is present.
- in
datatasetname_edgelist.csvonly the original quadruples are present, in the order timestamp,head,tail,relation_type, without inverse.
- Evaluation is done autmatically when running
main.py - The evaluation is conducted using the TGB 2.0 framework (https://tgb.complexdatalab.com/). The relevant code is taken from https://github.com/shenyangHuang/TGB.
- We add the tgb code in the Folder /tgb/
- We manually added the datasets
tkgl-icews14, tkgl-icews18, tkgl-gdelt, tkgl-yago, tkgl-wikioldand used the same splits as suggested in the evaluation paper by Gastinger et al. (https://dl.acm.org/doi/10.1007/978-3-031-43418-1_32) - You can however also run evaluation for a given rankings file by running
rul_based/eval.pywhen specifying the path to the rankings filepath_rankings_val = "/files/rankings/filename.txt"