This mini-repo aims to precompute and normalize large graphs.
Large graphs can also be challenging to precompute. Several graph collections are available online; one on top of the other is the Suitsparse Matrix Collection (link). However, the raw input files usually do not guarantee the contiguity of the indices (some indices may never appear in the entire set of non-zero values). While these 'ghost' indices could have sense in a Sparse matrix context, on graphs, they are completely meaningless; they only shift the indices of the following vertices without changing the graph topology. Moreover, especially in a GPU scenario where data contiguity is fundamental, these 'gost' vertices can create issues in performance.
This repo implements functions to normalize graphs by deleting ghost vertices and shifting all the following. Moreover, since it is often requested, it also allows parallel edges and self-loop deletions and vertex degree computation.
You can compile the program by using make.
makeSince the makefile is system dependent, some variable like must CUDA_HOME MPI_HOME be adapted to your envirorment.
The program accepts several command-line arguments to configure its behavior:
mpirun -np <number_of_processes> ./graph_processor -f <input_file> [options]-o <output_file>: Specify the output file prefix.-O <output_path>: Specify the output path.-m <metadata_path>: Specify the metadata path.-f <input_file>: Specify the input file.-I <max_iterations>: Specify the maximum number of iterations.-M <max_memory>: Specify the maximum memory per task in MB.
mpirun -np 4 ./graph_processor -f graph.txt -o output -O results/ -m metadata/ -I 10 -M 1024The program generates three output file: *_degree.out, *_globalmap.out, and *_mpi.mtx.
- The main output is inside
*_mpi.mtx, which contain the new mtx file where relabling was performed and no gost vertices are included. *_globalmap.outcontaing the global map used to generate*_mpi.mtx; each line include two numbers representing the new and the old vertex id.- If the
COMPUTE_DEGREEmarco is defined, the*_degree.outfile show the degree of each vertex.