Feat: Add Triangle Counting, Threshold Analysis Script, and New Dataset Pipeline#3
Open
hetvi3012 wants to merge 5 commits intoZhengChenCS:mainfrom
Open
Feat: Add Triangle Counting, Threshold Analysis Script, and New Dataset Pipeline#3hetvi3012 wants to merge 5 commits intoZhengChenCS:mainfrom
hetvi3012 wants to merge 5 commits intoZhengChenCS:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi @ZhengChenCS and team,
This pull request introduces several extensions and a bug fix developed as part of a university project exploring the excellent CompressGraph paper and framework. These additions aim to enhance the utility and understanding of the CompressGraph system.
Summary of Contributions:
Parallel Triangle Counting Algorithm (CPU): Implemented a common graph analytics kernel, triangle counting, to run directly on the compressed graph representation using the Ligra-based CPU framework.
Compression Threshold Analysis Script: Added a shell script (analyze_threshold.sh) to automate the analysis of the trade-off between the rule filter threshold, compression ratio, and BFS execution time.
Pipeline for New Datasets: Created a shell script (run_full_pipeline.sh) to streamline the process of testing CompressGraph on new graph datasets provided in edgelist format, including conversion, compression, filtering, and running analytics. Also includes results from testing on collaboration (ca-GrQc) and road (roadNet-CA) networks.
Bug Fix: Addressed a compilation issue by adding the missing header to the deps/ligra submodule code (utils.h and ligra.h), ensuring compatibility with stricter compilers/newer standards.
Details of Extensions:
Goal: Demonstrate the extensibility of the framework by adding a new algorithm.
Implementation: Follows a standard parallel approach, iterating through vertices u, then neighbors v > u, then neighbors w > v, checking if (u, w) exists. It correctly uses the edgeMap function to iterate neighbors over the compressed graph representation and employs a boolean array as a hash set for efficient neighbor lookups. An atomic counter ensures thread safety.
Result: Successfully counts triangles on the cnr-2000 sample graph.
Goal: Experimentally verify and quantify the space-vs-time trade-off associated with the rule filter threshold, as discussed in Section 6.5 of the paper.
Implementation: The script loops through specified thresholds (default: 8, 16, 32, 64, 128). For each threshold, it:
Copies the baseline compressed files (csr_vlist.bin, csr_elist.bin, info.bin).
Runs the filter executable.
Measures the resulting size of csr_vlist.bin + csr_elist.bin.
Runs convert2ligra to generate the Ligra graph format.
Runs bfs_cpu -r 3 and extracts the timing for the last run.
Result: Outputs a CSV-formatted table suitable for plotting, showing how compression ratio and BFS time vary with the threshold.
Goal: Evaluate CompressGraph's effectiveness on graph structures different from web graphs and provide a tool for easier testing on new datasets.
Implementation: The script takes a graph name as input (expecting <graph_name>.edgelist in dataset/). It automates the entire workflow: edgelist2csr -> compress -> filter -> (convert2ligra, save_degree, gene_rule_order) -> bfs_cpu -> pagerank_cpu. Results are stored in a dedicated directory (e.g., <graph_name>_results).
Results on ca-GrQc and roadNet-CA:
Compression ratio was significantly lower (1.09 and 0.97 respectively) compared to cnr-2000 (~2.37), confirming the technique relies heavily on the structural redundancy common in web/social graphs.
The pagerank_cpu executable encountered segmentation faults on both new datasets, suggesting potential issues when applied to different graph types.
Bug Fix (deps/ligra/)
Added #include to deps/ligra/ligra/utils.h and deps/ligra/ligra/ligra.h to resolve compilation errors related to undefined types like uint32_t on some systems. This fix was committed within the submodule and the main repository now points to the updated submodule commit.
We believe these additions provide valuable tools for analyzing CompressGraph's performance and demonstrate its application to new algorithms and datasets. We hope these contributions are useful to the project.
Thank you for developing CompressGraph and sharing it with the community!
Best regards, Hetvi Bagdai