Skip to content

Feat: Add Triangle Counting, Threshold Analysis Script, and New Dataset Pipeline#3

Open
hetvi3012 wants to merge 5 commits intoZhengChenCS:mainfrom
hetvi3012:my-project-extension
Open

Feat: Add Triangle Counting, Threshold Analysis Script, and New Dataset Pipeline#3
hetvi3012 wants to merge 5 commits intoZhengChenCS:mainfrom
hetvi3012:my-project-extension

Conversation

@hetvi3012
Copy link

Hi @ZhengChenCS and team,

This pull request introduces several extensions and a bug fix developed as part of a university project exploring the excellent CompressGraph paper and framework. These additions aim to enhance the utility and understanding of the CompressGraph system.

Summary of Contributions:

Parallel Triangle Counting Algorithm (CPU): Implemented a common graph analytics kernel, triangle counting, to run directly on the compressed graph representation using the Ligra-based CPU framework.

Compression Threshold Analysis Script: Added a shell script (analyze_threshold.sh) to automate the analysis of the trade-off between the rule filter threshold, compression ratio, and BFS execution time.

Pipeline for New Datasets: Created a shell script (run_full_pipeline.sh) to streamline the process of testing CompressGraph on new graph datasets provided in edgelist format, including conversion, compression, filtering, and running analytics. Also includes results from testing on collaboration (ca-GrQc) and road (roadNet-CA) networks.

Bug Fix: Addressed a compilation issue by adding the missing header to the deps/ligra submodule code (utils.h and ligra.h), ensuring compatibility with stricter compilers/newer standards.

Details of Extensions:

  1. Triangle Counting (src/apps_ligra/triangle.cpp, script/cpu/triangle.sh)

Goal: Demonstrate the extensibility of the framework by adding a new algorithm.

Implementation: Follows a standard parallel approach, iterating through vertices u, then neighbors v > u, then neighbors w > v, checking if (u, w) exists. It correctly uses the edgeMap function to iterate neighbors over the compressed graph representation and employs a boolean array as a hash set for efficient neighbor lookups. An atomic counter ensures thread safety.

Result: Successfully counts triangles on the cnr-2000 sample graph.

  1. Threshold Analysis (script/analyze_threshold.sh)

Goal: Experimentally verify and quantify the space-vs-time trade-off associated with the rule filter threshold, as discussed in Section 6.5 of the paper.

Implementation: The script loops through specified thresholds (default: 8, 16, 32, 64, 128). For each threshold, it:

Copies the baseline compressed files (csr_vlist.bin, csr_elist.bin, info.bin).

Runs the filter executable.

Measures the resulting size of csr_vlist.bin + csr_elist.bin.

Runs convert2ligra to generate the Ligra graph format.

Runs bfs_cpu -r 3 and extracts the timing for the last run.

Result: Outputs a CSV-formatted table suitable for plotting, showing how compression ratio and BFS time vary with the threshold.

  1. New Dataset Pipeline (script/run_full_pipeline.sh) & Testing

Goal: Evaluate CompressGraph's effectiveness on graph structures different from web graphs and provide a tool for easier testing on new datasets.

Implementation: The script takes a graph name as input (expecting <graph_name>.edgelist in dataset/). It automates the entire workflow: edgelist2csr -> compress -> filter -> (convert2ligra, save_degree, gene_rule_order) -> bfs_cpu -> pagerank_cpu. Results are stored in a dedicated directory (e.g., <graph_name>_results).

Results on ca-GrQc and roadNet-CA:

Compression ratio was significantly lower (1.09 and 0.97 respectively) compared to cnr-2000 (~2.37), confirming the technique relies heavily on the structural redundancy common in web/social graphs.

The pagerank_cpu executable encountered segmentation faults on both new datasets, suggesting potential issues when applied to different graph types.

Bug Fix (deps/ligra/)

Added #include to deps/ligra/ligra/utils.h and deps/ligra/ligra/ligra.h to resolve compilation errors related to undefined types like uint32_t on some systems. This fix was committed within the submodule and the main repository now points to the updated submodule commit.

We believe these additions provide valuable tools for analyzing CompressGraph's performance and demonstrate its application to new algorithms and datasets. We hope these contributions are useful to the project.

Thank you for developing CompressGraph and sharing it with the community!

Best regards, Hetvi Bagdai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant