The repository contains the source code for Rapidash, an efficient system to detect violations to denial constraints.
To run this project, ensure that you have the following installed:
-
Java 17 or above: The project is built and tested with Java 17. You can download the latest version of Java from the Oracle website or OpenJDK.
-
Maven: Maven is used for dependency management and building the project. You can download Maven from the Maven website.
-
Verify Java Installation: Ensure that Java 17 or above is installed by running the following command in your terminal:
java -version
The output should display a Java version of 17 or above.
-
Verify Maven Installation: Ensure that Maven is installed by running the following command in your terminal:
mvn -version
The output should display the Maven version.
-
Clone the Repository: Clone the project repository to your local machine:
git https://github.com/ZifanL/Rapidash.git cd Rapidash -
Build the Project: Use Maven to build the project:
mvn clean install
- The input dataset should be a csv file.
- Write the constraint to be verified in a file. Each line should be in the form of
[column-A] [operator] [column-B]that represents predicates.[column-A] [operator] t.[column-B], where[column-A]and[column-B]are column names (they can be the same or be different),sandtare two different rows, and[operator]should be one of==, <>, >, >=, <, <=. For example, to verifyNOT (s.Category == t.Category AND s.ID <= t.Amount), we write the following todc.txt:Category = Category ID <= Amount
java -cp target/rapidash-1.0-SNAPSHOT-jar-with-dependencies.jar org.dc.Main --dataset [path-to-csv-file] --constraint [path-to-constraint-file] --earlystop [earlystop] --treetype [treetype][path-to-csv-file]is the path to the input csv file.[path-to-constraint-file]is the path to the file that contains the denial constraint.[earlystop]is eithertrueorfalse. If it is set totrue, the system will stop when the first violation is found. If it is set tofalse, the system will output the count of the violations. The default value istrue[treetype]is eitherrange-treeorkd-tree, which specifies which data structure to use. Refer to the paper for the comparison between the two.range-treeis used by default.
We run Rapidash use a toy dataset:
java -cp target/rapidash-1.0-SNAPSHOT-jar-with-dependencies.jar org.dc.Main --dataset data/toy.csv --constraint data/dc.txtHere are the steps to reproduce the experimental results in the paper:
Download the data and uncompress. Note that the values in the datasets are encoded as integers, and the order is preserved for numerical values.
java -cp target/rapidash-1.0-SNAPSHOT-jar-with-dependencies.jar org.dc.Main --experiment [experiment-name]where [experiment-name] should be one of "tax", "tpch" and "ncvoter"
Please cite our paper if you find this repo helpful in your work:
@article{liu2024rapidash,
title={Rapidash: Efficient Detection of Constraint Violations},
author={Liu, Zifan and Deep, Shaleen and Fariha, Anna and Psallidas, Fotis and Tiwari, Ashish and Floratou, Avrilia},
journal={Proceedings of the VLDB Endowment},
volume={17},
number={8},
pages={2009--2021},
year={2024},
publisher={VLDB Endowment}
}