Skip to content

Add Script to tests/ Which Runs SemBench #261

@mdr223

Description

@mdr223

Overview

It would be great to add a script to our tests/ directory which runs our latest implementation(s) of PZ on the SemBench queries (with and without Abacus).

The benchmark consists of six "scenarios", and queries for each scenario are implemented under src/scenario/. For a given scenario, each system that was evaluated in the benchmark paper has its own queries implemented under src/scenario/<scenario_name>/runner/<system_name>_runner/<system_name>_runner.py. For example, the palimpzest queries for the Movie scenario can be found here: https://github.com/SemBench/SemBench/blob/main/src/scenario/movie/runner/palimpzest_runner/palimpzest_runner.py

Each scenario often requires some data generation / setup, which can be found at src/scenario/<scenario_name>/preparation/generate_data.py. Executing these setup scripts generally requires:

  1. Having a kaggle token which allows for automatically downloading kaggle datasets
  2. Specifying a scale factor for the dataset size

(1.) can be created for free and (2.) can be found by looking in the arxiv paper for the default scale factors.

As a starting point for writing a script to execute the queries, you should read through the Benchmark's readme: https://github.com/SemBench/SemBench/blob/main/README.md

Acceptance Criteria

This script should be invoked manually from the root of the repository (e.g. python tests/sembench.py). I do not want to prefix the name of the script with test_, because for the moment I want to execute it separately from our unit tests. (pytest will automatically execute any script under tests/ which is prefixed with test_).

A final goal would be to add a step to our .github/workflows/ci.yaml which executes the script when a pull request to main is created. If the script has significant regression(s) from our current performance on the benchmark, an exception should be thrown.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions