diff --git a/.gitignore b/.gitignore index a59ec565..7ec2f1fb 100644 --- a/.gitignore +++ b/.gitignore @@ -6,7 +6,7 @@ cis565_getting_started_generated_kernel* *.vcxproj *.xcodeproj build - +build/ # Created by https://www.gitignore.io/api/linux,osx,sublimetext,windows,jetbrains,vim,emacs,cmake,c++,cuda,visualstudio,webstorm,eclipse,xcode ### Linux ### @@ -25,7 +25,8 @@ build .LSOverride # Icon must end with two \r -Icon +Icon + # Thumbnails ._* diff --git a/README.md b/README.md index 0e38ddb1..5883b398 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,285 @@ -CUDA Stream Compaction -====================== +# CUDA Stream Compaction +* Hi! I am Vismay Churiwala, here are my socials: + * [LinkedIn](https://www.linkedin.com/in/vismay-churiwala-8b0073190/) | [Website](https://vismaychuriwala.com/) +* **System Specs (My Machine):** + * OS: Windows 11 + * CPU: AMD Ryzen 7 5800H with Radeon Graphics (8C/16T, 3.2GHz base) + * RAM: 32GB DDR4 + * GPU: NVIDIA GeForce RTX 3060 Laptop GPU (6GB GDDR6) + * CUDA Toolkit: 13.0 + * Driver Version: 581.15 -**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2** +--- +This is a CUDA-powered parallel implementation of stream compaction. Stream compaction reduces arrays by retaining only non-zero values. This seems to be a fairly straightforward problem that can be coded sequentially in a couple of minutes, but as we will see, this is extremely slow on the CPU for large arrays and using a GPU to parallelize this process can be orders of magnitude faster. -* (TODO) YOUR NAME HERE - * (TODO) [LinkedIn](), [personal website](), [twitter](), etc. -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +Here are some cool features I have implemented: +* Using **Shared Memory (SM)** for efficient memory access on the GPU (as opposed to using global memory, which is more than 100× slower ([see more here](https://www.ce.jhu.edu/dalrymple/classes/602/Class13.pdf))). +* **Hardware optimization** via [bank conflicts](https://forums.developer.nvidia.com/t/how-to-understand-the-bank-conflict-of-shared-mem/260900) prevention. +* Recursive scanning to scan **arrays of arbitrary sizes** (tested up to 1B elements (2^30), which took 3470.59 ms). +* **Radix Sort** using Parallel Scan +* Naive and CPU-based implementations to compare and benchmark techniques. +* Customized testing code to collect, average, and plot GPU and CPU timings. -### (TODO: Your README) +## CUDA Scan + Stream Compaction +Scan is a prefix-sum. I used the work‑efficient Blelloch scan (upsweep/downsweep) on the GPU so the total work stays O(n) and the depth is O(log n). Each block scans its chunk in shared memory. I used strided indexing to dodge bank conflicts, and I kept the loads/stores coalesced. For large inputs, I split the data into chunks and assign them to blocks. The blocks write their partial sums. I scan those recursively (so partial sums that don’t fit in a single block are scanned across multiple blocks), and then add the offsets back (uniform add) so it scales to arbitrary sizes and NPOTs. -Include analysis, etc. (Remember, this is public, so don't put -anything here that you don't want to share with the world.) +Stream compaction rides on top of scan. First I build a bools array (0/1 flag per element, i.e., keep if value != 0). Then I exclusive‑scan the flags to get output indices, and scatter only the keepers. That makes the compaction stable and very GPU‑friendly. I include a naive version, a work‑efficient version, and a CPU baseline so you can sanity‑check correctness and see the speedups on both power‑of‑two and non‑power‑of‑two (NPOT) sizes. +## Radix Sort using CUDA +I implement an LSD integer radix sort on the GPU built on the same work‑efficient Blelloch scans used above. For each bit (0→31), I split elements into 0/1 buckets via bit tests, exclusive‑scan the flags to get stable write indices, and scatter into ping‑pong buffers. Blocks use shared memory with coalesced global accesses; NPOT inputs are handled by padding internal scan buffers while keeping counts based on the original length. The algorithm is stable, scales to large arrays, and currently targets non‑negative 32‑bit integers. A CPU `std::sort` baseline is included for correctness and performance comparisons. + +![Radix Sort](img/radix_sort_nvidia.png) + +[Source - NVIDIA](https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda) +## Performance Analysis +I benchmarked each method across a range of sizes (both power‑of‑two and NPOT), averaged over 10 runs. The work‑efficient GPU scan/compaction wins big, especially as input grows; NPOT handling adds minor overhead but stays close. + +Comparing scan and stream compaction for sizes ranging from 260K to 260M elements: + +### Time vs Log(Size) +![Timings (Both)](plots/timings_plot_both.png) + +Here we can see how the timings for the naive GPU method and the CPU method increase much faster than the work‑efficient versions. This is true for both power‑of‑two and non‑power‑of‑two sizes (recall that we pad NPOT sizes to the next power of two). The NPOT cases have nearly the same timings as the power‑of‑two cases, with slight overhead. Below we highlight the differences more clearly by plotting the log‑log over an extended range of sizes. Note that the CPU implementation is not the serial version of the naive scan. I have tried to implement the fastest serial scan and compaction I can to have a good baseline. + +### Log(Time) vs Log(Size) + We plot a log-log plot to get a better idea of the differences below. This includes the whole range of values starting at 256 elements and going all the way to 256M values. Notice how the CPU version is faster until arrays are large enough to overcome the parallelization overhead. Also note how hardware optimizations make the work‑efficient scan much faster than the naive version. If I were to use global memory, I suspect my scan and compaction would have been a lot slower (closer to the naive scan). +![Timings (POW2, log‑log)](plots/timings_plot_both_loglog_full.png) + +You can find the raw images and variants in `plots/` if you want the linear‑scale or separate power‑of‑two and NPOT versions. + +### Optimizing Block Size +I ran the work‑efficient compaction for multiple block sizes to get the best block size for my implementation. Here are the timings for 33M elements (2^25), averaged over 10 runs. +![Timings (block sizes)](plots/blocksize_avg.png) +A block size of 128 seems to be a good choice for `Efficient::compact`. + +### Radix Sort + +I implemented radix sort using `Efficient::recursiveScan` to sort positive integer arrays of arbitrary lengths. I used `std::sort` for the CPU version to serve as a baseline. + +![Timings (Radix Sort)](plots/radix_timings_log_linear_gt_2pow18.png) + +Notice that just like our Efficient Scan implementation, the GPU Radix sort is much faster than the CPU version for sufficiently large sizes. + +![Timings (Radix Sort Full)](plots/radix_timings_loglog_full.png) + +The timings for both powers of two and NPOT are very similar. + +## Implementation + +### CPU Scan and Compact +The CPU scan is a serialized scan that runs over the array and adds the previous element to the next elements serially. I made this implementation as simple as possible and haven't tried to replicate Naive scan serially since I wanted a good baseline against the GPU. +The CPU compact has two variants that compact with and without using scan. + +### Naive Scan and Compact +The naive scan follows the implementation below: +![](img/figure-39-2.jpg) + +This is an inclusive scan. I wrote a kernel `Naive::make_exclusive` that shifts the inclusive scan to the right and sets the first element to zero. I used ping-ponging buffers to calculate the scan across steps. + +`Naive::compact` creates a bool array using `Common::kernMapToBoolean` and uses `Naive::scan` to get an array of indices. The output is then filled in by using the bool array and the array of indices in `Common::kernScatter`. +We use global memory throught this process, and there is a lot of wasted work, which makes the naive implementation very slow. + +### Efficient Scan + +`Efficient::recursiveScan` divides the array into chunks and each block uses shared memory. The blocks are scanned using `Efficient::multiscan` (which follows the implementation in [39.2](https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda)). The sum of all elements in a chunk (the last element of the individual scan) is stored inside `blockSums`, which is then scanned again recursively using `Efficient::recursiveScan`. The output is then shifted using the `blockSums` array and this results in a complete scan of arbitrary sized arrays in parallel. + +#### `Efficient::mutiscan` + +`Efficient::multiscan` is based on the pre-scan implemented in [39.2.2](https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda)(`Efficient::recursiveScan`). My implementation extends this to accommodate **arbitrary-sized** arrays. `multiscan` takes in arbitrary sized arrays in global memory and uses **shared memory** within each single block. It computes the scan for each chunk and stores the result to the global output. It also stores the last element from each chunk to `blockSums`. I have used strided indexing to **prevent bank conflicts** as suggested in [39.2.4]((https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda)). + +However, I noticed that the macros provided in 39.2.3 were incorrect, as the number of banks have increased to 32 in recent GPU architectures and the macro for CONFLICT_FREE_OFFSET(n) had been incorrectly calculated (see [blog](https://forums.developer.nvidia.com/t/conflict-free-offset-macro-why-and-how-does-it-work/343324)). + +I replaced it with the correct macros: + +**Correct Macros**: +``` +#define NUM_BANKS 32 +#define LOG_NUM_BANKS 5 +#define CONFLICT_FREE_OFFSET(n) ((n) >>(LOG_NUM_BANKS)) +``` +Incorrect Macros: +``` +// #define NUM_BANKS 16 +// #define LOG_NUM_BANKS 4 +// #define CONFLICT_FREE_OFFSET(n)((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS)) +``` + +#### `Efficient::recursiveScan` + +My implementation of `Efficient::multiscan` was working well until I started to scan arrays of very large sizes (~ 2M i.e. 2^21 elements). This is when I realized that the array of accumulated sums `blockSums` was getting too large to be scanned using a single Chunk. I considered using CPU::scan or a Naive Scan to fix this issue, but I knew that the performance hit would be a lot, especially in moving data between devices or allocation/reallocation. So I decided to use recursion to fix this issue. + +recursiveScan accepts arbitrary-sized arrays, calls `Efficient::multiscan` to compute scan of individual chunks, collects `blockSums`; and recursively calls `Efficient::recursiveScan` to compute the scan of `blockSums`. After getting this scan, the output array(containing the scan of all elements) is shifted by the elements in `blockSums` using `Efficient::uniformAdd`. This gets around the issues of computing the scan of `blockSums` for large arrays that don't fit inside a single chunk. + + +### Thrust Scan +Thrust::scan is very simple, the code is as follows: +``` +copy host -> device +thrust::device_vector d_in(idata, idata + n); +thrust::device_vector d_out(n); + +timer().startGpuTimer(); +// exclusive scan on device +thrust::exclusive_scan(d_in.begin(), d_in.end(), d_out.begin()); +cudaDeviceSynchronize(); // ensure timing is correct +timer().endGpuTimer(); + +// copy device -> host +thrust::copy(d_out.begin(), d_out.end(), odata); +``` +Even though thrust is CUDA's own implementation of exclusive scan, we can see that our work-efficient scan seems to perform better. This likely comes down to how we record the timings for the scan. There are a few possibilities I think might be causing this: + +* The memory allocation (`cudaMalloc`)for temporary arrays is probably happening after `thrust::exclusive_scan` is called, and is being counted in the GPU timing. This is not the case when we have finer control over where to insert the timer as in our other implementations. + +* Although we pre-copy and allocate the data using `thrust::device_vector` for the thrust scan, the actual allocation might be happening asynchronously inside the timer loop. + +* The GPU might not be warmed up and that can have some overhead. Recall how in all other methods we allocate and copy and do operations on arrays before starting the timer. + +I ran a profile on NSight Compute and got the following profile for thrust::exclusive_scan: +![](img/thrust_profile.png) +The operation is memory-limited, which makes sense for a scan. The compute throughput is low for all kernels, which means that the threads are waiting for memory access rather than compute. + +`static_kernel` has especially low compute - this might be happening because the static kernel is allocating temporary arrays while the Scan kernel performs the actual scan, but this is just a guess without looking at the code. + +### Radix Sort +I implemented a stable LSD radix sort that uses scan to partition the array by one bit at a time (least‑significant to most‑significant). Each pass is a stable split into 0‑bucket then 1‑bucket, so after 32 passes (for `int`) the array is fully sorted. + +- Per‑bit flags: For bit `k`, I build `b1[i] = ((unsigned)idata[i] >> k) & 1` with `radix_to_bools`, and `b0[i] = 1 - b1[i]` via `negate_bools_into`. +- Indices via scan: I exclusive‑scan `b1` and `b0` with my work‑efficient `Efficient::recursiveScan` to get write indices `idxOnes` and `idxZeros`. +- Scatter (stable): I compute `totalOnes` from the last prefix element plus the last flag, then `totalZeros = m - totalOnes` (where `m` is the original length, not padded). `assign_indexes` writes zeros to `odata[idxZeros[i]]` and ones to `odata[idxOnes[i] + totalZeros]`, preserving original order within each bucket. +- Ping‑pong buffers: I alternate `dev_bufA`/`dev_bufB` between passes so each pass reads from one and writes to the other. + +NPOT handling: I pad internal scan buffers up to the next power of two (`n = 1 << ilog2ceil(m)`) but keep all counts and scatters based on `m`. I also zero‑fill the tails of `b0`/`b1` when `n > m` so padding never contributes to totals. That keeps behavior identical for both power‑of‑two and NPOT sizes. + +Timing notes: Allocation and the initial H2D copy are outside the GPU timer; the timer wraps only the 32 per‑bit passes. Scans reuse the same recursive work‑efficient routine used in stream compaction, so accesses stay coalesced and use shared memory within blocks. + +Limitations/assumptions: Bit tests cast inputs to `unsigned` for correctness of shifts; my tests generate non‑negative data. If you need signed ascending order with negatives, you can post‑process or adjust the final pass to handle the sign bit specially. + +## Testing and Plotting + +The timings shown in this report are the exact times used by the CPU and GPU in calculating the scan. This does not include the times allocating memory for the scan function,copying memory between host and device etc. There are some nuances in `Efficient::scan` (`recursiveScan`) and `Thrust::scan` timings (discussed in more detail later), where intermediate arrays are allocated within the scan loop, so these are counted. But all other arrays like bools, indices etc. are preallocated and copied, and the output is copied to host after stopping the timers. + +I rewrote some of the code in `main.cpp` to get a format easier to read algorithmically and plot. I then diverted the `std::cout` to a file `timings.txt` (See `\plots\data`), where I read the txt file using python and computed the averages over multiple runs and plotted it using python (see `\plots\code`). I did the same for optimizing blockSizes. + +Original Output: +``` +**************** +** SCAN TESTS ** +**************** + [ 18 36 37 45 18 24 5 36 41 3 46 20 41 ... 3 0 ] +==== cpu scan, power-of-two ==== + elapsed time: 15.6374ms (std::chrono Measured) + [ 0 18 54 91 136 154 178 183 219 260 263 309 329 ... 821845107 821845110 ] +==== cpu scan, non-power-of-two ==== + elapsed time: 16.7006ms (std::chrono Measured) + [ 0 18 54 91 136 154 178 183 219 260 263 309 329 ... 821845015 821845057 ] +passed +==== naive scan, power-of-two ==== + elapsed time: 25.4879ms (CUDA Measured) +passed +==== naive scan, non-power-of-two ==== + elapsed time: 25.2602ms (CUDA Measured) +passed +==== work-efficient scan, power-of-two ==== + elapsed time: 3.4472ms (CUDA Measured) +passed +==== work-efficient scan, non-power-of-two ==== + elapsed time: 2.82419ms (CUDA Measured) +passed +==== thrust scan, power-of-two ==== + elapsed time: 1.5953ms (CUDA Measured) +passed +==== thrust scan, non-power-of-two ==== + elapsed time: 1.8985ms (CUDA Measured) +passed + +***************************** +** STREAM COMPACTION TESTS ** +***************************** + [ 1 3 1 2 1 3 2 1 1 1 0 2 1 ... 2 0 ] +==== cpu compact without scan, power-of-two ==== + elapsed time: 55.6823ms (std::chrono Measured) + [ 1 3 1 2 1 3 2 1 1 1 2 1 2 ... 3 2 ] +passed +==== cpu compact without scan, non-power-of-two ==== + elapsed time: 57.0865ms (std::chrono Measured) + [ 1 3 1 2 1 3 2 1 1 1 2 1 2 ... 2 3 ] +passed +==== cpu compact with scan ==== + elapsed time: 100.3ms (std::chrono Measured) + [ 1 3 1 2 1 3 2 1 1 1 2 1 2 ... 3 2 ] +passed +==== work-efficient compact, power-of-two ==== + elapsed time: 5.62285ms (CUDA Measured) + [ 1 3 1 2 1 3 2 1 1 1 2 1 2 ... 3 2 ] +passed +==== work-efficient compact, non-power-of-two ==== + elapsed time: 5.44666ms (CUDA Measured) + [ 1 3 1 2 1 3 2 1 1 1 2 1 2 ... 2 3 ] +passed +Press any key to continue . . . +``` +Testing Output: +``` +** SCAN TESTS ** +SIZE= 33554432 +17.3174 +16.2627 +25.3695 +25.0887 +3.09696 +2.87846 +1.57798 +1.72134 + +58.9666 +61.3826 +103.195 +5.72349 +5.43437 +``` + +Radix Sort: +``` +********************** +** RADIX SORT TESTS ** +********************** + [ 18284 4031 17105 7014 14969 10298 28093 25039 9467 30141 2902 28383 25462 ... 19308 0 ] +==== cpu sort, power-of-two ==== + elapsed time: 1518.95ms (std::chrono Measured) + [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 32767 32767 ] +==== cpu sort, non-power-of-two ==== + elapsed time: 1535.09ms (std::chrono Measured) + [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 32767 32767 ] +==== Radix Sort, power-of-two ==== + elapsed time: 310.548ms (CUDA Measured) + [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 32767 32767 ] +passed +==== Radix Sort, non-power-of-two ==== + elapsed time: 5.69754ms (CUDA Measured) + [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 32767 32767 ] +passed +``` + +Testing Output: +``` +SIZE= 33554432 +1516.88 +1507.03 +287.379 +303.782 +``` +I tried a for loop in C++ to get multiple such runs, but the code was running asynchronously and I was getting incorrect timings. I tried using `cudaDeviceSynchronize()` and/or `cudaDeviceReset()` between calls, but these didn't solve my timings issues. + +Since the number of runs weren't very large, I ended up running each input multiple times in Visual Studio and appending to a timings.txt using `iostream`. + +#### Notes + +- Likely Performance Bottlenecks: + - CPU - Serialization, fundamentally what we're trying to solve + - Naive - Global I/O reads and writes, unused threads in blocks. + - Thrust - Likely appears slow due to difference in time-calculation. + - Efficient-Work - Quite fast - other optimizations include ping-ponging buffers in `recursiveScan` to reduce allocation. +- Memory allocation takes time - scan is quite memory-intensive, curious about ways to reduce memory footprint. Also curious to see how my scan compares to thrust when comparing wall-clock time. +- Would be interesting to quantify the speedup from reducing Bank-Conflicts compared to traditional indexing. Also same with SM. +- Radix sort on a single block seems straightforward, merging multiple blocks seems challenging. diff --git a/img/radix_sort_nvidia.png b/img/radix_sort_nvidia.png new file mode 100644 index 00000000..8ebe1c99 Binary files /dev/null and b/img/radix_sort_nvidia.png differ diff --git a/img/thrust_profile.png b/img/thrust_profile.png new file mode 100644 index 00000000..ce49ca85 Binary files /dev/null and b/img/thrust_profile.png differ diff --git a/plots/blocksize_avg.png b/plots/blocksize_avg.png new file mode 100644 index 00000000..2e5bcaa4 Binary files /dev/null and b/plots/blocksize_avg.png differ diff --git a/plots/code/plot_blocksize_avg.py b/plots/code/plot_blocksize_avg.py new file mode 100644 index 00000000..d99fc906 --- /dev/null +++ b/plots/code/plot_blocksize_avg.py @@ -0,0 +1,99 @@ +#!/usr/bin/env python3 +import argparse +from collections import defaultdict +import math +import sys + +import matplotlib +matplotlib.use("Agg") # Ensure non-interactive backend for headless runs +import matplotlib.pyplot as plt + + +def parse_timings(path: str) -> dict[int, list[float]]: + data: dict[int, list[float]] = defaultdict(list) + try: + with open(path, "r", encoding="utf-8") as f: + lines = [ln.strip() for ln in f if ln.strip()] + except FileNotFoundError: + print(f"Input file not found: {path}", file=sys.stderr) + sys.exit(1) + + i = 0 + while i < len(lines): + line = lines[i] + if line.startswith("blockSize"): + # Expect format like: "blockSize= 32" + try: + _, rhs = line.split("=", 1) + block = int(rhs.strip()) + except Exception: + i += 1 + continue + # Next non-empty line should be the timing + if i + 1 < len(lines): + try: + t = float(lines[i + 1]) + data[block].append(t) + except Exception: + pass + i += 2 + continue + i += 1 + + if not data: + print("No timings parsed. Please check input format.", file=sys.stderr) + sys.exit(1) + return data + + +def compute_averages(data: dict[int, list[float]]): + blocks = sorted(data.keys()) + avgs = [sum(data[b]) / len(data[b]) for b in blocks] + return blocks, avgs + + +def plot(blocks, avgs, output_path: str | None): + fig, ax = plt.subplots(figsize=(7, 4)) + ax.plot(blocks, avgs, marker="o", linestyle="-", color="#1f77b4") + ax.set_xlabel("Block Size") + ax.set_ylabel("Average Time (ms)") + ax.set_title("Block Size vs Average Timing(lower is better)") + # Use logarithmic scale for block size (base 2 preferred) + try: + ax.set_xscale("log", base=2) + except TypeError: + # Older Matplotlib versions use 'basex' + ax.set_xscale("log", basex=2) + ax.grid(True, which="both", linestyle=":", linewidth=0.6, alpha=0.7) + ax.set_xticks(blocks) + ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter()) + ax.get_xaxis().set_minor_formatter(matplotlib.ticker.NullFormatter()) + plt.tight_layout() + if output_path: + plt.savefig(output_path, dpi=200) + else: + # Default filename + plt.savefig("blocksize_avg.png", dpi=200) + + +def main(): + parser = argparse.ArgumentParser(description="Plot block size vs average timings (log-scaled x-axis).") + parser.add_argument("--input", "-i", default="../data/blockSize_timings.txt", help="Path to timings input file.") + parser.add_argument("--output", "-o", default="../blocksize_avg.png", help="Output PNG path for the plot.") + args = parser.parse_args() + + data = parse_timings(args.input) + blocks, avgs = compute_averages(data) + + # Print summary to stdout + print("Averages (blockSize -> avg time):") + for b, a in zip(blocks, avgs): + print(f"{b} -> {a:.5f}") + + plot(blocks, avgs, args.output) + print(f"Saved plot to: {args.output}") + + +if __name__ == "__main__": + main() + diff --git a/plots/code/plotting.py b/plots/code/plotting.py new file mode 100644 index 00000000..d2f25280 --- /dev/null +++ b/plots/code/plotting.py @@ -0,0 +1,519 @@ +import re +from pathlib import Path +from typing import List, Dict, Any, Tuple + +import pandas as pd +import matplotlib.pyplot as plt +from matplotlib.ticker import FuncFormatter, LogLocator, FixedLocator, LogFormatterMathtext + +# Configuration (resolve paths relative to this file for robustness) +BASE_DIR = Path(__file__).resolve().parent # plots/code +PLOTS_DIR = BASE_DIR.parent # plots +DATA_DIR = PLOTS_DIR / "data" # plots/data + +INPUT_PATH = DATA_DIR / "timings.txt" +OUTPUT_ALL_RUNS_CSV = DATA_DIR / "timings_all_runs_long.csv" +OUTPUT_AVG_CSV = DATA_DIR / "timings_avg.csv" +OUTPUT_PLOT = DATA_DIR / "timings_plot.png" +OUTPUT_PLOT_POW2 = PLOTS_DIR / "timings_plot_pow2.png" +OUTPUT_PLOT_NONPOW2 = PLOTS_DIR / "timings_plot_nonpow2.png" +OUTPUT_PLOT_BOTH = PLOTS_DIR / "timings_plot_both.png" +OUTPUT_PLOT_BOTH_LOG = PLOTS_DIR / "timings_plot_both_loglog.png" +OUTPUT_PLOT_POW2_LOG = PLOTS_DIR / "timings_plot_pow2_loglog.png" +OUTPUT_PLOT_NONPOW2_LOG = PLOTS_DIR / "timings_plot_nonpow2_loglog.png" +OUTPUT_PLOT_BOTH_LOG_FULL = PLOTS_DIR / "timings_plot_both_loglog_full.png" + +# If your numbers are in seconds (very likely), set this to True to convert +# to milliseconds for plotting convenience. +CONVERT_TO_MS = True +Y_LABEL = "Time (ms)" if CONVERT_TO_MS else "Time (s)" +MIN_PLOT_SIZE = 2 ** 18 # start plots from 2^18 + +# Limit how many runs per size to include in outputs/plots +MAX_RUNS_PER_SIZE = 10 + +# Expected order in the file (exactly 13 numbers per block): +SCAN_METHODS = [ + "scan_cpu_pow2", + "scan_cpu_non_pow2", + "scan_naive_pow2", + "scan_naive_non_pow2", + "scan_work_efficient_pow2", + "scan_work_efficient_non_pow2", + "scan_thrust_pow2", + "scan_thrust_non_pow2", +] +COMPACT_METHODS = [ + "compact_cpu_without_scan_pow2", + "compact_cpu_without_scan_non_pow2", + "compact_cpu_with_scan", + "compact_work_efficient_pow2", + "compact_work_efficient_non_pow2", +] +N_SCAN = len(SCAN_METHODS) +N_COMPACT = len(COMPACT_METHODS) +N_PER_RUN = N_SCAN + N_COMPACT + +# Plot style for better readability +try: + plt.style.use("seaborn-whitegrid") +except Exception: + pass + + +def _is_float_token(tok: str) -> bool: + return bool( + re.match( + r"^[+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?$", + tok.strip(), + ) + ) + + +def parse_timings( + path: Path, + max_runs_per_size: int = MAX_RUNS_PER_SIZE, +) -> Tuple[pd.DataFrame, Dict[int, int], Dict[int, int]]: + """ + Parse timings in the simplified format present in timings.txt: + Optional header lines (e.g., "** SCAN TESTS **") + SIZE= + <8 scan times> (possibly one per line; blanks ignored) + <5 compaction times> (possibly one per line; blanks ignored) + ... repeated blocks for more runs and/or sizes + + Notes: + - There are no explicit RUN lines. We infer run numbers per size + by counting consecutive blocks encountered for that size. + + Returns: + - DataFrame (long format) with columns: size, run, suite, method, time_s + - encountered_counts: dict[size] -> total blocks encountered for that size + - used_counts: dict[size] -> blocks included (capped at max_runs_per_size) + """ + lines = [ln.strip() for ln in path.read_text().splitlines()] + + records: List[Dict[str, Any]] = [] + i = 0 + # Track how many blocks (runs) we've seen per size + encountered_counts: Dict[int, int] = {} + used_counts: Dict[int, int] = {} + + while i < len(lines): + line = lines[i] + if not line or line.startswith("**"): + i += 1 + continue + + # Parse a SIZE block + if line.upper().startswith("SIZE"): + m = re.search(r"SIZE\s*=\s*(\d+)", line, flags=re.IGNORECASE) + if not m: + raise ValueError(f"Bad SIZE line: {line}") + size = int(m.group(1)) + + # Gather the next N_PER_RUN float values (ignore blanks and headers) + times: List[float] = [] + i += 1 + while i < len(lines) and len(times) < N_PER_RUN: + ln = lines[i].strip() + if ln and not ln.upper().startswith("SIZE") and not ln.startswith("**"): + for tok in ln.replace(",", " ").split(): + if _is_float_token(tok): + times.append(float(tok)) + if len(times) >= N_PER_RUN: + break + i += 1 + + if len(times) != N_PER_RUN: + raise ValueError( + f"Expected {N_PER_RUN} times for size {size}, got {len(times)}" + ) + + # Count this block for the size + encountered_counts[size] = encountered_counts.get(size, 0) + 1 + + # Include only the first `max_runs_per_size` blocks per size + if encountered_counts[size] <= max_runs_per_size: + used_counts[size] = used_counts.get(size, 0) + 1 + run_no = used_counts[size] + + # Emit records for scan + for idx, t in enumerate(times[:N_SCAN]): + records.append( + { + "size": size, + "run": run_no, + "suite": "scan", + "method": SCAN_METHODS[idx], + "time_s": t, + } + ) + + # Emit records for compaction + for jdx, t in enumerate(times[N_SCAN:]): + records.append( + { + "size": size, + "run": run_no, + "suite": "compact", + "method": COMPACT_METHODS[jdx], + "time_s": t, + } + ) + continue + + # Skip any other lines + i += 1 + + df = pd.DataFrame.from_records(records) + return df, encountered_counts, used_counts + + +def average_across_runs(df: pd.DataFrame) -> pd.DataFrame: + avg = ( + df.groupby(["size", "suite", "method"], as_index=False)["time_s"] + .mean() + ) + return avg + + +def _is_pow2(x: int) -> bool: + return x > 0 and (x & (x - 1)) == 0 + + +def _format_size_ticks(sizes): + # Kept for compatibility; not used when using log10 axis with automatic ticks + return [f"{s:,}" for s in sizes] + + +def plot_by_suite( + avg_df: pd.DataFrame, + out_path: Path, + to_ms: bool = True, + runs_cap: int | None = None, + log_x: bool = True, + log_y: bool = False, + method_subset: str = "all", # one of: 'all', 'pow2', 'nonpow2' + min_size: int | None = MIN_PLOT_SIZE, +) -> None: + # Helper: map internal method keys to plain-English legend labels + def label_for_method(method: str, suite: str, subset: str) -> str: + # Determine base label by method family + base = method + if suite == "scan": + if method.startswith("scan_cpu"): + base = "CPU" + elif method.startswith("scan_naive"): + base = "Naive GPU" + elif method.startswith("scan_work_efficient"): + base = "Work-efficient GPU" + elif method.startswith("scan_thrust"): + base = "Thrust" + # Append operation for clarity + base = f"{base} scan" + else: # compact + if method == "compact_cpu_with_scan": + base = "CPU (with scan)" + elif method.startswith("compact_cpu_without_scan"): + base = "CPU (no scan)" + elif method.startswith("compact_work_efficient"): + base = "Work-efficient GPU" + else: + base = method.replace("_", " ") + base = f"{base}" + + # Determine suffix only when plotting both pow2 and nonpow2 + if subset == "all": + # Important: check the non-power-of-two suffix first because + # strings like "non_pow2" also end with "_pow2". + if method.endswith("_non_pow2"): + return f"{base} (non-power-of-two)" + if method.endswith("_pow2"): + return f"{base} (power-of-two)" + # In pow2-only or nonpow2-only plots, suffix is redundant + return base + + # Prepare values + plot_df = avg_df.copy() + if to_ms: + plot_df["time_val"] = plot_df["time_s"] * 1000.0 + else: + plot_df["time_val"] = plot_df["time_s"] + + # Optional filter on minimum size for plotting (strictly greater than) + if min_size is not None: + plot_df = plot_df[plot_df["size"] > min_size] + + # Use all sizes; split by method family (pow2/nonpow2) + sizes = sorted(plot_df["size"].unique()) + + # Keep method ordering consistent with the lists above + def select_methods(all_methods: List[str]) -> List[str]: + if method_subset == "pow2": + sel = [m for m in all_methods if m.endswith("_pow2")] + # Include methods without explicit suffix (apply to both) + sel += [m for m in all_methods if ("_pow2" not in m and "_non_pow2" not in m)] + return [m for m in all_methods if m in sel] + elif method_subset == "nonpow2": + sel = [m for m in all_methods if m.endswith("_non_pow2")] + sel += [m for m in all_methods if ("_pow2" not in m and "_non_pow2" not in m)] + return [m for m in all_methods if m in sel] + else: + return all_methods + + scan_methods = select_methods(SCAN_METHODS) + compact_methods = select_methods(COMPACT_METHODS) + + fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(13, 5.5), dpi=180) + plt.tight_layout(pad=2.0) + + # Scan subplot + ax = axes[0] + scan_df = plot_df[plot_df["suite"] == "scan"] + for m in scan_methods: + y = [] + for s in sizes: + row = scan_df[(scan_df["size"] == s) & (scan_df["method"] == m)] + y.append(row["time_val"].iloc[0] if not row.empty else float("nan")) + ax.plot( + sizes, + y, + marker="o", + linewidth=2.0, + markersize=4, + label=label_for_method(m, "scan", method_subset), + ) + ax.set_title("Scan") + ax.set_xlabel("Array Size") + ax.set_ylabel(Y_LABEL + (" (log)" if log_y else "")) + if log_x: + try: + ax.set_xscale("log", base=10) + except TypeError: + ax.set_xscale("log") + if log_y: + try: + ax.set_yscale("log", base=10) + except TypeError: + ax.set_yscale("log") + # Prefer 10** major ticks for x; don't label every point + if log_x: + ax.xaxis.set_major_locator(LogLocator(base=10.0)) + ax.xaxis.set_major_formatter(LogFormatterMathtext(base=10.0)) + # Thousands separators for y values (linear) + if not log_y: + ax.yaxis.set_major_formatter(FuncFormatter(lambda v, _: f"{v:,.0f}")) + if log_y: + ax.yaxis.set_major_locator(LogLocator(base=10.0)) + ax.yaxis.set_major_formatter(LogFormatterMathtext(base=10.0)) + if min_size is not None and len(sizes) > 0: + ax.set_xlim(left=min_size) + ax.legend(fontsize=8, ncol=2, loc="upper left", bbox_to_anchor=(0, 1.02)) + # ax.grid(True, which="both", linestyle="--", alpha=0.4) + + # Compaction subplot + ax = axes[1] + comp_df = plot_df[plot_df["suite"] == "compact"] + for m in compact_methods: + y = [] + for s in sizes: + row = comp_df[(comp_df["size"] == s) & (comp_df["method"] == m)] + y.append(row["time_val"].iloc[0] if not row.empty else float("nan")) + ax.plot( + sizes, + y, + marker="o", + linewidth=2.0, + markersize=4, + label=label_for_method(m, "compact", method_subset), + ) + ax.set_title("Stream compaction") + ax.set_xlabel("Array Size") + ax.set_ylabel(Y_LABEL + (" (log)" if log_y else "")) + if log_x: + try: + ax.set_xscale("log", base=10) + except TypeError: + ax.set_xscale("log") + if log_y: + try: + ax.set_yscale("log", base=10) + except TypeError: + ax.set_yscale("log") + if log_x: + ax.xaxis.set_major_locator(LogLocator(base=10.0)) + ax.xaxis.set_major_formatter(LogFormatterMathtext(base=10.0)) + if not log_y: + ax.yaxis.set_major_formatter(FuncFormatter(lambda v, _: f"{v:,.0f}")) + if log_y: + ax.yaxis.set_major_locator(LogLocator(base=10.0)) + ax.yaxis.set_major_formatter(LogFormatterMathtext(base=10.0)) + if min_size is not None and len(sizes) > 0: + ax.set_xlim(left=min_size) + ax.legend(fontsize=8, ncol=2, loc="upper left", bbox_to_anchor=(0, 1.02)) + # ax.grid(True, which="both", linestyle="--", alpha=0.4) + + # Build a clear, accurate title segment about the size subset. + # If a minimum size filter excludes smaller inputs, reflect that using + # compact binary units (e.g., "256k" instead of 2^18). + overall_min_size = int(avg_df["size"].min()) if not avg_df.empty else None + def _fmt_min(s: int) -> str: + if s is None: + return "" + # Prefer binary multiples: k, M, G + for unit, factor in (("k", 1024), ("M", 1024**2), ("G", 1024**3)): + if s % factor == 0 and s >= factor: + val = s // factor + # Keep it simple (e.g., 256k, 1M, 4G) + return f"{val}{unit}" + return f"{s:,}" + + filtered = ( + min_size is not None + and overall_min_size is not None + and min_size > overall_min_size + ) + base_subset = { + "all": "All sizes" if not filtered else "Sizes", + "pow2": "Power-of-two sizes", + "nonpow2": "Non-power-of-two sizes", + }.get(method_subset, "All sizes" if not filtered else "Sizes") + subset_note = ( + f"{base_subset} > {_fmt_min(min_size)}" if filtered else base_subset + ) + y_unit = "ms" if to_ms else "s" + if log_x and log_y: + scale_note = " [log-x, log-y]" + elif log_x: + scale_note = " [log-x]" + elif log_y: + scale_note = " [log-y]" + else: + scale_note = "" + # English title with clarity about metric + # Example: "Average time (lower is better) — Power-of-two sizes [log-x]" + fig.suptitle( + f"Average time (lower is better) — {subset_note}{scale_note}\n" + f"{runs_cap} runs", + fontsize=12, + ) + fig.tight_layout() + fig.savefig(out_path, bbox_inches="tight") + plt.close(fig) + + +def main() -> None: + if not INPUT_PATH.exists(): + raise SystemExit( + f"Input file not found: {INPUT_PATH.resolve()}" + ) + + df, encountered_counts, used_counts = parse_timings(INPUT_PATH) + # Save all runs (long format) + df_out = df.copy() + df_out["time_ms"] = df_out["time_s"] * 1000.0 + df_out.to_csv(OUTPUT_ALL_RUNS_CSV, index=False) + + # Average across runs + avg = average_across_runs(df) + avg_out = avg.copy() + avg_out["time_ms"] = avg_out["time_s"] * 1000.0 + avg_out.to_csv(OUTPUT_AVG_CSV, index=False) + + # Plot + # Plots: both, pow2-only, nonpow2-only + plot_by_suite( + avg, + OUTPUT_PLOT_BOTH, + to_ms=CONVERT_TO_MS, + runs_cap=MAX_RUNS_PER_SIZE, + log_x=True, + log_y=False, + method_subset="all", + min_size=MIN_PLOT_SIZE, + ) + plot_by_suite( + avg, + OUTPUT_PLOT_POW2, + to_ms=CONVERT_TO_MS, + runs_cap=MAX_RUNS_PER_SIZE, + log_x=True, + log_y=False, + method_subset="pow2", + min_size=MIN_PLOT_SIZE, + ) + plot_by_suite( + avg, + OUTPUT_PLOT_NONPOW2, + to_ms=CONVERT_TO_MS, + runs_cap=MAX_RUNS_PER_SIZE, + log_x=True, + log_y=False, + method_subset="nonpow2", + min_size=MIN_PLOT_SIZE, + ) + + # Log-Log versions (separate files) + plot_by_suite( + avg, + OUTPUT_PLOT_BOTH_LOG, + to_ms=CONVERT_TO_MS, + runs_cap=MAX_RUNS_PER_SIZE, + log_x=True, + log_y=True, + method_subset="all", + min_size=MIN_PLOT_SIZE, + ) + plot_by_suite( + avg, + OUTPUT_PLOT_POW2_LOG, + to_ms=CONVERT_TO_MS, + runs_cap=MAX_RUNS_PER_SIZE, + log_x=True, + log_y=True, + method_subset="pow2", + min_size=MIN_PLOT_SIZE, + ) + plot_by_suite( + avg, + OUTPUT_PLOT_NONPOW2_LOG, + to_ms=CONVERT_TO_MS, + runs_cap=MAX_RUNS_PER_SIZE, + log_x=True, + log_y=True, + method_subset="nonpow2", + min_size=MIN_PLOT_SIZE, + ) + + # Log-Log with full range (no min size) for both + plot_by_suite( + avg, + OUTPUT_PLOT_BOTH_LOG_FULL, + to_ms=CONVERT_TO_MS, + runs_cap=MAX_RUNS_PER_SIZE, + log_x=True, + log_y=True, + method_subset="all", + min_size=None, + ) + + print(f"Wrote: {OUTPUT_ALL_RUNS_CSV}") + print(f"Wrote: {OUTPUT_AVG_CSV}") + print(f"Wrote: {OUTPUT_PLOT_BOTH}") + print(f"Wrote: {OUTPUT_PLOT_POW2}") + print(f"Wrote: {OUTPUT_PLOT_NONPOW2}") + print(f"Wrote: {OUTPUT_PLOT_BOTH_LOG}") + print(f"Wrote: {OUTPUT_PLOT_POW2_LOG}") + print(f"Wrote: {OUTPUT_PLOT_NONPOW2_LOG}") + print(f"Wrote: {OUTPUT_PLOT_BOTH_LOG_FULL}") + # Print encountered vs used runs per size + print("Runs encountered per size:") + for size in sorted(encountered_counts.keys()): + enc = encountered_counts[size] + used = used_counts.get(size, 0) + print(f" size={size}: encountered={enc}, used={used}") + + +if __name__ == "__main__": + main() diff --git a/plots/code/radix_plotting.py b/plots/code/radix_plotting.py new file mode 100644 index 00000000..53b9e8a2 --- /dev/null +++ b/plots/code/radix_plotting.py @@ -0,0 +1,257 @@ +import re +from pathlib import Path +from typing import List, Dict, Any, Tuple + +import pandas as pd +import matplotlib.pyplot as plt +from matplotlib.ticker import FuncFormatter, LogLocator, LogFormatterMathtext + +# Paths +BASE_DIR = Path(__file__).resolve().parent # plots/code +PLOTS_DIR = BASE_DIR.parent # plots +DATA_DIR = PLOTS_DIR / "data" # plots/data + +INPUT_PATH = DATA_DIR / "RadixSort_timings.txt" +OUTPUT_ALL_RUNS_CSV = DATA_DIR / "radix_timings_all_runs_long.csv" +OUTPUT_AVG_CSV = DATA_DIR / "radix_timings_avg.csv" +OUTPUT_PLOT_LOGLOG_FULL = PLOTS_DIR / "radix_timings_loglog_full.png" +OUTPUT_PLOT_LOGX_LINEAR_GT = PLOTS_DIR / "radix_timings_log_linear_gt_2pow18.png" + +# Data in RadixSort_timings.txt is in milliseconds already +Y_LABEL = "Time (ms)" +MIN_PLOT_SIZE = 2 ** 18 # for the log-x, linear-y plot + +# Limit how many runs per size to include in outputs/plots +MAX_RUNS_PER_SIZE = 10 + +# Methods in order per SIZE block +METHODS = [ + "cpu_pow2", + "cpu_non_pow2", + "radixsort_pow2", + "radixsort_non_pow2", +] +N_PER_RUN = len(METHODS) + +# Style +try: + plt.style.use("seaborn-whitegrid") +except Exception: + pass + + +def _is_float_token(tok: str) -> bool: + return bool( + re.match(r"^[+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?$", tok.strip()) + ) + + +def parse_radix_timings( + path: Path, + max_runs_per_size: int = MAX_RUNS_PER_SIZE, +) -> Tuple[pd.DataFrame, Dict[int, int], Dict[int, int]]: + """ + Parse RadixSort_timings.txt blocks of the form: + SIZE= + + + + + + Returns: + - DataFrame with columns: size, run, method, time_ms + - encountered_counts: dict[size] -> total blocks seen + - used_counts: dict[size] -> blocks included (capped) + """ + lines = [ln.strip() for ln in path.read_text().splitlines()] + + records: List[Dict[str, Any]] = [] + i = 0 + encountered_counts: Dict[int, int] = {} + used_counts: Dict[int, int] = {} + + while i < len(lines): + line = lines[i] + if not line or line.startswith("**"): + i += 1 + continue + + if line.upper().startswith("SIZE"): + m = re.search(r"SIZE\s*=\s*(\d+)", line, flags=re.IGNORECASE) + if not m: + raise ValueError(f"Bad SIZE line: {line}") + size = int(m.group(1)) + + # Collect next N_PER_RUN float values + vals: List[float] = [] + i += 1 + while i < len(lines) and len(vals) < N_PER_RUN: + ln = lines[i].strip() + if ln and not ln.upper().startswith("SIZE") and not ln.startswith("**"): + for tok in ln.replace(",", " ").split(): + if _is_float_token(tok): + vals.append(float(tok)) + if len(vals) >= N_PER_RUN: + break + i += 1 + + if len(vals) != N_PER_RUN: + raise ValueError( + f"Expected {N_PER_RUN} values for size {size}, got {len(vals)}" + ) + + encountered_counts[size] = encountered_counts.get(size, 0) + 1 + if encountered_counts[size] <= max_runs_per_size: + used_counts[size] = used_counts.get(size, 0) + 1 + run_no = used_counts[size] + for idx, t in enumerate(vals): + records.append( + { + "size": size, + "run": run_no, + "method": METHODS[idx], + "time_ms": t, + } + ) + continue + + i += 1 + + df = pd.DataFrame.from_records(records) + return df, encountered_counts, used_counts + + +def average_across_runs(df: pd.DataFrame) -> pd.DataFrame: + return ( + df.groupby(["size", "method"], as_index=False)["time_ms"].mean() + ) + + +def label_for_method(method: str, subset: str) -> str: + base = { + "cpu_pow2": "CPU", + "cpu_non_pow2": "CPU", + "radixsort_pow2": "Radix GPU", + "radixsort_non_pow2": "Radix GPU", + }.get(method, method) + + if subset == "all": + if method.endswith("non_pow2"): + return f"{base} (non-power-of-two)" + if method.endswith("pow2"): + return f"{base} (power-of-two)" + return base + + +def plot_full_loglog(avg_df: pd.DataFrame, out_path: Path) -> None: + sizes = sorted(avg_df["size"].unique()) + fig, ax = plt.subplots(figsize=(7.5, 5.0), dpi=180) + + for m in METHODS: + y = [] + for s in sizes: + row = avg_df[(avg_df["size"] == s) & (avg_df["method"] == m)] + y.append(row["time_ms"].iloc[0] if not row.empty else float("nan")) + ax.plot( + sizes, + y, + marker="o", + linewidth=2.0, + markersize=4, + label=label_for_method(m, subset="all"), + ) + + # Axes scales and formatting + try: + ax.set_xscale("log", base=10) + ax.set_yscale("log", base=10) + except TypeError: + ax.set_xscale("log") + ax.set_yscale("log") + + ax.xaxis.set_major_locator(LogLocator(base=10.0)) + ax.xaxis.set_major_formatter(LogFormatterMathtext(base=10.0)) + ax.yaxis.set_major_locator(LogLocator(base=10.0)) + ax.yaxis.set_major_formatter(LogFormatterMathtext(base=10.0)) + + ax.set_xlabel("Array Size") + ax.set_ylabel(Y_LABEL + " (log)") + ax.set_title("Radix sort timings — full range [log-log]") + ax.legend(fontsize=9) + # ax.grid(True, which="both", linestyle="--", alpha=0.4) + + fig.tight_layout() + fig.savefig(out_path, bbox_inches="tight") + plt.close(fig) + + +def plot_logx_linear_gt(avg_df: pd.DataFrame, min_size: int, out_path: Path) -> None: + df = avg_df[avg_df["size"] > min_size].copy() + sizes = sorted(df["size"].unique()) + fig, ax = plt.subplots(figsize=(7.5, 5.0), dpi=180) + + for m in METHODS: + y = [] + for s in sizes: + row = df[(df["size"] == s) & (df["method"] == m)] + y.append(row["time_ms"].iloc[0] if not row.empty else float("nan")) + ax.plot( + sizes, + y, + marker="o", + linewidth=2.0, + markersize=4, + label=label_for_method(m, subset="all"), + ) + + try: + ax.set_xscale("log", base=10) + except TypeError: + ax.set_xscale("log") + + ax.xaxis.set_major_locator(LogLocator(base=10.0)) + ax.xaxis.set_major_formatter(LogFormatterMathtext(base=10.0)) + ax.yaxis.set_major_formatter(FuncFormatter(lambda v, _: f"{v:,.0f}")) + + ax.set_xlabel("Array Size") + ax.set_ylabel(Y_LABEL) + ax.set_title(f"Radix sort timings — sizes > 256k [log-x]") + ax.legend(fontsize=9) + # ax.grid(True, which="both", linestyle="--", alpha=0.4) + + fig.tight_layout() + fig.savefig(out_path, bbox_inches="tight") + plt.close(fig) + + +def main() -> None: + if not INPUT_PATH.exists(): + raise SystemExit(f"Input file not found: {INPUT_PATH.resolve()}") + + df, encountered_counts, used_counts = parse_radix_timings(INPUT_PATH) + + # Save all runs (long format) + df.to_csv(OUTPUT_ALL_RUNS_CSV, index=False) + + # Average across runs + avg = average_across_runs(df) + avg.to_csv(OUTPUT_AVG_CSV, index=False) + + # Plots + plot_full_loglog(avg, OUTPUT_PLOT_LOGLOG_FULL) + plot_logx_linear_gt(avg, MIN_PLOT_SIZE, OUTPUT_PLOT_LOGX_LINEAR_GT) + + print(f"Wrote: {OUTPUT_ALL_RUNS_CSV}") + print(f"Wrote: {OUTPUT_AVG_CSV}") + print(f"Wrote: {OUTPUT_PLOT_LOGLOG_FULL}") + print(f"Wrote: {OUTPUT_PLOT_LOGX_LINEAR_GT}") + print("Runs encountered per size:") + for size in sorted(encountered_counts.keys()): + enc = encountered_counts[size] + used = used_counts.get(size, 0) + print(f" size={size}: encountered={enc}, used={used}") + + +if __name__ == "__main__": + main() + diff --git a/plots/data/RadixSort_timings.txt b/plots/data/RadixSort_timings.txt new file mode 100644 index 00000000..740906e4 --- /dev/null +++ b/plots/data/RadixSort_timings.txt @@ -0,0 +1,1050 @@ +SIZE= 256 +0.0094 +0.0076 +4.70938 +5.80403 +SIZE= 256 +0.0095 +0.0079 +4.58547 +6.47066 +SIZE= 256 +0.0079 +0.006 +4.53222 +6.80858 +SIZE= 256 +0.0095 +0.0083 +4.57114 +6.75328 +SIZE= 256 +0.0089 +0.0065 +5.39341 +6.03546 +SIZE= 256 +0.0078 +0.0063 +4.40227 +5.99142 +SIZE= 256 +0.008 +0.0066 +6.40614 +5.8368 +SIZE= 256 +0.0088 +0.0082 +5.31354 +5.72109 +SIZE= 256 +0.0082 +0.0065 +4.53632 +6.6601 +SIZE= 256 +0.0098 +0.0074 +5.00842 +5.85318 +SIZE= 512 +0.0167 +0.0148 +4.47802 +6.31706 +SIZE= 512 +0.0203 +0.0171 +4.81485 +5.69549 +SIZE= 512 +0.0191 +0.0178 +5.32173 +5.94842 +SIZE= 512 +0.0201 +0.0179 +4.49024 +5.3289 +SIZE= 512 +0.0203 +0.0174 +4.59162 +6.63757 +SIZE= 512 +0.02 +0.0182 +4.27418 +5.46509 +SIZE= 512 +0.0203 +0.0183 +5.72211 +6.03341 +SIZE= 512 +0.0162 +0.0144 +4.51789 +5.66886 +SIZE= 512 +0.0162 +0.0142 +5.23776 +5.89005 +SIZE= 512 +0.0168 +0.0135 +5.26746 +5.50093 +SIZE= 1024 +0.0417 +0.0386 +8.81971 +10.2728 +SIZE= 1024 +0.0336 +0.0479 +9.42285 +9.07878 +SIZE= 1024 +0.0401 +0.041 +8.76339 +10.1038 +SIZE= 1024 +0.0431 +0.0387 +9.34502 +9.73005 +SIZE= 1024 +0.0337 +0.0316 +7.66157 +9.81709 +SIZE= 1024 +0.0549 +0.0509 +8.4521 +9.71776 +SIZE= 1024 +0.0423 +0.0385 +9.3952 +9.11053 +SIZE= 1024 +0.043 +0.0397 +8.20531 +9.35936 +SIZE= 1024 +0.0336 +0.0317 +8.71629 +10.0772 +SIZE= 1024 +0.0344 +0.0317 +8.26368 +9.7239 +SIZE= 1024 +0.0432 +0.0405 +9.59386 +27.1493 +SIZE= 1024 +0.0418 +0.0393 +8.62106 +10.2257 +SIZE= 2048 +0.0737 +0.0701 +8.03021 +9.72083 +SIZE= 2048 +0.0853 +0.0668 +8.35584 +9.8007 +SIZE= 2048 +0.2276 +0.1111 +9.31942 +9.73517 +SIZE= 2048 +0.0909 +0.0867 +8.6487 +9.82323 +SIZE= 2048 +0.0888 +0.0864 +9.14125 +8.57395 +SIZE= 2048 +0.0905 +0.1036 +7.86842 +9.74746 +SIZE= 2048 +0.0911 +0.0883 +8.03942 +10.2236 +SIZE= 2048 +0.09 +0.089 +8.23808 +10.0628 +SIZE= 2048 +0.0715 +0.0753 +7.67907 +9.89594 +SIZE= 2048 +0.0739 +0.1144 +9.86317 +9.32147 +SIZE= 4096 +0.2001 +0.1917 +8.92006 +9.3225 +SIZE= 4096 +0.1542 +0.1531 +9.27539 +9.99629 +SIZE= 4096 +0.1552 +0.1541 +8.55552 +10.1816 +SIZE= 4096 +0.2469 +0.1849 +8.71629 +9.89389 +SIZE= 4096 +0.1536 +0.1468 +8.68659 +9.69114 +SIZE= 4096 +0.1978 +0.2148 +9.31021 +9.44026 +SIZE= 4096 +0.1546 +0.1542 +9.06957 +9.02042 +SIZE= 4096 +0.1961 +0.1909 +7.99846 +10.7459 +SIZE= 4096 +0.198 +0.1913 +7.69024 +10.4325 +SIZE= 4096 +0.1951 +0.1883 +8.25139 +10.3823 +SIZE= 8192 +0.476 +0.5345 +8.7255 +9.73414 +SIZE= 8192 +0.4344 +0.4173 +8.59136 +9.45152 +SIZE= 8192 +0.422 +0.4191 +8.27392 +10.2144 +SIZE= 8192 +0.4217 +0.3546 +8.70717 +10.3025 +SIZE= 8192 +0.4119 +0.4722 +9.53037 +9.29792 +SIZE= 8192 +0.3389 +0.3307 +8.83405 +9.68909 +SIZE= 8192 +0.4182 +0.4103 +9.12077 +8.73267 +SIZE= 8192 +0.4297 +0.3378 +8.21862 +9.46586 +SIZE= 8192 +0.4268 +0.4054 +8.99994 +10.3516 +SIZE= 8192 +0.4159 +0.4239 +8.69376 +10.3875 +SIZE= 8192 +0.3319 +0.3304 +8.06912 +9.80787 +SIZE= 8192 +0.3479 +0.3316 +7.98618 +9.84064 +SIZE= 8192 +0.4209 +0.3986 +8.99379 +9.92461 +SIZE= 8192 +0.4342 +0.3783 +7.90118 +9.81094 +SIZE= 8192 +0.3442 +0.5483 +22.6714 +10.1724 +SIZE= 8192 +0.3341 +0.3947 +8.39987 +8.94566 +SIZE= 16384 +0.8558 +0.8202 +8.29952 +9.85805 +SIZE= 16384 +0.9684 +0.7255 +8.99174 +9.76179 +SIZE= 16384 +0.7418 +0.7624 +8.40602 +9.74131 +SIZE= 16384 +0.7385 +0.7352 +8.40192 +8.77363 +SIZE= 16384 +0.9046 +0.7186 +9.09517 +8.92621 +SIZE= 16384 +0.7814 +0.7301 +8.46029 +10.3516 +SIZE= 16384 +0.9294 +0.7808 +8.48384 +9.63072 +SIZE= 16384 +0.8703 +0.7456 +8.79206 +9.728 +SIZE= 16384 +0.9126 +0.7382 +8.23296 +10.8995 +SIZE= 16384 +0.7111 +0.7135 +7.80288 +9.70752 +SIZE= 32768 +2.0392 +1.4586 +8.13907 +10.1763 +SIZE= 32768 +1.7165 +1.5264 +8.54109 +9.51715 +SIZE= 32768 +1.7449 +1.564 +8.82499 +9.99238 +SIZE= 32768 +1.6377 +1.5428 +8.62384 +9.87574 +SIZE= 32768 +1.8314 +1.5287 +8.45795 +10.4574 +SIZE= 32768 +1.566 +1.5696 +9.16515 +9.40928 +SIZE= 32768 +1.6343 +1.5655 +8.86426 +8.87331 +SIZE= 32768 +1.7344 +1.5418 +8.32374 +9.50976 +SIZE= 32768 +1.646 +1.51 +9.10112 +9.59808 +SIZE= 32768 +1.7292 +1.4931 +8.03283 +9.66461 +SIZE= 65536 +3.4814 +3.3578 +7.76166 +9.66013 +SIZE= 65536 +3.26 +3.2471 +8.12717 +9.04381 +SIZE= 65536 +3.5663 +3.3185 +7.71843 +9.98192 +SIZE= 65536 +3.2645 +3.2908 +8.71018 +10.2396 +SIZE= 65536 +3.6521 +3.9321 +9.69318 +10.0577 +SIZE= 65536 +3.4707 +3.2604 +8.57818 +8.81245 +SIZE= 65536 +3.309 +3.2483 +8.09094 +9.34371 +SIZE= 65536 +3.398 +3.2397 +8.13494 +10.0073 +SIZE= 65536 +3.4635 +3.2983 +8.22051 +8.05741 +SIZE= 65536 +3.6104 +3.2547 +7.44486 +9.26691 +SIZE= 131072 +7.2689 +7.1223 +23.5286 +10.7708 +SIZE= 131072 +6.986 +6.7363 +9.74467 +10.437 +SIZE= 131072 +6.9956 +6.8577 +10.255 +10.4286 +SIZE= 131072 +7.1561 +6.8991 +8.01344 +10.4806 +SIZE= 131072 +6.873 +6.9901 +8.92461 +10.5989 +SIZE= 131072 +6.9476 +6.8527 +8.10896 +9.09939 +SIZE= 131072 +6.8539 +6.8951 +8.59392 +10.4662 +SIZE= 131072 +6.9023 +6.8467 +8.16234 +9.75286 +SIZE= 131072 +6.8614 +6.898 +8.07261 +10.7548 +SIZE= 131072 +6.8861 +6.8388 +8.73133 +10.1843 +SIZE= 262144 +13.7022 +13.6261 +41.0628 +41.5971 +SIZE= 262144 +14.1539 +13.6582 +40.7475 +39.2637 +SIZE= 262144 +13.928 +13.431 +39.2268 +41.5457 +SIZE= 262144 +13.9735 +14.0254 +40.2047 +40.2706 +SIZE= 262144 +13.9074 +13.3946 +38.8272 +41.7732 +SIZE= 262144 +13.458 +13.7582 +39.9342 +40.7908 +SIZE= 262144 +13.5681 +13.6886 +39.9582 +39.2651 +SIZE= 262144 +13.7044 +13.9067 +38.0496 +40.6629 +SIZE= 262144 +13.6174 +13.695 +40.1017 +41.0712 +SIZE= 262144 +13.7482 +13.6265 +40.3364 +40.5949 +SIZE= 524288 +26.2559 +25.7275 +46.9112 +46.1847 +SIZE= 524288 +26.1733 +26.8918 +45.3429 +45.25 +SIZE= 524288 +25.4766 +25.8535 +46.8657 +46.1959 +SIZE= 524288 +26.2441 +25.7427 +46.509 +46.7176 +SIZE= 524288 +26.3006 +26.7468 +47.9771 +50.4236 +SIZE= 524288 +26.7854 +26.0227 +47.1139 +47.5199 +SIZE= 524288 +25.9513 +25.5866 +46.8894 +47.2251 +SIZE= 524288 +25.8295 +26.082 +46.4053 +48.3841 +SIZE= 524288 +26.0114 +25.8691 +50.2007 +46.5084 +SIZE= 524288 +26.3342 +25.7634 +43.8896 +46.1819 +SIZE= 1048576 +51.725 +51.8295 +51.6252 +50.9465 +SIZE= 1048576 +51.2834 +49.5117 +50.2353 +50.6454 +SIZE= 1048576 +50.8451 +51.7985 +50.0359 +51.182 +SIZE= 1048576 +50.7828 +51.3079 +49.7542 +50.2588 +SIZE= 1048576 +50.5902 +49.7813 +50.2263 +50.1767 +SIZE= 1048576 +49.979 +50.7429 +49.4942 +49.4941 +SIZE= 1048576 +51.1298 +52.1358 +50.1065 +51.6395 +SIZE= 1048576 +50.1754 +50.5928 +49.076 +52.1901 +SIZE= 1048576 +50.7922 +50.5043 +50.7409 +50.9494 +SIZE= 1048576 +49.3953 +50.3009 +50.0728 +50.6073 +SIZE= 1048576 +50.9745 +51.8891 +51.1357 +50.4871 +SIZE= 2097152 +99.2969 +98.844 +57.5874 +56.3668 +SIZE= 2097152 +98.2919 +99.0055 +58.1849 +57.1159 +SIZE= 2097152 +99.7459 +100.664 +56.7805 +56.1347 +SIZE= 2097152 +99.5306 +101.685 +58.3835 +57.0717 +SIZE= 2097152 +100.764 +101.344 +57.3329 +58.6157 +SIZE= 2097152 +102.15 +101.258 +57.9577 +58.5743 +SIZE= 2097152 +98.6016 +98.6839 +56.3813 +58.0832 +SIZE= 2097152 +99.5248 +101.089 +57.4688 +58.6643 +SIZE= 2097152 +100.417 +98.967 +55.6776 +57.4975 +SIZE= 2097152 +102.541 +101.076 +57.8341 +57.5494 +SIZE= 4194304 +195.23 +194.335 +73.6001 +74.0202 +SIZE= 4194304 +195.094 +190.167 +72.3023 +77.5372 +SIZE= 4194304 +197.997 +193.224 +72.8361 +75.0703 +SIZE= 4194304 +195.771 +196.456 +75.5524 +73.361 +SIZE= 4194304 +195.885 +193.249 +74.1706 +73.4458 +SIZE= 4194304 +198.366 +197.27 +73.3776 +72.547 +SIZE= 4194304 +199.973 +195.923 +74.1127 +73.2485 +SIZE= 4194304 +194.349 +195.795 +72.7879 +73.6293 +SIZE= 4194304 +199.44 +196.657 +73.8464 +73.5133 +SIZE= 4194304 +196.2 +195.402 +73.7365 +73.4385 +SIZE= 8388608 +389.795 +401.727 +107.963 +105.163 +SIZE= 8388608 +385.408 +383.691 +105.487 +105.208 +SIZE= 8388608 +391.59 +374.424 +104.287 +107.228 +SIZE= 8388608 +377.539 +395.694 +104.934 +108.62 +SIZE= 8388608 +383.13 +387.813 +105.347 +102.123 +SIZE= 8388608 +393.158 +387.59 +104.947 +105.546 +SIZE= 8388608 +393.783 +398.769 +106.472 +104.916 +SIZE= 8388608 +380.899 +385.233 +104.837 +108.965 +SIZE= 8388608 +388.114 +383.582 +104.838 +107.71 +SIZE= 8388608 +385.488 +378.217 +104.353 +104.079 +SIZE= 16777216 +774.947 +760.568 +169.911 +160.96 +SIZE= 16777216 +767.042 +755.257 +167.325 +159.101 +SIZE= 16777216 +767.151 +761.937 +168.185 +162.628 +SIZE= 16777216 +765.656 +772.266 +167.013 +159.146 +SIZE= 16777216 +770.765 +765.667 +166.513 +160.025 +SIZE= 16777216 +762.95 +770.438 +167.253 +169.067 +SIZE= 16777216 +759.056 +761.054 +167.528 +158.333 +SIZE= 16777216 +753.022 +753.03 +168.375 +162.041 +SIZE= 16777216 +756.446 +764.814 +166.933 +159.813 +SIZE= 16777216 +775.042 +759.894 +166.249 +161.447 +SIZE= 33554432 +1539.45 +1523.08 +290.056 +279.519 +SIZE= 33554432 +1533.33 +1492.94 +286.482 +278.727 +SIZE= 33554432 +1518.03 +1520.47 +286.96 +279.168 +SIZE= 33554432 +1513.25 +1515.85 +290.399 +279.422 +SIZE= 33554432 +1516.88 +1507.03 +287.379 +303.782 +SIZE= 33554432 +1526.54 +1518.3 +292.845 +279.78 +SIZE= 33554432 +1516.4 +1501.75 +285.407 +278.723 +SIZE= 33554432 +1539.81 +1504.19 +285.582 +278.672 +SIZE= 33554432 +1505.77 +1536.85 +290.809 +305.345 +SIZE= 33554432 +1546.37 +1518.21 +289.364 +280.599 +SIZE= 67108864 +3068.55 +3012.11 +515.612 +519.143 +SIZE= 67108864 +3005.19 +3028.14 +526.227 +515.964 +SIZE= 67108864 +3058.06 +3049.14 +521.026 +515.174 +SIZE= 67108864 +3010.2 +3019.12 +517.42 +516.477 +SIZE= 67108864 +3032.43 +3002.39 +521.633 +516.546 +SIZE= 67108864 +3027.27 +3023.65 +520.2 +517.002 +SIZE= 67108864 +3067.05 +3092.49 +536.546 +518.33 +SIZE= 67108864 +3017.05 +3019.5 +524.98 +519.893 +SIZE= 67108864 +3011.92 +2995.26 +523.85 +517.899 +SIZE= 67108864 +3003.54 +3012.17 +520.796 +519.375 +SIZE= 67108864 +2994.34 +3016.09 +523.982 +518.418 +SIZE= 134217728 +5936.73 +6091.71 +1044.68 +1035.31 +SIZE= 134217728 +6000.72 +5948.19 +1033.53 +1027.97 +SIZE= 134217728 +5981.03 +5984.76 +1032.85 +1028.51 +SIZE= 134217728 +5893.52 +5995.66 +1036.58 +1026.83 +SIZE= 134217728 +5970.86 +6050.97 +1057.49 +1027.12 +SIZE= 134217728 +6114.02 +6219.39 +1049.61 +1082.07 +SIZE= 134217728 +6049.7 +6038.77 +1037.27 +1031.87 +SIZE= 134217728 +6179.4 +6105.52 +1089.13 +1081.17 +SIZE= 134217728 +6135.11 +6102.33 +1062.17 +1052.3 +SIZE= 134217728 +6062.03 +6224.28 +1061 +1054.05 \ No newline at end of file diff --git a/plots/data/blockSize_timings.txt b/plots/data/blockSize_timings.txt new file mode 100644 index 00000000..769425c0 --- /dev/null +++ b/plots/data/blockSize_timings.txt @@ -0,0 +1,192 @@ +** SCAN TESTS ** +blockSize= 32 +11.4302 +** SCAN TESTS ** +blockSize= 32 +10.533 +** SCAN TESTS ** +blockSize= 32 +10.6891 +** SCAN TESTS ** +blockSize= 32 +10.5644 +** SCAN TESTS ** +blockSize= 32 +10.7888 +** SCAN TESTS ** +blockSize= 32 +10.7962 +** SCAN TESTS ** +blockSize= 32 +10.8693 +** SCAN TESTS ** +blockSize= 32 +10.6384 +** SCAN TESTS ** +blockSize= 32 +10.6246 +** SCAN TESTS ** +blockSize= 32 +10.839 +** SCAN TESTS ** +blockSize= 64 +7.29376 +** SCAN TESTS ** +blockSize= 64 +6.92973 +** SCAN TESTS ** +blockSize= 64 +6.94682 +** SCAN TESTS ** +blockSize= 64 +6.98816 +** SCAN TESTS ** +blockSize= 64 +6.87296 +** SCAN TESTS ** +blockSize= 64 +7.09789 +** SCAN TESTS ** +blockSize= 64 +7.03043 +** SCAN TESTS ** +blockSize= 64 +7.06954 +** SCAN TESTS ** +blockSize= 64 +6.96989 +** SCAN TESTS ** +blockSize= 64 +6.85994 +** SCAN TESTS ** +blockSize= 64 +7.00032 +** SCAN TESTS ** +blockSize= 128 +5.86563 +** SCAN TESTS ** +blockSize= 128 +6.03846 +** SCAN TESTS ** +blockSize= 128 +5.90906 +** SCAN TESTS ** +blockSize= 128 +6.01888 +** SCAN TESTS ** +blockSize= 128 +5.94816 +** SCAN TESTS ** +blockSize= 128 +6.57162 +** SCAN TESTS ** +blockSize= 128 +6.43235 +** SCAN TESTS ** +blockSize= 128 +5.99184 +** SCAN TESTS ** +blockSize= 128 +5.95002 +** SCAN TESTS ** +blockSize= 128 +6.22931 +** SCAN TESTS ** +blockSize= 256 +11.7848 +** SCAN TESTS ** +blockSize= 256 +6.09424 +** SCAN TESTS ** +blockSize= 256 +5.98832 +** SCAN TESTS ** +blockSize= 256 +6.08637 +** SCAN TESTS ** +blockSize= 256 +5.91056 +** SCAN TESTS ** +blockSize= 256 +6.15024 +** SCAN TESTS ** +blockSize= 256 +6.07635 +** SCAN TESTS ** +blockSize= 256 +6.1169 +** SCAN TESTS ** +blockSize= 256 +6.13814 +** SCAN TESTS ** +blockSize= 256 +5.80851 +** SCAN TESTS ** +blockSize= 512 +6.6457 +** SCAN TESTS ** +blockSize= 512 +6.2737 +** SCAN TESTS ** +blockSize= 512 +6.38054 +** SCAN TESTS ** +blockSize= 512 +6.39696 +** SCAN TESTS ** +blockSize= 512 +6.32326 +** SCAN TESTS ** +blockSize= 512 +6.31642 +** SCAN TESTS ** +blockSize= 512 +6.36275 +** SCAN TESTS ** +blockSize= 512 +6.1615 +** SCAN TESTS ** +blockSize= 512 +6.34816 +** SCAN TESTS ** +blockSize= 512 +6.80384 +** SCAN TESTS ** +blockSize= 512 +6.45811 +** SCAN TESTS ** +blockSize= 512 +6.29962 +** SCAN TESTS ** +blockSize= 512 +6.58864 +** SCAN TESTS ** +blockSize= 1024 +7.60422 +** SCAN TESTS ** +blockSize= 1024 +7.67088 +** SCAN TESTS ** +blockSize= 1024 +7.75478 +** SCAN TESTS ** +blockSize= 1024 +7.63782 +** SCAN TESTS ** +blockSize= 1024 +7.52176 +** SCAN TESTS ** +blockSize= 1024 +7.58723 +** SCAN TESTS ** +blockSize= 1024 +7.83197 +** SCAN TESTS ** +blockSize= 1024 +7.09651 +** SCAN TESTS ** +blockSize= 1024 +7.53715 +** SCAN TESTS ** +blockSize= 1024 +7.72787 \ No newline at end of file diff --git a/plots/data/radix_timings_all_runs_long.csv b/plots/data/radix_timings_all_runs_long.csv new file mode 100644 index 00000000..788fc315 --- /dev/null +++ b/plots/data/radix_timings_all_runs_long.csv @@ -0,0 +1,801 @@ +size,run,method,time_ms +256,1,cpu_pow2,0.0094 +256,1,cpu_non_pow2,0.0076 +256,1,radixsort_pow2,4.70938 +256,1,radixsort_non_pow2,5.80403 +256,2,cpu_pow2,0.0095 +256,2,cpu_non_pow2,0.0079 +256,2,radixsort_pow2,4.58547 +256,2,radixsort_non_pow2,6.47066 +256,3,cpu_pow2,0.0079 +256,3,cpu_non_pow2,0.006 +256,3,radixsort_pow2,4.53222 +256,3,radixsort_non_pow2,6.80858 +256,4,cpu_pow2,0.0095 +256,4,cpu_non_pow2,0.0083 +256,4,radixsort_pow2,4.57114 +256,4,radixsort_non_pow2,6.75328 +256,5,cpu_pow2,0.0089 +256,5,cpu_non_pow2,0.0065 +256,5,radixsort_pow2,5.39341 +256,5,radixsort_non_pow2,6.03546 +256,6,cpu_pow2,0.0078 +256,6,cpu_non_pow2,0.0063 +256,6,radixsort_pow2,4.40227 +256,6,radixsort_non_pow2,5.99142 +256,7,cpu_pow2,0.008 +256,7,cpu_non_pow2,0.0066 +256,7,radixsort_pow2,6.40614 +256,7,radixsort_non_pow2,5.8368 +256,8,cpu_pow2,0.0088 +256,8,cpu_non_pow2,0.0082 +256,8,radixsort_pow2,5.31354 +256,8,radixsort_non_pow2,5.72109 +256,9,cpu_pow2,0.0082 +256,9,cpu_non_pow2,0.0065 +256,9,radixsort_pow2,4.53632 +256,9,radixsort_non_pow2,6.6601 +256,10,cpu_pow2,0.0098 +256,10,cpu_non_pow2,0.0074 +256,10,radixsort_pow2,5.00842 +256,10,radixsort_non_pow2,5.85318 +512,1,cpu_pow2,0.0167 +512,1,cpu_non_pow2,0.0148 +512,1,radixsort_pow2,4.47802 +512,1,radixsort_non_pow2,6.31706 +512,2,cpu_pow2,0.0203 +512,2,cpu_non_pow2,0.0171 +512,2,radixsort_pow2,4.81485 +512,2,radixsort_non_pow2,5.69549 +512,3,cpu_pow2,0.0191 +512,3,cpu_non_pow2,0.0178 +512,3,radixsort_pow2,5.32173 +512,3,radixsort_non_pow2,5.94842 +512,4,cpu_pow2,0.0201 +512,4,cpu_non_pow2,0.0179 +512,4,radixsort_pow2,4.49024 +512,4,radixsort_non_pow2,5.3289 +512,5,cpu_pow2,0.0203 +512,5,cpu_non_pow2,0.0174 +512,5,radixsort_pow2,4.59162 +512,5,radixsort_non_pow2,6.63757 +512,6,cpu_pow2,0.02 +512,6,cpu_non_pow2,0.0182 +512,6,radixsort_pow2,4.27418 +512,6,radixsort_non_pow2,5.46509 +512,7,cpu_pow2,0.0203 +512,7,cpu_non_pow2,0.0183 +512,7,radixsort_pow2,5.72211 +512,7,radixsort_non_pow2,6.03341 +512,8,cpu_pow2,0.0162 +512,8,cpu_non_pow2,0.0144 +512,8,radixsort_pow2,4.51789 +512,8,radixsort_non_pow2,5.66886 +512,9,cpu_pow2,0.0162 +512,9,cpu_non_pow2,0.0142 +512,9,radixsort_pow2,5.23776 +512,9,radixsort_non_pow2,5.89005 +512,10,cpu_pow2,0.0168 +512,10,cpu_non_pow2,0.0135 +512,10,radixsort_pow2,5.26746 +512,10,radixsort_non_pow2,5.50093 +1024,1,cpu_pow2,0.0417 +1024,1,cpu_non_pow2,0.0386 +1024,1,radixsort_pow2,8.81971 +1024,1,radixsort_non_pow2,10.2728 +1024,2,cpu_pow2,0.0336 +1024,2,cpu_non_pow2,0.0479 +1024,2,radixsort_pow2,9.42285 +1024,2,radixsort_non_pow2,9.07878 +1024,3,cpu_pow2,0.0401 +1024,3,cpu_non_pow2,0.041 +1024,3,radixsort_pow2,8.76339 +1024,3,radixsort_non_pow2,10.1038 +1024,4,cpu_pow2,0.0431 +1024,4,cpu_non_pow2,0.0387 +1024,4,radixsort_pow2,9.34502 +1024,4,radixsort_non_pow2,9.73005 +1024,5,cpu_pow2,0.0337 +1024,5,cpu_non_pow2,0.0316 +1024,5,radixsort_pow2,7.66157 +1024,5,radixsort_non_pow2,9.81709 +1024,6,cpu_pow2,0.0549 +1024,6,cpu_non_pow2,0.0509 +1024,6,radixsort_pow2,8.4521 +1024,6,radixsort_non_pow2,9.71776 +1024,7,cpu_pow2,0.0423 +1024,7,cpu_non_pow2,0.0385 +1024,7,radixsort_pow2,9.3952 +1024,7,radixsort_non_pow2,9.11053 +1024,8,cpu_pow2,0.043 +1024,8,cpu_non_pow2,0.0397 +1024,8,radixsort_pow2,8.20531 +1024,8,radixsort_non_pow2,9.35936 +1024,9,cpu_pow2,0.0336 +1024,9,cpu_non_pow2,0.0317 +1024,9,radixsort_pow2,8.71629 +1024,9,radixsort_non_pow2,10.0772 +1024,10,cpu_pow2,0.0344 +1024,10,cpu_non_pow2,0.0317 +1024,10,radixsort_pow2,8.26368 +1024,10,radixsort_non_pow2,9.7239 +2048,1,cpu_pow2,0.0737 +2048,1,cpu_non_pow2,0.0701 +2048,1,radixsort_pow2,8.03021 +2048,1,radixsort_non_pow2,9.72083 +2048,2,cpu_pow2,0.0853 +2048,2,cpu_non_pow2,0.0668 +2048,2,radixsort_pow2,8.35584 +2048,2,radixsort_non_pow2,9.8007 +2048,3,cpu_pow2,0.2276 +2048,3,cpu_non_pow2,0.1111 +2048,3,radixsort_pow2,9.31942 +2048,3,radixsort_non_pow2,9.73517 +2048,4,cpu_pow2,0.0909 +2048,4,cpu_non_pow2,0.0867 +2048,4,radixsort_pow2,8.6487 +2048,4,radixsort_non_pow2,9.82323 +2048,5,cpu_pow2,0.0888 +2048,5,cpu_non_pow2,0.0864 +2048,5,radixsort_pow2,9.14125 +2048,5,radixsort_non_pow2,8.57395 +2048,6,cpu_pow2,0.0905 +2048,6,cpu_non_pow2,0.1036 +2048,6,radixsort_pow2,7.86842 +2048,6,radixsort_non_pow2,9.74746 +2048,7,cpu_pow2,0.0911 +2048,7,cpu_non_pow2,0.0883 +2048,7,radixsort_pow2,8.03942 +2048,7,radixsort_non_pow2,10.2236 +2048,8,cpu_pow2,0.09 +2048,8,cpu_non_pow2,0.089 +2048,8,radixsort_pow2,8.23808 +2048,8,radixsort_non_pow2,10.0628 +2048,9,cpu_pow2,0.0715 +2048,9,cpu_non_pow2,0.0753 +2048,9,radixsort_pow2,7.67907 +2048,9,radixsort_non_pow2,9.89594 +2048,10,cpu_pow2,0.0739 +2048,10,cpu_non_pow2,0.1144 +2048,10,radixsort_pow2,9.86317 +2048,10,radixsort_non_pow2,9.32147 +4096,1,cpu_pow2,0.2001 +4096,1,cpu_non_pow2,0.1917 +4096,1,radixsort_pow2,8.92006 +4096,1,radixsort_non_pow2,9.3225 +4096,2,cpu_pow2,0.1542 +4096,2,cpu_non_pow2,0.1531 +4096,2,radixsort_pow2,9.27539 +4096,2,radixsort_non_pow2,9.99629 +4096,3,cpu_pow2,0.1552 +4096,3,cpu_non_pow2,0.1541 +4096,3,radixsort_pow2,8.55552 +4096,3,radixsort_non_pow2,10.1816 +4096,4,cpu_pow2,0.2469 +4096,4,cpu_non_pow2,0.1849 +4096,4,radixsort_pow2,8.71629 +4096,4,radixsort_non_pow2,9.89389 +4096,5,cpu_pow2,0.1536 +4096,5,cpu_non_pow2,0.1468 +4096,5,radixsort_pow2,8.68659 +4096,5,radixsort_non_pow2,9.69114 +4096,6,cpu_pow2,0.1978 +4096,6,cpu_non_pow2,0.2148 +4096,6,radixsort_pow2,9.31021 +4096,6,radixsort_non_pow2,9.44026 +4096,7,cpu_pow2,0.1546 +4096,7,cpu_non_pow2,0.1542 +4096,7,radixsort_pow2,9.06957 +4096,7,radixsort_non_pow2,9.02042 +4096,8,cpu_pow2,0.1961 +4096,8,cpu_non_pow2,0.1909 +4096,8,radixsort_pow2,7.99846 +4096,8,radixsort_non_pow2,10.7459 +4096,9,cpu_pow2,0.198 +4096,9,cpu_non_pow2,0.1913 +4096,9,radixsort_pow2,7.69024 +4096,9,radixsort_non_pow2,10.4325 +4096,10,cpu_pow2,0.1951 +4096,10,cpu_non_pow2,0.1883 +4096,10,radixsort_pow2,8.25139 +4096,10,radixsort_non_pow2,10.3823 +8192,1,cpu_pow2,0.476 +8192,1,cpu_non_pow2,0.5345 +8192,1,radixsort_pow2,8.7255 +8192,1,radixsort_non_pow2,9.73414 +8192,2,cpu_pow2,0.4344 +8192,2,cpu_non_pow2,0.4173 +8192,2,radixsort_pow2,8.59136 +8192,2,radixsort_non_pow2,9.45152 +8192,3,cpu_pow2,0.422 +8192,3,cpu_non_pow2,0.4191 +8192,3,radixsort_pow2,8.27392 +8192,3,radixsort_non_pow2,10.2144 +8192,4,cpu_pow2,0.4217 +8192,4,cpu_non_pow2,0.3546 +8192,4,radixsort_pow2,8.70717 +8192,4,radixsort_non_pow2,10.3025 +8192,5,cpu_pow2,0.4119 +8192,5,cpu_non_pow2,0.4722 +8192,5,radixsort_pow2,9.53037 +8192,5,radixsort_non_pow2,9.29792 +8192,6,cpu_pow2,0.3389 +8192,6,cpu_non_pow2,0.3307 +8192,6,radixsort_pow2,8.83405 +8192,6,radixsort_non_pow2,9.68909 +8192,7,cpu_pow2,0.4182 +8192,7,cpu_non_pow2,0.4103 +8192,7,radixsort_pow2,9.12077 +8192,7,radixsort_non_pow2,8.73267 +8192,8,cpu_pow2,0.4297 +8192,8,cpu_non_pow2,0.3378 +8192,8,radixsort_pow2,8.21862 +8192,8,radixsort_non_pow2,9.46586 +8192,9,cpu_pow2,0.4268 +8192,9,cpu_non_pow2,0.4054 +8192,9,radixsort_pow2,8.99994 +8192,9,radixsort_non_pow2,10.3516 +8192,10,cpu_pow2,0.4159 +8192,10,cpu_non_pow2,0.4239 +8192,10,radixsort_pow2,8.69376 +8192,10,radixsort_non_pow2,10.3875 +16384,1,cpu_pow2,0.8558 +16384,1,cpu_non_pow2,0.8202 +16384,1,radixsort_pow2,8.29952 +16384,1,radixsort_non_pow2,9.85805 +16384,2,cpu_pow2,0.9684 +16384,2,cpu_non_pow2,0.7255 +16384,2,radixsort_pow2,8.99174 +16384,2,radixsort_non_pow2,9.76179 +16384,3,cpu_pow2,0.7418 +16384,3,cpu_non_pow2,0.7624 +16384,3,radixsort_pow2,8.40602 +16384,3,radixsort_non_pow2,9.74131 +16384,4,cpu_pow2,0.7385 +16384,4,cpu_non_pow2,0.7352 +16384,4,radixsort_pow2,8.40192 +16384,4,radixsort_non_pow2,8.77363 +16384,5,cpu_pow2,0.9046 +16384,5,cpu_non_pow2,0.7186 +16384,5,radixsort_pow2,9.09517 +16384,5,radixsort_non_pow2,8.92621 +16384,6,cpu_pow2,0.7814 +16384,6,cpu_non_pow2,0.7301 +16384,6,radixsort_pow2,8.46029 +16384,6,radixsort_non_pow2,10.3516 +16384,7,cpu_pow2,0.9294 +16384,7,cpu_non_pow2,0.7808 +16384,7,radixsort_pow2,8.48384 +16384,7,radixsort_non_pow2,9.63072 +16384,8,cpu_pow2,0.8703 +16384,8,cpu_non_pow2,0.7456 +16384,8,radixsort_pow2,8.79206 +16384,8,radixsort_non_pow2,9.728 +16384,9,cpu_pow2,0.9126 +16384,9,cpu_non_pow2,0.7382 +16384,9,radixsort_pow2,8.23296 +16384,9,radixsort_non_pow2,10.8995 +16384,10,cpu_pow2,0.7111 +16384,10,cpu_non_pow2,0.7135 +16384,10,radixsort_pow2,7.80288 +16384,10,radixsort_non_pow2,9.70752 +32768,1,cpu_pow2,2.0392 +32768,1,cpu_non_pow2,1.4586 +32768,1,radixsort_pow2,8.13907 +32768,1,radixsort_non_pow2,10.1763 +32768,2,cpu_pow2,1.7165 +32768,2,cpu_non_pow2,1.5264 +32768,2,radixsort_pow2,8.54109 +32768,2,radixsort_non_pow2,9.51715 +32768,3,cpu_pow2,1.7449 +32768,3,cpu_non_pow2,1.564 +32768,3,radixsort_pow2,8.82499 +32768,3,radixsort_non_pow2,9.99238 +32768,4,cpu_pow2,1.6377 +32768,4,cpu_non_pow2,1.5428 +32768,4,radixsort_pow2,8.62384 +32768,4,radixsort_non_pow2,9.87574 +32768,5,cpu_pow2,1.8314 +32768,5,cpu_non_pow2,1.5287 +32768,5,radixsort_pow2,8.45795 +32768,5,radixsort_non_pow2,10.4574 +32768,6,cpu_pow2,1.566 +32768,6,cpu_non_pow2,1.5696 +32768,6,radixsort_pow2,9.16515 +32768,6,radixsort_non_pow2,9.40928 +32768,7,cpu_pow2,1.6343 +32768,7,cpu_non_pow2,1.5655 +32768,7,radixsort_pow2,8.86426 +32768,7,radixsort_non_pow2,8.87331 +32768,8,cpu_pow2,1.7344 +32768,8,cpu_non_pow2,1.5418 +32768,8,radixsort_pow2,8.32374 +32768,8,radixsort_non_pow2,9.50976 +32768,9,cpu_pow2,1.646 +32768,9,cpu_non_pow2,1.51 +32768,9,radixsort_pow2,9.10112 +32768,9,radixsort_non_pow2,9.59808 +32768,10,cpu_pow2,1.7292 +32768,10,cpu_non_pow2,1.4931 +32768,10,radixsort_pow2,8.03283 +32768,10,radixsort_non_pow2,9.66461 +65536,1,cpu_pow2,3.4814 +65536,1,cpu_non_pow2,3.3578 +65536,1,radixsort_pow2,7.76166 +65536,1,radixsort_non_pow2,9.66013 +65536,2,cpu_pow2,3.26 +65536,2,cpu_non_pow2,3.2471 +65536,2,radixsort_pow2,8.12717 +65536,2,radixsort_non_pow2,9.04381 +65536,3,cpu_pow2,3.5663 +65536,3,cpu_non_pow2,3.3185 +65536,3,radixsort_pow2,7.71843 +65536,3,radixsort_non_pow2,9.98192 +65536,4,cpu_pow2,3.2645 +65536,4,cpu_non_pow2,3.2908 +65536,4,radixsort_pow2,8.71018 +65536,4,radixsort_non_pow2,10.2396 +65536,5,cpu_pow2,3.6521 +65536,5,cpu_non_pow2,3.9321 +65536,5,radixsort_pow2,9.69318 +65536,5,radixsort_non_pow2,10.0577 +65536,6,cpu_pow2,3.4707 +65536,6,cpu_non_pow2,3.2604 +65536,6,radixsort_pow2,8.57818 +65536,6,radixsort_non_pow2,8.81245 +65536,7,cpu_pow2,3.309 +65536,7,cpu_non_pow2,3.2483 +65536,7,radixsort_pow2,8.09094 +65536,7,radixsort_non_pow2,9.34371 +65536,8,cpu_pow2,3.398 +65536,8,cpu_non_pow2,3.2397 +65536,8,radixsort_pow2,8.13494 +65536,8,radixsort_non_pow2,10.0073 +65536,9,cpu_pow2,3.4635 +65536,9,cpu_non_pow2,3.2983 +65536,9,radixsort_pow2,8.22051 +65536,9,radixsort_non_pow2,8.05741 +65536,10,cpu_pow2,3.6104 +65536,10,cpu_non_pow2,3.2547 +65536,10,radixsort_pow2,7.44486 +65536,10,radixsort_non_pow2,9.26691 +131072,1,cpu_pow2,7.2689 +131072,1,cpu_non_pow2,7.1223 +131072,1,radixsort_pow2,23.5286 +131072,1,radixsort_non_pow2,10.7708 +131072,2,cpu_pow2,6.986 +131072,2,cpu_non_pow2,6.7363 +131072,2,radixsort_pow2,9.74467 +131072,2,radixsort_non_pow2,10.437 +131072,3,cpu_pow2,6.9956 +131072,3,cpu_non_pow2,6.8577 +131072,3,radixsort_pow2,10.255 +131072,3,radixsort_non_pow2,10.4286 +131072,4,cpu_pow2,7.1561 +131072,4,cpu_non_pow2,6.8991 +131072,4,radixsort_pow2,8.01344 +131072,4,radixsort_non_pow2,10.4806 +131072,5,cpu_pow2,6.873 +131072,5,cpu_non_pow2,6.9901 +131072,5,radixsort_pow2,8.92461 +131072,5,radixsort_non_pow2,10.5989 +131072,6,cpu_pow2,6.9476 +131072,6,cpu_non_pow2,6.8527 +131072,6,radixsort_pow2,8.10896 +131072,6,radixsort_non_pow2,9.09939 +131072,7,cpu_pow2,6.8539 +131072,7,cpu_non_pow2,6.8951 +131072,7,radixsort_pow2,8.59392 +131072,7,radixsort_non_pow2,10.4662 +131072,8,cpu_pow2,6.9023 +131072,8,cpu_non_pow2,6.8467 +131072,8,radixsort_pow2,8.16234 +131072,8,radixsort_non_pow2,9.75286 +131072,9,cpu_pow2,6.8614 +131072,9,cpu_non_pow2,6.898 +131072,9,radixsort_pow2,8.07261 +131072,9,radixsort_non_pow2,10.7548 +131072,10,cpu_pow2,6.8861 +131072,10,cpu_non_pow2,6.8388 +131072,10,radixsort_pow2,8.73133 +131072,10,radixsort_non_pow2,10.1843 +262144,1,cpu_pow2,13.7022 +262144,1,cpu_non_pow2,13.6261 +262144,1,radixsort_pow2,41.0628 +262144,1,radixsort_non_pow2,41.5971 +262144,2,cpu_pow2,14.1539 +262144,2,cpu_non_pow2,13.6582 +262144,2,radixsort_pow2,40.7475 +262144,2,radixsort_non_pow2,39.2637 +262144,3,cpu_pow2,13.928 +262144,3,cpu_non_pow2,13.431 +262144,3,radixsort_pow2,39.2268 +262144,3,radixsort_non_pow2,41.5457 +262144,4,cpu_pow2,13.9735 +262144,4,cpu_non_pow2,14.0254 +262144,4,radixsort_pow2,40.2047 +262144,4,radixsort_non_pow2,40.2706 +262144,5,cpu_pow2,13.9074 +262144,5,cpu_non_pow2,13.3946 +262144,5,radixsort_pow2,38.8272 +262144,5,radixsort_non_pow2,41.7732 +262144,6,cpu_pow2,13.458 +262144,6,cpu_non_pow2,13.7582 +262144,6,radixsort_pow2,39.9342 +262144,6,radixsort_non_pow2,40.7908 +262144,7,cpu_pow2,13.5681 +262144,7,cpu_non_pow2,13.6886 +262144,7,radixsort_pow2,39.9582 +262144,7,radixsort_non_pow2,39.2651 +262144,8,cpu_pow2,13.7044 +262144,8,cpu_non_pow2,13.9067 +262144,8,radixsort_pow2,38.0496 +262144,8,radixsort_non_pow2,40.6629 +262144,9,cpu_pow2,13.6174 +262144,9,cpu_non_pow2,13.695 +262144,9,radixsort_pow2,40.1017 +262144,9,radixsort_non_pow2,41.0712 +262144,10,cpu_pow2,13.7482 +262144,10,cpu_non_pow2,13.6265 +262144,10,radixsort_pow2,40.3364 +262144,10,radixsort_non_pow2,40.5949 +524288,1,cpu_pow2,26.2559 +524288,1,cpu_non_pow2,25.7275 +524288,1,radixsort_pow2,46.9112 +524288,1,radixsort_non_pow2,46.1847 +524288,2,cpu_pow2,26.1733 +524288,2,cpu_non_pow2,26.8918 +524288,2,radixsort_pow2,45.3429 +524288,2,radixsort_non_pow2,45.25 +524288,3,cpu_pow2,25.4766 +524288,3,cpu_non_pow2,25.8535 +524288,3,radixsort_pow2,46.8657 +524288,3,radixsort_non_pow2,46.1959 +524288,4,cpu_pow2,26.2441 +524288,4,cpu_non_pow2,25.7427 +524288,4,radixsort_pow2,46.509 +524288,4,radixsort_non_pow2,46.7176 +524288,5,cpu_pow2,26.3006 +524288,5,cpu_non_pow2,26.7468 +524288,5,radixsort_pow2,47.9771 +524288,5,radixsort_non_pow2,50.4236 +524288,6,cpu_pow2,26.7854 +524288,6,cpu_non_pow2,26.0227 +524288,6,radixsort_pow2,47.1139 +524288,6,radixsort_non_pow2,47.5199 +524288,7,cpu_pow2,25.9513 +524288,7,cpu_non_pow2,25.5866 +524288,7,radixsort_pow2,46.8894 +524288,7,radixsort_non_pow2,47.2251 +524288,8,cpu_pow2,25.8295 +524288,8,cpu_non_pow2,26.082 +524288,8,radixsort_pow2,46.4053 +524288,8,radixsort_non_pow2,48.3841 +524288,9,cpu_pow2,26.0114 +524288,9,cpu_non_pow2,25.8691 +524288,9,radixsort_pow2,50.2007 +524288,9,radixsort_non_pow2,46.5084 +524288,10,cpu_pow2,26.3342 +524288,10,cpu_non_pow2,25.7634 +524288,10,radixsort_pow2,43.8896 +524288,10,radixsort_non_pow2,46.1819 +1048576,1,cpu_pow2,51.725 +1048576,1,cpu_non_pow2,51.8295 +1048576,1,radixsort_pow2,51.6252 +1048576,1,radixsort_non_pow2,50.9465 +1048576,2,cpu_pow2,51.2834 +1048576,2,cpu_non_pow2,49.5117 +1048576,2,radixsort_pow2,50.2353 +1048576,2,radixsort_non_pow2,50.6454 +1048576,3,cpu_pow2,50.8451 +1048576,3,cpu_non_pow2,51.7985 +1048576,3,radixsort_pow2,50.0359 +1048576,3,radixsort_non_pow2,51.182 +1048576,4,cpu_pow2,50.7828 +1048576,4,cpu_non_pow2,51.3079 +1048576,4,radixsort_pow2,49.7542 +1048576,4,radixsort_non_pow2,50.2588 +1048576,5,cpu_pow2,50.5902 +1048576,5,cpu_non_pow2,49.7813 +1048576,5,radixsort_pow2,50.2263 +1048576,5,radixsort_non_pow2,50.1767 +1048576,6,cpu_pow2,49.979 +1048576,6,cpu_non_pow2,50.7429 +1048576,6,radixsort_pow2,49.4942 +1048576,6,radixsort_non_pow2,49.4941 +1048576,7,cpu_pow2,51.1298 +1048576,7,cpu_non_pow2,52.1358 +1048576,7,radixsort_pow2,50.1065 +1048576,7,radixsort_non_pow2,51.6395 +1048576,8,cpu_pow2,50.1754 +1048576,8,cpu_non_pow2,50.5928 +1048576,8,radixsort_pow2,49.076 +1048576,8,radixsort_non_pow2,52.1901 +1048576,9,cpu_pow2,50.7922 +1048576,9,cpu_non_pow2,50.5043 +1048576,9,radixsort_pow2,50.7409 +1048576,9,radixsort_non_pow2,50.9494 +1048576,10,cpu_pow2,49.3953 +1048576,10,cpu_non_pow2,50.3009 +1048576,10,radixsort_pow2,50.0728 +1048576,10,radixsort_non_pow2,50.6073 +2097152,1,cpu_pow2,99.2969 +2097152,1,cpu_non_pow2,98.844 +2097152,1,radixsort_pow2,57.5874 +2097152,1,radixsort_non_pow2,56.3668 +2097152,2,cpu_pow2,98.2919 +2097152,2,cpu_non_pow2,99.0055 +2097152,2,radixsort_pow2,58.1849 +2097152,2,radixsort_non_pow2,57.1159 +2097152,3,cpu_pow2,99.7459 +2097152,3,cpu_non_pow2,100.664 +2097152,3,radixsort_pow2,56.7805 +2097152,3,radixsort_non_pow2,56.1347 +2097152,4,cpu_pow2,99.5306 +2097152,4,cpu_non_pow2,101.685 +2097152,4,radixsort_pow2,58.3835 +2097152,4,radixsort_non_pow2,57.0717 +2097152,5,cpu_pow2,100.764 +2097152,5,cpu_non_pow2,101.344 +2097152,5,radixsort_pow2,57.3329 +2097152,5,radixsort_non_pow2,58.6157 +2097152,6,cpu_pow2,102.15 +2097152,6,cpu_non_pow2,101.258 +2097152,6,radixsort_pow2,57.9577 +2097152,6,radixsort_non_pow2,58.5743 +2097152,7,cpu_pow2,98.6016 +2097152,7,cpu_non_pow2,98.6839 +2097152,7,radixsort_pow2,56.3813 +2097152,7,radixsort_non_pow2,58.0832 +2097152,8,cpu_pow2,99.5248 +2097152,8,cpu_non_pow2,101.089 +2097152,8,radixsort_pow2,57.4688 +2097152,8,radixsort_non_pow2,58.6643 +2097152,9,cpu_pow2,100.417 +2097152,9,cpu_non_pow2,98.967 +2097152,9,radixsort_pow2,55.6776 +2097152,9,radixsort_non_pow2,57.4975 +2097152,10,cpu_pow2,102.541 +2097152,10,cpu_non_pow2,101.076 +2097152,10,radixsort_pow2,57.8341 +2097152,10,radixsort_non_pow2,57.5494 +4194304,1,cpu_pow2,195.23 +4194304,1,cpu_non_pow2,194.335 +4194304,1,radixsort_pow2,73.6001 +4194304,1,radixsort_non_pow2,74.0202 +4194304,2,cpu_pow2,195.094 +4194304,2,cpu_non_pow2,190.167 +4194304,2,radixsort_pow2,72.3023 +4194304,2,radixsort_non_pow2,77.5372 +4194304,3,cpu_pow2,197.997 +4194304,3,cpu_non_pow2,193.224 +4194304,3,radixsort_pow2,72.8361 +4194304,3,radixsort_non_pow2,75.0703 +4194304,4,cpu_pow2,195.771 +4194304,4,cpu_non_pow2,196.456 +4194304,4,radixsort_pow2,75.5524 +4194304,4,radixsort_non_pow2,73.361 +4194304,5,cpu_pow2,195.885 +4194304,5,cpu_non_pow2,193.249 +4194304,5,radixsort_pow2,74.1706 +4194304,5,radixsort_non_pow2,73.4458 +4194304,6,cpu_pow2,198.366 +4194304,6,cpu_non_pow2,197.27 +4194304,6,radixsort_pow2,73.3776 +4194304,6,radixsort_non_pow2,72.547 +4194304,7,cpu_pow2,199.973 +4194304,7,cpu_non_pow2,195.923 +4194304,7,radixsort_pow2,74.1127 +4194304,7,radixsort_non_pow2,73.2485 +4194304,8,cpu_pow2,194.349 +4194304,8,cpu_non_pow2,195.795 +4194304,8,radixsort_pow2,72.7879 +4194304,8,radixsort_non_pow2,73.6293 +4194304,9,cpu_pow2,199.44 +4194304,9,cpu_non_pow2,196.657 +4194304,9,radixsort_pow2,73.8464 +4194304,9,radixsort_non_pow2,73.5133 +4194304,10,cpu_pow2,196.2 +4194304,10,cpu_non_pow2,195.402 +4194304,10,radixsort_pow2,73.7365 +4194304,10,radixsort_non_pow2,73.4385 +8388608,1,cpu_pow2,389.795 +8388608,1,cpu_non_pow2,401.727 +8388608,1,radixsort_pow2,107.963 +8388608,1,radixsort_non_pow2,105.163 +8388608,2,cpu_pow2,385.408 +8388608,2,cpu_non_pow2,383.691 +8388608,2,radixsort_pow2,105.487 +8388608,2,radixsort_non_pow2,105.208 +8388608,3,cpu_pow2,391.59 +8388608,3,cpu_non_pow2,374.424 +8388608,3,radixsort_pow2,104.287 +8388608,3,radixsort_non_pow2,107.228 +8388608,4,cpu_pow2,377.539 +8388608,4,cpu_non_pow2,395.694 +8388608,4,radixsort_pow2,104.934 +8388608,4,radixsort_non_pow2,108.62 +8388608,5,cpu_pow2,383.13 +8388608,5,cpu_non_pow2,387.813 +8388608,5,radixsort_pow2,105.347 +8388608,5,radixsort_non_pow2,102.123 +8388608,6,cpu_pow2,393.158 +8388608,6,cpu_non_pow2,387.59 +8388608,6,radixsort_pow2,104.947 +8388608,6,radixsort_non_pow2,105.546 +8388608,7,cpu_pow2,393.783 +8388608,7,cpu_non_pow2,398.769 +8388608,7,radixsort_pow2,106.472 +8388608,7,radixsort_non_pow2,104.916 +8388608,8,cpu_pow2,380.899 +8388608,8,cpu_non_pow2,385.233 +8388608,8,radixsort_pow2,104.837 +8388608,8,radixsort_non_pow2,108.965 +8388608,9,cpu_pow2,388.114 +8388608,9,cpu_non_pow2,383.582 +8388608,9,radixsort_pow2,104.838 +8388608,9,radixsort_non_pow2,107.71 +8388608,10,cpu_pow2,385.488 +8388608,10,cpu_non_pow2,378.217 +8388608,10,radixsort_pow2,104.353 +8388608,10,radixsort_non_pow2,104.079 +16777216,1,cpu_pow2,774.947 +16777216,1,cpu_non_pow2,760.568 +16777216,1,radixsort_pow2,169.911 +16777216,1,radixsort_non_pow2,160.96 +16777216,2,cpu_pow2,767.042 +16777216,2,cpu_non_pow2,755.257 +16777216,2,radixsort_pow2,167.325 +16777216,2,radixsort_non_pow2,159.101 +16777216,3,cpu_pow2,767.151 +16777216,3,cpu_non_pow2,761.937 +16777216,3,radixsort_pow2,168.185 +16777216,3,radixsort_non_pow2,162.628 +16777216,4,cpu_pow2,765.656 +16777216,4,cpu_non_pow2,772.266 +16777216,4,radixsort_pow2,167.013 +16777216,4,radixsort_non_pow2,159.146 +16777216,5,cpu_pow2,770.765 +16777216,5,cpu_non_pow2,765.667 +16777216,5,radixsort_pow2,166.513 +16777216,5,radixsort_non_pow2,160.025 +16777216,6,cpu_pow2,762.95 +16777216,6,cpu_non_pow2,770.438 +16777216,6,radixsort_pow2,167.253 +16777216,6,radixsort_non_pow2,169.067 +16777216,7,cpu_pow2,759.056 +16777216,7,cpu_non_pow2,761.054 +16777216,7,radixsort_pow2,167.528 +16777216,7,radixsort_non_pow2,158.333 +16777216,8,cpu_pow2,753.022 +16777216,8,cpu_non_pow2,753.03 +16777216,8,radixsort_pow2,168.375 +16777216,8,radixsort_non_pow2,162.041 +16777216,9,cpu_pow2,756.446 +16777216,9,cpu_non_pow2,764.814 +16777216,9,radixsort_pow2,166.933 +16777216,9,radixsort_non_pow2,159.813 +16777216,10,cpu_pow2,775.042 +16777216,10,cpu_non_pow2,759.894 +16777216,10,radixsort_pow2,166.249 +16777216,10,radixsort_non_pow2,161.447 +33554432,1,cpu_pow2,1539.45 +33554432,1,cpu_non_pow2,1523.08 +33554432,1,radixsort_pow2,290.056 +33554432,1,radixsort_non_pow2,279.519 +33554432,2,cpu_pow2,1533.33 +33554432,2,cpu_non_pow2,1492.94 +33554432,2,radixsort_pow2,286.482 +33554432,2,radixsort_non_pow2,278.727 +33554432,3,cpu_pow2,1518.03 +33554432,3,cpu_non_pow2,1520.47 +33554432,3,radixsort_pow2,286.96 +33554432,3,radixsort_non_pow2,279.168 +33554432,4,cpu_pow2,1513.25 +33554432,4,cpu_non_pow2,1515.85 +33554432,4,radixsort_pow2,290.399 +33554432,4,radixsort_non_pow2,279.422 +33554432,5,cpu_pow2,1516.88 +33554432,5,cpu_non_pow2,1507.03 +33554432,5,radixsort_pow2,287.379 +33554432,5,radixsort_non_pow2,303.782 +33554432,6,cpu_pow2,1526.54 +33554432,6,cpu_non_pow2,1518.3 +33554432,6,radixsort_pow2,292.845 +33554432,6,radixsort_non_pow2,279.78 +33554432,7,cpu_pow2,1516.4 +33554432,7,cpu_non_pow2,1501.75 +33554432,7,radixsort_pow2,285.407 +33554432,7,radixsort_non_pow2,278.723 +33554432,8,cpu_pow2,1539.81 +33554432,8,cpu_non_pow2,1504.19 +33554432,8,radixsort_pow2,285.582 +33554432,8,radixsort_non_pow2,278.672 +33554432,9,cpu_pow2,1505.77 +33554432,9,cpu_non_pow2,1536.85 +33554432,9,radixsort_pow2,290.809 +33554432,9,radixsort_non_pow2,305.345 +33554432,10,cpu_pow2,1546.37 +33554432,10,cpu_non_pow2,1518.21 +33554432,10,radixsort_pow2,289.364 +33554432,10,radixsort_non_pow2,280.599 +67108864,1,cpu_pow2,3068.55 +67108864,1,cpu_non_pow2,3012.11 +67108864,1,radixsort_pow2,515.612 +67108864,1,radixsort_non_pow2,519.143 +67108864,2,cpu_pow2,3005.19 +67108864,2,cpu_non_pow2,3028.14 +67108864,2,radixsort_pow2,526.227 +67108864,2,radixsort_non_pow2,515.964 +67108864,3,cpu_pow2,3058.06 +67108864,3,cpu_non_pow2,3049.14 +67108864,3,radixsort_pow2,521.026 +67108864,3,radixsort_non_pow2,515.174 +67108864,4,cpu_pow2,3010.2 +67108864,4,cpu_non_pow2,3019.12 +67108864,4,radixsort_pow2,517.42 +67108864,4,radixsort_non_pow2,516.477 +67108864,5,cpu_pow2,3032.43 +67108864,5,cpu_non_pow2,3002.39 +67108864,5,radixsort_pow2,521.633 +67108864,5,radixsort_non_pow2,516.546 +67108864,6,cpu_pow2,3027.27 +67108864,6,cpu_non_pow2,3023.65 +67108864,6,radixsort_pow2,520.2 +67108864,6,radixsort_non_pow2,517.002 +67108864,7,cpu_pow2,3067.05 +67108864,7,cpu_non_pow2,3092.49 +67108864,7,radixsort_pow2,536.546 +67108864,7,radixsort_non_pow2,518.33 +67108864,8,cpu_pow2,3017.05 +67108864,8,cpu_non_pow2,3019.5 +67108864,8,radixsort_pow2,524.98 +67108864,8,radixsort_non_pow2,519.893 +67108864,9,cpu_pow2,3011.92 +67108864,9,cpu_non_pow2,2995.26 +67108864,9,radixsort_pow2,523.85 +67108864,9,radixsort_non_pow2,517.899 +67108864,10,cpu_pow2,3003.54 +67108864,10,cpu_non_pow2,3012.17 +67108864,10,radixsort_pow2,520.796 +67108864,10,radixsort_non_pow2,519.375 +134217728,1,cpu_pow2,5936.73 +134217728,1,cpu_non_pow2,6091.71 +134217728,1,radixsort_pow2,1044.68 +134217728,1,radixsort_non_pow2,1035.31 +134217728,2,cpu_pow2,6000.72 +134217728,2,cpu_non_pow2,5948.19 +134217728,2,radixsort_pow2,1033.53 +134217728,2,radixsort_non_pow2,1027.97 +134217728,3,cpu_pow2,5981.03 +134217728,3,cpu_non_pow2,5984.76 +134217728,3,radixsort_pow2,1032.85 +134217728,3,radixsort_non_pow2,1028.51 +134217728,4,cpu_pow2,5893.52 +134217728,4,cpu_non_pow2,5995.66 +134217728,4,radixsort_pow2,1036.58 +134217728,4,radixsort_non_pow2,1026.83 +134217728,5,cpu_pow2,5970.86 +134217728,5,cpu_non_pow2,6050.97 +134217728,5,radixsort_pow2,1057.49 +134217728,5,radixsort_non_pow2,1027.12 +134217728,6,cpu_pow2,6114.02 +134217728,6,cpu_non_pow2,6219.39 +134217728,6,radixsort_pow2,1049.61 +134217728,6,radixsort_non_pow2,1082.07 +134217728,7,cpu_pow2,6049.7 +134217728,7,cpu_non_pow2,6038.77 +134217728,7,radixsort_pow2,1037.27 +134217728,7,radixsort_non_pow2,1031.87 +134217728,8,cpu_pow2,6179.4 +134217728,8,cpu_non_pow2,6105.52 +134217728,8,radixsort_pow2,1089.13 +134217728,8,radixsort_non_pow2,1081.17 +134217728,9,cpu_pow2,6135.11 +134217728,9,cpu_non_pow2,6102.33 +134217728,9,radixsort_pow2,1062.17 +134217728,9,radixsort_non_pow2,1052.3 +134217728,10,cpu_pow2,6062.03 +134217728,10,cpu_non_pow2,6224.28 +134217728,10,radixsort_pow2,1061.0 +134217728,10,radixsort_non_pow2,1054.05 diff --git a/plots/data/radix_timings_avg.csv b/plots/data/radix_timings_avg.csv new file mode 100644 index 00000000..741355a8 --- /dev/null +++ b/plots/data/radix_timings_avg.csv @@ -0,0 +1,81 @@ +size,method,time_ms +256,cpu_non_pow2,0.00713 +256,cpu_pow2,0.00878 +256,radixsort_non_pow2,6.19346 +256,radixsort_pow2,4.945831 +512,cpu_non_pow2,0.01636 +512,cpu_pow2,0.0186 +512,radixsort_non_pow2,5.848578 +512,radixsort_pow2,4.871586 +1024,cpu_non_pow2,0.039029999999999995 +1024,cpu_pow2,0.04004 +1024,radixsort_non_pow2,9.699127 +1024,radixsort_pow2,8.704512 +2048,cpu_non_pow2,0.08917 +2048,cpu_pow2,0.09833 +2048,radixsort_non_pow2,9.690515 +2048,radixsort_pow2,8.518358000000001 +4096,cpu_non_pow2,0.17701 +4096,cpu_pow2,0.18516 +4096,radixsort_non_pow2,9.910680000000001 +4096,radixsort_pow2,8.647372 +8192,cpu_non_pow2,0.41058000000000006 +8192,cpu_pow2,0.41955 +8192,radixsort_non_pow2,9.76272 +8192,radixsort_pow2,8.769546 +16384,cpu_non_pow2,0.7470100000000001 +16384,cpu_pow2,0.84139 +16384,radixsort_non_pow2,9.737833 +16384,radixsort_pow2,8.49664 +32768,cpu_non_pow2,1.53005 +32768,cpu_pow2,1.72796 +32768,radixsort_non_pow2,9.707401 +32768,radixsort_pow2,8.607403999999999 +65536,cpu_non_pow2,3.3447699999999996 +65536,cpu_pow2,3.4475899999999995 +65536,radixsort_non_pow2,9.447094 +65536,radixsort_pow2,8.248005000000001 +131072,cpu_non_pow2,6.893680000000001 +131072,cpu_pow2,6.973090000000001 +131072,radixsort_non_pow2,10.297345 +131072,radixsort_pow2,10.213548 +262144,cpu_non_pow2,13.681030000000002 +262144,cpu_pow2,13.77611 +262144,radixsort_non_pow2,40.68352 +262144,radixsort_pow2,39.84491 +524288,cpu_non_pow2,26.028609999999997 +524288,cpu_pow2,26.13623 +524288,radixsort_non_pow2,47.05912 +524288,radixsort_pow2,46.81048 +1048576,cpu_non_pow2,50.85056 +1048576,cpu_pow2,50.66982 +1048576,radixsort_non_pow2,50.808980000000005 +1048576,radixsort_pow2,50.13673 +2097152,cpu_non_pow2,100.26164 +2097152,cpu_pow2,100.08637 +2097152,radixsort_non_pow2,57.56735 +2097152,radixsort_pow2,57.35887 +4194304,cpu_non_pow2,194.8478 +4194304,cpu_pow2,196.8305 +4194304,radixsort_non_pow2,73.98111 +4194304,radixsort_pow2,73.63226 +8388608,cpu_non_pow2,387.674 +8388608,cpu_pow2,386.8904 +8388608,radixsort_non_pow2,105.9558 +8388608,radixsort_pow2,105.34649999999999 +16777216,cpu_non_pow2,762.4925000000001 +16777216,cpu_pow2,765.2077 +16777216,radixsort_non_pow2,161.2561 +16777216,radixsort_pow2,167.52849999999998 +33554432,cpu_non_pow2,1513.867 +33554432,cpu_pow2,1525.583 +33554432,radixsort_non_pow2,284.3737 +33554432,radixsort_pow2,288.5283 +67108864,cpu_non_pow2,3025.397 +67108864,cpu_pow2,3030.126 +67108864,radixsort_non_pow2,517.5803 +67108864,radixsort_pow2,522.829 +134217728,cpu_non_pow2,6076.158 +134217728,cpu_pow2,6032.312 +134217728,radixsort_non_pow2,1044.7199999999998 +134217728,radixsort_pow2,1050.431 diff --git a/plots/data/timings.txt b/plots/data/timings.txt new file mode 100644 index 00000000..34aea946 --- /dev/null +++ b/plots/data/timings.txt @@ -0,0 +1,3776 @@ +** SCAN TESTS ** +SIZE= 1048576 +0.4504 +0.4851 +1.36806 +1.63558 +0.2928 +0.119808 +0.591872 +0.871424 + +1.7626 +1.6515 +2.9795 +0.485824 +0.269312 +** SCAN TESTS ** +SIZE= 1048576 +0.5501 +0.4727 +0.99328 +0.896576 +0.991584 +0.15872 +0.504832 +0.50688 + +1.7609 +1.7537 +3.0031 +0.57184 +0.28672 +** SCAN TESTS ** +SIZE= 1048576 +0.4682 +0.5173 +1.0007 +0.8968 +0.33104 +0.129024 +0.555008 +0.534528 + +1.788 +1.7597 +2.905 +0.562208 +0.236544 +** SCAN TESTS ** +SIZE= 1048576 +0.5049 +0.5013 +1.05923 +1.55363 +0.309632 +0.128 +0.5632 +1.18886 + +1.7676 +1.7558 +3.0609 +0.45408 +0.289792 +** SCAN TESTS ** +SIZE= 1048576 +0.5039 +0.4544 +1.04778 +1.02218 +0.269248 +0.130048 +1.02502 +1.03936 + +1.7631 +1.767 +3.1677 +0.541056 +0.498688 +** SCAN TESTS ** +SIZE= 1048576 +0.5116 +0.5015 +1.12358 +0.924448 +0.275488 +0.126976 +0.576512 +0.52736 + +1.7716 +1.7523 +2.8995 +0.485312 +0.267264 +** SCAN TESTS ** +SIZE= 1048576 +0.6063 +0.472 +1.09014 +0.907424 +0.287744 +0.121856 +0.555008 +0.627712 + +1.7736 +1.7799 +2.9913 +0.481152 +0.231424 +** SCAN TESTS ** +SIZE= 1048576 +0.54 +0.4696 +0.990528 +0.90704 +0.49472 +0.130048 +0.782336 +0.572416 + +1.7738 +1.7718 +2.9814 +0.683872 +0.221184 +** SCAN TESTS ** +SIZE= 1048576 +0.4714 +0.4571 +1.06557 +0.888192 +0.289152 +0.216064 +0.490496 +0.564224 + +1.772 +1.7727 +2.9239 +0.821728 +0.882688 +** SCAN TESTS ** +SIZE= 1048576 +0.6784 +0.473 +1.01606 +0.959392 +0.44688 +0.119808 +0.566272 +0.483328 + +1.7899 +1.7777 +3.0413 +1.20099 +0.2816 +** SCAN TESTS ** +SIZE= 524288 +0.2726 +0.2448 +0.923296 +0.919456 +0.319264 +0.145408 +0.570368 +0.499712 + +0.8899 +0.8792 +1.484 +0.366624 +0.139264 +** SCAN TESTS ** +SIZE= 524288 +0.2597 +3.2616 +1.64019 +0.768576 +0.218208 +0.078848 +0.5376 +0.524288 + +0.8938 +0.8795 +1.4681 +0.3992 +0.1536 +** SCAN TESTS ** +SIZE= 524288 +0.2345 +0.2091 +0.795232 +0.656224 +0.352256 +0.079872 +0.551936 +0.507904 + +0.9156 +0.8925 +1.5577 +0.403616 +0.157696 +** SCAN TESTS ** +SIZE= 524288 +0.2419 +0.2087 +1.56384 +0.628512 +0.778464 +0.1024 +0.520192 +0.49152 + +0.8868 +0.8961 +1.4897 +0.357472 +0.18944 +** SCAN TESTS ** +SIZE= 524288 +0.334 +0.2255 +0.72784 +0.688512 +0.321056 +0.079872 +0.550912 +0.482304 + +0.8902 +0.8859 +1.5687 +0.439872 +0.190464 +** SCAN TESTS ** +SIZE= 524288 +0.3269 +0.2151 +0.756384 +0.582912 +0.224256 +0.11776 +0.618496 +0.505856 + +0.8836 +0.8815 +1.4793 +0.35952 +0.205824 +** SCAN TESTS ** +SIZE= 524288 +0.2721 +0.334 +0.797824 +0.861888 +0.263424 +0.110592 +1.24621 +0.480256 + +0.8919 +0.8867 +1.5136 +0.351904 +0.167936 +** SCAN TESTS ** +SIZE= 524288 +0.2777 +0.2814 +1.23696 +0.777856 +0.535456 +0.080896 +0.559104 +0.521216 + +0.8926 +0.8845 +1.4883 +0.362944 +0.197632 +** SCAN TESTS ** +SIZE= 524288 +0.2368 +0.3119 +0.775712 +0.759072 +0.58144 +0.105472 +0.560128 +0.482304 + +0.8946 +0.8832 +1.6546 +0.364192 +0.150528 +** SCAN TESTS ** +SIZE= 524288 +0.2817 +0.316 +0.665056 +0.855264 +0.207776 +0.077824 +0.574464 +0.956416 + +0.8979 +0.8796 +1.466 +0.436096 +0.139264 +** SCAN TESTS ** +SIZE= 524288 +0.3001 +0.2419 +0.897216 +0.684736 +0.24432 +0.106496 +0.668672 +0.726016 + +0.886 +0.8807 +1.5079 +0.305024 +0.187392 +** SCAN TESTS ** +SIZE= 524288 +0.2796 +0.2488 +0.63728 +0.867808 +0.216096 +0.08192 +0.600064 +0.521216 + +0.8975 +0.8859 +1.481 +0.371648 +0.181248 +** SCAN TESTS ** +SIZE= 524288 +0.2721 +0.223 +0.832608 +0.90384 +0.226368 +0.08192 +0.525312 +0.451584 + +0.8892 +0.8826 +1.4629 +0.826112 +0.147456 +** SCAN TESTS ** +SIZE= 524288 +0.2353 +0.2863 +1.67933 +0.684672 +0.225216 +0.090112 +0.516096 +0.50688 + +0.8852 +0.8858 +1.5655 +0.390144 +0.192512 +** SCAN TESTS ** +SIZE= 262144 +0.1314 +0.0997 +1.42598 +1.02925 +0.202048 +0.083968 +0.536576 +0.622592 + +0.4531 +0.4483 +0.7781 +0.221024 +0.103424 +** SCAN TESTS ** +SIZE= 262144 +0.1202 +0.1002 +0.5288 +0.388416 +0.202304 +0.062464 +0.549888 +0.546816 + +0.4517 +0.4558 +0.7405 +0.349216 +0.104448 +** SCAN TESTS ** +SIZE= 262144 +0.128 +0.1123 +0.486112 +0.406784 +0.207776 +0.074752 +0.576512 +0.585728 + +0.4498 +0.4545 +0.7724 +0.250848 +0.113664 +** SCAN TESTS ** +SIZE= 262144 +0.1185 +0.124 +0.512032 +0.600384 +0.266528 +0.06144 +0.605184 +0.605184 + +0.4484 +0.4363 +0.7507 +0.309856 +0.11776 +** SCAN TESTS ** +SIZE= 262144 +0.1297 +0.1247 +0.47056 +0.394112 +0.197408 +0.063488 +0.514048 +0.637952 + +0.4533 +0.4422 +0.7578 +0.294272 +0.110592 +** SCAN TESTS ** +SIZE= 262144 +0.1519 +0.1236 +0.498464 +0.505952 +0.202048 +0.095232 +0.470016 +0.448512 + +0.446 +0.4488 +0.7608 +0.287456 +0.101376 +** SCAN TESTS ** +SIZE= 262144 +0.1315 +0.1236 +0.448384 +0.430176 +0.22336 +0.063488 +0.587776 +0.466944 + +0.4529 +0.4493 +0.7285 +0.354048 +0.101376 +** SCAN TESTS ** +SIZE= 262144 +0.1422 +0.126 +0.46608 +0.38432 +0.199616 +0.063488 +0.598016 +0.49152 + +0.4483 +0.4374 +0.7773 +0.300608 +0.100352 +** SCAN TESTS ** +SIZE= 262144 +0.1484 +0.1519 +0.476 +0.413568 +0.192672 +0.074752 +0.52224 +0.497664 + +0.4477 +0.4446 +0.7858 +0.329856 +0.101376 +** SCAN TESTS ** +SIZE= 262144 +0.1363 +0.1041 +0.54752 +0.397024 +0.19744 +0.063488 +0.562176 +0.454656 + +0.4555 +0.462 +0.7988 +0.277088 +0.118784 +** SCAN TESTS ** +SIZE= 262144 +0.1295 +0.1524 +0.560736 +0.408288 +0.26848 +0.075776 +0.545792 +0.50688 + +0.4531 +0.4515 +0.8054 +0.41584 +0.13824 +** SCAN TESTS ** +SIZE= 262144 +0.1357 +0.1516 +0.47424 +0.82832 +0.226016 +0.0768 +0.560128 +0.544672 + +0.4537 +0.4384 +0.8328 +0.871456 +0.105472 +** SCAN TESTS ** +SIZE= 262144 +0.1322 +0.1218 +0.488352 +0.378112 +0.220448 +0.06256 +0.502784 +0.502784 + +0.4574 +0.4532 +0.8122 +0.343904 +0.11888 +** SCAN TESTS ** +SIZE= 262144 +0.1223 +0.1229 +0.573664 +0.396352 +0.252704 +0.063488 +0.618496 +0.470016 + +0.458 +0.4536 +0.792 +0.3864 +0.115712 +** SCAN TESTS ** +SIZE= 262144 +0.1221 +0.1237 +0.583712 +0.387616 +0.24896 +0.062464 +0.5376 +0.636928 + +0.4624 +0.4436 +0.8024 +0.3528 +0.110592 +** SCAN TESTS ** +SIZE= 262144 +0.12 +0.1097 +0.485696 +0.365184 +0.223264 +0.062464 +0.585728 +0.49664 + +0.4463 +0.4437 +0.7758 +0.808992 +0.1024 +** SCAN TESTS ** +SIZE= 131072 +0.0641 +0.069 +0.423104 +0.610944 +0.197248 +0.056352 +0.14336 +0.047104 + +0.2291 +0.2283 +0.3813 +0.234144 +0.099328 +** SCAN TESTS ** +SIZE= 131072 +0.058 +0.0485 +1.17872 +0.313568 +0.23776 +0.068608 +0.132096 +0.045056 + +0.2289 +0.2181 +0.3837 +0.231712 +0.162816 +** SCAN TESTS ** +SIZE= 131072 +0.0716 +0.0612 +0.571392 +0.338112 +0.177792 +0.057344 +0.160768 +0.048128 + +0.226 +0.2171 +0.3918 +0.23936 +0.08704 +** SCAN TESTS ** +SIZE= 131072 +0.0672 +0.0507 +1.22704 +0.41136 +0.181312 +0.105472 +0.106496 +0.045056 + +0.2274 +0.2205 +0.3919 +0.240768 +0.130048 +** SCAN TESTS ** +SIZE= 131072 +0.0628 +0.0569 +0.508416 +0.386208 +0.192512 +0.057344 +0.155648 +0.047104 + +0.2477 +0.2164 +0.4028 +0.35344 +0.08704 +** SCAN TESTS ** +SIZE= 131072 +0.0652 +0.0605 +0.52832 +0.337024 +0.177568 +0.067584 +0.103424 +0.046016 + +0.2262 +0.2268 +0.3835 +0.249344 +0.28672 +** SCAN TESTS ** +SIZE= 131072 +0.066 +0.0593 +0.452096 +0.336096 +0.233248 +0.05744 +0.110592 +0.04608 + +0.2286 +0.2323 +0.3869 +0.223744 +0.083968 +** SCAN TESTS ** +SIZE= 131072 +0.0667 +0.0484 +0.71808 +0.406528 +0.228768 +0.059392 +0.105472 +0.083968 + +0.227 +0.2265 +0.3798 +0.210304 +0.088064 +** SCAN TESTS ** +SIZE= 131072 +0.0598 +0.0604 +0.476672 +0.367616 +0.17536 +0.05632 +0.10752 +0.050176 + +0.2286 +0.224 +0.4248 +0.263264 +0.104448 +** SCAN TESTS ** +SIZE= 131072 +0.061 +0.0508 +0.626464 +0.346592 +0.177152 +0.05632 +0.145408 +0.047008 + +0.2259 +0.2186 +0.3903 +0.2552 +0.109568 +** SCAN TESTS ** +SIZE= 131072 +0.0653 +0.075 +0.523136 +0.341312 +0.218432 +0.101376 +0.120832 +0.057344 + +0.2588 +0.2565 +0.439 +0.341088 +0.08704 +** SCAN TESTS ** +SIZE= 131072 +0.057 +0.0485 +0.452832 +1.05754 +0.223168 +0.068608 +0.113664 +0.053248 + +0.2266 +0.2179 +0.3915 +0.283936 +0.101376 +** SCAN TESTS ** +SIZE= 65536 +0.0298 +0.0273 +0.50256 +0.40112 +0.199424 +0.053248 +0.109568 +0.048128 + +0.1155 +0.1101 +0.2443 +0.193216 +0.269312 +** SCAN TESTS ** +SIZE= 65536 +0.0349 +0.034 +0.446688 +0.36672 +0.176256 +0.055296 +0.128 +0.083968 + +0.115 +0.1158 +0.1965 +0.228288 +0.113664 +** SCAN TESTS ** +SIZE= 65536 +0.0291 +0.0269 +0.444896 +0.314976 +0.22944 +0.052224 +0.13824 +0.078848 + +0.118 +0.1188 +0.2145 +0.255872 +0.088064 +** SCAN TESTS ** +SIZE= 65536 +0.0413 +0.0267 +0.467008 +0.281504 +0.220832 +0.053248 +0.105472 +0.050176 + +0.1175 +0.1124 +0.2082 +0.17616 +0.099328 +** SCAN TESTS ** +SIZE= 65536 +0.0282 +0.0271 +0.43968 +0.331424 +0.195552 +0.055296 +0.125952 +0.079872 + +0.1166 +0.1098 +0.2073 +0.256384 +0.082944 +** SCAN TESTS ** +SIZE= 65536 +0.0357 +0.0344 +0.572032 +0.274688 +0.204416 +0.053248 +0.104448 +0.04608 + +0.115 +0.1097 +0.217 +0.174432 +0.114688 +** SCAN TESTS ** +SIZE= 65536 +0.0352 +0.0341 +0.571328 +0.437504 +0.22624 +0.08704 +0.108544 +0.049152 + +0.1155 +0.1101 +0.2106 +0.316192 +0.079872 +** SCAN TESTS ** +SIZE= 65536 +0.0299 +0.0288 +0.464736 +0.297248 +0.207424 +0.052224 +0.104448 +0.04608 + +0.113 +0.107 +0.2086 +0.22704 +0.121856 +** SCAN TESTS ** +SIZE= 65536 +0.0312 +0.028 +0.467232 +0.448608 +0.267168 +0.053312 +0.14848 +0.048128 + +0.115 +0.1095 +0.1975 +0.218912 +0.079872 +** SCAN TESTS ** +SIZE= 65536 +0.0291 +0.0279 +1.16371 +0.336096 +0.218432 +0.052224 +0.126976 +0.048128 + +0.1459 +0.1438 +0.2517 +0.889952 +0.079872 +** SCAN TESTS ** +SIZE= 65536 +0.0346 +0.034 +0.567616 +0.3104 +0.202944 +0.053248 +0.137216 +0.093184 + +0.1468 +0.1448 +0.2769 +0.279424 +0.09216 +** SCAN TESTS ** +SIZE= 32768 +0.017 +0.016 +0.399936 +0.355616 +0.179584 +0.08704 +0.10752 +0.044032 + +0.0643 +0.0539 +0.1181 +0.21072 +0.079872 +** SCAN TESTS ** +SIZE= 32768 +0.0151 +0.0139 +0.393152 +0.447936 +0.176512 +0.0512 +0.141312 +0.029696 + +0.0583 +0.0581 +0.1111 +0.227392 +0.077824 +** SCAN TESTS ** +SIZE= 32768 +0.0139 +0.0132 +0.518368 +0.3568 +0.203072 +0.082944 +0.123872 +0.04608 + +0.066 +0.0546 +0.1048 +0.208608 +0.078848 +** SCAN TESTS ** +SIZE= 32768 +0.0167 +0.016 +0.4968 +0.338592 +0.21152 +0.084992 +0.144384 +0.03072 + +0.0595 +0.0586 +0.1106 +0.257408 +0.110592 +** SCAN TESTS ** +SIZE= 32768 +0.0206 +0.0129 +0.474816 +0.27232 +0.203456 +0.0512 +0.124928 +0.0768 + +0.0585 +0.0582 +0.1071 +0.255168 +0.146432 +** SCAN TESTS ** +SIZE= 32768 +0.0141 +0.0131 +0.488864 +0.31312 +0.17968 +0.052256 +0.105472 +0.044032 + +0.0581 +0.0544 +0.1183 +0.18912 +0.108544 +** SCAN TESTS ** +SIZE= 32768 +0.0173 +0.0166 +0.491584 +0.354208 +0.213728 +0.06144 +0.110592 +0.059392 + +0.0707 +0.0709 +0.1106 +0.221376 +0.101376 +** SCAN TESTS ** +SIZE= 32768 +0.0144 +0.0131 +0.482304 +0.740416 +0.220768 +0.053248 +0.118784 +0.050112 + +0.0594 +0.0588 +0.1067 +0.209408 +0.078848 +** SCAN TESTS ** +SIZE= 32768 +0.0136 +0.0119 +0.38848 +0.322016 +0.191776 +0.078848 +0.146432 +0.044032 + +0.0654 +0.055 +0.1108 +0.208832 +0.180224 +** SCAN TESTS ** +SIZE= 32768 +0.0167 +0.0159 +0.396736 +0.366464 +0.218144 +0.063488 +0.146432 +0.04096 + +0.0724 +0.067 +0.1376 +0.260352 +0.074752 +** SCAN TESTS ** +SIZE= 16384 +0.0082 +0.0076 +0.385024 +0.301056 +0.171008 +0.072736 +0.108544 +0.05632 + +0.0299 +0.0262 +0.0611 +0.182272 +0.142336 +** SCAN TESTS ** +SIZE= 16384 +0.0073 +0.0069 +0.425984 +0.314368 +0.187392 +0.052256 +0.10752 +0.050176 + +0.0377 +0.0319 +0.0798 +0.173056 +0.091136 +** SCAN TESTS ** +SIZE= 16384 +0.0071 +0.007 +0.976896 +0.270336 +0.748544 +0.095232 +0.10752 +0.047104 + +0.0301 +0.0295 +0.0595 +0.168 +0.144384 +** SCAN TESTS ** +SIZE= 16384 +0.0081 +0.0076 +0.369664 +0.251904 +0.158752 +0.0512 +0.098304 +0.0512 + +0.0377 +0.0349 +0.0776 +0.165888 +0.113664 +** SCAN TESTS ** +SIZE= 16384 +0.0079 +0.0061 +1.00147 +0.297984 +0.151552 +0.086016 +0.108544 +0.082944 + +0.0302 +0.0263 +0.1034 +0.141312 +0.136192 +** SCAN TESTS ** +SIZE= 16384 +0.0083 +0.0074 +0.386048 +0.280576 +0.124928 +0.0512 +0.13312 +0.050176 + +0.0418 +0.0351 +0.0744 +0.164864 +0.125952 +** SCAN TESTS ** +SIZE= 16384 +0.0079 +0.0076 +0.413696 +0.251008 +0.178176 +0.0512 +0.150528 +0.108544 + +0.0309 +0.0277 +0.0567 +0.171008 +0.096256 +** SCAN TESTS ** +SIZE= 16384 +0.0066 +0.0061 +0.473088 +0.351232 +0.131072 +0.075776 +0.130048 +0.047104 + +0.0388 +0.0354 +0.0739 +0.149504 +0.075776 +** SCAN TESTS ** +SIZE= 16384 +0.008 +0.0114 +0.366592 +0.377856 +0.18944 +0.060416 +0.106496 +0.055296 + +0.0307 +0.0278 +0.0614 +0.170048 +0.145408 +** SCAN TESTS ** +SIZE= 16384 +0.0082 +0.0076 +0.996352 +0.301056 +0.15872 +0.050272 +0.10752 +0.04608 + +0.0311 +0.0265 +0.0571 +0.157696 +0.14336 +** SCAN TESTS ** +SIZE= 8192 +0.0043 +0.0039 +0.315392 +0.234496 +0.186368 +0.052224 +0.11776 +0.048128 + +0.0184 +0.013 +0.0331 +0.22016 +0.10752 +** SCAN TESTS ** +SIZE= 8192 +0.004 +0.0031 +0.350208 +0.226304 +0.208896 +0.050176 +0.101376 +0.058368 + +0.015 +0.011 +0.0261 +0.181248 +0.147456 +** SCAN TESTS ** +SIZE= 8192 +0.0039 +0.0034 +0.93696 +0.306176 +0.13312 +0.053312 +0.124928 +0.039936 + +0.0153 +0.0107 +0.0266 +0.161792 +0.11264 +** SCAN TESTS ** +SIZE= 8192 +0.0046 +0.0038 +0.443392 +0.247808 +0.178176 +0.06656 +0.144384 +0.04608 + +0.0157 +0.0111 +0.0268 +0.159744 +0.123904 +** SCAN TESTS ** +SIZE= 8192 +0.0034 +0.0035 +0.387072 +0.270336 +0.173056 +0.058368 +0.118784 +0.047104 + +0.0153 +0.0107 +0.0265 +0.169984 +0.113664 +** SCAN TESTS ** +SIZE= 8192 +0.0042 +0.0041 +0.486528 +0.315392 +0.152576 +0.09216 +0.114688 +0.06656 + +0.0154 +0.0103 +0.027 +0.154656 +0.106496 +** SCAN TESTS ** +SIZE= 8192 +0.0044 +0.0038 +0.335872 +0.21504 +0.150528 +0.08704 +0.109568 +0.049152 + +0.0153 +0.0105 +0.0273 +0.690176 +0.074752 +** SCAN TESTS ** +SIZE= 8192 +0.0044 +0.0038 +0.345088 +0.325632 +0.131072 +0.0512 +0.126976 +0.045056 + +0.0152 +0.0104 +0.0265 +0.216064 +0.073728 +** SCAN TESTS ** +SIZE= 8192 +0.0042 +0.0039 +0.342016 +0.233472 +0.132096 +0.08704 +0.187392 +0.045056 + +0.0155 +0.0105 +0.0268 +0.18944 +0.139264 +** SCAN TESTS ** +SIZE= 8192 +0.0037 +0.0075 +0.352256 +0.29184 +0.170048 +0.086016 +0.105472 +0.04608 + +0.0155 +0.0106 +0.0265 +0.13312 +0.075776 +** SCAN TESTS ** +SIZE= 4096 +0.0027 +0.0015 +0.946176 +0.27648 +0.177152 +0.083968 +0.118784 +0.041984 + +0.0076 +0.0045 +0.0121 +0.124928 +0.073728 +** SCAN TESTS ** +SIZE= 4096 +0.0022 +0.0028 +0.907264 +0.221184 +0.14336 +0.072704 +0.109568 +0.045056 + +0.0079 +0.0045 +0.014 +0.171008 +0.072704 +** SCAN TESTS ** +SIZE= 4096 +0.0028 +0.002 +0.325632 +0.277504 +0.16384 +0.083968 +0.130048 +0.041984 + +0.008 +0.0046 +0.0143 +0.20992 +0.126976 +** SCAN TESTS ** +SIZE= 4096 +0.0028 +0.0019 +0.411648 +0.282624 +0.16384 +0.08192 +0.125952 +0.058368 + +0.0079 +0.0047 +0.0123 +0.163904 +0.120832 +** SCAN TESTS ** +SIZE= 4096 +0.0018 +0.0016 +0.295936 +0.27136 +0.1792 +0.06144 +0.111616 +0.045056 + +0.0078 +0.0044 +0.0132 +0.196608 +0.079872 +** SCAN TESTS ** +SIZE= 4096 +0.002 +0.0016 +0.268288 +0.241664 +0.145408 +0.0512 +0.11264 +0.079872 + +0.008 +0.0047 +0.0127 +0.160768 +0.142336 +** SCAN TESTS ** +SIZE= 4096 +0.0024 +0.002 +0.33792 +0.214016 +0.234496 +0.06144 +0.1024 +0.044032 + +0.0079 +0.0045 +0.0137 +0.151552 +0.109568 +** SCAN TESTS ** +SIZE= 4096 +0.0026 +0.002 +0.2816 +0.238592 +0.264192 +0.050176 +0.104448 +0.067584 + +0.0078 +0.0044 +0.0125 +0.164864 +0.11264 +** SCAN TESTS ** +SIZE= 4096 +0.0027 +0.002 +0.309248 +0.24064 +0.124928 +0.095232 +0.139264 +0.0512 + +0.0101 +0.0061 +0.0153 +0.212992 +0.098304 +** SCAN TESTS ** +SIZE= 4096 +0.0022 +0.0016 +0.340992 +0.26112 +0.157696 +0.086016 +0.100288 +0.057344 + +0.008 +0.0045 +0.015 +0.200704 +0.114688 +** SCAN TESTS ** +SIZE= 2048 +0.0014 +0.001 +0.385088 +0.299008 +0.12288 +0.083968 +0.11264 +0.045056 + +0.0041 +0.0023 +0.0108 +0.186368 +0.099328 +** SCAN TESTS ** +SIZE= 2048 +0.0015 +0.001 +0.953344 +0.288768 +0.196608 +0.053248 +0.116736 +0.049184 + +0.0042 +0.0022 +0.0084 +0.177152 +0.14336 +** SCAN TESTS ** +SIZE= 2048 +0.0012 +0.0009 +0.805888 +0.269312 +0.192512 +0.052224 +0.105472 +0.041984 + +0.0044 +0.0022 +0.0066 +0.187392 +0.142336 +** SCAN TESTS ** +SIZE= 2048 +0.0012 +0.0009 +0.467968 +0.190464 +0.130048 +0.050176 +0.105472 +0.043072 + +0.0041 +0.0022 +0.0063 +0.16384 +0.098304 +** SCAN TESTS ** +SIZE= 2048 +0.0014 +0.001 +0.294912 +0.289792 +0.123904 +0.084992 +0.1024 +0.054272 + +0.0052 +0.0028 +0.0079 +0.167936 +0.0768 +** SCAN TESTS ** +SIZE= 2048 +0.0013 +0.001 +0.387072 +0.29696 +0.177152 +0.686144 +0.113664 +0.04096 + +0.0044 +0.0021 +0.0056 +0.15872 +0.10752 +** SCAN TESTS ** +SIZE= 2048 +0.0013 +0.0008 +0.280576 +0.19456 +0.126048 +0.052224 +0.110592 +0.050176 + +0.0052 +0.0026 +0.0066 +0.290816 +0.080896 +** SCAN TESTS ** +SIZE= 2048 +0.0016 +0.001 +0.335872 +0.219136 +0.19968 +0.0512 +0.14336 +0.044032 + +0.0038 +0.002 +0.0049 +0.16896 +0.110592 +** SCAN TESTS ** +SIZE= 2048 +0.0014 +0.001 +0.333824 +0.212992 +0.151552 +0.065536 +0.1024 +0.04496 + +0.0042 +0.0022 +0.0054 +0.16384 +0.72704 +** SCAN TESTS ** +SIZE= 2048 +0.0011 +0.0009 +0.319488 +0.221184 +0.185344 +0.08192 +0.103424 +0.045056 + +0.0043 +0.0023 +0.0068 +0.192512 +0.073728 +** SCAN TESTS ** +SIZE= 2048 +0.0014 +0.001 +0.303104 +0.251904 +0.154624 +0.075776 +0.123904 +0.047104 + +0.0041 +0.002 +0.0052 +0.167936 +0.075776 +** SCAN TESTS ** +SIZE= 1024 +0.001 +0.0006 +0.357376 +0.274432 +0.193536 +0.088064 +0.118784 +0.04608 + +0.0019 +0.001 +0.0026 +0.193536 +0.105472 +** SCAN TESTS ** +SIZE= 1024 +0.0012 +0.0005 +0.30208 +0.241664 +0.178176 +0.08704 +0.128 +0.041984 + +0.0022 +0.0011 +0.0031 +0.141312 +0.0768 +** SCAN TESTS ** +SIZE= 1024 +0.0009 +0.0006 +0.406528 +0.272384 +0.152576 +0.084992 +0.123904 +0.055296 + +0.0028 +0.0013 +0.0035 +0.182272 +0.109568 +** SCAN TESTS ** +SIZE= 1024 +0.0009 +0.0005 +0.326656 +0.201728 +0.154624 +0.08192 +0.10656 +0.04816 + +0.0028 +0.0014 +0.0034 +0.220192 +0.074752 +** SCAN TESTS ** +SIZE= 1024 +0.0008 +0.0004 +0.447488 +0.259072 +0.160768 +0.06144 +0.131008 +0.04608 + +0.0021 +0.001 +0.0026 +0.216064 +0.074752 +** SCAN TESTS ** +SIZE= 1024 +0.0011 +0.0004 +0.35328 +0.201728 +0.177152 +0.08192 +0.125952 +0.041984 + +0.0023 +0.0011 +0.0029 +0.17408 +0.10752 +** SCAN TESTS ** +SIZE= 1024 +0.0015 +0.0005 +0.349184 +0.221184 +0.259072 +0.124928 +0.121792 +0.043008 + +0.0021 +0.0009 +0.0027 +0.165888 +0.198656 +** SCAN TESTS ** +SIZE= 1024 +0.0011 +0.0006 +0.305152 +0.146432 +0.1536 +0.088064 +0.123904 +0.029696 + +0.0021 +0.0011 +0.0028 +0.166912 +0.11264 +** SCAN TESTS ** +SIZE= 1024 +0.0008 +0.0006 +0.306176 +0.181248 +0.156672 +0.086016 +0.149504 +0.045056 + +0.0019 +0.001 +0.0028 +0.175104 +0.125952 +** SCAN TESTS ** +SIZE= 1024 +0.0008 +0.0005 +0.754688 +0.1792 +0.156704 +0.049184 +0.116736 +0.04608 + +0.0022 +0.0011 +0.0028 +0.195584 +0.109568 +** SCAN TESTS ** +SIZE= 1024 +0.001 +0.0006 +0.372736 +0.180224 +0.176128 +0.0512 +0.105472 +0.048128 + +0.0025 +0.0012 +0.0032 +0.176128 +0.126976 +** SCAN TESTS ** +SIZE= 1024 +0.001 +0.0006 +0.340992 +0.186368 +0.193536 +0.049152 +0.106496 +0.045056 + +0.0028 +0.0013 +0.0034 +0.192512 +0.141312 +** SCAN TESTS ** +SIZE= 1024 +0.0007 +0.0005 +0.361472 +0.229376 +0.160768 +0.048128 +0.139264 +0.045056 + +0.0022 +0.001 +0.0029 +0.160768 +0.073728 +** SCAN TESTS ** +SIZE= 512 +0.0006 +0.0003 +0.293888 +0.180224 +0.108544 +0.016384 +0.120832 +0.077824 + +0.0012 +0.0006 +0.0013 +0.134144 +0.0512 +** SCAN TESTS ** +SIZE= 512 +0.0007 +0.0003 +0.297984 +0.192512 +0.063488 +0.027648 +0.101376 +0.045056 + +0.0011 +0.0005 +0.0013 +0.113664 +0.082944 +** SCAN TESTS ** +SIZE= 512 +0.0007 +0.0004 +0.315392 +0.188416 +0.080896 +0.016384 +0.11776 +0.041984 + +0.0013 +0.0006 +0.0016 +0.113664 +0.093184 +** SCAN TESTS ** +SIZE= 512 +0.0006 +0.0003 +0.3072 +0.233472 +0.067584 +0.024576 +0.108544 +0.041984 + +0.0013 +0.0006 +0.0017 +0.156672 +0.08704 +** SCAN TESTS ** +SIZE= 512 +0.001 +0.0003 +0.325632 +0.23552 +0.099328 +0.017408 +0.124928 +0.047168 + +0.0012 +0.0006 +0.0015 +0.10752 +0.089088 +** SCAN TESTS ** +SIZE= 512 +0.0007 +0.0003 +0.27648 +0.253952 +0.099328 +0.033792 +0.152576 +0.043104 + +0.0015 +0.0007 +0.0019 +0.146432 +0.0512 +** SCAN TESTS ** +SIZE= 512 +0.0007 +0.0003 +1.02605 +0.19968 +0.090112 +0.017408 +0.126976 +0.045056 + +0.0013 +0.0005 +0.0015 +0.140288 +0.083968 +** SCAN TESTS ** +SIZE= 512 +0.0007 +0.0003 +0.32256 +0.200704 +0.059392 +0.016384 +0.142336 +0.04608 + +0.0015 +0.0006 +0.0019 +0.144384 +0.050176 +** SCAN TESTS ** +SIZE= 512 +0.0005 +0.0003 +0.26112 +0.226304 +0.075776 +0.016384 +0.119808 +0.045056 + +0.0013 +0.0005 +0.0016 +0.137216 +0.079872 +** SCAN TESTS ** +SIZE= 512 +0.0008 +0.0004 +0.351232 +0.232448 +0.069632 +0.023552 +0.114688 +0.047104 + +0.0013 +0.0006 +0.0016 +0.115712 +0.059392 +** SCAN TESTS ** +SIZE= 512 +0.0011 +0.0003 +0.268288 +0.217088 +0.098304 +0.023552 +0.128 +0.044032 + +0.0012 +0.0007 +0.0017 +0.106496 +0.050176 +** SCAN TESTS ** +SIZE= 512 +0.0006 +0.0002 +0.297984 +0.226304 +0.096256 +0.024576 +0.124928 +0.044032 + +0.0011 +0.0006 +0.0015 +0.167936 +0.048128 +** SCAN TESTS ** +SIZE= 256 +0.0005 +0.0002 +0.24576 +0.191488 +0.09216 +0.016384 +0.105472 +0.067584 + +0.0007 +0.0004 +0.0009 +0.144384 +0.08192 +** SCAN TESTS ** +SIZE= 256 +0.0004 +0.0002 +0.25088 +0.12288 +0.0768 +0.017408 +0.106496 +0.074752 + +0.0006 +0.0004 +0.0008 +0.14336 +0.050176 +** SCAN TESTS ** +SIZE= 256 +0.0008 +0.0002 +0.316416 +0.229376 +0.078848 +0.0256 +0.125952 +0.034816 + +0.0008 +0.0004 +0.0008 +0.142336 +0.083968 +** SCAN TESTS ** +SIZE= 256 +0.0004 +0.0001 +0.239616 +0.181248 +0.079872 +0.016384 +0.136192 +0.045056 + +0.0007 +0.0004 +0.0009 +0.139264 +0.082944 +** SCAN TESTS ** +SIZE= 256 +0.0004 +0.0002 +0.335872 +0.183296 +0.091136 +0.016384 +0.125952 +0.043008 + +0.0007 +0.0004 +0.0007 +0.195584 +0.065536 +** SCAN TESTS ** +SIZE= 256 +0.0005 +0.0003 +0.33792 +0.150528 +0.083968 +0.023552 +0.10752 +0.045056 + +0.0007 +0.0004 +0.0008 +0.107616 +0.050176 +** SCAN TESTS ** +SIZE= 256 +0.0004 +0.0001 +0.257024 +0.1536 +0.062464 +0.023552 +0.126976 +0.043008 + +0.0009 +0.0004 +0.0011 +0.166912 +0.058368 +** SCAN TESTS ** +SIZE= 256 +0.0006 +0.0002 +0.3072 +0.151552 +0.095232 +0.023552 +0.128 +0.026624 + +0.0006 +0.0004 +0.0008 +0.140288 +0.082944 +** SCAN TESTS ** +SIZE= 256 +0.0005 +0.0003 +0.333824 +0.175104 +0.093184 +0.016384 +0.121856 +0.045056 + +0.0007 +0.0004 +0.0007 +0.166912 +0.049152 +** SCAN TESTS ** +SIZE= 256 +0.0004 +0.0002 +0.241664 +0.186432 +0.063488 +0.016384 +0.129024 +0.077824 + +0.0007 +0.0004 +0.0008 +0.147456 +0.049152 +** SCAN TESTS ** +SIZE= 256 +0.0005 +0.0002 +0.252928 +0.201728 +0.12288 +0.016448 +0.149504 +0.044032 + +0.0006 +0.0004 +0.0008 +0.11776 +0.083968 +** SCAN TESTS ** +SIZE= 256 +0.0005 +0.0002 +0.238688 +0.14336 +0.100352 +0.017408 +0.135168 +0.039936 + +0.0007 +0.0004 +0.0009 +0.155648 +0.049152 +** SCAN TESTS ** +SIZE= 256 +0.0005 +0.0003 +0.26624 +0.209952 +0.094208 +0.016384 +0.105472 +0.04608 + +0.0007 +0.0004 +0.0008 +0.151552 +0.093184 +** SCAN TESTS ** +SIZE= 256 +0.0004 +0.0002 +0.231424 +0.18432 +0.077824 +0.016416 +0.108544 +0.045056 + +0.0007 +0.0004 +0.0009 +0.134144 +0.083968 +** SCAN TESTS ** +SIZE= 256 +0.0004 +0.0002 +0.246784 +0.216064 +0.095232 +0.018432 +0.13312 +0.055296 + +0.0008 +0.0004 +0.001 +0.113664 +0.049152 +** SCAN TESTS ** +SIZE= 1048576 +0.5552 +0.5048 +1.43517 +1.1567 +0.428256 +0.12288 +0.548864 +0.65024 + +1.7697 +1.7612 +3.2172 +0.527648 +0.292864 +** SCAN TESTS ** +SIZE= 2097152 +1.0637 +1.0598 +1.79869 +1.64061 +0.774624 +0.549888 +0.57856 +0.699392 + +3.3956 +3.5848 +6.3465 +1.32294 +1.38957 +** SCAN TESTS ** +SIZE= 2097152 +1.2786 +1.1046 +1.71523 +2.27069 +0.728864 +0.556032 +0.55296 +0.551936 + +3.534 +3.5509 +6.0547 +1.76403 +1.08339 +** SCAN TESTS ** +SIZE= 2097152 +1.0387 +1.0955 +1.73178 +2.14938 +0.654048 +0.579584 +0.534528 +0.526336 + +3.5031 +3.5363 +6.1338 +1.47238 +1.04346 +** SCAN TESTS ** +SIZE= 2097152 +1.3694 +1.0494 +1.81363 +1.59533 +1.10499 +0.509952 +0.583648 +0.676864 + +3.4984 +3.3388 +6.397 +1.29149 +1.06496 +** SCAN TESTS ** +SIZE= 2097152 +1.028 +0.9806 +1.79773 +1.59802 +0.725536 +0.507904 +0.566272 +0.748544 + +3.5767 +3.5661 +6.0808 +1.38314 +0.920576 +** SCAN TESTS ** +SIZE= 2097152 +1.1422 +1.0384 +2.77859 +1.58848 +0.738944 +0.504832 +0.528384 +0.698368 + +3.5595 +3.4713 +6.2921 +1.29184 +0.965632 +** SCAN TESTS ** +SIZE= 2097152 +1.0823 +1.0753 +2.00064 +1.59347 +0.93792 +0.497664 +0.538624 +1.16835 + +3.5402 +3.5109 +6.1128 +1.22554 +0.991232 +** SCAN TESTS ** +SIZE= 2097152 +1.1428 +1.0534 +1.80038 +1.57702 +1.27027 +0.52224 +0.57856 +0.622592 + +3.5185 +3.3396 +6.7386 +1.35978 +1.12333 +** SCAN TESTS ** +SIZE= 2097152 +0.9588 +0.9766 +1.78499 +2.28851 +0.662272 +0.650272 +0.643072 +0.562176 + +3.5533 +3.5475 +6.1091 +1.47894 +0.971776 +** SCAN TESTS ** +SIZE= 2097152 +1.0233 +0.9828 +2.16378 +1.69619 +0.733248 +0.596992 +0.548864 +0.559104 + +3.5766 +3.5256 +6.4474 +1.3208 +1.00352 +** SCAN TESTS ** +SIZE= 4194304 +2.18 +2.2165 +3.2249 +3.21805 +0.958336 +0.843776 +0.664576 +0.81408 + +7.2458 +7.1916 +12.6813 +1.58509 +1.31994 +** SCAN TESTS ** +SIZE= 4194304 +1.9014 +2.1376 +3.84598 +3.12982 +1.05837 +1.21754 +0.663552 +0.720896 + +7.1029 +7.1129 +16.594 +1.90944 +1.29434 +** SCAN TESTS ** +SIZE= 4194304 +2.0985 +2.1093 +3.19302 +3.23082 +1.52906 +0.804864 +0.673792 +0.6144 + +6.9448 +7.0111 +12.0462 +1.61491 +1.29126 +** SCAN TESTS ** +SIZE= 4194304 +2.046 +2.0965 +3.24842 +3.21526 +1.00698 +0.615424 +0.674816 +0.684032 + +6.86 +6.8863 +12.5549 +1.55798 +3.02592 +** SCAN TESTS ** +SIZE= 4194304 +2.2949 +2.2026 +3.16701 +3.74506 +0.973408 +0.591872 +0.638976 +0.617472 + +7.2153 +7.1366 +12.6805 +1.60208 +1.62918 +** SCAN TESTS ** +SIZE= 4194304 +2.4326 +1.9851 +3.23459 +3.22314 +1.11504 +0.85504 +0.657408 +1.07827 + +7.1045 +7.126 +12.3457 +1.57347 +1.2288 +** SCAN TESTS ** +SIZE= 4194304 +1.9245 +2.0409 +3.87283 +3.12003 +0.97856 +1.51757 +0.6912 +0.705536 + +6.9868 +7.057 +12.3065 +1.68016 +1.33222 +** SCAN TESTS ** +SIZE= 4194304 +2.0329 +2.3725 +3.29904 +3.73277 +0.882048 +0.758784 +0.75264 +0.8048 + +7.2324 +7.15 +12.6406 +1.60282 +1.28819 +** SCAN TESTS ** +SIZE= 4194304 +2.2467 +2.1563 +3.71978 +3.25091 +1.0224 +0.86528 +0.693248 +0.731136 + +7.0871 +6.7846 +13.0226 +1.77075 +1.31174 +** SCAN TESTS ** +SIZE= 4194304 +1.9668 +2.0215 +3.80122 +3.23866 +0.9808 +0.723968 +0.596992 +0.801792 + +7.4832 +7.1112 +12.6838 +1.99706 +1.28 +** SCAN TESTS ** +SIZE= 8388608 +4.1648 +4.3179 +6.95533 +6.41062 +1.2665 +0.978944 +0.881664 +0.887808 + +14.5249 +14.1581 +24.7806 +2.1585 +1.93437 +** SCAN TESTS ** +SIZE= 8388608 +4.0626 +4.1356 +6.8207 +6.13306 +1.39952 +0.828416 +0.7936 +0.923648 + +14.2262 +13.6921 +24.4571 +2.33824 +1.80736 +** SCAN TESTS ** +SIZE= 8388608 +3.8734 +4.1889 +6.42112 +6.72486 +1.51571 +1.07008 +1.02605 +0.935936 + +14.1611 +14.0538 +24.7208 +2.23578 +1.94662 +** SCAN TESTS ** +SIZE= 8388608 +3.895 +4.454 +7.00374 +6.92333 +1.72573 +1.01786 +0.949248 +0.948288 + +14.1115 +13.7728 +25.3407 +2.14582 +1.92 +** SCAN TESTS ** +SIZE= 8388608 +4.3684 +3.8767 +6.36963 +6.23763 +1.98704 +1.13664 +0.858112 +0.980992 + +14.0435 +14.336 +24.99 +2.25568 +1.9712 +** SCAN TESTS ** +SIZE= 8388608 +4.5514 +4.1218 +6.35981 +6.27021 +1.64166 +1.11002 +0.843776 +0.896 + +14.28 +14.097 +25.9096 +2.17619 +1.95072 +** SCAN TESTS ** +SIZE= 8388608 +4.0255 +4.1788 +6.96211 +6.14246 +1.6569 +1.00352 +1.08339 +0.902176 + +14.0148 +13.8375 +25.2176 +2.23869 +1.92307 +** SCAN TESTS ** +SIZE= 8388608 +4.5373 +4.2575 +7.18739 +6.75066 +1.49405 +1.07725 +0.8704 +1.27795 + +13.8218 +13.7692 +25.3661 +2.46467 +1.99578 +** SCAN TESTS ** +SIZE= 8388608 +4.1012 +4.176 +6.85757 +6.13942 +1.81488 +1.23187 +0.86016 +0.973824 + +13.9876 +13.8367 +25.9743 +2.13539 +1.87597 +** SCAN TESTS ** +SIZE= 8388608 +4.2186 +4.3114 +6.96698 +6.19072 +1.7815 +1.32608 +0.854016 +0.940032 + +13.6961 +13.8071 +24.1105 +2.30371 +1.95584 +** SCAN TESTS ** +SIZE= 16777216 +8.8192 +8.6034 +13.0046 +12.4661 +1.9721 +1.86061 +1.14483 +1.11514 + +27.9313 +27.4414 +50.0174 +3.39184 +3.04333 +** SCAN TESTS ** +SIZE= 16777216 +8.7636 +7.9199 +12.4361 +12.269 +2.05738 +1.71418 +1.08237 +1.10182 + +27.696 +28.2018 +51.2568 +3.49302 +2.94298 +** SCAN TESTS ** +SIZE= 16777216 +8.0591 +8.1771 +13.0196 +12.4272 +1.9687 +1.97018 +1.16838 +1.09875 + +28.2515 +28.9084 +49.4342 +3.47747 +3.17338 +** SCAN TESTS ** +SIZE= 16777216 +8.332 +8.3742 +13.007 +12.8768 +1.87334 +1.87597 +1.09978 +1.0793 + +28.3906 +27.9081 +52.1602 +3.37898 +3.36691 +** SCAN TESTS ** +SIZE= 16777216 +10.5132 +8.6552 +13.022 +12.322 +1.91459 +1.76435 +1.13869 +1.02605 + +27.9204 +27.608 +49.5117 +3.37341 +3.10477 +** SCAN TESTS ** +SIZE= 16777216 +8.1987 +8.6911 +13.3701 +12.4167 +2.07562 +1.84218 +1.16736 +1.16227 + +27.908 +27.1709 +50.3292 +3.3727 +3.21331 +** SCAN TESTS ** +SIZE= 16777216 +8.5866 +8.5128 +12.9977 +12.4143 +1.90534 +1.72646 +1.12333 +1.09978 + +28.0414 +27.45 +52.7391 +3.28806 +3.01466 +** SCAN TESTS ** +SIZE= 16777216 +8.1571 +8.2681 +12.8949 +12.4012 +1.90502 +1.72237 +1.08544 +1.08032 + +27.5374 +28.1366 +51.9063 +3.4713 +3.00858 +** SCAN TESTS ** +SIZE= 16777216 +8.5533 +8.4548 +13.0243 +12.3937 +1.87325 +1.82682 +1.05062 +1.07418 + +28.0757 +28.6645 +50.3038 +3.29414 +3.10682 +** SCAN TESTS ** +SIZE= 16777216 +8.3335 +8.0696 +14.0056 +12.3258 +2.14918 +1.89747 +1.08237 +1.08749 + +27.7936 +28.9048 +49.6402 +3.37325 +3.02387 +** SCAN TESTS ** +SIZE= 33554432 +17.1524 +15.4388 +25.4122 +25.2767 +3.368 +3.0464 +1.73869 +1.64352 + +57.015 +55.3494 +100.392 +5.86621 +5.53472 +** SCAN TESTS ** +SIZE= 33554432 +15.7645 +16.3241 +25.4285 +25.3756 +3.17162 +2.83853 +1.64352 +1.55034 + +55.9716 +56.3901 +100.645 +5.6831 +5.45997 +** SCAN TESTS ** +SIZE= 33554432 +17.6581 +18.0834 +25.3882 +25.7363 +3.02618 +2.77094 +1.65581 +1.6169 + +55.8455 +56.1403 +103.037 +5.58765 +5.44358 +** SCAN TESTS ** +SIZE= 33554432 +17.3758 +15.8788 +25.2473 +25.2294 +3.12957 +2.84058 +1.5872 +1.58208 + +56.0168 +55.9438 +101.631 +5.71117 +5.3248 +** SCAN TESTS ** +SIZE= 33554432 +16.7024 +17.1964 +25.4267 +25.2602 +2.93616 +2.95117 +1.61587 +1.60154 + +55.6078 +57.3164 +103.323 +5.97661 +5.48352 +** SCAN TESTS ** +SIZE= 33554432 +16.3218 +16.0544 +25.519 +25.2199 +3.09546 +2.81088 +1.61792 +1.69574 + +57.0773 +55.0733 +99.9008 +5.75434 +5.3719 +** SCAN TESTS ** +SIZE= 33554432 +16.338 +16.5429 +25.4168 +25.3821 +3.30931 +3.11091 +1.72749 +1.71917 + +56.7651 +54.625 +101.518 +6.03466 +5.4313 +** SCAN TESTS ** +SIZE= 33554432 +17.5497 +16.8521 +25.314 +25.3746 +3.28918 +3.06381 +1.65363 +1.73168 + +56.7143 +57.5416 +101.239 +5.62749 +5.3975 +** SCAN TESTS ** +SIZE= 33554432 +15.9771 +17.0679 +25.2897 +25.4526 +2.99638 +2.80986 +1.52269 +1.77152 + +55.4533 +56.7389 +102.083 +5.82 +5.36576 +** SCAN TESTS ** +SIZE= 33554432 +15.6764 +16.6751 +25.3104 +25.3611 +3.36125 +2.67878 +1.60051 +1.62102 + +57.3079 +55.0642 +103.722 +5.7879 +5.44461 +** SCAN TESTS ** +SIZE= 67108864 +32.9706 +35.5836 +53.0656 +53.5323 +5.12349 +4.9193 +2.74218 +2.75446 + +112.721 +110.711 +199.535 +10.3279 +10.1284 +** SCAN TESTS ** +SIZE= 67108864 +33.5569 +34.8141 +52.5207 +52.7478 +5.1296 +4.95107 +2.75661 +2.61939 + +113.256 +111.616 +201.185 +10.2797 +10.2431 +** SCAN TESTS ** +SIZE= 67108864 +31.8614 +33.4057 +52.6211 +53.2301 +4.82077 +5.04218 +2.57126 +3.23782 + +112.968 +110.388 +202.18 +10.4631 +10.4233 +** SCAN TESTS ** +SIZE= 67108864 +32.7121 +32.178 +53.204 +53.2202 +4.98829 +4.86912 +2.67162 +2.70336 + +112.433 +111.404 +199.558 +10.5103 +10.0844 +** SCAN TESTS ** +SIZE= 67108864 +33.219 +32.6333 +52.8677 +53.5472 +5.17571 +4.91213 +2.81805 +2.78938 + +112.719 +111.592 +200.691 +10.4745 +10.1069 +** SCAN TESTS ** +SIZE= 67108864 +32.5147 +32.7109 +52.6984 +52.8136 +5.25699 +4.9193 +2.55795 +2.75558 + +114.087 +111.741 +201.624 +10.2673 +10.3004 +** SCAN TESTS ** +SIZE= 67108864 +34.0245 +32.2048 +52.9994 +52.2628 +5.212 +5.33606 +2.64192 +2.65523 + +111.849 +113.071 +201.059 +10.4103 +10.0895 +** SCAN TESTS ** +SIZE= 67108864 +31.8586 +34.716 +52.758 +52.8854 +5.27904 +4.89584 +2.5856 +2.69722 + +112.197 +112.028 +205.128 +10.6945 +10.1253 +** SCAN TESTS ** +SIZE= 67108864 +33.727 +32.7404 +52.5358 +52.5147 +5.66061 +4.91315 +2.87334 +2.80371 + +110.564 +111.981 +201.118 +10.483 +9.96454 +** SCAN TESTS ** +SIZE= 67108864 +31.3847 +33.2231 +52.6487 +53.3386 +4.98733 +4.96435 +2.58765 +2.61939 + +113.122 +113.051 +202.496 +10.7417 +10.2738 +** SCAN TESTS ** +SIZE= 134217728 +65.7458 +66.5541 +109.714 +108.931 +9.90403 +9.3952 +4.66637 +4.61414 + +224.717 +226.441 +405.414 +19.2476 +18.6665 +** SCAN TESTS ** +SIZE= 134217728 +71.1686 +70.3108 +110.751 +109.079 +10.0701 +9.49658 +4.77798 +4.64486 + +223.765 +223.462 +400.433 +19.1076 +18.7228 +** SCAN TESTS ** +SIZE= 134217728 +63.701 +64.7987 +110.326 +109.048 +9.95459 +9.56422 +4.65818 +4.76262 + +222.645 +222.427 +400.019 +20.2342 +19.4038 +** SCAN TESTS ** +SIZE= 134217728 +64.6291 +72.2041 +110.321 +108.97 +9.69693 +9.44435 +4.67354 +5.21523 + +225.243 +223.965 +407.214 +20.1375 +19.543 +** SCAN TESTS ** +SIZE= 134217728 +66.9426 +67.8383 +110.252 +108.943 +10.3463 +9.47405 +4.67354 +4.864 + +225.264 +222.31 +405.457 +21.0745 +18.9583 +** SCAN TESTS ** +SIZE= 134217728 +71.2926 +64.0346 +110.549 +109.242 +10.0368 +9.54163 +4.64896 +4.71654 + +223.021 +220.288 +403.56 +20.3589 +19.2143 +** SCAN TESTS ** +SIZE= 134217728 +64.5018 +69.2349 +110.357 +108.873 +9.96083 +9.37677 +4.67251 +4.66739 + +227.221 +223.779 +405.394 +20.2519 +18.8559 +** SCAN TESTS ** +SIZE= 134217728 +66.2381 +65.3097 +110.37 +109.331 +9.97939 +9.4935 +4.67456 +4.76262 + +224.317 +228.967 +405.428 +19.0925 +19.115 +** SCAN TESTS ** +SIZE= 134217728 +64.5025 +65.5607 +110.815 +108.996 +9.51082 +8.98867 +4.6848 +4.91622 + +223.964 +222.718 +406.811 +19.3702 +18.9348 +** SCAN TESTS ** +SIZE= 134217728 +70.6062 +70.563 +110.376 +109.062 +10.1392 +9.3911 +5.01862 +4.74931 + +227.643 +226.933 +402.117 +20.1753 +19.0259 +** SCAN TESTS ** +SIZE= 268435456 +132.373 +133.35 +228.608 +227.778 +16.8142 +16.8602 +8.58112 +8.54323 + +451.766 +456.493 +804.737 +39.1622 +36.9736 +** SCAN TESTS ** +SIZE= 268435456 +130.853 +132.751 +230.517 +227.635 +17.5688 +17.0496 +8.2135 +9.22624 + +450.667 +452.019 +842.232 +37.3482 +37.1026 +** SCAN TESTS ** +SIZE= 268435456 +132.292 +138.979 +228.483 +227.452 +17.3796 +16.34 +8.576 +8.63744 + +448.704 +450.355 +803.179 +38.7403 +36.6735 +** SCAN TESTS ** +SIZE= 268435456 +132.747 +128.708 +228.25 +227.803 +17.5412 +16.6175 +8.52582 +8.62413 + +451.018 +450.695 +819.745 +37.7899 +37.3105 +** SCAN TESTS ** +SIZE= 268435456 +143.426 +153.884 +228.478 +227.438 +17.1444 +16.9697 +8.54426 +8.5975 + +453.483 +446.99 +801.387 +39.5788 +37.0872 +** SCAN TESTS ** +SIZE= 268435456 +136.296 +133.544 +228.354 +227.714 +17.319 +17.1448 +8.91802 +8.66714 + +447.684 +454.203 +804.796 +39.0621 +37.0831 +** SCAN TESTS ** +SIZE= 268435456 +139.047 +134.467 +228.831 +227.896 +17.4343 +17.1387 +8.64563 +8.76954 + +464.444 +453.871 +815.938 +39.8693 +36.8323 +** SCAN TESTS ** +SIZE= 268435456 +128.088 +133.992 +228.497 +227.595 +17.2821 +16.6349 +8.35482 +8.6057 + +449.984 +456.349 +831.611 +38.8659 +36.6664 +** SCAN TESTS ** +SIZE= 268435456 +137.352 +133.549 +228.095 +227.808 +17.6704 +17.0086 +8.76442 +9.07674 + +448.816 +458.047 +834.451 +39.5145 +36.1943 +** SCAN TESTS ** +SIZE= 268435456 +147.094 +134.541 +228.095 +227.397 +17.2193 +17.1213 +8.28723 +8.47053 + +452.239 +452.946 +821.22 +36.6826 +36.8712 +** SCAN TESTS ** +SIZE= 268435456 +131.345 +133.329 +228.696 +227.337 +17.0305 +16.7363 +8.45517 +8.69888 + +453.835 +445.657 +827.376 +39.1469 +36.5681 diff --git a/plots/data/timings_all_runs_long.csv b/plots/data/timings_all_runs_long.csv new file mode 100644 index 00000000..d908ca69 --- /dev/null +++ b/plots/data/timings_all_runs_long.csv @@ -0,0 +1,2731 @@ +size,run,suite,method,time_s,time_ms +1048576,1,scan,scan_cpu_pow2,0.4504,450.40000000000003 +1048576,1,scan,scan_cpu_non_pow2,0.4851,485.09999999999997 +1048576,1,scan,scan_naive_pow2,1.36806,1368.06 +1048576,1,scan,scan_naive_non_pow2,1.63558,1635.58 +1048576,1,scan,scan_work_efficient_pow2,0.2928,292.8 +1048576,1,scan,scan_work_efficient_non_pow2,0.119808,119.80799999999999 +1048576,1,scan,scan_thrust_pow2,0.591872,591.872 +1048576,1,scan,scan_thrust_non_pow2,0.871424,871.424 +1048576,1,compact,compact_cpu_without_scan_pow2,1.7626,1762.6 +1048576,1,compact,compact_cpu_without_scan_non_pow2,1.6515,1651.5 +1048576,1,compact,compact_cpu_with_scan,2.9795,2979.5 +1048576,1,compact,compact_work_efficient_pow2,0.485824,485.82399999999996 +1048576,1,compact,compact_work_efficient_non_pow2,0.269312,269.312 +1048576,2,scan,scan_cpu_pow2,0.5501,550.1 +1048576,2,scan,scan_cpu_non_pow2,0.4727,472.7 +1048576,2,scan,scan_naive_pow2,0.99328,993.2800000000001 +1048576,2,scan,scan_naive_non_pow2,0.896576,896.576 +1048576,2,scan,scan_work_efficient_pow2,0.991584,991.5840000000001 +1048576,2,scan,scan_work_efficient_non_pow2,0.15872,158.72 +1048576,2,scan,scan_thrust_pow2,0.504832,504.83199999999994 +1048576,2,scan,scan_thrust_non_pow2,0.50688,506.88 +1048576,2,compact,compact_cpu_without_scan_pow2,1.7609,1760.8999999999999 +1048576,2,compact,compact_cpu_without_scan_non_pow2,1.7537,1753.7 +1048576,2,compact,compact_cpu_with_scan,3.0031,3003.1 +1048576,2,compact,compact_work_efficient_pow2,0.57184,571.84 +1048576,2,compact,compact_work_efficient_non_pow2,0.28672,286.71999999999997 +1048576,3,scan,scan_cpu_pow2,0.4682,468.2 +1048576,3,scan,scan_cpu_non_pow2,0.5173,517.3 +1048576,3,scan,scan_naive_pow2,1.0007,1000.6999999999999 +1048576,3,scan,scan_naive_non_pow2,0.8968,896.8000000000001 +1048576,3,scan,scan_work_efficient_pow2,0.33104,331.04 +1048576,3,scan,scan_work_efficient_non_pow2,0.129024,129.024 +1048576,3,scan,scan_thrust_pow2,0.555008,555.0079999999999 +1048576,3,scan,scan_thrust_non_pow2,0.534528,534.528 +1048576,3,compact,compact_cpu_without_scan_pow2,1.788,1788.0 +1048576,3,compact,compact_cpu_without_scan_non_pow2,1.7597,1759.7 +1048576,3,compact,compact_cpu_with_scan,2.905,2905.0 +1048576,3,compact,compact_work_efficient_pow2,0.562208,562.2080000000001 +1048576,3,compact,compact_work_efficient_non_pow2,0.236544,236.544 +1048576,4,scan,scan_cpu_pow2,0.5049,504.90000000000003 +1048576,4,scan,scan_cpu_non_pow2,0.5013,501.29999999999995 +1048576,4,scan,scan_naive_pow2,1.05923,1059.2299999999998 +1048576,4,scan,scan_naive_non_pow2,1.55363,1553.63 +1048576,4,scan,scan_work_efficient_pow2,0.309632,309.632 +1048576,4,scan,scan_work_efficient_non_pow2,0.128,128.0 +1048576,4,scan,scan_thrust_pow2,0.5632,563.2 +1048576,4,scan,scan_thrust_non_pow2,1.18886,1188.8600000000001 +1048576,4,compact,compact_cpu_without_scan_pow2,1.7676,1767.6000000000001 +1048576,4,compact,compact_cpu_without_scan_non_pow2,1.7558,1755.8 +1048576,4,compact,compact_cpu_with_scan,3.0609,3060.9 +1048576,4,compact,compact_work_efficient_pow2,0.45408,454.08 +1048576,4,compact,compact_work_efficient_non_pow2,0.289792,289.792 +1048576,5,scan,scan_cpu_pow2,0.5039,503.90000000000003 +1048576,5,scan,scan_cpu_non_pow2,0.4544,454.40000000000003 +1048576,5,scan,scan_naive_pow2,1.04778,1047.78 +1048576,5,scan,scan_naive_non_pow2,1.02218,1022.1800000000001 +1048576,5,scan,scan_work_efficient_pow2,0.269248,269.248 +1048576,5,scan,scan_work_efficient_non_pow2,0.130048,130.048 +1048576,5,scan,scan_thrust_pow2,1.02502,1025.02 +1048576,5,scan,scan_thrust_non_pow2,1.03936,1039.3600000000001 +1048576,5,compact,compact_cpu_without_scan_pow2,1.7631,1763.1 +1048576,5,compact,compact_cpu_without_scan_non_pow2,1.767,1767.0 +1048576,5,compact,compact_cpu_with_scan,3.1677,3167.7 +1048576,5,compact,compact_work_efficient_pow2,0.541056,541.0559999999999 +1048576,5,compact,compact_work_efficient_non_pow2,0.498688,498.68800000000005 +1048576,6,scan,scan_cpu_pow2,0.5116,511.6000000000001 +1048576,6,scan,scan_cpu_non_pow2,0.5015,501.49999999999994 +1048576,6,scan,scan_naive_pow2,1.12358,1123.58 +1048576,6,scan,scan_naive_non_pow2,0.924448,924.4480000000001 +1048576,6,scan,scan_work_efficient_pow2,0.275488,275.488 +1048576,6,scan,scan_work_efficient_non_pow2,0.126976,126.976 +1048576,6,scan,scan_thrust_pow2,0.576512,576.5120000000001 +1048576,6,scan,scan_thrust_non_pow2,0.52736,527.36 +1048576,6,compact,compact_cpu_without_scan_pow2,1.7716,1771.6000000000001 +1048576,6,compact,compact_cpu_without_scan_non_pow2,1.7523,1752.3 +1048576,6,compact,compact_cpu_with_scan,2.8995,2899.5 +1048576,6,compact,compact_work_efficient_pow2,0.485312,485.312 +1048576,6,compact,compact_work_efficient_non_pow2,0.267264,267.264 +1048576,7,scan,scan_cpu_pow2,0.6063,606.3 +1048576,7,scan,scan_cpu_non_pow2,0.472,472.0 +1048576,7,scan,scan_naive_pow2,1.09014,1090.14 +1048576,7,scan,scan_naive_non_pow2,0.907424,907.424 +1048576,7,scan,scan_work_efficient_pow2,0.287744,287.744 +1048576,7,scan,scan_work_efficient_non_pow2,0.121856,121.85600000000001 +1048576,7,scan,scan_thrust_pow2,0.555008,555.0079999999999 +1048576,7,scan,scan_thrust_non_pow2,0.627712,627.7120000000001 +1048576,7,compact,compact_cpu_without_scan_pow2,1.7736,1773.6000000000001 +1048576,7,compact,compact_cpu_without_scan_non_pow2,1.7799,1779.9 +1048576,7,compact,compact_cpu_with_scan,2.9913,2991.2999999999997 +1048576,7,compact,compact_work_efficient_pow2,0.481152,481.15200000000004 +1048576,7,compact,compact_work_efficient_non_pow2,0.231424,231.42399999999998 +1048576,8,scan,scan_cpu_pow2,0.54,540.0 +1048576,8,scan,scan_cpu_non_pow2,0.4696,469.6 +1048576,8,scan,scan_naive_pow2,0.990528,990.528 +1048576,8,scan,scan_naive_non_pow2,0.90704,907.04 +1048576,8,scan,scan_work_efficient_pow2,0.49472,494.71999999999997 +1048576,8,scan,scan_work_efficient_non_pow2,0.130048,130.048 +1048576,8,scan,scan_thrust_pow2,0.782336,782.336 +1048576,8,scan,scan_thrust_non_pow2,0.572416,572.416 +1048576,8,compact,compact_cpu_without_scan_pow2,1.7738,1773.8 +1048576,8,compact,compact_cpu_without_scan_non_pow2,1.7718,1771.8 +1048576,8,compact,compact_cpu_with_scan,2.9814,2981.3999999999996 +1048576,8,compact,compact_work_efficient_pow2,0.683872,683.8720000000001 +1048576,8,compact,compact_work_efficient_non_pow2,0.221184,221.184 +1048576,9,scan,scan_cpu_pow2,0.4714,471.4 +1048576,9,scan,scan_cpu_non_pow2,0.4571,457.1 +1048576,9,scan,scan_naive_pow2,1.06557,1065.57 +1048576,9,scan,scan_naive_non_pow2,0.888192,888.192 +1048576,9,scan,scan_work_efficient_pow2,0.289152,289.15200000000004 +1048576,9,scan,scan_work_efficient_non_pow2,0.216064,216.064 +1048576,9,scan,scan_thrust_pow2,0.490496,490.496 +1048576,9,scan,scan_thrust_non_pow2,0.564224,564.2239999999999 +1048576,9,compact,compact_cpu_without_scan_pow2,1.772,1772.0 +1048576,9,compact,compact_cpu_without_scan_non_pow2,1.7727,1772.7 +1048576,9,compact,compact_cpu_with_scan,2.9239,2923.9 +1048576,9,compact,compact_work_efficient_pow2,0.821728,821.7280000000001 +1048576,9,compact,compact_work_efficient_non_pow2,0.882688,882.688 +1048576,10,scan,scan_cpu_pow2,0.6784,678.4 +1048576,10,scan,scan_cpu_non_pow2,0.473,473.0 +1048576,10,scan,scan_naive_pow2,1.01606,1016.06 +1048576,10,scan,scan_naive_non_pow2,0.959392,959.392 +1048576,10,scan,scan_work_efficient_pow2,0.44688,446.88 +1048576,10,scan,scan_work_efficient_non_pow2,0.119808,119.80799999999999 +1048576,10,scan,scan_thrust_pow2,0.566272,566.272 +1048576,10,scan,scan_thrust_non_pow2,0.483328,483.328 +1048576,10,compact,compact_cpu_without_scan_pow2,1.7899,1789.9 +1048576,10,compact,compact_cpu_without_scan_non_pow2,1.7777,1777.7 +1048576,10,compact,compact_cpu_with_scan,3.0413,3041.3 +1048576,10,compact,compact_work_efficient_pow2,1.20099,1200.99 +1048576,10,compact,compact_work_efficient_non_pow2,0.2816,281.6 +524288,1,scan,scan_cpu_pow2,0.2726,272.6 +524288,1,scan,scan_cpu_non_pow2,0.2448,244.79999999999998 +524288,1,scan,scan_naive_pow2,0.923296,923.296 +524288,1,scan,scan_naive_non_pow2,0.919456,919.456 +524288,1,scan,scan_work_efficient_pow2,0.319264,319.264 +524288,1,scan,scan_work_efficient_non_pow2,0.145408,145.40800000000002 +524288,1,scan,scan_thrust_pow2,0.570368,570.3679999999999 +524288,1,scan,scan_thrust_non_pow2,0.499712,499.712 +524288,1,compact,compact_cpu_without_scan_pow2,0.8899,889.9 +524288,1,compact,compact_cpu_without_scan_non_pow2,0.8792,879.1999999999999 +524288,1,compact,compact_cpu_with_scan,1.484,1484.0 +524288,1,compact,compact_work_efficient_pow2,0.366624,366.624 +524288,1,compact,compact_work_efficient_non_pow2,0.139264,139.264 +524288,2,scan,scan_cpu_pow2,0.2597,259.7 +524288,2,scan,scan_cpu_non_pow2,3.2616,3261.6 +524288,2,scan,scan_naive_pow2,1.64019,1640.19 +524288,2,scan,scan_naive_non_pow2,0.768576,768.576 +524288,2,scan,scan_work_efficient_pow2,0.218208,218.20800000000003 +524288,2,scan,scan_work_efficient_non_pow2,0.078848,78.848 +524288,2,scan,scan_thrust_pow2,0.5376,537.6 +524288,2,scan,scan_thrust_non_pow2,0.524288,524.288 +524288,2,compact,compact_cpu_without_scan_pow2,0.8938,893.8000000000001 +524288,2,compact,compact_cpu_without_scan_non_pow2,0.8795,879.5 +524288,2,compact,compact_cpu_with_scan,1.4681,1468.1 +524288,2,compact,compact_work_efficient_pow2,0.3992,399.2 +524288,2,compact,compact_work_efficient_non_pow2,0.1536,153.6 +524288,3,scan,scan_cpu_pow2,0.2345,234.5 +524288,3,scan,scan_cpu_non_pow2,0.2091,209.1 +524288,3,scan,scan_naive_pow2,0.795232,795.2320000000001 +524288,3,scan,scan_naive_non_pow2,0.656224,656.224 +524288,3,scan,scan_work_efficient_pow2,0.352256,352.25600000000003 +524288,3,scan,scan_work_efficient_non_pow2,0.079872,79.872 +524288,3,scan,scan_thrust_pow2,0.551936,551.936 +524288,3,scan,scan_thrust_non_pow2,0.507904,507.904 +524288,3,compact,compact_cpu_without_scan_pow2,0.9156,915.6 +524288,3,compact,compact_cpu_without_scan_non_pow2,0.8925,892.5 +524288,3,compact,compact_cpu_with_scan,1.5577,1557.7 +524288,3,compact,compact_work_efficient_pow2,0.403616,403.616 +524288,3,compact,compact_work_efficient_non_pow2,0.157696,157.696 +524288,4,scan,scan_cpu_pow2,0.2419,241.9 +524288,4,scan,scan_cpu_non_pow2,0.2087,208.7 +524288,4,scan,scan_naive_pow2,1.56384,1563.84 +524288,4,scan,scan_naive_non_pow2,0.628512,628.512 +524288,4,scan,scan_work_efficient_pow2,0.778464,778.464 +524288,4,scan,scan_work_efficient_non_pow2,0.1024,102.4 +524288,4,scan,scan_thrust_pow2,0.520192,520.192 +524288,4,scan,scan_thrust_non_pow2,0.49152,491.52000000000004 +524288,4,compact,compact_cpu_without_scan_pow2,0.8868,886.8000000000001 +524288,4,compact,compact_cpu_without_scan_non_pow2,0.8961,896.1 +524288,4,compact,compact_cpu_with_scan,1.4897,1489.7 +524288,4,compact,compact_work_efficient_pow2,0.357472,357.47200000000004 +524288,4,compact,compact_work_efficient_non_pow2,0.18944,189.44 +524288,5,scan,scan_cpu_pow2,0.334,334.0 +524288,5,scan,scan_cpu_non_pow2,0.2255,225.5 +524288,5,scan,scan_naive_pow2,0.72784,727.84 +524288,5,scan,scan_naive_non_pow2,0.688512,688.5120000000001 +524288,5,scan,scan_work_efficient_pow2,0.321056,321.056 +524288,5,scan,scan_work_efficient_non_pow2,0.079872,79.872 +524288,5,scan,scan_thrust_pow2,0.550912,550.9119999999999 +524288,5,scan,scan_thrust_non_pow2,0.482304,482.30400000000003 +524288,5,compact,compact_cpu_without_scan_pow2,0.8902,890.2 +524288,5,compact,compact_cpu_without_scan_non_pow2,0.8859,885.9 +524288,5,compact,compact_cpu_with_scan,1.5687,1568.7 +524288,5,compact,compact_work_efficient_pow2,0.439872,439.87199999999996 +524288,5,compact,compact_work_efficient_non_pow2,0.190464,190.464 +524288,6,scan,scan_cpu_pow2,0.3269,326.90000000000003 +524288,6,scan,scan_cpu_non_pow2,0.2151,215.10000000000002 +524288,6,scan,scan_naive_pow2,0.756384,756.3839999999999 +524288,6,scan,scan_naive_non_pow2,0.582912,582.912 +524288,6,scan,scan_work_efficient_pow2,0.224256,224.256 +524288,6,scan,scan_work_efficient_non_pow2,0.11776,117.76 +524288,6,scan,scan_thrust_pow2,0.618496,618.4960000000001 +524288,6,scan,scan_thrust_non_pow2,0.505856,505.856 +524288,6,compact,compact_cpu_without_scan_pow2,0.8836,883.6 +524288,6,compact,compact_cpu_without_scan_non_pow2,0.8815,881.5 +524288,6,compact,compact_cpu_with_scan,1.4793,1479.3 +524288,6,compact,compact_work_efficient_pow2,0.35952,359.52 +524288,6,compact,compact_work_efficient_non_pow2,0.205824,205.824 +524288,7,scan,scan_cpu_pow2,0.2721,272.1 +524288,7,scan,scan_cpu_non_pow2,0.334,334.0 +524288,7,scan,scan_naive_pow2,0.797824,797.824 +524288,7,scan,scan_naive_non_pow2,0.861888,861.888 +524288,7,scan,scan_work_efficient_pow2,0.263424,263.424 +524288,7,scan,scan_work_efficient_non_pow2,0.110592,110.592 +524288,7,scan,scan_thrust_pow2,1.24621,1246.21 +524288,7,scan,scan_thrust_non_pow2,0.480256,480.25600000000003 +524288,7,compact,compact_cpu_without_scan_pow2,0.8919,891.9 +524288,7,compact,compact_cpu_without_scan_non_pow2,0.8867,886.7 +524288,7,compact,compact_cpu_with_scan,1.5136,1513.6000000000001 +524288,7,compact,compact_work_efficient_pow2,0.351904,351.904 +524288,7,compact,compact_work_efficient_non_pow2,0.167936,167.936 +524288,8,scan,scan_cpu_pow2,0.2777,277.7 +524288,8,scan,scan_cpu_non_pow2,0.2814,281.4 +524288,8,scan,scan_naive_pow2,1.23696,1236.96 +524288,8,scan,scan_naive_non_pow2,0.777856,777.856 +524288,8,scan,scan_work_efficient_pow2,0.535456,535.456 +524288,8,scan,scan_work_efficient_non_pow2,0.080896,80.896 +524288,8,scan,scan_thrust_pow2,0.559104,559.104 +524288,8,scan,scan_thrust_non_pow2,0.521216,521.216 +524288,8,compact,compact_cpu_without_scan_pow2,0.8926,892.5999999999999 +524288,8,compact,compact_cpu_without_scan_non_pow2,0.8845,884.5 +524288,8,compact,compact_cpu_with_scan,1.4883,1488.3 +524288,8,compact,compact_work_efficient_pow2,0.362944,362.944 +524288,8,compact,compact_work_efficient_non_pow2,0.197632,197.632 +524288,9,scan,scan_cpu_pow2,0.2368,236.8 +524288,9,scan,scan_cpu_non_pow2,0.3119,311.90000000000003 +524288,9,scan,scan_naive_pow2,0.775712,775.712 +524288,9,scan,scan_naive_non_pow2,0.759072,759.072 +524288,9,scan,scan_work_efficient_pow2,0.58144,581.4399999999999 +524288,9,scan,scan_work_efficient_non_pow2,0.105472,105.472 +524288,9,scan,scan_thrust_pow2,0.560128,560.1279999999999 +524288,9,scan,scan_thrust_non_pow2,0.482304,482.30400000000003 +524288,9,compact,compact_cpu_without_scan_pow2,0.8946,894.5999999999999 +524288,9,compact,compact_cpu_without_scan_non_pow2,0.8832,883.1999999999999 +524288,9,compact,compact_cpu_with_scan,1.6546,1654.6000000000001 +524288,9,compact,compact_work_efficient_pow2,0.364192,364.192 +524288,9,compact,compact_work_efficient_non_pow2,0.150528,150.528 +524288,10,scan,scan_cpu_pow2,0.2817,281.7 +524288,10,scan,scan_cpu_non_pow2,0.316,316.0 +524288,10,scan,scan_naive_pow2,0.665056,665.0559999999999 +524288,10,scan,scan_naive_non_pow2,0.855264,855.264 +524288,10,scan,scan_work_efficient_pow2,0.207776,207.77599999999998 +524288,10,scan,scan_work_efficient_non_pow2,0.077824,77.824 +524288,10,scan,scan_thrust_pow2,0.574464,574.4639999999999 +524288,10,scan,scan_thrust_non_pow2,0.956416,956.416 +524288,10,compact,compact_cpu_without_scan_pow2,0.8979,897.9 +524288,10,compact,compact_cpu_without_scan_non_pow2,0.8796,879.6 +524288,10,compact,compact_cpu_with_scan,1.466,1466.0 +524288,10,compact,compact_work_efficient_pow2,0.436096,436.096 +524288,10,compact,compact_work_efficient_non_pow2,0.139264,139.264 +262144,1,scan,scan_cpu_pow2,0.1314,131.39999999999998 +262144,1,scan,scan_cpu_non_pow2,0.0997,99.7 +262144,1,scan,scan_naive_pow2,1.42598,1425.98 +262144,1,scan,scan_naive_non_pow2,1.02925,1029.25 +262144,1,scan,scan_work_efficient_pow2,0.202048,202.048 +262144,1,scan,scan_work_efficient_non_pow2,0.083968,83.968 +262144,1,scan,scan_thrust_pow2,0.536576,536.576 +262144,1,scan,scan_thrust_non_pow2,0.622592,622.592 +262144,1,compact,compact_cpu_without_scan_pow2,0.4531,453.1 +262144,1,compact,compact_cpu_without_scan_non_pow2,0.4483,448.29999999999995 +262144,1,compact,compact_cpu_with_scan,0.7781,778.1 +262144,1,compact,compact_work_efficient_pow2,0.221024,221.024 +262144,1,compact,compact_work_efficient_non_pow2,0.103424,103.424 +262144,2,scan,scan_cpu_pow2,0.1202,120.2 +262144,2,scan,scan_cpu_non_pow2,0.1002,100.2 +262144,2,scan,scan_naive_pow2,0.5288,528.8000000000001 +262144,2,scan,scan_naive_non_pow2,0.388416,388.416 +262144,2,scan,scan_work_efficient_pow2,0.202304,202.304 +262144,2,scan,scan_work_efficient_non_pow2,0.062464,62.464 +262144,2,scan,scan_thrust_pow2,0.549888,549.888 +262144,2,scan,scan_thrust_non_pow2,0.546816,546.8159999999999 +262144,2,compact,compact_cpu_without_scan_pow2,0.4517,451.7 +262144,2,compact,compact_cpu_without_scan_non_pow2,0.4558,455.79999999999995 +262144,2,compact,compact_cpu_with_scan,0.7405,740.5 +262144,2,compact,compact_work_efficient_pow2,0.349216,349.216 +262144,2,compact,compact_work_efficient_non_pow2,0.104448,104.448 +262144,3,scan,scan_cpu_pow2,0.128,128.0 +262144,3,scan,scan_cpu_non_pow2,0.1123,112.3 +262144,3,scan,scan_naive_pow2,0.486112,486.11199999999997 +262144,3,scan,scan_naive_non_pow2,0.406784,406.784 +262144,3,scan,scan_work_efficient_pow2,0.207776,207.77599999999998 +262144,3,scan,scan_work_efficient_non_pow2,0.074752,74.752 +262144,3,scan,scan_thrust_pow2,0.576512,576.5120000000001 +262144,3,scan,scan_thrust_non_pow2,0.585728,585.7280000000001 +262144,3,compact,compact_cpu_without_scan_pow2,0.4498,449.79999999999995 +262144,3,compact,compact_cpu_without_scan_non_pow2,0.4545,454.5 +262144,3,compact,compact_cpu_with_scan,0.7724,772.4 +262144,3,compact,compact_work_efficient_pow2,0.250848,250.848 +262144,3,compact,compact_work_efficient_non_pow2,0.113664,113.664 +262144,4,scan,scan_cpu_pow2,0.1185,118.5 +262144,4,scan,scan_cpu_non_pow2,0.124,124.0 +262144,4,scan,scan_naive_pow2,0.512032,512.032 +262144,4,scan,scan_naive_non_pow2,0.600384,600.384 +262144,4,scan,scan_work_efficient_pow2,0.266528,266.52799999999996 +262144,4,scan,scan_work_efficient_non_pow2,0.06144,61.440000000000005 +262144,4,scan,scan_thrust_pow2,0.605184,605.1840000000001 +262144,4,scan,scan_thrust_non_pow2,0.605184,605.1840000000001 +262144,4,compact,compact_cpu_without_scan_pow2,0.4484,448.40000000000003 +262144,4,compact,compact_cpu_without_scan_non_pow2,0.4363,436.3 +262144,4,compact,compact_cpu_with_scan,0.7507,750.7 +262144,4,compact,compact_work_efficient_pow2,0.309856,309.856 +262144,4,compact,compact_work_efficient_non_pow2,0.11776,117.76 +262144,5,scan,scan_cpu_pow2,0.1297,129.70000000000002 +262144,5,scan,scan_cpu_non_pow2,0.1247,124.7 +262144,5,scan,scan_naive_pow2,0.47056,470.56 +262144,5,scan,scan_naive_non_pow2,0.394112,394.112 +262144,5,scan,scan_work_efficient_pow2,0.197408,197.408 +262144,5,scan,scan_work_efficient_non_pow2,0.063488,63.488 +262144,5,scan,scan_thrust_pow2,0.514048,514.048 +262144,5,scan,scan_thrust_non_pow2,0.637952,637.952 +262144,5,compact,compact_cpu_without_scan_pow2,0.4533,453.29999999999995 +262144,5,compact,compact_cpu_without_scan_non_pow2,0.4422,442.2 +262144,5,compact,compact_cpu_with_scan,0.7578,757.8000000000001 +262144,5,compact,compact_work_efficient_pow2,0.294272,294.272 +262144,5,compact,compact_work_efficient_non_pow2,0.110592,110.592 +262144,6,scan,scan_cpu_pow2,0.1519,151.9 +262144,6,scan,scan_cpu_non_pow2,0.1236,123.60000000000001 +262144,6,scan,scan_naive_pow2,0.498464,498.464 +262144,6,scan,scan_naive_non_pow2,0.505952,505.95199999999994 +262144,6,scan,scan_work_efficient_pow2,0.202048,202.048 +262144,6,scan,scan_work_efficient_non_pow2,0.095232,95.232 +262144,6,scan,scan_thrust_pow2,0.470016,470.01599999999996 +262144,6,scan,scan_thrust_non_pow2,0.448512,448.512 +262144,6,compact,compact_cpu_without_scan_pow2,0.446,446.0 +262144,6,compact,compact_cpu_without_scan_non_pow2,0.4488,448.79999999999995 +262144,6,compact,compact_cpu_with_scan,0.7608,760.8000000000001 +262144,6,compact,compact_work_efficient_pow2,0.287456,287.456 +262144,6,compact,compact_work_efficient_non_pow2,0.101376,101.37599999999999 +262144,7,scan,scan_cpu_pow2,0.1315,131.5 +262144,7,scan,scan_cpu_non_pow2,0.1236,123.60000000000001 +262144,7,scan,scan_naive_pow2,0.448384,448.384 +262144,7,scan,scan_naive_non_pow2,0.430176,430.176 +262144,7,scan,scan_work_efficient_pow2,0.22336,223.36 +262144,7,scan,scan_work_efficient_non_pow2,0.063488,63.488 +262144,7,scan,scan_thrust_pow2,0.587776,587.776 +262144,7,scan,scan_thrust_non_pow2,0.466944,466.944 +262144,7,compact,compact_cpu_without_scan_pow2,0.4529,452.90000000000003 +262144,7,compact,compact_cpu_without_scan_non_pow2,0.4493,449.29999999999995 +262144,7,compact,compact_cpu_with_scan,0.7285,728.5 +262144,7,compact,compact_work_efficient_pow2,0.354048,354.048 +262144,7,compact,compact_work_efficient_non_pow2,0.101376,101.37599999999999 +262144,8,scan,scan_cpu_pow2,0.1422,142.2 +262144,8,scan,scan_cpu_non_pow2,0.126,126.0 +262144,8,scan,scan_naive_pow2,0.46608,466.08 +262144,8,scan,scan_naive_non_pow2,0.38432,384.32 +262144,8,scan,scan_work_efficient_pow2,0.199616,199.61599999999999 +262144,8,scan,scan_work_efficient_non_pow2,0.063488,63.488 +262144,8,scan,scan_thrust_pow2,0.598016,598.016 +262144,8,scan,scan_thrust_non_pow2,0.49152,491.52000000000004 +262144,8,compact,compact_cpu_without_scan_pow2,0.4483,448.29999999999995 +262144,8,compact,compact_cpu_without_scan_non_pow2,0.4374,437.40000000000003 +262144,8,compact,compact_cpu_with_scan,0.7773,777.3 +262144,8,compact,compact_work_efficient_pow2,0.300608,300.608 +262144,8,compact,compact_work_efficient_non_pow2,0.100352,100.352 +262144,9,scan,scan_cpu_pow2,0.1484,148.4 +262144,9,scan,scan_cpu_non_pow2,0.1519,151.9 +262144,9,scan,scan_naive_pow2,0.476,476.0 +262144,9,scan,scan_naive_non_pow2,0.413568,413.568 +262144,9,scan,scan_work_efficient_pow2,0.192672,192.672 +262144,9,scan,scan_work_efficient_non_pow2,0.074752,74.752 +262144,9,scan,scan_thrust_pow2,0.52224,522.24 +262144,9,scan,scan_thrust_non_pow2,0.497664,497.664 +262144,9,compact,compact_cpu_without_scan_pow2,0.4477,447.7 +262144,9,compact,compact_cpu_without_scan_non_pow2,0.4446,444.6 +262144,9,compact,compact_cpu_with_scan,0.7858,785.8000000000001 +262144,9,compact,compact_work_efficient_pow2,0.329856,329.856 +262144,9,compact,compact_work_efficient_non_pow2,0.101376,101.37599999999999 +262144,10,scan,scan_cpu_pow2,0.1363,136.3 +262144,10,scan,scan_cpu_non_pow2,0.1041,104.1 +262144,10,scan,scan_naive_pow2,0.54752,547.52 +262144,10,scan,scan_naive_non_pow2,0.397024,397.024 +262144,10,scan,scan_work_efficient_pow2,0.19744,197.44 +262144,10,scan,scan_work_efficient_non_pow2,0.063488,63.488 +262144,10,scan,scan_thrust_pow2,0.562176,562.176 +262144,10,scan,scan_thrust_non_pow2,0.454656,454.656 +262144,10,compact,compact_cpu_without_scan_pow2,0.4555,455.5 +262144,10,compact,compact_cpu_without_scan_non_pow2,0.462,462.0 +262144,10,compact,compact_cpu_with_scan,0.7988,798.8 +262144,10,compact,compact_work_efficient_pow2,0.277088,277.088 +262144,10,compact,compact_work_efficient_non_pow2,0.118784,118.784 +131072,1,scan,scan_cpu_pow2,0.0641,64.10000000000001 +131072,1,scan,scan_cpu_non_pow2,0.069,69.0 +131072,1,scan,scan_naive_pow2,0.423104,423.104 +131072,1,scan,scan_naive_non_pow2,0.610944,610.9440000000001 +131072,1,scan,scan_work_efficient_pow2,0.197248,197.24800000000002 +131072,1,scan,scan_work_efficient_non_pow2,0.056352,56.352 +131072,1,scan,scan_thrust_pow2,0.14336,143.35999999999999 +131072,1,scan,scan_thrust_non_pow2,0.047104,47.104 +131072,1,compact,compact_cpu_without_scan_pow2,0.2291,229.1 +131072,1,compact,compact_cpu_without_scan_non_pow2,0.2283,228.3 +131072,1,compact,compact_cpu_with_scan,0.3813,381.29999999999995 +131072,1,compact,compact_work_efficient_pow2,0.234144,234.14399999999998 +131072,1,compact,compact_work_efficient_non_pow2,0.099328,99.328 +131072,2,scan,scan_cpu_pow2,0.058,58.0 +131072,2,scan,scan_cpu_non_pow2,0.0485,48.5 +131072,2,scan,scan_naive_pow2,1.17872,1178.72 +131072,2,scan,scan_naive_non_pow2,0.313568,313.56800000000004 +131072,2,scan,scan_work_efficient_pow2,0.23776,237.76 +131072,2,scan,scan_work_efficient_non_pow2,0.068608,68.608 +131072,2,scan,scan_thrust_pow2,0.132096,132.096 +131072,2,scan,scan_thrust_non_pow2,0.045056,45.056 +131072,2,compact,compact_cpu_without_scan_pow2,0.2289,228.9 +131072,2,compact,compact_cpu_without_scan_non_pow2,0.2181,218.1 +131072,2,compact,compact_cpu_with_scan,0.3837,383.7 +131072,2,compact,compact_work_efficient_pow2,0.231712,231.712 +131072,2,compact,compact_work_efficient_non_pow2,0.162816,162.816 +131072,3,scan,scan_cpu_pow2,0.0716,71.6 +131072,3,scan,scan_cpu_non_pow2,0.0612,61.199999999999996 +131072,3,scan,scan_naive_pow2,0.571392,571.392 +131072,3,scan,scan_naive_non_pow2,0.338112,338.112 +131072,3,scan,scan_work_efficient_pow2,0.177792,177.792 +131072,3,scan,scan_work_efficient_non_pow2,0.057344,57.344 +131072,3,scan,scan_thrust_pow2,0.160768,160.768 +131072,3,scan,scan_thrust_non_pow2,0.048128,48.128 +131072,3,compact,compact_cpu_without_scan_pow2,0.226,226.0 +131072,3,compact,compact_cpu_without_scan_non_pow2,0.2171,217.1 +131072,3,compact,compact_cpu_with_scan,0.3918,391.79999999999995 +131072,3,compact,compact_work_efficient_pow2,0.23936,239.35999999999999 +131072,3,compact,compact_work_efficient_non_pow2,0.08704,87.04 +131072,4,scan,scan_cpu_pow2,0.0672,67.2 +131072,4,scan,scan_cpu_non_pow2,0.0507,50.7 +131072,4,scan,scan_naive_pow2,1.22704,1227.04 +131072,4,scan,scan_naive_non_pow2,0.41136,411.36 +131072,4,scan,scan_work_efficient_pow2,0.181312,181.312 +131072,4,scan,scan_work_efficient_non_pow2,0.105472,105.472 +131072,4,scan,scan_thrust_pow2,0.106496,106.496 +131072,4,scan,scan_thrust_non_pow2,0.045056,45.056 +131072,4,compact,compact_cpu_without_scan_pow2,0.2274,227.39999999999998 +131072,4,compact,compact_cpu_without_scan_non_pow2,0.2205,220.5 +131072,4,compact,compact_cpu_with_scan,0.3919,391.90000000000003 +131072,4,compact,compact_work_efficient_pow2,0.240768,240.768 +131072,4,compact,compact_work_efficient_non_pow2,0.130048,130.048 +131072,5,scan,scan_cpu_pow2,0.0628,62.8 +131072,5,scan,scan_cpu_non_pow2,0.0569,56.9 +131072,5,scan,scan_naive_pow2,0.508416,508.416 +131072,5,scan,scan_naive_non_pow2,0.386208,386.20799999999997 +131072,5,scan,scan_work_efficient_pow2,0.192512,192.512 +131072,5,scan,scan_work_efficient_non_pow2,0.057344,57.344 +131072,5,scan,scan_thrust_pow2,0.155648,155.648 +131072,5,scan,scan_thrust_non_pow2,0.047104,47.104 +131072,5,compact,compact_cpu_without_scan_pow2,0.2477,247.70000000000002 +131072,5,compact,compact_cpu_without_scan_non_pow2,0.2164,216.4 +131072,5,compact,compact_cpu_with_scan,0.4028,402.8 +131072,5,compact,compact_work_efficient_pow2,0.35344,353.44 +131072,5,compact,compact_work_efficient_non_pow2,0.08704,87.04 +131072,6,scan,scan_cpu_pow2,0.0652,65.19999999999999 +131072,6,scan,scan_cpu_non_pow2,0.0605,60.5 +131072,6,scan,scan_naive_pow2,0.52832,528.32 +131072,6,scan,scan_naive_non_pow2,0.337024,337.024 +131072,6,scan,scan_work_efficient_pow2,0.177568,177.568 +131072,6,scan,scan_work_efficient_non_pow2,0.067584,67.584 +131072,6,scan,scan_thrust_pow2,0.103424,103.424 +131072,6,scan,scan_thrust_non_pow2,0.046016,46.016 +131072,6,compact,compact_cpu_without_scan_pow2,0.2262,226.20000000000002 +131072,6,compact,compact_cpu_without_scan_non_pow2,0.2268,226.8 +131072,6,compact,compact_cpu_with_scan,0.3835,383.5 +131072,6,compact,compact_work_efficient_pow2,0.249344,249.34400000000002 +131072,6,compact,compact_work_efficient_non_pow2,0.28672,286.71999999999997 +131072,7,scan,scan_cpu_pow2,0.066,66.0 +131072,7,scan,scan_cpu_non_pow2,0.0593,59.3 +131072,7,scan,scan_naive_pow2,0.452096,452.096 +131072,7,scan,scan_naive_non_pow2,0.336096,336.096 +131072,7,scan,scan_work_efficient_pow2,0.233248,233.24800000000002 +131072,7,scan,scan_work_efficient_non_pow2,0.05744,57.44 +131072,7,scan,scan_thrust_pow2,0.110592,110.592 +131072,7,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +131072,7,compact,compact_cpu_without_scan_pow2,0.2286,228.6 +131072,7,compact,compact_cpu_without_scan_non_pow2,0.2323,232.3 +131072,7,compact,compact_cpu_with_scan,0.3869,386.90000000000003 +131072,7,compact,compact_work_efficient_pow2,0.223744,223.744 +131072,7,compact,compact_work_efficient_non_pow2,0.083968,83.968 +131072,8,scan,scan_cpu_pow2,0.0667,66.69999999999999 +131072,8,scan,scan_cpu_non_pow2,0.0484,48.4 +131072,8,scan,scan_naive_pow2,0.71808,718.08 +131072,8,scan,scan_naive_non_pow2,0.406528,406.528 +131072,8,scan,scan_work_efficient_pow2,0.228768,228.768 +131072,8,scan,scan_work_efficient_non_pow2,0.059392,59.392 +131072,8,scan,scan_thrust_pow2,0.105472,105.472 +131072,8,scan,scan_thrust_non_pow2,0.083968,83.968 +131072,8,compact,compact_cpu_without_scan_pow2,0.227,227.0 +131072,8,compact,compact_cpu_without_scan_non_pow2,0.2265,226.5 +131072,8,compact,compact_cpu_with_scan,0.3798,379.8 +131072,8,compact,compact_work_efficient_pow2,0.210304,210.304 +131072,8,compact,compact_work_efficient_non_pow2,0.088064,88.06400000000001 +131072,9,scan,scan_cpu_pow2,0.0598,59.8 +131072,9,scan,scan_cpu_non_pow2,0.0604,60.400000000000006 +131072,9,scan,scan_naive_pow2,0.476672,476.67199999999997 +131072,9,scan,scan_naive_non_pow2,0.367616,367.616 +131072,9,scan,scan_work_efficient_pow2,0.17536,175.35999999999999 +131072,9,scan,scan_work_efficient_non_pow2,0.05632,56.32 +131072,9,scan,scan_thrust_pow2,0.10752,107.52000000000001 +131072,9,scan,scan_thrust_non_pow2,0.050176,50.176 +131072,9,compact,compact_cpu_without_scan_pow2,0.2286,228.6 +131072,9,compact,compact_cpu_without_scan_non_pow2,0.224,224.0 +131072,9,compact,compact_cpu_with_scan,0.4248,424.8 +131072,9,compact,compact_work_efficient_pow2,0.263264,263.264 +131072,9,compact,compact_work_efficient_non_pow2,0.104448,104.448 +131072,10,scan,scan_cpu_pow2,0.061,61.0 +131072,10,scan,scan_cpu_non_pow2,0.0508,50.8 +131072,10,scan,scan_naive_pow2,0.626464,626.464 +131072,10,scan,scan_naive_non_pow2,0.346592,346.592 +131072,10,scan,scan_work_efficient_pow2,0.177152,177.15200000000002 +131072,10,scan,scan_work_efficient_non_pow2,0.05632,56.32 +131072,10,scan,scan_thrust_pow2,0.145408,145.40800000000002 +131072,10,scan,scan_thrust_non_pow2,0.047008,47.008 +131072,10,compact,compact_cpu_without_scan_pow2,0.2259,225.89999999999998 +131072,10,compact,compact_cpu_without_scan_non_pow2,0.2186,218.6 +131072,10,compact,compact_cpu_with_scan,0.3903,390.29999999999995 +131072,10,compact,compact_work_efficient_pow2,0.2552,255.2 +131072,10,compact,compact_work_efficient_non_pow2,0.109568,109.568 +65536,1,scan,scan_cpu_pow2,0.0298,29.8 +65536,1,scan,scan_cpu_non_pow2,0.0273,27.3 +65536,1,scan,scan_naive_pow2,0.50256,502.56 +65536,1,scan,scan_naive_non_pow2,0.40112,401.12 +65536,1,scan,scan_work_efficient_pow2,0.199424,199.42399999999998 +65536,1,scan,scan_work_efficient_non_pow2,0.053248,53.248 +65536,1,scan,scan_thrust_pow2,0.109568,109.568 +65536,1,scan,scan_thrust_non_pow2,0.048128,48.128 +65536,1,compact,compact_cpu_without_scan_pow2,0.1155,115.5 +65536,1,compact,compact_cpu_without_scan_non_pow2,0.1101,110.10000000000001 +65536,1,compact,compact_cpu_with_scan,0.2443,244.29999999999998 +65536,1,compact,compact_work_efficient_pow2,0.193216,193.216 +65536,1,compact,compact_work_efficient_non_pow2,0.269312,269.312 +65536,2,scan,scan_cpu_pow2,0.0349,34.9 +65536,2,scan,scan_cpu_non_pow2,0.034,34.0 +65536,2,scan,scan_naive_pow2,0.446688,446.688 +65536,2,scan,scan_naive_non_pow2,0.36672,366.71999999999997 +65536,2,scan,scan_work_efficient_pow2,0.176256,176.256 +65536,2,scan,scan_work_efficient_non_pow2,0.055296,55.296 +65536,2,scan,scan_thrust_pow2,0.128,128.0 +65536,2,scan,scan_thrust_non_pow2,0.083968,83.968 +65536,2,compact,compact_cpu_without_scan_pow2,0.115,115.0 +65536,2,compact,compact_cpu_without_scan_non_pow2,0.1158,115.8 +65536,2,compact,compact_cpu_with_scan,0.1965,196.5 +65536,2,compact,compact_work_efficient_pow2,0.228288,228.28799999999998 +65536,2,compact,compact_work_efficient_non_pow2,0.113664,113.664 +65536,3,scan,scan_cpu_pow2,0.0291,29.1 +65536,3,scan,scan_cpu_non_pow2,0.0269,26.9 +65536,3,scan,scan_naive_pow2,0.444896,444.896 +65536,3,scan,scan_naive_non_pow2,0.314976,314.976 +65536,3,scan,scan_work_efficient_pow2,0.22944,229.44 +65536,3,scan,scan_work_efficient_non_pow2,0.052224,52.224 +65536,3,scan,scan_thrust_pow2,0.13824,138.24 +65536,3,scan,scan_thrust_non_pow2,0.078848,78.848 +65536,3,compact,compact_cpu_without_scan_pow2,0.118,118.0 +65536,3,compact,compact_cpu_without_scan_non_pow2,0.1188,118.8 +65536,3,compact,compact_cpu_with_scan,0.2145,214.5 +65536,3,compact,compact_work_efficient_pow2,0.255872,255.87199999999999 +65536,3,compact,compact_work_efficient_non_pow2,0.088064,88.06400000000001 +65536,4,scan,scan_cpu_pow2,0.0413,41.300000000000004 +65536,4,scan,scan_cpu_non_pow2,0.0267,26.700000000000003 +65536,4,scan,scan_naive_pow2,0.467008,467.008 +65536,4,scan,scan_naive_non_pow2,0.281504,281.50399999999996 +65536,4,scan,scan_work_efficient_pow2,0.220832,220.832 +65536,4,scan,scan_work_efficient_non_pow2,0.053248,53.248 +65536,4,scan,scan_thrust_pow2,0.105472,105.472 +65536,4,scan,scan_thrust_non_pow2,0.050176,50.176 +65536,4,compact,compact_cpu_without_scan_pow2,0.1175,117.5 +65536,4,compact,compact_cpu_without_scan_non_pow2,0.1124,112.4 +65536,4,compact,compact_cpu_with_scan,0.2082,208.2 +65536,4,compact,compact_work_efficient_pow2,0.17616,176.16000000000003 +65536,4,compact,compact_work_efficient_non_pow2,0.099328,99.328 +65536,5,scan,scan_cpu_pow2,0.0282,28.2 +65536,5,scan,scan_cpu_non_pow2,0.0271,27.099999999999998 +65536,5,scan,scan_naive_pow2,0.43968,439.68 +65536,5,scan,scan_naive_non_pow2,0.331424,331.424 +65536,5,scan,scan_work_efficient_pow2,0.195552,195.552 +65536,5,scan,scan_work_efficient_non_pow2,0.055296,55.296 +65536,5,scan,scan_thrust_pow2,0.125952,125.95200000000001 +65536,5,scan,scan_thrust_non_pow2,0.079872,79.872 +65536,5,compact,compact_cpu_without_scan_pow2,0.1166,116.6 +65536,5,compact,compact_cpu_without_scan_non_pow2,0.1098,109.8 +65536,5,compact,compact_cpu_with_scan,0.2073,207.3 +65536,5,compact,compact_work_efficient_pow2,0.256384,256.384 +65536,5,compact,compact_work_efficient_non_pow2,0.082944,82.944 +65536,6,scan,scan_cpu_pow2,0.0357,35.7 +65536,6,scan,scan_cpu_non_pow2,0.0344,34.4 +65536,6,scan,scan_naive_pow2,0.572032,572.032 +65536,6,scan,scan_naive_non_pow2,0.274688,274.688 +65536,6,scan,scan_work_efficient_pow2,0.204416,204.416 +65536,6,scan,scan_work_efficient_non_pow2,0.053248,53.248 +65536,6,scan,scan_thrust_pow2,0.104448,104.448 +65536,6,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +65536,6,compact,compact_cpu_without_scan_pow2,0.115,115.0 +65536,6,compact,compact_cpu_without_scan_non_pow2,0.1097,109.7 +65536,6,compact,compact_cpu_with_scan,0.217,217.0 +65536,6,compact,compact_work_efficient_pow2,0.174432,174.43200000000002 +65536,6,compact,compact_work_efficient_non_pow2,0.114688,114.688 +65536,7,scan,scan_cpu_pow2,0.0352,35.2 +65536,7,scan,scan_cpu_non_pow2,0.0341,34.1 +65536,7,scan,scan_naive_pow2,0.571328,571.328 +65536,7,scan,scan_naive_non_pow2,0.437504,437.504 +65536,7,scan,scan_work_efficient_pow2,0.22624,226.24 +65536,7,scan,scan_work_efficient_non_pow2,0.08704,87.04 +65536,7,scan,scan_thrust_pow2,0.108544,108.544 +65536,7,scan,scan_thrust_non_pow2,0.049152,49.152 +65536,7,compact,compact_cpu_without_scan_pow2,0.1155,115.5 +65536,7,compact,compact_cpu_without_scan_non_pow2,0.1101,110.10000000000001 +65536,7,compact,compact_cpu_with_scan,0.2106,210.60000000000002 +65536,7,compact,compact_work_efficient_pow2,0.316192,316.19199999999995 +65536,7,compact,compact_work_efficient_non_pow2,0.079872,79.872 +65536,8,scan,scan_cpu_pow2,0.0299,29.9 +65536,8,scan,scan_cpu_non_pow2,0.0288,28.8 +65536,8,scan,scan_naive_pow2,0.464736,464.736 +65536,8,scan,scan_naive_non_pow2,0.297248,297.248 +65536,8,scan,scan_work_efficient_pow2,0.207424,207.424 +65536,8,scan,scan_work_efficient_non_pow2,0.052224,52.224 +65536,8,scan,scan_thrust_pow2,0.104448,104.448 +65536,8,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +65536,8,compact,compact_cpu_without_scan_pow2,0.113,113.0 +65536,8,compact,compact_cpu_without_scan_non_pow2,0.107,107.0 +65536,8,compact,compact_cpu_with_scan,0.2086,208.6 +65536,8,compact,compact_work_efficient_pow2,0.22704,227.04 +65536,8,compact,compact_work_efficient_non_pow2,0.121856,121.85600000000001 +65536,9,scan,scan_cpu_pow2,0.0312,31.2 +65536,9,scan,scan_cpu_non_pow2,0.028,28.0 +65536,9,scan,scan_naive_pow2,0.467232,467.23199999999997 +65536,9,scan,scan_naive_non_pow2,0.448608,448.608 +65536,9,scan,scan_work_efficient_pow2,0.267168,267.168 +65536,9,scan,scan_work_efficient_non_pow2,0.053312,53.312 +65536,9,scan,scan_thrust_pow2,0.14848,148.48 +65536,9,scan,scan_thrust_non_pow2,0.048128,48.128 +65536,9,compact,compact_cpu_without_scan_pow2,0.115,115.0 +65536,9,compact,compact_cpu_without_scan_non_pow2,0.1095,109.5 +65536,9,compact,compact_cpu_with_scan,0.1975,197.5 +65536,9,compact,compact_work_efficient_pow2,0.218912,218.912 +65536,9,compact,compact_work_efficient_non_pow2,0.079872,79.872 +65536,10,scan,scan_cpu_pow2,0.0291,29.1 +65536,10,scan,scan_cpu_non_pow2,0.0279,27.900000000000002 +65536,10,scan,scan_naive_pow2,1.16371,1163.71 +65536,10,scan,scan_naive_non_pow2,0.336096,336.096 +65536,10,scan,scan_work_efficient_pow2,0.218432,218.432 +65536,10,scan,scan_work_efficient_non_pow2,0.052224,52.224 +65536,10,scan,scan_thrust_pow2,0.126976,126.976 +65536,10,scan,scan_thrust_non_pow2,0.048128,48.128 +65536,10,compact,compact_cpu_without_scan_pow2,0.1459,145.9 +65536,10,compact,compact_cpu_without_scan_non_pow2,0.1438,143.8 +65536,10,compact,compact_cpu_with_scan,0.2517,251.7 +65536,10,compact,compact_work_efficient_pow2,0.889952,889.952 +65536,10,compact,compact_work_efficient_non_pow2,0.079872,79.872 +32768,1,scan,scan_cpu_pow2,0.017,17.0 +32768,1,scan,scan_cpu_non_pow2,0.016,16.0 +32768,1,scan,scan_naive_pow2,0.399936,399.93600000000004 +32768,1,scan,scan_naive_non_pow2,0.355616,355.616 +32768,1,scan,scan_work_efficient_pow2,0.179584,179.584 +32768,1,scan,scan_work_efficient_non_pow2,0.08704,87.04 +32768,1,scan,scan_thrust_pow2,0.10752,107.52000000000001 +32768,1,scan,scan_thrust_non_pow2,0.044032,44.032000000000004 +32768,1,compact,compact_cpu_without_scan_pow2,0.0643,64.3 +32768,1,compact,compact_cpu_without_scan_non_pow2,0.0539,53.900000000000006 +32768,1,compact,compact_cpu_with_scan,0.1181,118.1 +32768,1,compact,compact_work_efficient_pow2,0.21072,210.72 +32768,1,compact,compact_work_efficient_non_pow2,0.079872,79.872 +32768,2,scan,scan_cpu_pow2,0.0151,15.100000000000001 +32768,2,scan,scan_cpu_non_pow2,0.0139,13.899999999999999 +32768,2,scan,scan_naive_pow2,0.393152,393.152 +32768,2,scan,scan_naive_non_pow2,0.447936,447.936 +32768,2,scan,scan_work_efficient_pow2,0.176512,176.512 +32768,2,scan,scan_work_efficient_non_pow2,0.0512,51.2 +32768,2,scan,scan_thrust_pow2,0.141312,141.31199999999998 +32768,2,scan,scan_thrust_non_pow2,0.029696,29.696 +32768,2,compact,compact_cpu_without_scan_pow2,0.0583,58.3 +32768,2,compact,compact_cpu_without_scan_non_pow2,0.0581,58.1 +32768,2,compact,compact_cpu_with_scan,0.1111,111.10000000000001 +32768,2,compact,compact_work_efficient_pow2,0.227392,227.39200000000002 +32768,2,compact,compact_work_efficient_non_pow2,0.077824,77.824 +32768,3,scan,scan_cpu_pow2,0.0139,13.899999999999999 +32768,3,scan,scan_cpu_non_pow2,0.0132,13.2 +32768,3,scan,scan_naive_pow2,0.518368,518.368 +32768,3,scan,scan_naive_non_pow2,0.3568,356.8 +32768,3,scan,scan_work_efficient_pow2,0.203072,203.072 +32768,3,scan,scan_work_efficient_non_pow2,0.082944,82.944 +32768,3,scan,scan_thrust_pow2,0.123872,123.872 +32768,3,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +32768,3,compact,compact_cpu_without_scan_pow2,0.066,66.0 +32768,3,compact,compact_cpu_without_scan_non_pow2,0.0546,54.6 +32768,3,compact,compact_cpu_with_scan,0.1048,104.80000000000001 +32768,3,compact,compact_work_efficient_pow2,0.208608,208.60799999999998 +32768,3,compact,compact_work_efficient_non_pow2,0.078848,78.848 +32768,4,scan,scan_cpu_pow2,0.0167,16.7 +32768,4,scan,scan_cpu_non_pow2,0.016,16.0 +32768,4,scan,scan_naive_pow2,0.4968,496.8 +32768,4,scan,scan_naive_non_pow2,0.338592,338.592 +32768,4,scan,scan_work_efficient_pow2,0.21152,211.52 +32768,4,scan,scan_work_efficient_non_pow2,0.084992,84.992 +32768,4,scan,scan_thrust_pow2,0.144384,144.38400000000001 +32768,4,scan,scan_thrust_non_pow2,0.03072,30.720000000000002 +32768,4,compact,compact_cpu_without_scan_pow2,0.0595,59.5 +32768,4,compact,compact_cpu_without_scan_non_pow2,0.0586,58.6 +32768,4,compact,compact_cpu_with_scan,0.1106,110.60000000000001 +32768,4,compact,compact_work_efficient_pow2,0.257408,257.408 +32768,4,compact,compact_work_efficient_non_pow2,0.110592,110.592 +32768,5,scan,scan_cpu_pow2,0.0206,20.6 +32768,5,scan,scan_cpu_non_pow2,0.0129,12.9 +32768,5,scan,scan_naive_pow2,0.474816,474.81600000000003 +32768,5,scan,scan_naive_non_pow2,0.27232,272.32 +32768,5,scan,scan_work_efficient_pow2,0.203456,203.456 +32768,5,scan,scan_work_efficient_non_pow2,0.0512,51.2 +32768,5,scan,scan_thrust_pow2,0.124928,124.928 +32768,5,scan,scan_thrust_non_pow2,0.0768,76.8 +32768,5,compact,compact_cpu_without_scan_pow2,0.0585,58.5 +32768,5,compact,compact_cpu_without_scan_non_pow2,0.0582,58.2 +32768,5,compact,compact_cpu_with_scan,0.1071,107.1 +32768,5,compact,compact_work_efficient_pow2,0.255168,255.168 +32768,5,compact,compact_work_efficient_non_pow2,0.146432,146.43200000000002 +32768,6,scan,scan_cpu_pow2,0.0141,14.1 +32768,6,scan,scan_cpu_non_pow2,0.0131,13.100000000000001 +32768,6,scan,scan_naive_pow2,0.488864,488.86400000000003 +32768,6,scan,scan_naive_non_pow2,0.31312,313.12 +32768,6,scan,scan_work_efficient_pow2,0.17968,179.68 +32768,6,scan,scan_work_efficient_non_pow2,0.052256,52.256 +32768,6,scan,scan_thrust_pow2,0.105472,105.472 +32768,6,scan,scan_thrust_non_pow2,0.044032,44.032000000000004 +32768,6,compact,compact_cpu_without_scan_pow2,0.0581,58.1 +32768,6,compact,compact_cpu_without_scan_non_pow2,0.0544,54.4 +32768,6,compact,compact_cpu_with_scan,0.1183,118.3 +32768,6,compact,compact_work_efficient_pow2,0.18912,189.12 +32768,6,compact,compact_work_efficient_non_pow2,0.108544,108.544 +32768,7,scan,scan_cpu_pow2,0.0173,17.3 +32768,7,scan,scan_cpu_non_pow2,0.0166,16.6 +32768,7,scan,scan_naive_pow2,0.491584,491.584 +32768,7,scan,scan_naive_non_pow2,0.354208,354.208 +32768,7,scan,scan_work_efficient_pow2,0.213728,213.728 +32768,7,scan,scan_work_efficient_non_pow2,0.06144,61.440000000000005 +32768,7,scan,scan_thrust_pow2,0.110592,110.592 +32768,7,scan,scan_thrust_non_pow2,0.059392,59.392 +32768,7,compact,compact_cpu_without_scan_pow2,0.0707,70.7 +32768,7,compact,compact_cpu_without_scan_non_pow2,0.0709,70.9 +32768,7,compact,compact_cpu_with_scan,0.1106,110.60000000000001 +32768,7,compact,compact_work_efficient_pow2,0.221376,221.37599999999998 +32768,7,compact,compact_work_efficient_non_pow2,0.101376,101.37599999999999 +32768,8,scan,scan_cpu_pow2,0.0144,14.4 +32768,8,scan,scan_cpu_non_pow2,0.0131,13.100000000000001 +32768,8,scan,scan_naive_pow2,0.482304,482.30400000000003 +32768,8,scan,scan_naive_non_pow2,0.740416,740.4159999999999 +32768,8,scan,scan_work_efficient_pow2,0.220768,220.768 +32768,8,scan,scan_work_efficient_non_pow2,0.053248,53.248 +32768,8,scan,scan_thrust_pow2,0.118784,118.784 +32768,8,scan,scan_thrust_non_pow2,0.050112,50.111999999999995 +32768,8,compact,compact_cpu_without_scan_pow2,0.0594,59.4 +32768,8,compact,compact_cpu_without_scan_non_pow2,0.0588,58.8 +32768,8,compact,compact_cpu_with_scan,0.1067,106.7 +32768,8,compact,compact_work_efficient_pow2,0.209408,209.40800000000002 +32768,8,compact,compact_work_efficient_non_pow2,0.078848,78.848 +32768,9,scan,scan_cpu_pow2,0.0136,13.6 +32768,9,scan,scan_cpu_non_pow2,0.0119,11.9 +32768,9,scan,scan_naive_pow2,0.38848,388.48 +32768,9,scan,scan_naive_non_pow2,0.322016,322.016 +32768,9,scan,scan_work_efficient_pow2,0.191776,191.776 +32768,9,scan,scan_work_efficient_non_pow2,0.078848,78.848 +32768,9,scan,scan_thrust_pow2,0.146432,146.43200000000002 +32768,9,scan,scan_thrust_non_pow2,0.044032,44.032000000000004 +32768,9,compact,compact_cpu_without_scan_pow2,0.0654,65.4 +32768,9,compact,compact_cpu_without_scan_non_pow2,0.055,55.0 +32768,9,compact,compact_cpu_with_scan,0.1108,110.8 +32768,9,compact,compact_work_efficient_pow2,0.208832,208.832 +32768,9,compact,compact_work_efficient_non_pow2,0.180224,180.224 +32768,10,scan,scan_cpu_pow2,0.0167,16.7 +32768,10,scan,scan_cpu_non_pow2,0.0159,15.9 +32768,10,scan,scan_naive_pow2,0.396736,396.736 +32768,10,scan,scan_naive_non_pow2,0.366464,366.464 +32768,10,scan,scan_work_efficient_pow2,0.218144,218.144 +32768,10,scan,scan_work_efficient_non_pow2,0.063488,63.488 +32768,10,scan,scan_thrust_pow2,0.146432,146.43200000000002 +32768,10,scan,scan_thrust_non_pow2,0.04096,40.96 +32768,10,compact,compact_cpu_without_scan_pow2,0.0724,72.4 +32768,10,compact,compact_cpu_without_scan_non_pow2,0.067,67.0 +32768,10,compact,compact_cpu_with_scan,0.1376,137.6 +32768,10,compact,compact_work_efficient_pow2,0.260352,260.352 +32768,10,compact,compact_work_efficient_non_pow2,0.074752,74.752 +16384,1,scan,scan_cpu_pow2,0.0082,8.200000000000001 +16384,1,scan,scan_cpu_non_pow2,0.0076,7.6 +16384,1,scan,scan_naive_pow2,0.385024,385.024 +16384,1,scan,scan_naive_non_pow2,0.301056,301.056 +16384,1,scan,scan_work_efficient_pow2,0.171008,171.00799999999998 +16384,1,scan,scan_work_efficient_non_pow2,0.072736,72.73599999999999 +16384,1,scan,scan_thrust_pow2,0.108544,108.544 +16384,1,scan,scan_thrust_non_pow2,0.05632,56.32 +16384,1,compact,compact_cpu_without_scan_pow2,0.0299,29.9 +16384,1,compact,compact_cpu_without_scan_non_pow2,0.0262,26.200000000000003 +16384,1,compact,compact_cpu_with_scan,0.0611,61.1 +16384,1,compact,compact_work_efficient_pow2,0.182272,182.272 +16384,1,compact,compact_work_efficient_non_pow2,0.142336,142.33599999999998 +16384,2,scan,scan_cpu_pow2,0.0073,7.3 +16384,2,scan,scan_cpu_non_pow2,0.0069,6.8999999999999995 +16384,2,scan,scan_naive_pow2,0.425984,425.984 +16384,2,scan,scan_naive_non_pow2,0.314368,314.368 +16384,2,scan,scan_work_efficient_pow2,0.187392,187.392 +16384,2,scan,scan_work_efficient_non_pow2,0.052256,52.256 +16384,2,scan,scan_thrust_pow2,0.10752,107.52000000000001 +16384,2,scan,scan_thrust_non_pow2,0.050176,50.176 +16384,2,compact,compact_cpu_without_scan_pow2,0.0377,37.699999999999996 +16384,2,compact,compact_cpu_without_scan_non_pow2,0.0319,31.9 +16384,2,compact,compact_cpu_with_scan,0.0798,79.8 +16384,2,compact,compact_work_efficient_pow2,0.173056,173.05599999999998 +16384,2,compact,compact_work_efficient_non_pow2,0.091136,91.136 +16384,3,scan,scan_cpu_pow2,0.0071,7.1000000000000005 +16384,3,scan,scan_cpu_non_pow2,0.007,7.0 +16384,3,scan,scan_naive_pow2,0.976896,976.896 +16384,3,scan,scan_naive_non_pow2,0.270336,270.336 +16384,3,scan,scan_work_efficient_pow2,0.748544,748.544 +16384,3,scan,scan_work_efficient_non_pow2,0.095232,95.232 +16384,3,scan,scan_thrust_pow2,0.10752,107.52000000000001 +16384,3,scan,scan_thrust_non_pow2,0.047104,47.104 +16384,3,compact,compact_cpu_without_scan_pow2,0.0301,30.099999999999998 +16384,3,compact,compact_cpu_without_scan_non_pow2,0.0295,29.5 +16384,3,compact,compact_cpu_with_scan,0.0595,59.5 +16384,3,compact,compact_work_efficient_pow2,0.168,168.0 +16384,3,compact,compact_work_efficient_non_pow2,0.144384,144.38400000000001 +16384,4,scan,scan_cpu_pow2,0.0081,8.1 +16384,4,scan,scan_cpu_non_pow2,0.0076,7.6 +16384,4,scan,scan_naive_pow2,0.369664,369.664 +16384,4,scan,scan_naive_non_pow2,0.251904,251.90400000000002 +16384,4,scan,scan_work_efficient_pow2,0.158752,158.752 +16384,4,scan,scan_work_efficient_non_pow2,0.0512,51.2 +16384,4,scan,scan_thrust_pow2,0.098304,98.304 +16384,4,scan,scan_thrust_non_pow2,0.0512,51.2 +16384,4,compact,compact_cpu_without_scan_pow2,0.0377,37.699999999999996 +16384,4,compact,compact_cpu_without_scan_non_pow2,0.0349,34.9 +16384,4,compact,compact_cpu_with_scan,0.0776,77.60000000000001 +16384,4,compact,compact_work_efficient_pow2,0.165888,165.888 +16384,4,compact,compact_work_efficient_non_pow2,0.113664,113.664 +16384,5,scan,scan_cpu_pow2,0.0079,7.9 +16384,5,scan,scan_cpu_non_pow2,0.0061,6.1000000000000005 +16384,5,scan,scan_naive_pow2,1.00147,1001.47 +16384,5,scan,scan_naive_non_pow2,0.297984,297.98400000000004 +16384,5,scan,scan_work_efficient_pow2,0.151552,151.552 +16384,5,scan,scan_work_efficient_non_pow2,0.086016,86.01599999999999 +16384,5,scan,scan_thrust_pow2,0.108544,108.544 +16384,5,scan,scan_thrust_non_pow2,0.082944,82.944 +16384,5,compact,compact_cpu_without_scan_pow2,0.0302,30.200000000000003 +16384,5,compact,compact_cpu_without_scan_non_pow2,0.0263,26.3 +16384,5,compact,compact_cpu_with_scan,0.1034,103.4 +16384,5,compact,compact_work_efficient_pow2,0.141312,141.31199999999998 +16384,5,compact,compact_work_efficient_non_pow2,0.136192,136.192 +16384,6,scan,scan_cpu_pow2,0.0083,8.3 +16384,6,scan,scan_cpu_non_pow2,0.0074,7.4 +16384,6,scan,scan_naive_pow2,0.386048,386.048 +16384,6,scan,scan_naive_non_pow2,0.280576,280.57599999999996 +16384,6,scan,scan_work_efficient_pow2,0.124928,124.928 +16384,6,scan,scan_work_efficient_non_pow2,0.0512,51.2 +16384,6,scan,scan_thrust_pow2,0.13312,133.11999999999998 +16384,6,scan,scan_thrust_non_pow2,0.050176,50.176 +16384,6,compact,compact_cpu_without_scan_pow2,0.0418,41.8 +16384,6,compact,compact_cpu_without_scan_non_pow2,0.0351,35.1 +16384,6,compact,compact_cpu_with_scan,0.0744,74.39999999999999 +16384,6,compact,compact_work_efficient_pow2,0.164864,164.864 +16384,6,compact,compact_work_efficient_non_pow2,0.125952,125.95200000000001 +16384,7,scan,scan_cpu_pow2,0.0079,7.9 +16384,7,scan,scan_cpu_non_pow2,0.0076,7.6 +16384,7,scan,scan_naive_pow2,0.413696,413.696 +16384,7,scan,scan_naive_non_pow2,0.251008,251.008 +16384,7,scan,scan_work_efficient_pow2,0.178176,178.176 +16384,7,scan,scan_work_efficient_non_pow2,0.0512,51.2 +16384,7,scan,scan_thrust_pow2,0.150528,150.528 +16384,7,scan,scan_thrust_non_pow2,0.108544,108.544 +16384,7,compact,compact_cpu_without_scan_pow2,0.0309,30.900000000000002 +16384,7,compact,compact_cpu_without_scan_non_pow2,0.0277,27.7 +16384,7,compact,compact_cpu_with_scan,0.0567,56.7 +16384,7,compact,compact_work_efficient_pow2,0.171008,171.00799999999998 +16384,7,compact,compact_work_efficient_non_pow2,0.096256,96.256 +16384,8,scan,scan_cpu_pow2,0.0066,6.6 +16384,8,scan,scan_cpu_non_pow2,0.0061,6.1000000000000005 +16384,8,scan,scan_naive_pow2,0.473088,473.088 +16384,8,scan,scan_naive_non_pow2,0.351232,351.23199999999997 +16384,8,scan,scan_work_efficient_pow2,0.131072,131.072 +16384,8,scan,scan_work_efficient_non_pow2,0.075776,75.776 +16384,8,scan,scan_thrust_pow2,0.130048,130.048 +16384,8,scan,scan_thrust_non_pow2,0.047104,47.104 +16384,8,compact,compact_cpu_without_scan_pow2,0.0388,38.800000000000004 +16384,8,compact,compact_cpu_without_scan_non_pow2,0.0354,35.4 +16384,8,compact,compact_cpu_with_scan,0.0739,73.89999999999999 +16384,8,compact,compact_work_efficient_pow2,0.149504,149.504 +16384,8,compact,compact_work_efficient_non_pow2,0.075776,75.776 +16384,9,scan,scan_cpu_pow2,0.008,8.0 +16384,9,scan,scan_cpu_non_pow2,0.0114,11.4 +16384,9,scan,scan_naive_pow2,0.366592,366.592 +16384,9,scan,scan_naive_non_pow2,0.377856,377.85600000000005 +16384,9,scan,scan_work_efficient_pow2,0.18944,189.44 +16384,9,scan,scan_work_efficient_non_pow2,0.060416,60.416 +16384,9,scan,scan_thrust_pow2,0.106496,106.496 +16384,9,scan,scan_thrust_non_pow2,0.055296,55.296 +16384,9,compact,compact_cpu_without_scan_pow2,0.0307,30.700000000000003 +16384,9,compact,compact_cpu_without_scan_non_pow2,0.0278,27.799999999999997 +16384,9,compact,compact_cpu_with_scan,0.0614,61.400000000000006 +16384,9,compact,compact_work_efficient_pow2,0.170048,170.048 +16384,9,compact,compact_work_efficient_non_pow2,0.145408,145.40800000000002 +16384,10,scan,scan_cpu_pow2,0.0082,8.200000000000001 +16384,10,scan,scan_cpu_non_pow2,0.0076,7.6 +16384,10,scan,scan_naive_pow2,0.996352,996.352 +16384,10,scan,scan_naive_non_pow2,0.301056,301.056 +16384,10,scan,scan_work_efficient_pow2,0.15872,158.72 +16384,10,scan,scan_work_efficient_non_pow2,0.050272,50.272 +16384,10,scan,scan_thrust_pow2,0.10752,107.52000000000001 +16384,10,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +16384,10,compact,compact_cpu_without_scan_pow2,0.0311,31.099999999999998 +16384,10,compact,compact_cpu_without_scan_non_pow2,0.0265,26.5 +16384,10,compact,compact_cpu_with_scan,0.0571,57.1 +16384,10,compact,compact_work_efficient_pow2,0.157696,157.696 +16384,10,compact,compact_work_efficient_non_pow2,0.14336,143.35999999999999 +8192,1,scan,scan_cpu_pow2,0.0043,4.3 +8192,1,scan,scan_cpu_non_pow2,0.0039,3.9 +8192,1,scan,scan_naive_pow2,0.315392,315.392 +8192,1,scan,scan_naive_non_pow2,0.234496,234.496 +8192,1,scan,scan_work_efficient_pow2,0.186368,186.368 +8192,1,scan,scan_work_efficient_non_pow2,0.052224,52.224 +8192,1,scan,scan_thrust_pow2,0.11776,117.76 +8192,1,scan,scan_thrust_non_pow2,0.048128,48.128 +8192,1,compact,compact_cpu_without_scan_pow2,0.0184,18.4 +8192,1,compact,compact_cpu_without_scan_non_pow2,0.013,13.0 +8192,1,compact,compact_cpu_with_scan,0.0331,33.099999999999994 +8192,1,compact,compact_work_efficient_pow2,0.22016,220.16 +8192,1,compact,compact_work_efficient_non_pow2,0.10752,107.52000000000001 +8192,2,scan,scan_cpu_pow2,0.004,4.0 +8192,2,scan,scan_cpu_non_pow2,0.0031,3.1 +8192,2,scan,scan_naive_pow2,0.350208,350.208 +8192,2,scan,scan_naive_non_pow2,0.226304,226.304 +8192,2,scan,scan_work_efficient_pow2,0.208896,208.896 +8192,2,scan,scan_work_efficient_non_pow2,0.050176,50.176 +8192,2,scan,scan_thrust_pow2,0.101376,101.37599999999999 +8192,2,scan,scan_thrust_non_pow2,0.058368,58.368 +8192,2,compact,compact_cpu_without_scan_pow2,0.015,15.0 +8192,2,compact,compact_cpu_without_scan_non_pow2,0.011,11.0 +8192,2,compact,compact_cpu_with_scan,0.0261,26.1 +8192,2,compact,compact_work_efficient_pow2,0.181248,181.248 +8192,2,compact,compact_work_efficient_non_pow2,0.147456,147.45600000000002 +8192,3,scan,scan_cpu_pow2,0.0039,3.9 +8192,3,scan,scan_cpu_non_pow2,0.0034,3.4 +8192,3,scan,scan_naive_pow2,0.93696,936.96 +8192,3,scan,scan_naive_non_pow2,0.306176,306.176 +8192,3,scan,scan_work_efficient_pow2,0.13312,133.11999999999998 +8192,3,scan,scan_work_efficient_non_pow2,0.053312,53.312 +8192,3,scan,scan_thrust_pow2,0.124928,124.928 +8192,3,scan,scan_thrust_non_pow2,0.039936,39.936 +8192,3,compact,compact_cpu_without_scan_pow2,0.0153,15.299999999999999 +8192,3,compact,compact_cpu_without_scan_non_pow2,0.0107,10.7 +8192,3,compact,compact_cpu_with_scan,0.0266,26.599999999999998 +8192,3,compact,compact_work_efficient_pow2,0.161792,161.792 +8192,3,compact,compact_work_efficient_non_pow2,0.11264,112.64 +8192,4,scan,scan_cpu_pow2,0.0046,4.6 +8192,4,scan,scan_cpu_non_pow2,0.0038,3.8 +8192,4,scan,scan_naive_pow2,0.443392,443.392 +8192,4,scan,scan_naive_non_pow2,0.247808,247.808 +8192,4,scan,scan_work_efficient_pow2,0.178176,178.176 +8192,4,scan,scan_work_efficient_non_pow2,0.06656,66.55999999999999 +8192,4,scan,scan_thrust_pow2,0.144384,144.38400000000001 +8192,4,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +8192,4,compact,compact_cpu_without_scan_pow2,0.0157,15.7 +8192,4,compact,compact_cpu_without_scan_non_pow2,0.0111,11.1 +8192,4,compact,compact_cpu_with_scan,0.0268,26.8 +8192,4,compact,compact_work_efficient_pow2,0.159744,159.744 +8192,4,compact,compact_work_efficient_non_pow2,0.123904,123.904 +8192,5,scan,scan_cpu_pow2,0.0034,3.4 +8192,5,scan,scan_cpu_non_pow2,0.0035,3.5 +8192,5,scan,scan_naive_pow2,0.387072,387.072 +8192,5,scan,scan_naive_non_pow2,0.270336,270.336 +8192,5,scan,scan_work_efficient_pow2,0.173056,173.05599999999998 +8192,5,scan,scan_work_efficient_non_pow2,0.058368,58.368 +8192,5,scan,scan_thrust_pow2,0.118784,118.784 +8192,5,scan,scan_thrust_non_pow2,0.047104,47.104 +8192,5,compact,compact_cpu_without_scan_pow2,0.0153,15.299999999999999 +8192,5,compact,compact_cpu_without_scan_non_pow2,0.0107,10.7 +8192,5,compact,compact_cpu_with_scan,0.0265,26.5 +8192,5,compact,compact_work_efficient_pow2,0.169984,169.984 +8192,5,compact,compact_work_efficient_non_pow2,0.113664,113.664 +8192,6,scan,scan_cpu_pow2,0.0042,4.2 +8192,6,scan,scan_cpu_non_pow2,0.0041,4.1000000000000005 +8192,6,scan,scan_naive_pow2,0.486528,486.528 +8192,6,scan,scan_naive_non_pow2,0.315392,315.392 +8192,6,scan,scan_work_efficient_pow2,0.152576,152.576 +8192,6,scan,scan_work_efficient_non_pow2,0.09216,92.16000000000001 +8192,6,scan,scan_thrust_pow2,0.114688,114.688 +8192,6,scan,scan_thrust_non_pow2,0.06656,66.55999999999999 +8192,6,compact,compact_cpu_without_scan_pow2,0.0154,15.4 +8192,6,compact,compact_cpu_without_scan_non_pow2,0.0103,10.3 +8192,6,compact,compact_cpu_with_scan,0.027,27.0 +8192,6,compact,compact_work_efficient_pow2,0.154656,154.65599999999998 +8192,6,compact,compact_work_efficient_non_pow2,0.106496,106.496 +8192,7,scan,scan_cpu_pow2,0.0044,4.4 +8192,7,scan,scan_cpu_non_pow2,0.0038,3.8 +8192,7,scan,scan_naive_pow2,0.335872,335.872 +8192,7,scan,scan_naive_non_pow2,0.21504,215.04000000000002 +8192,7,scan,scan_work_efficient_pow2,0.150528,150.528 +8192,7,scan,scan_work_efficient_non_pow2,0.08704,87.04 +8192,7,scan,scan_thrust_pow2,0.109568,109.568 +8192,7,scan,scan_thrust_non_pow2,0.049152,49.152 +8192,7,compact,compact_cpu_without_scan_pow2,0.0153,15.299999999999999 +8192,7,compact,compact_cpu_without_scan_non_pow2,0.0105,10.5 +8192,7,compact,compact_cpu_with_scan,0.0273,27.3 +8192,7,compact,compact_work_efficient_pow2,0.690176,690.176 +8192,7,compact,compact_work_efficient_non_pow2,0.074752,74.752 +8192,8,scan,scan_cpu_pow2,0.0044,4.4 +8192,8,scan,scan_cpu_non_pow2,0.0038,3.8 +8192,8,scan,scan_naive_pow2,0.345088,345.088 +8192,8,scan,scan_naive_non_pow2,0.325632,325.632 +8192,8,scan,scan_work_efficient_pow2,0.131072,131.072 +8192,8,scan,scan_work_efficient_non_pow2,0.0512,51.2 +8192,8,scan,scan_thrust_pow2,0.126976,126.976 +8192,8,scan,scan_thrust_non_pow2,0.045056,45.056 +8192,8,compact,compact_cpu_without_scan_pow2,0.0152,15.2 +8192,8,compact,compact_cpu_without_scan_non_pow2,0.0104,10.4 +8192,8,compact,compact_cpu_with_scan,0.0265,26.5 +8192,8,compact,compact_work_efficient_pow2,0.216064,216.064 +8192,8,compact,compact_work_efficient_non_pow2,0.073728,73.72800000000001 +8192,9,scan,scan_cpu_pow2,0.0042,4.2 +8192,9,scan,scan_cpu_non_pow2,0.0039,3.9 +8192,9,scan,scan_naive_pow2,0.342016,342.01599999999996 +8192,9,scan,scan_naive_non_pow2,0.233472,233.472 +8192,9,scan,scan_work_efficient_pow2,0.132096,132.096 +8192,9,scan,scan_work_efficient_non_pow2,0.08704,87.04 +8192,9,scan,scan_thrust_pow2,0.187392,187.392 +8192,9,scan,scan_thrust_non_pow2,0.045056,45.056 +8192,9,compact,compact_cpu_without_scan_pow2,0.0155,15.5 +8192,9,compact,compact_cpu_without_scan_non_pow2,0.0105,10.5 +8192,9,compact,compact_cpu_with_scan,0.0268,26.8 +8192,9,compact,compact_work_efficient_pow2,0.18944,189.44 +8192,9,compact,compact_work_efficient_non_pow2,0.139264,139.264 +8192,10,scan,scan_cpu_pow2,0.0037,3.7 +8192,10,scan,scan_cpu_non_pow2,0.0075,7.5 +8192,10,scan,scan_naive_pow2,0.352256,352.25600000000003 +8192,10,scan,scan_naive_non_pow2,0.29184,291.84 +8192,10,scan,scan_work_efficient_pow2,0.170048,170.048 +8192,10,scan,scan_work_efficient_non_pow2,0.086016,86.01599999999999 +8192,10,scan,scan_thrust_pow2,0.105472,105.472 +8192,10,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +8192,10,compact,compact_cpu_without_scan_pow2,0.0155,15.5 +8192,10,compact,compact_cpu_without_scan_non_pow2,0.0106,10.6 +8192,10,compact,compact_cpu_with_scan,0.0265,26.5 +8192,10,compact,compact_work_efficient_pow2,0.13312,133.11999999999998 +8192,10,compact,compact_work_efficient_non_pow2,0.075776,75.776 +4096,1,scan,scan_cpu_pow2,0.0027,2.7 +4096,1,scan,scan_cpu_non_pow2,0.0015,1.5 +4096,1,scan,scan_naive_pow2,0.946176,946.176 +4096,1,scan,scan_naive_non_pow2,0.27648,276.48 +4096,1,scan,scan_work_efficient_pow2,0.177152,177.15200000000002 +4096,1,scan,scan_work_efficient_non_pow2,0.083968,83.968 +4096,1,scan,scan_thrust_pow2,0.118784,118.784 +4096,1,scan,scan_thrust_non_pow2,0.041984,41.984 +4096,1,compact,compact_cpu_without_scan_pow2,0.0076,7.6 +4096,1,compact,compact_cpu_without_scan_non_pow2,0.0045,4.5 +4096,1,compact,compact_cpu_with_scan,0.0121,12.1 +4096,1,compact,compact_work_efficient_pow2,0.124928,124.928 +4096,1,compact,compact_work_efficient_non_pow2,0.073728,73.72800000000001 +4096,2,scan,scan_cpu_pow2,0.0022,2.2 +4096,2,scan,scan_cpu_non_pow2,0.0028,2.8 +4096,2,scan,scan_naive_pow2,0.907264,907.264 +4096,2,scan,scan_naive_non_pow2,0.221184,221.184 +4096,2,scan,scan_work_efficient_pow2,0.14336,143.35999999999999 +4096,2,scan,scan_work_efficient_non_pow2,0.072704,72.70400000000001 +4096,2,scan,scan_thrust_pow2,0.109568,109.568 +4096,2,scan,scan_thrust_non_pow2,0.045056,45.056 +4096,2,compact,compact_cpu_without_scan_pow2,0.0079,7.9 +4096,2,compact,compact_cpu_without_scan_non_pow2,0.0045,4.5 +4096,2,compact,compact_cpu_with_scan,0.014,14.0 +4096,2,compact,compact_work_efficient_pow2,0.171008,171.00799999999998 +4096,2,compact,compact_work_efficient_non_pow2,0.072704,72.70400000000001 +4096,3,scan,scan_cpu_pow2,0.0028,2.8 +4096,3,scan,scan_cpu_non_pow2,0.002,2.0 +4096,3,scan,scan_naive_pow2,0.325632,325.632 +4096,3,scan,scan_naive_non_pow2,0.277504,277.50399999999996 +4096,3,scan,scan_work_efficient_pow2,0.16384,163.84 +4096,3,scan,scan_work_efficient_non_pow2,0.083968,83.968 +4096,3,scan,scan_thrust_pow2,0.130048,130.048 +4096,3,scan,scan_thrust_non_pow2,0.041984,41.984 +4096,3,compact,compact_cpu_without_scan_pow2,0.008,8.0 +4096,3,compact,compact_cpu_without_scan_non_pow2,0.0046,4.6 +4096,3,compact,compact_cpu_with_scan,0.0143,14.3 +4096,3,compact,compact_work_efficient_pow2,0.20992,209.92 +4096,3,compact,compact_work_efficient_non_pow2,0.126976,126.976 +4096,4,scan,scan_cpu_pow2,0.0028,2.8 +4096,4,scan,scan_cpu_non_pow2,0.0019,1.9 +4096,4,scan,scan_naive_pow2,0.411648,411.648 +4096,4,scan,scan_naive_non_pow2,0.282624,282.62399999999997 +4096,4,scan,scan_work_efficient_pow2,0.16384,163.84 +4096,4,scan,scan_work_efficient_non_pow2,0.08192,81.92 +4096,4,scan,scan_thrust_pow2,0.125952,125.95200000000001 +4096,4,scan,scan_thrust_non_pow2,0.058368,58.368 +4096,4,compact,compact_cpu_without_scan_pow2,0.0079,7.9 +4096,4,compact,compact_cpu_without_scan_non_pow2,0.0047,4.7 +4096,4,compact,compact_cpu_with_scan,0.0123,12.3 +4096,4,compact,compact_work_efficient_pow2,0.163904,163.904 +4096,4,compact,compact_work_efficient_non_pow2,0.120832,120.832 +4096,5,scan,scan_cpu_pow2,0.0018,1.8 +4096,5,scan,scan_cpu_non_pow2,0.0016,1.6 +4096,5,scan,scan_naive_pow2,0.295936,295.936 +4096,5,scan,scan_naive_non_pow2,0.27136,271.36 +4096,5,scan,scan_work_efficient_pow2,0.1792,179.2 +4096,5,scan,scan_work_efficient_non_pow2,0.06144,61.440000000000005 +4096,5,scan,scan_thrust_pow2,0.111616,111.61600000000001 +4096,5,scan,scan_thrust_non_pow2,0.045056,45.056 +4096,5,compact,compact_cpu_without_scan_pow2,0.0078,7.8 +4096,5,compact,compact_cpu_without_scan_non_pow2,0.0044,4.4 +4096,5,compact,compact_cpu_with_scan,0.0132,13.2 +4096,5,compact,compact_work_efficient_pow2,0.196608,196.608 +4096,5,compact,compact_work_efficient_non_pow2,0.079872,79.872 +4096,6,scan,scan_cpu_pow2,0.002,2.0 +4096,6,scan,scan_cpu_non_pow2,0.0016,1.6 +4096,6,scan,scan_naive_pow2,0.268288,268.288 +4096,6,scan,scan_naive_non_pow2,0.241664,241.664 +4096,6,scan,scan_work_efficient_pow2,0.145408,145.40800000000002 +4096,6,scan,scan_work_efficient_non_pow2,0.0512,51.2 +4096,6,scan,scan_thrust_pow2,0.11264,112.64 +4096,6,scan,scan_thrust_non_pow2,0.079872,79.872 +4096,6,compact,compact_cpu_without_scan_pow2,0.008,8.0 +4096,6,compact,compact_cpu_without_scan_non_pow2,0.0047,4.7 +4096,6,compact,compact_cpu_with_scan,0.0127,12.7 +4096,6,compact,compact_work_efficient_pow2,0.160768,160.768 +4096,6,compact,compact_work_efficient_non_pow2,0.142336,142.33599999999998 +4096,7,scan,scan_cpu_pow2,0.0024,2.4 +4096,7,scan,scan_cpu_non_pow2,0.002,2.0 +4096,7,scan,scan_naive_pow2,0.33792,337.92 +4096,7,scan,scan_naive_non_pow2,0.214016,214.01600000000002 +4096,7,scan,scan_work_efficient_pow2,0.234496,234.496 +4096,7,scan,scan_work_efficient_non_pow2,0.06144,61.440000000000005 +4096,7,scan,scan_thrust_pow2,0.1024,102.4 +4096,7,scan,scan_thrust_non_pow2,0.044032,44.032000000000004 +4096,7,compact,compact_cpu_without_scan_pow2,0.0079,7.9 +4096,7,compact,compact_cpu_without_scan_non_pow2,0.0045,4.5 +4096,7,compact,compact_cpu_with_scan,0.0137,13.700000000000001 +4096,7,compact,compact_work_efficient_pow2,0.151552,151.552 +4096,7,compact,compact_work_efficient_non_pow2,0.109568,109.568 +4096,8,scan,scan_cpu_pow2,0.0026,2.6 +4096,8,scan,scan_cpu_non_pow2,0.002,2.0 +4096,8,scan,scan_naive_pow2,0.2816,281.6 +4096,8,scan,scan_naive_non_pow2,0.238592,238.59199999999998 +4096,8,scan,scan_work_efficient_pow2,0.264192,264.192 +4096,8,scan,scan_work_efficient_non_pow2,0.050176,50.176 +4096,8,scan,scan_thrust_pow2,0.104448,104.448 +4096,8,scan,scan_thrust_non_pow2,0.067584,67.584 +4096,8,compact,compact_cpu_without_scan_pow2,0.0078,7.8 +4096,8,compact,compact_cpu_without_scan_non_pow2,0.0044,4.4 +4096,8,compact,compact_cpu_with_scan,0.0125,12.5 +4096,8,compact,compact_work_efficient_pow2,0.164864,164.864 +4096,8,compact,compact_work_efficient_non_pow2,0.11264,112.64 +4096,9,scan,scan_cpu_pow2,0.0027,2.7 +4096,9,scan,scan_cpu_non_pow2,0.002,2.0 +4096,9,scan,scan_naive_pow2,0.309248,309.24800000000005 +4096,9,scan,scan_naive_non_pow2,0.24064,240.64 +4096,9,scan,scan_work_efficient_pow2,0.124928,124.928 +4096,9,scan,scan_work_efficient_non_pow2,0.095232,95.232 +4096,9,scan,scan_thrust_pow2,0.139264,139.264 +4096,9,scan,scan_thrust_non_pow2,0.0512,51.2 +4096,9,compact,compact_cpu_without_scan_pow2,0.0101,10.1 +4096,9,compact,compact_cpu_without_scan_non_pow2,0.0061,6.1000000000000005 +4096,9,compact,compact_cpu_with_scan,0.0153,15.299999999999999 +4096,9,compact,compact_work_efficient_pow2,0.212992,212.992 +4096,9,compact,compact_work_efficient_non_pow2,0.098304,98.304 +4096,10,scan,scan_cpu_pow2,0.0022,2.2 +4096,10,scan,scan_cpu_non_pow2,0.0016,1.6 +4096,10,scan,scan_naive_pow2,0.340992,340.992 +4096,10,scan,scan_naive_non_pow2,0.26112,261.12 +4096,10,scan,scan_work_efficient_pow2,0.157696,157.696 +4096,10,scan,scan_work_efficient_non_pow2,0.086016,86.01599999999999 +4096,10,scan,scan_thrust_pow2,0.100288,100.288 +4096,10,scan,scan_thrust_non_pow2,0.057344,57.344 +4096,10,compact,compact_cpu_without_scan_pow2,0.008,8.0 +4096,10,compact,compact_cpu_without_scan_non_pow2,0.0045,4.5 +4096,10,compact,compact_cpu_with_scan,0.015,15.0 +4096,10,compact,compact_work_efficient_pow2,0.200704,200.704 +4096,10,compact,compact_work_efficient_non_pow2,0.114688,114.688 +2048,1,scan,scan_cpu_pow2,0.0014,1.4 +2048,1,scan,scan_cpu_non_pow2,0.001,1.0 +2048,1,scan,scan_naive_pow2,0.385088,385.08799999999997 +2048,1,scan,scan_naive_non_pow2,0.299008,299.008 +2048,1,scan,scan_work_efficient_pow2,0.12288,122.88000000000001 +2048,1,scan,scan_work_efficient_non_pow2,0.083968,83.968 +2048,1,scan,scan_thrust_pow2,0.11264,112.64 +2048,1,scan,scan_thrust_non_pow2,0.045056,45.056 +2048,1,compact,compact_cpu_without_scan_pow2,0.0041,4.1000000000000005 +2048,1,compact,compact_cpu_without_scan_non_pow2,0.0023,2.3 +2048,1,compact,compact_cpu_with_scan,0.0108,10.8 +2048,1,compact,compact_work_efficient_pow2,0.186368,186.368 +2048,1,compact,compact_work_efficient_non_pow2,0.099328,99.328 +2048,2,scan,scan_cpu_pow2,0.0015,1.5 +2048,2,scan,scan_cpu_non_pow2,0.001,1.0 +2048,2,scan,scan_naive_pow2,0.953344,953.3439999999999 +2048,2,scan,scan_naive_non_pow2,0.288768,288.76800000000003 +2048,2,scan,scan_work_efficient_pow2,0.196608,196.608 +2048,2,scan,scan_work_efficient_non_pow2,0.053248,53.248 +2048,2,scan,scan_thrust_pow2,0.116736,116.736 +2048,2,scan,scan_thrust_non_pow2,0.049184,49.184 +2048,2,compact,compact_cpu_without_scan_pow2,0.0042,4.2 +2048,2,compact,compact_cpu_without_scan_non_pow2,0.0022,2.2 +2048,2,compact,compact_cpu_with_scan,0.0084,8.4 +2048,2,compact,compact_work_efficient_pow2,0.177152,177.15200000000002 +2048,2,compact,compact_work_efficient_non_pow2,0.14336,143.35999999999999 +2048,3,scan,scan_cpu_pow2,0.0012,1.2 +2048,3,scan,scan_cpu_non_pow2,0.0009,0.9 +2048,3,scan,scan_naive_pow2,0.805888,805.888 +2048,3,scan,scan_naive_non_pow2,0.269312,269.312 +2048,3,scan,scan_work_efficient_pow2,0.192512,192.512 +2048,3,scan,scan_work_efficient_non_pow2,0.052224,52.224 +2048,3,scan,scan_thrust_pow2,0.105472,105.472 +2048,3,scan,scan_thrust_non_pow2,0.041984,41.984 +2048,3,compact,compact_cpu_without_scan_pow2,0.0044,4.4 +2048,3,compact,compact_cpu_without_scan_non_pow2,0.0022,2.2 +2048,3,compact,compact_cpu_with_scan,0.0066,6.6 +2048,3,compact,compact_work_efficient_pow2,0.187392,187.392 +2048,3,compact,compact_work_efficient_non_pow2,0.142336,142.33599999999998 +2048,4,scan,scan_cpu_pow2,0.0012,1.2 +2048,4,scan,scan_cpu_non_pow2,0.0009,0.9 +2048,4,scan,scan_naive_pow2,0.467968,467.968 +2048,4,scan,scan_naive_non_pow2,0.190464,190.464 +2048,4,scan,scan_work_efficient_pow2,0.130048,130.048 +2048,4,scan,scan_work_efficient_non_pow2,0.050176,50.176 +2048,4,scan,scan_thrust_pow2,0.105472,105.472 +2048,4,scan,scan_thrust_non_pow2,0.043072,43.072 +2048,4,compact,compact_cpu_without_scan_pow2,0.0041,4.1000000000000005 +2048,4,compact,compact_cpu_without_scan_non_pow2,0.0022,2.2 +2048,4,compact,compact_cpu_with_scan,0.0063,6.3 +2048,4,compact,compact_work_efficient_pow2,0.16384,163.84 +2048,4,compact,compact_work_efficient_non_pow2,0.098304,98.304 +2048,5,scan,scan_cpu_pow2,0.0014,1.4 +2048,5,scan,scan_cpu_non_pow2,0.001,1.0 +2048,5,scan,scan_naive_pow2,0.294912,294.91200000000003 +2048,5,scan,scan_naive_non_pow2,0.289792,289.792 +2048,5,scan,scan_work_efficient_pow2,0.123904,123.904 +2048,5,scan,scan_work_efficient_non_pow2,0.084992,84.992 +2048,5,scan,scan_thrust_pow2,0.1024,102.4 +2048,5,scan,scan_thrust_non_pow2,0.054272,54.272 +2048,5,compact,compact_cpu_without_scan_pow2,0.0052,5.2 +2048,5,compact,compact_cpu_without_scan_non_pow2,0.0028,2.8 +2048,5,compact,compact_cpu_with_scan,0.0079,7.9 +2048,5,compact,compact_work_efficient_pow2,0.167936,167.936 +2048,5,compact,compact_work_efficient_non_pow2,0.0768,76.8 +2048,6,scan,scan_cpu_pow2,0.0013,1.3 +2048,6,scan,scan_cpu_non_pow2,0.001,1.0 +2048,6,scan,scan_naive_pow2,0.387072,387.072 +2048,6,scan,scan_naive_non_pow2,0.29696,296.96 +2048,6,scan,scan_work_efficient_pow2,0.177152,177.15200000000002 +2048,6,scan,scan_work_efficient_non_pow2,0.686144,686.144 +2048,6,scan,scan_thrust_pow2,0.113664,113.664 +2048,6,scan,scan_thrust_non_pow2,0.04096,40.96 +2048,6,compact,compact_cpu_without_scan_pow2,0.0044,4.4 +2048,6,compact,compact_cpu_without_scan_non_pow2,0.0021,2.1 +2048,6,compact,compact_cpu_with_scan,0.0056,5.6 +2048,6,compact,compact_work_efficient_pow2,0.15872,158.72 +2048,6,compact,compact_work_efficient_non_pow2,0.10752,107.52000000000001 +2048,7,scan,scan_cpu_pow2,0.0013,1.3 +2048,7,scan,scan_cpu_non_pow2,0.0008,0.8 +2048,7,scan,scan_naive_pow2,0.280576,280.57599999999996 +2048,7,scan,scan_naive_non_pow2,0.19456,194.56 +2048,7,scan,scan_work_efficient_pow2,0.126048,126.04799999999999 +2048,7,scan,scan_work_efficient_non_pow2,0.052224,52.224 +2048,7,scan,scan_thrust_pow2,0.110592,110.592 +2048,7,scan,scan_thrust_non_pow2,0.050176,50.176 +2048,7,compact,compact_cpu_without_scan_pow2,0.0052,5.2 +2048,7,compact,compact_cpu_without_scan_non_pow2,0.0026,2.6 +2048,7,compact,compact_cpu_with_scan,0.0066,6.6 +2048,7,compact,compact_work_efficient_pow2,0.290816,290.81600000000003 +2048,7,compact,compact_work_efficient_non_pow2,0.080896,80.896 +2048,8,scan,scan_cpu_pow2,0.0016,1.6 +2048,8,scan,scan_cpu_non_pow2,0.001,1.0 +2048,8,scan,scan_naive_pow2,0.335872,335.872 +2048,8,scan,scan_naive_non_pow2,0.219136,219.136 +2048,8,scan,scan_work_efficient_pow2,0.19968,199.68 +2048,8,scan,scan_work_efficient_non_pow2,0.0512,51.2 +2048,8,scan,scan_thrust_pow2,0.14336,143.35999999999999 +2048,8,scan,scan_thrust_non_pow2,0.044032,44.032000000000004 +2048,8,compact,compact_cpu_without_scan_pow2,0.0038,3.8 +2048,8,compact,compact_cpu_without_scan_non_pow2,0.002,2.0 +2048,8,compact,compact_cpu_with_scan,0.0049,4.8999999999999995 +2048,8,compact,compact_work_efficient_pow2,0.16896,168.96 +2048,8,compact,compact_work_efficient_non_pow2,0.110592,110.592 +2048,9,scan,scan_cpu_pow2,0.0014,1.4 +2048,9,scan,scan_cpu_non_pow2,0.001,1.0 +2048,9,scan,scan_naive_pow2,0.333824,333.824 +2048,9,scan,scan_naive_non_pow2,0.212992,212.992 +2048,9,scan,scan_work_efficient_pow2,0.151552,151.552 +2048,9,scan,scan_work_efficient_non_pow2,0.065536,65.536 +2048,9,scan,scan_thrust_pow2,0.1024,102.4 +2048,9,scan,scan_thrust_non_pow2,0.04496,44.96 +2048,9,compact,compact_cpu_without_scan_pow2,0.0042,4.2 +2048,9,compact,compact_cpu_without_scan_non_pow2,0.0022,2.2 +2048,9,compact,compact_cpu_with_scan,0.0054,5.4 +2048,9,compact,compact_work_efficient_pow2,0.16384,163.84 +2048,9,compact,compact_work_efficient_non_pow2,0.72704,727.04 +2048,10,scan,scan_cpu_pow2,0.0011,1.1 +2048,10,scan,scan_cpu_non_pow2,0.0009,0.9 +2048,10,scan,scan_naive_pow2,0.319488,319.488 +2048,10,scan,scan_naive_non_pow2,0.221184,221.184 +2048,10,scan,scan_work_efficient_pow2,0.185344,185.34400000000002 +2048,10,scan,scan_work_efficient_non_pow2,0.08192,81.92 +2048,10,scan,scan_thrust_pow2,0.103424,103.424 +2048,10,scan,scan_thrust_non_pow2,0.045056,45.056 +2048,10,compact,compact_cpu_without_scan_pow2,0.0043,4.3 +2048,10,compact,compact_cpu_without_scan_non_pow2,0.0023,2.3 +2048,10,compact,compact_cpu_with_scan,0.0068,6.8 +2048,10,compact,compact_work_efficient_pow2,0.192512,192.512 +2048,10,compact,compact_work_efficient_non_pow2,0.073728,73.72800000000001 +1024,1,scan,scan_cpu_pow2,0.001,1.0 +1024,1,scan,scan_cpu_non_pow2,0.0006,0.6 +1024,1,scan,scan_naive_pow2,0.357376,357.37600000000003 +1024,1,scan,scan_naive_non_pow2,0.274432,274.432 +1024,1,scan,scan_work_efficient_pow2,0.193536,193.536 +1024,1,scan,scan_work_efficient_non_pow2,0.088064,88.06400000000001 +1024,1,scan,scan_thrust_pow2,0.118784,118.784 +1024,1,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +1024,1,compact,compact_cpu_without_scan_pow2,0.0019,1.9 +1024,1,compact,compact_cpu_without_scan_non_pow2,0.001,1.0 +1024,1,compact,compact_cpu_with_scan,0.0026,2.6 +1024,1,compact,compact_work_efficient_pow2,0.193536,193.536 +1024,1,compact,compact_work_efficient_non_pow2,0.105472,105.472 +1024,2,scan,scan_cpu_pow2,0.0012,1.2 +1024,2,scan,scan_cpu_non_pow2,0.0005,0.5 +1024,2,scan,scan_naive_pow2,0.30208,302.08000000000004 +1024,2,scan,scan_naive_non_pow2,0.241664,241.664 +1024,2,scan,scan_work_efficient_pow2,0.178176,178.176 +1024,2,scan,scan_work_efficient_non_pow2,0.08704,87.04 +1024,2,scan,scan_thrust_pow2,0.128,128.0 +1024,2,scan,scan_thrust_non_pow2,0.041984,41.984 +1024,2,compact,compact_cpu_without_scan_pow2,0.0022,2.2 +1024,2,compact,compact_cpu_without_scan_non_pow2,0.0011,1.1 +1024,2,compact,compact_cpu_with_scan,0.0031,3.1 +1024,2,compact,compact_work_efficient_pow2,0.141312,141.31199999999998 +1024,2,compact,compact_work_efficient_non_pow2,0.0768,76.8 +1024,3,scan,scan_cpu_pow2,0.0009,0.9 +1024,3,scan,scan_cpu_non_pow2,0.0006,0.6 +1024,3,scan,scan_naive_pow2,0.406528,406.528 +1024,3,scan,scan_naive_non_pow2,0.272384,272.384 +1024,3,scan,scan_work_efficient_pow2,0.152576,152.576 +1024,3,scan,scan_work_efficient_non_pow2,0.084992,84.992 +1024,3,scan,scan_thrust_pow2,0.123904,123.904 +1024,3,scan,scan_thrust_non_pow2,0.055296,55.296 +1024,3,compact,compact_cpu_without_scan_pow2,0.0028,2.8 +1024,3,compact,compact_cpu_without_scan_non_pow2,0.0013,1.3 +1024,3,compact,compact_cpu_with_scan,0.0035,3.5 +1024,3,compact,compact_work_efficient_pow2,0.182272,182.272 +1024,3,compact,compact_work_efficient_non_pow2,0.109568,109.568 +1024,4,scan,scan_cpu_pow2,0.0009,0.9 +1024,4,scan,scan_cpu_non_pow2,0.0005,0.5 +1024,4,scan,scan_naive_pow2,0.326656,326.656 +1024,4,scan,scan_naive_non_pow2,0.201728,201.72799999999998 +1024,4,scan,scan_work_efficient_pow2,0.154624,154.62400000000002 +1024,4,scan,scan_work_efficient_non_pow2,0.08192,81.92 +1024,4,scan,scan_thrust_pow2,0.10656,106.56 +1024,4,scan,scan_thrust_non_pow2,0.04816,48.160000000000004 +1024,4,compact,compact_cpu_without_scan_pow2,0.0028,2.8 +1024,4,compact,compact_cpu_without_scan_non_pow2,0.0014,1.4 +1024,4,compact,compact_cpu_with_scan,0.0034,3.4 +1024,4,compact,compact_work_efficient_pow2,0.220192,220.192 +1024,4,compact,compact_work_efficient_non_pow2,0.074752,74.752 +1024,5,scan,scan_cpu_pow2,0.0008,0.8 +1024,5,scan,scan_cpu_non_pow2,0.0004,0.4 +1024,5,scan,scan_naive_pow2,0.447488,447.488 +1024,5,scan,scan_naive_non_pow2,0.259072,259.072 +1024,5,scan,scan_work_efficient_pow2,0.160768,160.768 +1024,5,scan,scan_work_efficient_non_pow2,0.06144,61.440000000000005 +1024,5,scan,scan_thrust_pow2,0.131008,131.008 +1024,5,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +1024,5,compact,compact_cpu_without_scan_pow2,0.0021,2.1 +1024,5,compact,compact_cpu_without_scan_non_pow2,0.001,1.0 +1024,5,compact,compact_cpu_with_scan,0.0026,2.6 +1024,5,compact,compact_work_efficient_pow2,0.216064,216.064 +1024,5,compact,compact_work_efficient_non_pow2,0.074752,74.752 +1024,6,scan,scan_cpu_pow2,0.0011,1.1 +1024,6,scan,scan_cpu_non_pow2,0.0004,0.4 +1024,6,scan,scan_naive_pow2,0.35328,353.28 +1024,6,scan,scan_naive_non_pow2,0.201728,201.72799999999998 +1024,6,scan,scan_work_efficient_pow2,0.177152,177.15200000000002 +1024,6,scan,scan_work_efficient_non_pow2,0.08192,81.92 +1024,6,scan,scan_thrust_pow2,0.125952,125.95200000000001 +1024,6,scan,scan_thrust_non_pow2,0.041984,41.984 +1024,6,compact,compact_cpu_without_scan_pow2,0.0023,2.3 +1024,6,compact,compact_cpu_without_scan_non_pow2,0.0011,1.1 +1024,6,compact,compact_cpu_with_scan,0.0029,2.9 +1024,6,compact,compact_work_efficient_pow2,0.17408,174.08 +1024,6,compact,compact_work_efficient_non_pow2,0.10752,107.52000000000001 +1024,7,scan,scan_cpu_pow2,0.0015,1.5 +1024,7,scan,scan_cpu_non_pow2,0.0005,0.5 +1024,7,scan,scan_naive_pow2,0.349184,349.18399999999997 +1024,7,scan,scan_naive_non_pow2,0.221184,221.184 +1024,7,scan,scan_work_efficient_pow2,0.259072,259.072 +1024,7,scan,scan_work_efficient_non_pow2,0.124928,124.928 +1024,7,scan,scan_thrust_pow2,0.121792,121.792 +1024,7,scan,scan_thrust_non_pow2,0.043008,43.007999999999996 +1024,7,compact,compact_cpu_without_scan_pow2,0.0021,2.1 +1024,7,compact,compact_cpu_without_scan_non_pow2,0.0009,0.9 +1024,7,compact,compact_cpu_with_scan,0.0027,2.7 +1024,7,compact,compact_work_efficient_pow2,0.165888,165.888 +1024,7,compact,compact_work_efficient_non_pow2,0.198656,198.656 +1024,8,scan,scan_cpu_pow2,0.0011,1.1 +1024,8,scan,scan_cpu_non_pow2,0.0006,0.6 +1024,8,scan,scan_naive_pow2,0.305152,305.152 +1024,8,scan,scan_naive_non_pow2,0.146432,146.43200000000002 +1024,8,scan,scan_work_efficient_pow2,0.1536,153.6 +1024,8,scan,scan_work_efficient_non_pow2,0.088064,88.06400000000001 +1024,8,scan,scan_thrust_pow2,0.123904,123.904 +1024,8,scan,scan_thrust_non_pow2,0.029696,29.696 +1024,8,compact,compact_cpu_without_scan_pow2,0.0021,2.1 +1024,8,compact,compact_cpu_without_scan_non_pow2,0.0011,1.1 +1024,8,compact,compact_cpu_with_scan,0.0028,2.8 +1024,8,compact,compact_work_efficient_pow2,0.166912,166.912 +1024,8,compact,compact_work_efficient_non_pow2,0.11264,112.64 +1024,9,scan,scan_cpu_pow2,0.0008,0.8 +1024,9,scan,scan_cpu_non_pow2,0.0006,0.6 +1024,9,scan,scan_naive_pow2,0.306176,306.176 +1024,9,scan,scan_naive_non_pow2,0.181248,181.248 +1024,9,scan,scan_work_efficient_pow2,0.156672,156.672 +1024,9,scan,scan_work_efficient_non_pow2,0.086016,86.01599999999999 +1024,9,scan,scan_thrust_pow2,0.149504,149.504 +1024,9,scan,scan_thrust_non_pow2,0.045056,45.056 +1024,9,compact,compact_cpu_without_scan_pow2,0.0019,1.9 +1024,9,compact,compact_cpu_without_scan_non_pow2,0.001,1.0 +1024,9,compact,compact_cpu_with_scan,0.0028,2.8 +1024,9,compact,compact_work_efficient_pow2,0.175104,175.104 +1024,9,compact,compact_work_efficient_non_pow2,0.125952,125.95200000000001 +1024,10,scan,scan_cpu_pow2,0.0008,0.8 +1024,10,scan,scan_cpu_non_pow2,0.0005,0.5 +1024,10,scan,scan_naive_pow2,0.754688,754.688 +1024,10,scan,scan_naive_non_pow2,0.1792,179.2 +1024,10,scan,scan_work_efficient_pow2,0.156704,156.704 +1024,10,scan,scan_work_efficient_non_pow2,0.049184,49.184 +1024,10,scan,scan_thrust_pow2,0.116736,116.736 +1024,10,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +1024,10,compact,compact_cpu_without_scan_pow2,0.0022,2.2 +1024,10,compact,compact_cpu_without_scan_non_pow2,0.0011,1.1 +1024,10,compact,compact_cpu_with_scan,0.0028,2.8 +1024,10,compact,compact_work_efficient_pow2,0.195584,195.584 +1024,10,compact,compact_work_efficient_non_pow2,0.109568,109.568 +512,1,scan,scan_cpu_pow2,0.0006,0.6 +512,1,scan,scan_cpu_non_pow2,0.0003,0.3 +512,1,scan,scan_naive_pow2,0.293888,293.888 +512,1,scan,scan_naive_non_pow2,0.180224,180.224 +512,1,scan,scan_work_efficient_pow2,0.108544,108.544 +512,1,scan,scan_work_efficient_non_pow2,0.016384,16.384 +512,1,scan,scan_thrust_pow2,0.120832,120.832 +512,1,scan,scan_thrust_non_pow2,0.077824,77.824 +512,1,compact,compact_cpu_without_scan_pow2,0.0012,1.2 +512,1,compact,compact_cpu_without_scan_non_pow2,0.0006,0.6 +512,1,compact,compact_cpu_with_scan,0.0013,1.3 +512,1,compact,compact_work_efficient_pow2,0.134144,134.144 +512,1,compact,compact_work_efficient_non_pow2,0.0512,51.2 +512,2,scan,scan_cpu_pow2,0.0007,0.7 +512,2,scan,scan_cpu_non_pow2,0.0003,0.3 +512,2,scan,scan_naive_pow2,0.297984,297.98400000000004 +512,2,scan,scan_naive_non_pow2,0.192512,192.512 +512,2,scan,scan_work_efficient_pow2,0.063488,63.488 +512,2,scan,scan_work_efficient_non_pow2,0.027648,27.648 +512,2,scan,scan_thrust_pow2,0.101376,101.37599999999999 +512,2,scan,scan_thrust_non_pow2,0.045056,45.056 +512,2,compact,compact_cpu_without_scan_pow2,0.0011,1.1 +512,2,compact,compact_cpu_without_scan_non_pow2,0.0005,0.5 +512,2,compact,compact_cpu_with_scan,0.0013,1.3 +512,2,compact,compact_work_efficient_pow2,0.113664,113.664 +512,2,compact,compact_work_efficient_non_pow2,0.082944,82.944 +512,3,scan,scan_cpu_pow2,0.0007,0.7 +512,3,scan,scan_cpu_non_pow2,0.0004,0.4 +512,3,scan,scan_naive_pow2,0.315392,315.392 +512,3,scan,scan_naive_non_pow2,0.188416,188.416 +512,3,scan,scan_work_efficient_pow2,0.080896,80.896 +512,3,scan,scan_work_efficient_non_pow2,0.016384,16.384 +512,3,scan,scan_thrust_pow2,0.11776,117.76 +512,3,scan,scan_thrust_non_pow2,0.041984,41.984 +512,3,compact,compact_cpu_without_scan_pow2,0.0013,1.3 +512,3,compact,compact_cpu_without_scan_non_pow2,0.0006,0.6 +512,3,compact,compact_cpu_with_scan,0.0016,1.6 +512,3,compact,compact_work_efficient_pow2,0.113664,113.664 +512,3,compact,compact_work_efficient_non_pow2,0.093184,93.184 +512,4,scan,scan_cpu_pow2,0.0006,0.6 +512,4,scan,scan_cpu_non_pow2,0.0003,0.3 +512,4,scan,scan_naive_pow2,0.3072,307.2 +512,4,scan,scan_naive_non_pow2,0.233472,233.472 +512,4,scan,scan_work_efficient_pow2,0.067584,67.584 +512,4,scan,scan_work_efficient_non_pow2,0.024576,24.576 +512,4,scan,scan_thrust_pow2,0.108544,108.544 +512,4,scan,scan_thrust_non_pow2,0.041984,41.984 +512,4,compact,compact_cpu_without_scan_pow2,0.0013,1.3 +512,4,compact,compact_cpu_without_scan_non_pow2,0.0006,0.6 +512,4,compact,compact_cpu_with_scan,0.0017,1.7 +512,4,compact,compact_work_efficient_pow2,0.156672,156.672 +512,4,compact,compact_work_efficient_non_pow2,0.08704,87.04 +512,5,scan,scan_cpu_pow2,0.001,1.0 +512,5,scan,scan_cpu_non_pow2,0.0003,0.3 +512,5,scan,scan_naive_pow2,0.325632,325.632 +512,5,scan,scan_naive_non_pow2,0.23552,235.52 +512,5,scan,scan_work_efficient_pow2,0.099328,99.328 +512,5,scan,scan_work_efficient_non_pow2,0.017408,17.408 +512,5,scan,scan_thrust_pow2,0.124928,124.928 +512,5,scan,scan_thrust_non_pow2,0.047168,47.168 +512,5,compact,compact_cpu_without_scan_pow2,0.0012,1.2 +512,5,compact,compact_cpu_without_scan_non_pow2,0.0006,0.6 +512,5,compact,compact_cpu_with_scan,0.0015,1.5 +512,5,compact,compact_work_efficient_pow2,0.10752,107.52000000000001 +512,5,compact,compact_work_efficient_non_pow2,0.089088,89.088 +512,6,scan,scan_cpu_pow2,0.0007,0.7 +512,6,scan,scan_cpu_non_pow2,0.0003,0.3 +512,6,scan,scan_naive_pow2,0.27648,276.48 +512,6,scan,scan_naive_non_pow2,0.253952,253.952 +512,6,scan,scan_work_efficient_pow2,0.099328,99.328 +512,6,scan,scan_work_efficient_non_pow2,0.033792,33.792 +512,6,scan,scan_thrust_pow2,0.152576,152.576 +512,6,scan,scan_thrust_non_pow2,0.043104,43.104000000000006 +512,6,compact,compact_cpu_without_scan_pow2,0.0015,1.5 +512,6,compact,compact_cpu_without_scan_non_pow2,0.0007,0.7 +512,6,compact,compact_cpu_with_scan,0.0019,1.9 +512,6,compact,compact_work_efficient_pow2,0.146432,146.43200000000002 +512,6,compact,compact_work_efficient_non_pow2,0.0512,51.2 +512,7,scan,scan_cpu_pow2,0.0007,0.7 +512,7,scan,scan_cpu_non_pow2,0.0003,0.3 +512,7,scan,scan_naive_pow2,1.02605,1026.05 +512,7,scan,scan_naive_non_pow2,0.19968,199.68 +512,7,scan,scan_work_efficient_pow2,0.090112,90.112 +512,7,scan,scan_work_efficient_non_pow2,0.017408,17.408 +512,7,scan,scan_thrust_pow2,0.126976,126.976 +512,7,scan,scan_thrust_non_pow2,0.045056,45.056 +512,7,compact,compact_cpu_without_scan_pow2,0.0013,1.3 +512,7,compact,compact_cpu_without_scan_non_pow2,0.0005,0.5 +512,7,compact,compact_cpu_with_scan,0.0015,1.5 +512,7,compact,compact_work_efficient_pow2,0.140288,140.28799999999998 +512,7,compact,compact_work_efficient_non_pow2,0.083968,83.968 +512,8,scan,scan_cpu_pow2,0.0007,0.7 +512,8,scan,scan_cpu_non_pow2,0.0003,0.3 +512,8,scan,scan_naive_pow2,0.32256,322.56 +512,8,scan,scan_naive_non_pow2,0.200704,200.704 +512,8,scan,scan_work_efficient_pow2,0.059392,59.392 +512,8,scan,scan_work_efficient_non_pow2,0.016384,16.384 +512,8,scan,scan_thrust_pow2,0.142336,142.33599999999998 +512,8,scan,scan_thrust_non_pow2,0.04608,46.080000000000005 +512,8,compact,compact_cpu_without_scan_pow2,0.0015,1.5 +512,8,compact,compact_cpu_without_scan_non_pow2,0.0006,0.6 +512,8,compact,compact_cpu_with_scan,0.0019,1.9 +512,8,compact,compact_work_efficient_pow2,0.144384,144.38400000000001 +512,8,compact,compact_work_efficient_non_pow2,0.050176,50.176 +512,9,scan,scan_cpu_pow2,0.0005,0.5 +512,9,scan,scan_cpu_non_pow2,0.0003,0.3 +512,9,scan,scan_naive_pow2,0.26112,261.12 +512,9,scan,scan_naive_non_pow2,0.226304,226.304 +512,9,scan,scan_work_efficient_pow2,0.075776,75.776 +512,9,scan,scan_work_efficient_non_pow2,0.016384,16.384 +512,9,scan,scan_thrust_pow2,0.119808,119.80799999999999 +512,9,scan,scan_thrust_non_pow2,0.045056,45.056 +512,9,compact,compact_cpu_without_scan_pow2,0.0013,1.3 +512,9,compact,compact_cpu_without_scan_non_pow2,0.0005,0.5 +512,9,compact,compact_cpu_with_scan,0.0016,1.6 +512,9,compact,compact_work_efficient_pow2,0.137216,137.216 +512,9,compact,compact_work_efficient_non_pow2,0.079872,79.872 +512,10,scan,scan_cpu_pow2,0.0008,0.8 +512,10,scan,scan_cpu_non_pow2,0.0004,0.4 +512,10,scan,scan_naive_pow2,0.351232,351.23199999999997 +512,10,scan,scan_naive_non_pow2,0.232448,232.44799999999998 +512,10,scan,scan_work_efficient_pow2,0.069632,69.632 +512,10,scan,scan_work_efficient_non_pow2,0.023552,23.552 +512,10,scan,scan_thrust_pow2,0.114688,114.688 +512,10,scan,scan_thrust_non_pow2,0.047104,47.104 +512,10,compact,compact_cpu_without_scan_pow2,0.0013,1.3 +512,10,compact,compact_cpu_without_scan_non_pow2,0.0006,0.6 +512,10,compact,compact_cpu_with_scan,0.0016,1.6 +512,10,compact,compact_work_efficient_pow2,0.115712,115.71199999999999 +512,10,compact,compact_work_efficient_non_pow2,0.059392,59.392 +256,1,scan,scan_cpu_pow2,0.0005,0.5 +256,1,scan,scan_cpu_non_pow2,0.0002,0.2 +256,1,scan,scan_naive_pow2,0.24576,245.76000000000002 +256,1,scan,scan_naive_non_pow2,0.191488,191.488 +256,1,scan,scan_work_efficient_pow2,0.09216,92.16000000000001 +256,1,scan,scan_work_efficient_non_pow2,0.016384,16.384 +256,1,scan,scan_thrust_pow2,0.105472,105.472 +256,1,scan,scan_thrust_non_pow2,0.067584,67.584 +256,1,compact,compact_cpu_without_scan_pow2,0.0007,0.7 +256,1,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,1,compact,compact_cpu_with_scan,0.0009,0.9 +256,1,compact,compact_work_efficient_pow2,0.144384,144.38400000000001 +256,1,compact,compact_work_efficient_non_pow2,0.08192,81.92 +256,2,scan,scan_cpu_pow2,0.0004,0.4 +256,2,scan,scan_cpu_non_pow2,0.0002,0.2 +256,2,scan,scan_naive_pow2,0.25088,250.88 +256,2,scan,scan_naive_non_pow2,0.12288,122.88000000000001 +256,2,scan,scan_work_efficient_pow2,0.0768,76.8 +256,2,scan,scan_work_efficient_non_pow2,0.017408,17.408 +256,2,scan,scan_thrust_pow2,0.106496,106.496 +256,2,scan,scan_thrust_non_pow2,0.074752,74.752 +256,2,compact,compact_cpu_without_scan_pow2,0.0006,0.6 +256,2,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,2,compact,compact_cpu_with_scan,0.0008,0.8 +256,2,compact,compact_work_efficient_pow2,0.14336,143.35999999999999 +256,2,compact,compact_work_efficient_non_pow2,0.050176,50.176 +256,3,scan,scan_cpu_pow2,0.0008,0.8 +256,3,scan,scan_cpu_non_pow2,0.0002,0.2 +256,3,scan,scan_naive_pow2,0.316416,316.416 +256,3,scan,scan_naive_non_pow2,0.229376,229.376 +256,3,scan,scan_work_efficient_pow2,0.078848,78.848 +256,3,scan,scan_work_efficient_non_pow2,0.0256,25.6 +256,3,scan,scan_thrust_pow2,0.125952,125.95200000000001 +256,3,scan,scan_thrust_non_pow2,0.034816,34.816 +256,3,compact,compact_cpu_without_scan_pow2,0.0008,0.8 +256,3,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,3,compact,compact_cpu_with_scan,0.0008,0.8 +256,3,compact,compact_work_efficient_pow2,0.142336,142.33599999999998 +256,3,compact,compact_work_efficient_non_pow2,0.083968,83.968 +256,4,scan,scan_cpu_pow2,0.0004,0.4 +256,4,scan,scan_cpu_non_pow2,0.0001,0.1 +256,4,scan,scan_naive_pow2,0.239616,239.61599999999999 +256,4,scan,scan_naive_non_pow2,0.181248,181.248 +256,4,scan,scan_work_efficient_pow2,0.079872,79.872 +256,4,scan,scan_work_efficient_non_pow2,0.016384,16.384 +256,4,scan,scan_thrust_pow2,0.136192,136.192 +256,4,scan,scan_thrust_non_pow2,0.045056,45.056 +256,4,compact,compact_cpu_without_scan_pow2,0.0007,0.7 +256,4,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,4,compact,compact_cpu_with_scan,0.0009,0.9 +256,4,compact,compact_work_efficient_pow2,0.139264,139.264 +256,4,compact,compact_work_efficient_non_pow2,0.082944,82.944 +256,5,scan,scan_cpu_pow2,0.0004,0.4 +256,5,scan,scan_cpu_non_pow2,0.0002,0.2 +256,5,scan,scan_naive_pow2,0.335872,335.872 +256,5,scan,scan_naive_non_pow2,0.183296,183.296 +256,5,scan,scan_work_efficient_pow2,0.091136,91.136 +256,5,scan,scan_work_efficient_non_pow2,0.016384,16.384 +256,5,scan,scan_thrust_pow2,0.125952,125.95200000000001 +256,5,scan,scan_thrust_non_pow2,0.043008,43.007999999999996 +256,5,compact,compact_cpu_without_scan_pow2,0.0007,0.7 +256,5,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,5,compact,compact_cpu_with_scan,0.0007,0.7 +256,5,compact,compact_work_efficient_pow2,0.195584,195.584 +256,5,compact,compact_work_efficient_non_pow2,0.065536,65.536 +256,6,scan,scan_cpu_pow2,0.0005,0.5 +256,6,scan,scan_cpu_non_pow2,0.0003,0.3 +256,6,scan,scan_naive_pow2,0.33792,337.92 +256,6,scan,scan_naive_non_pow2,0.150528,150.528 +256,6,scan,scan_work_efficient_pow2,0.083968,83.968 +256,6,scan,scan_work_efficient_non_pow2,0.023552,23.552 +256,6,scan,scan_thrust_pow2,0.10752,107.52000000000001 +256,6,scan,scan_thrust_non_pow2,0.045056,45.056 +256,6,compact,compact_cpu_without_scan_pow2,0.0007,0.7 +256,6,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,6,compact,compact_cpu_with_scan,0.0008,0.8 +256,6,compact,compact_work_efficient_pow2,0.107616,107.616 +256,6,compact,compact_work_efficient_non_pow2,0.050176,50.176 +256,7,scan,scan_cpu_pow2,0.0004,0.4 +256,7,scan,scan_cpu_non_pow2,0.0001,0.1 +256,7,scan,scan_naive_pow2,0.257024,257.024 +256,7,scan,scan_naive_non_pow2,0.1536,153.6 +256,7,scan,scan_work_efficient_pow2,0.062464,62.464 +256,7,scan,scan_work_efficient_non_pow2,0.023552,23.552 +256,7,scan,scan_thrust_pow2,0.126976,126.976 +256,7,scan,scan_thrust_non_pow2,0.043008,43.007999999999996 +256,7,compact,compact_cpu_without_scan_pow2,0.0009,0.9 +256,7,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,7,compact,compact_cpu_with_scan,0.0011,1.1 +256,7,compact,compact_work_efficient_pow2,0.166912,166.912 +256,7,compact,compact_work_efficient_non_pow2,0.058368,58.368 +256,8,scan,scan_cpu_pow2,0.0006,0.6 +256,8,scan,scan_cpu_non_pow2,0.0002,0.2 +256,8,scan,scan_naive_pow2,0.3072,307.2 +256,8,scan,scan_naive_non_pow2,0.151552,151.552 +256,8,scan,scan_work_efficient_pow2,0.095232,95.232 +256,8,scan,scan_work_efficient_non_pow2,0.023552,23.552 +256,8,scan,scan_thrust_pow2,0.128,128.0 +256,8,scan,scan_thrust_non_pow2,0.026624,26.624 +256,8,compact,compact_cpu_without_scan_pow2,0.0006,0.6 +256,8,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,8,compact,compact_cpu_with_scan,0.0008,0.8 +256,8,compact,compact_work_efficient_pow2,0.140288,140.28799999999998 +256,8,compact,compact_work_efficient_non_pow2,0.082944,82.944 +256,9,scan,scan_cpu_pow2,0.0005,0.5 +256,9,scan,scan_cpu_non_pow2,0.0003,0.3 +256,9,scan,scan_naive_pow2,0.333824,333.824 +256,9,scan,scan_naive_non_pow2,0.175104,175.104 +256,9,scan,scan_work_efficient_pow2,0.093184,93.184 +256,9,scan,scan_work_efficient_non_pow2,0.016384,16.384 +256,9,scan,scan_thrust_pow2,0.121856,121.85600000000001 +256,9,scan,scan_thrust_non_pow2,0.045056,45.056 +256,9,compact,compact_cpu_without_scan_pow2,0.0007,0.7 +256,9,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,9,compact,compact_cpu_with_scan,0.0007,0.7 +256,9,compact,compact_work_efficient_pow2,0.166912,166.912 +256,9,compact,compact_work_efficient_non_pow2,0.049152,49.152 +256,10,scan,scan_cpu_pow2,0.0004,0.4 +256,10,scan,scan_cpu_non_pow2,0.0002,0.2 +256,10,scan,scan_naive_pow2,0.241664,241.664 +256,10,scan,scan_naive_non_pow2,0.186432,186.432 +256,10,scan,scan_work_efficient_pow2,0.063488,63.488 +256,10,scan,scan_work_efficient_non_pow2,0.016384,16.384 +256,10,scan,scan_thrust_pow2,0.129024,129.024 +256,10,scan,scan_thrust_non_pow2,0.077824,77.824 +256,10,compact,compact_cpu_without_scan_pow2,0.0007,0.7 +256,10,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,10,compact,compact_cpu_with_scan,0.0008,0.8 +256,10,compact,compact_work_efficient_pow2,0.147456,147.45600000000002 +256,10,compact,compact_work_efficient_non_pow2,0.049152,49.152 +2097152,1,scan,scan_cpu_pow2,1.0637,1063.7 +2097152,1,scan,scan_cpu_non_pow2,1.0598,1059.8000000000002 +2097152,1,scan,scan_naive_pow2,1.79869,1798.6899999999998 +2097152,1,scan,scan_naive_non_pow2,1.64061,1640.61 +2097152,1,scan,scan_work_efficient_pow2,0.774624,774.624 +2097152,1,scan,scan_work_efficient_non_pow2,0.549888,549.888 +2097152,1,scan,scan_thrust_pow2,0.57856,578.56 +2097152,1,scan,scan_thrust_non_pow2,0.699392,699.392 +2097152,1,compact,compact_cpu_without_scan_pow2,3.3956,3395.6 +2097152,1,compact,compact_cpu_without_scan_non_pow2,3.5848,3584.8 +2097152,1,compact,compact_cpu_with_scan,6.3465,6346.5 +2097152,1,compact,compact_work_efficient_pow2,1.32294,1322.94 +2097152,1,compact,compact_work_efficient_non_pow2,1.38957,1389.57 +2097152,2,scan,scan_cpu_pow2,1.2786,1278.6 +2097152,2,scan,scan_cpu_non_pow2,1.1046,1104.6000000000001 +2097152,2,scan,scan_naive_pow2,1.71523,1715.23 +2097152,2,scan,scan_naive_non_pow2,2.27069,2270.69 +2097152,2,scan,scan_work_efficient_pow2,0.728864,728.8639999999999 +2097152,2,scan,scan_work_efficient_non_pow2,0.556032,556.0319999999999 +2097152,2,scan,scan_thrust_pow2,0.55296,552.96 +2097152,2,scan,scan_thrust_non_pow2,0.551936,551.936 +2097152,2,compact,compact_cpu_without_scan_pow2,3.534,3534.0 +2097152,2,compact,compact_cpu_without_scan_non_pow2,3.5509,3550.9 +2097152,2,compact,compact_cpu_with_scan,6.0547,6054.700000000001 +2097152,2,compact,compact_work_efficient_pow2,1.76403,1764.03 +2097152,2,compact,compact_work_efficient_non_pow2,1.08339,1083.39 +2097152,3,scan,scan_cpu_pow2,1.0387,1038.7 +2097152,3,scan,scan_cpu_non_pow2,1.0955,1095.5 +2097152,3,scan,scan_naive_pow2,1.73178,1731.7800000000002 +2097152,3,scan,scan_naive_non_pow2,2.14938,2149.3799999999997 +2097152,3,scan,scan_work_efficient_pow2,0.654048,654.048 +2097152,3,scan,scan_work_efficient_non_pow2,0.579584,579.584 +2097152,3,scan,scan_thrust_pow2,0.534528,534.528 +2097152,3,scan,scan_thrust_non_pow2,0.526336,526.336 +2097152,3,compact,compact_cpu_without_scan_pow2,3.5031,3503.1 +2097152,3,compact,compact_cpu_without_scan_non_pow2,3.5363,3536.3 +2097152,3,compact,compact_cpu_with_scan,6.1338,6133.8 +2097152,3,compact,compact_work_efficient_pow2,1.47238,1472.38 +2097152,3,compact,compact_work_efficient_non_pow2,1.04346,1043.46 +2097152,4,scan,scan_cpu_pow2,1.3694,1369.3999999999999 +2097152,4,scan,scan_cpu_non_pow2,1.0494,1049.4 +2097152,4,scan,scan_naive_pow2,1.81363,1813.63 +2097152,4,scan,scan_naive_non_pow2,1.59533,1595.33 +2097152,4,scan,scan_work_efficient_pow2,1.10499,1104.99 +2097152,4,scan,scan_work_efficient_non_pow2,0.509952,509.95199999999994 +2097152,4,scan,scan_thrust_pow2,0.583648,583.6479999999999 +2097152,4,scan,scan_thrust_non_pow2,0.676864,676.864 +2097152,4,compact,compact_cpu_without_scan_pow2,3.4984,3498.4 +2097152,4,compact,compact_cpu_without_scan_non_pow2,3.3388,3338.8 +2097152,4,compact,compact_cpu_with_scan,6.397,6397.0 +2097152,4,compact,compact_work_efficient_pow2,1.29149,1291.49 +2097152,4,compact,compact_work_efficient_non_pow2,1.06496,1064.9599999999998 +2097152,5,scan,scan_cpu_pow2,1.028,1028.0 +2097152,5,scan,scan_cpu_non_pow2,0.9806,980.6 +2097152,5,scan,scan_naive_pow2,1.79773,1797.73 +2097152,5,scan,scan_naive_non_pow2,1.59802,1598.02 +2097152,5,scan,scan_work_efficient_pow2,0.725536,725.536 +2097152,5,scan,scan_work_efficient_non_pow2,0.507904,507.904 +2097152,5,scan,scan_thrust_pow2,0.566272,566.272 +2097152,5,scan,scan_thrust_non_pow2,0.748544,748.544 +2097152,5,compact,compact_cpu_without_scan_pow2,3.5767,3576.7000000000003 +2097152,5,compact,compact_cpu_without_scan_non_pow2,3.5661,3566.1 +2097152,5,compact,compact_cpu_with_scan,6.0808,6080.8 +2097152,5,compact,compact_work_efficient_pow2,1.38314,1383.14 +2097152,5,compact,compact_work_efficient_non_pow2,0.920576,920.5759999999999 +2097152,6,scan,scan_cpu_pow2,1.1422,1142.2 +2097152,6,scan,scan_cpu_non_pow2,1.0384,1038.4 +2097152,6,scan,scan_naive_pow2,2.77859,2778.5899999999997 +2097152,6,scan,scan_naive_non_pow2,1.58848,1588.4799999999998 +2097152,6,scan,scan_work_efficient_pow2,0.738944,738.9440000000001 +2097152,6,scan,scan_work_efficient_non_pow2,0.504832,504.83199999999994 +2097152,6,scan,scan_thrust_pow2,0.528384,528.384 +2097152,6,scan,scan_thrust_non_pow2,0.698368,698.3679999999999 +2097152,6,compact,compact_cpu_without_scan_pow2,3.5595,3559.5 +2097152,6,compact,compact_cpu_without_scan_non_pow2,3.4713,3471.2999999999997 +2097152,6,compact,compact_cpu_with_scan,6.2921,6292.099999999999 +2097152,6,compact,compact_work_efficient_pow2,1.29184,1291.8400000000001 +2097152,6,compact,compact_work_efficient_non_pow2,0.965632,965.6320000000001 +2097152,7,scan,scan_cpu_pow2,1.0823,1082.3 +2097152,7,scan,scan_cpu_non_pow2,1.0753,1075.3 +2097152,7,scan,scan_naive_pow2,2.00064,2000.64 +2097152,7,scan,scan_naive_non_pow2,1.59347,1593.47 +2097152,7,scan,scan_work_efficient_pow2,0.93792,937.92 +2097152,7,scan,scan_work_efficient_non_pow2,0.497664,497.664 +2097152,7,scan,scan_thrust_pow2,0.538624,538.624 +2097152,7,scan,scan_thrust_non_pow2,1.16835,1168.35 +2097152,7,compact,compact_cpu_without_scan_pow2,3.5402,3540.2 +2097152,7,compact,compact_cpu_without_scan_non_pow2,3.5109,3510.9 +2097152,7,compact,compact_cpu_with_scan,6.1128,6112.8 +2097152,7,compact,compact_work_efficient_pow2,1.22554,1225.54 +2097152,7,compact,compact_work_efficient_non_pow2,0.991232,991.232 +2097152,8,scan,scan_cpu_pow2,1.1428,1142.8 +2097152,8,scan,scan_cpu_non_pow2,1.0534,1053.3999999999999 +2097152,8,scan,scan_naive_pow2,1.80038,1800.38 +2097152,8,scan,scan_naive_non_pow2,1.57702,1577.02 +2097152,8,scan,scan_work_efficient_pow2,1.27027,1270.27 +2097152,8,scan,scan_work_efficient_non_pow2,0.52224,522.24 +2097152,8,scan,scan_thrust_pow2,0.57856,578.56 +2097152,8,scan,scan_thrust_non_pow2,0.622592,622.592 +2097152,8,compact,compact_cpu_without_scan_pow2,3.5185,3518.5 +2097152,8,compact,compact_cpu_without_scan_non_pow2,3.3396,3339.6 +2097152,8,compact,compact_cpu_with_scan,6.7386,6738.6 +2097152,8,compact,compact_work_efficient_pow2,1.35978,1359.78 +2097152,8,compact,compact_work_efficient_non_pow2,1.12333,1123.33 +2097152,9,scan,scan_cpu_pow2,0.9588,958.8 +2097152,9,scan,scan_cpu_non_pow2,0.9766,976.6 +2097152,9,scan,scan_naive_pow2,1.78499,1784.99 +2097152,9,scan,scan_naive_non_pow2,2.28851,2288.51 +2097152,9,scan,scan_work_efficient_pow2,0.662272,662.2719999999999 +2097152,9,scan,scan_work_efficient_non_pow2,0.650272,650.2719999999999 +2097152,9,scan,scan_thrust_pow2,0.643072,643.072 +2097152,9,scan,scan_thrust_non_pow2,0.562176,562.176 +2097152,9,compact,compact_cpu_without_scan_pow2,3.5533,3553.3 +2097152,9,compact,compact_cpu_without_scan_non_pow2,3.5475,3547.5 +2097152,9,compact,compact_cpu_with_scan,6.1091,6109.099999999999 +2097152,9,compact,compact_work_efficient_pow2,1.47894,1478.9399999999998 +2097152,9,compact,compact_work_efficient_non_pow2,0.971776,971.776 +2097152,10,scan,scan_cpu_pow2,1.0233,1023.3000000000001 +2097152,10,scan,scan_cpu_non_pow2,0.9828,982.8 +2097152,10,scan,scan_naive_pow2,2.16378,2163.78 +2097152,10,scan,scan_naive_non_pow2,1.69619,1696.19 +2097152,10,scan,scan_work_efficient_pow2,0.733248,733.248 +2097152,10,scan,scan_work_efficient_non_pow2,0.596992,596.992 +2097152,10,scan,scan_thrust_pow2,0.548864,548.864 +2097152,10,scan,scan_thrust_non_pow2,0.559104,559.104 +2097152,10,compact,compact_cpu_without_scan_pow2,3.5766,3576.6 +2097152,10,compact,compact_cpu_without_scan_non_pow2,3.5256,3525.6 +2097152,10,compact,compact_cpu_with_scan,6.4474,6447.4 +2097152,10,compact,compact_work_efficient_pow2,1.3208,1320.8 +2097152,10,compact,compact_work_efficient_non_pow2,1.00352,1003.52 +4194304,1,scan,scan_cpu_pow2,2.18,2180.0 +4194304,1,scan,scan_cpu_non_pow2,2.2165,2216.5 +4194304,1,scan,scan_naive_pow2,3.2249,3224.9 +4194304,1,scan,scan_naive_non_pow2,3.21805,3218.0499999999997 +4194304,1,scan,scan_work_efficient_pow2,0.958336,958.336 +4194304,1,scan,scan_work_efficient_non_pow2,0.843776,843.776 +4194304,1,scan,scan_thrust_pow2,0.664576,664.5759999999999 +4194304,1,scan,scan_thrust_non_pow2,0.81408,814.08 +4194304,1,compact,compact_cpu_without_scan_pow2,7.2458,7245.8 +4194304,1,compact,compact_cpu_without_scan_non_pow2,7.1916,7191.6 +4194304,1,compact,compact_cpu_with_scan,12.6813,12681.300000000001 +4194304,1,compact,compact_work_efficient_pow2,1.58509,1585.0900000000001 +4194304,1,compact,compact_work_efficient_non_pow2,1.31994,1319.9399999999998 +4194304,2,scan,scan_cpu_pow2,1.9014,1901.4 +4194304,2,scan,scan_cpu_non_pow2,2.1376,2137.6 +4194304,2,scan,scan_naive_pow2,3.84598,3845.98 +4194304,2,scan,scan_naive_non_pow2,3.12982,3129.82 +4194304,2,scan,scan_work_efficient_pow2,1.05837,1058.3700000000001 +4194304,2,scan,scan_work_efficient_non_pow2,1.21754,1217.54 +4194304,2,scan,scan_thrust_pow2,0.663552,663.552 +4194304,2,scan,scan_thrust_non_pow2,0.720896,720.896 +4194304,2,compact,compact_cpu_without_scan_pow2,7.1029,7102.9 +4194304,2,compact,compact_cpu_without_scan_non_pow2,7.1129,7112.9 +4194304,2,compact,compact_cpu_with_scan,16.594,16594.0 +4194304,2,compact,compact_work_efficient_pow2,1.90944,1909.44 +4194304,2,compact,compact_work_efficient_non_pow2,1.29434,1294.3400000000001 +4194304,3,scan,scan_cpu_pow2,2.0985,2098.5 +4194304,3,scan,scan_cpu_non_pow2,2.1093,2109.3 +4194304,3,scan,scan_naive_pow2,3.19302,3193.02 +4194304,3,scan,scan_naive_non_pow2,3.23082,3230.82 +4194304,3,scan,scan_work_efficient_pow2,1.52906,1529.0600000000002 +4194304,3,scan,scan_work_efficient_non_pow2,0.804864,804.864 +4194304,3,scan,scan_thrust_pow2,0.673792,673.7919999999999 +4194304,3,scan,scan_thrust_non_pow2,0.6144,614.4 +4194304,3,compact,compact_cpu_without_scan_pow2,6.9448,6944.8 +4194304,3,compact,compact_cpu_without_scan_non_pow2,7.0111,7011.099999999999 +4194304,3,compact,compact_cpu_with_scan,12.0462,12046.2 +4194304,3,compact,compact_work_efficient_pow2,1.61491,1614.91 +4194304,3,compact,compact_work_efficient_non_pow2,1.29126,1291.26 +4194304,4,scan,scan_cpu_pow2,2.046,2045.9999999999998 +4194304,4,scan,scan_cpu_non_pow2,2.0965,2096.5 +4194304,4,scan,scan_naive_pow2,3.24842,3248.42 +4194304,4,scan,scan_naive_non_pow2,3.21526,3215.2599999999998 +4194304,4,scan,scan_work_efficient_pow2,1.00698,1006.98 +4194304,4,scan,scan_work_efficient_non_pow2,0.615424,615.424 +4194304,4,scan,scan_thrust_pow2,0.674816,674.8159999999999 +4194304,4,scan,scan_thrust_non_pow2,0.684032,684.0319999999999 +4194304,4,compact,compact_cpu_without_scan_pow2,6.86,6860.0 +4194304,4,compact,compact_cpu_without_scan_non_pow2,6.8863,6886.3 +4194304,4,compact,compact_cpu_with_scan,12.5549,12554.9 +4194304,4,compact,compact_work_efficient_pow2,1.55798,1557.98 +4194304,4,compact,compact_work_efficient_non_pow2,3.02592,3025.92 +4194304,5,scan,scan_cpu_pow2,2.2949,2294.9 +4194304,5,scan,scan_cpu_non_pow2,2.2026,2202.6 +4194304,5,scan,scan_naive_pow2,3.16701,3167.0099999999998 +4194304,5,scan,scan_naive_non_pow2,3.74506,3745.06 +4194304,5,scan,scan_work_efficient_pow2,0.973408,973.408 +4194304,5,scan,scan_work_efficient_non_pow2,0.591872,591.872 +4194304,5,scan,scan_thrust_pow2,0.638976,638.976 +4194304,5,scan,scan_thrust_non_pow2,0.617472,617.472 +4194304,5,compact,compact_cpu_without_scan_pow2,7.2153,7215.3 +4194304,5,compact,compact_cpu_without_scan_non_pow2,7.1366,7136.599999999999 +4194304,5,compact,compact_cpu_with_scan,12.6805,12680.5 +4194304,5,compact,compact_work_efficient_pow2,1.60208,1602.08 +4194304,5,compact,compact_work_efficient_non_pow2,1.62918,1629.18 +4194304,6,scan,scan_cpu_pow2,2.4326,2432.6 +4194304,6,scan,scan_cpu_non_pow2,1.9851,1985.1000000000001 +4194304,6,scan,scan_naive_pow2,3.23459,3234.5899999999997 +4194304,6,scan,scan_naive_non_pow2,3.22314,3223.14 +4194304,6,scan,scan_work_efficient_pow2,1.11504,1115.04 +4194304,6,scan,scan_work_efficient_non_pow2,0.85504,855.0400000000001 +4194304,6,scan,scan_thrust_pow2,0.657408,657.408 +4194304,6,scan,scan_thrust_non_pow2,1.07827,1078.27 +4194304,6,compact,compact_cpu_without_scan_pow2,7.1045,7104.5 +4194304,6,compact,compact_cpu_without_scan_non_pow2,7.126,7126.0 +4194304,6,compact,compact_cpu_with_scan,12.3457,12345.7 +4194304,6,compact,compact_work_efficient_pow2,1.57347,1573.47 +4194304,6,compact,compact_work_efficient_non_pow2,1.2288,1228.8 +4194304,7,scan,scan_cpu_pow2,1.9245,1924.5 +4194304,7,scan,scan_cpu_non_pow2,2.0409,2040.9 +4194304,7,scan,scan_naive_pow2,3.87283,3872.83 +4194304,7,scan,scan_naive_non_pow2,3.12003,3120.0299999999997 +4194304,7,scan,scan_work_efficient_pow2,0.97856,978.56 +4194304,7,scan,scan_work_efficient_non_pow2,1.51757,1517.5700000000002 +4194304,7,scan,scan_thrust_pow2,0.6912,691.2 +4194304,7,scan,scan_thrust_non_pow2,0.705536,705.5360000000001 +4194304,7,compact,compact_cpu_without_scan_pow2,6.9868,6986.799999999999 +4194304,7,compact,compact_cpu_without_scan_non_pow2,7.057,7057.0 +4194304,7,compact,compact_cpu_with_scan,12.3065,12306.5 +4194304,7,compact,compact_work_efficient_pow2,1.68016,1680.16 +4194304,7,compact,compact_work_efficient_non_pow2,1.33222,1332.22 +4194304,8,scan,scan_cpu_pow2,2.0329,2032.9 +4194304,8,scan,scan_cpu_non_pow2,2.3725,2372.5 +4194304,8,scan,scan_naive_pow2,3.29904,3299.0400000000004 +4194304,8,scan,scan_naive_non_pow2,3.73277,3732.77 +4194304,8,scan,scan_work_efficient_pow2,0.882048,882.048 +4194304,8,scan,scan_work_efficient_non_pow2,0.758784,758.784 +4194304,8,scan,scan_thrust_pow2,0.75264,752.64 +4194304,8,scan,scan_thrust_non_pow2,0.8048,804.8 +4194304,8,compact,compact_cpu_without_scan_pow2,7.2324,7232.400000000001 +4194304,8,compact,compact_cpu_without_scan_non_pow2,7.15,7150.0 +4194304,8,compact,compact_cpu_with_scan,12.6406,12640.599999999999 +4194304,8,compact,compact_work_efficient_pow2,1.60282,1602.82 +4194304,8,compact,compact_work_efficient_non_pow2,1.28819,1288.19 +4194304,9,scan,scan_cpu_pow2,2.2467,2246.7000000000003 +4194304,9,scan,scan_cpu_non_pow2,2.1563,2156.2999999999997 +4194304,9,scan,scan_naive_pow2,3.71978,3719.78 +4194304,9,scan,scan_naive_non_pow2,3.25091,3250.9100000000003 +4194304,9,scan,scan_work_efficient_pow2,1.0224,1022.4 +4194304,9,scan,scan_work_efficient_non_pow2,0.86528,865.2800000000001 +4194304,9,scan,scan_thrust_pow2,0.693248,693.2479999999999 +4194304,9,scan,scan_thrust_non_pow2,0.731136,731.136 +4194304,9,compact,compact_cpu_without_scan_pow2,7.0871,7087.1 +4194304,9,compact,compact_cpu_without_scan_non_pow2,6.7846,6784.6 +4194304,9,compact,compact_cpu_with_scan,13.0226,13022.6 +4194304,9,compact,compact_work_efficient_pow2,1.77075,1770.75 +4194304,9,compact,compact_work_efficient_non_pow2,1.31174,1311.74 +4194304,10,scan,scan_cpu_pow2,1.9668,1966.8000000000002 +4194304,10,scan,scan_cpu_non_pow2,2.0215,2021.5 +4194304,10,scan,scan_naive_pow2,3.80122,3801.22 +4194304,10,scan,scan_naive_non_pow2,3.23866,3238.66 +4194304,10,scan,scan_work_efficient_pow2,0.9808,980.8 +4194304,10,scan,scan_work_efficient_non_pow2,0.723968,723.968 +4194304,10,scan,scan_thrust_pow2,0.596992,596.992 +4194304,10,scan,scan_thrust_non_pow2,0.801792,801.7919999999999 +4194304,10,compact,compact_cpu_without_scan_pow2,7.4832,7483.2 +4194304,10,compact,compact_cpu_without_scan_non_pow2,7.1112,7111.2 +4194304,10,compact,compact_cpu_with_scan,12.6838,12683.8 +4194304,10,compact,compact_work_efficient_pow2,1.99706,1997.06 +4194304,10,compact,compact_work_efficient_non_pow2,1.28,1280.0 +8388608,1,scan,scan_cpu_pow2,4.1648,4164.799999999999 +8388608,1,scan,scan_cpu_non_pow2,4.3179,4317.9 +8388608,1,scan,scan_naive_pow2,6.95533,6955.33 +8388608,1,scan,scan_naive_non_pow2,6.41062,6410.62 +8388608,1,scan,scan_work_efficient_pow2,1.2665,1266.5 +8388608,1,scan,scan_work_efficient_non_pow2,0.978944,978.9440000000001 +8388608,1,scan,scan_thrust_pow2,0.881664,881.664 +8388608,1,scan,scan_thrust_non_pow2,0.887808,887.808 +8388608,1,compact,compact_cpu_without_scan_pow2,14.5249,14524.900000000001 +8388608,1,compact,compact_cpu_without_scan_non_pow2,14.1581,14158.099999999999 +8388608,1,compact,compact_cpu_with_scan,24.7806,24780.6 +8388608,1,compact,compact_work_efficient_pow2,2.1585,2158.5 +8388608,1,compact,compact_work_efficient_non_pow2,1.93437,1934.37 +8388608,2,scan,scan_cpu_pow2,4.0626,4062.6 +8388608,2,scan,scan_cpu_non_pow2,4.1356,4135.6 +8388608,2,scan,scan_naive_pow2,6.8207,6820.700000000001 +8388608,2,scan,scan_naive_non_pow2,6.13306,6133.06 +8388608,2,scan,scan_work_efficient_pow2,1.39952,1399.5200000000002 +8388608,2,scan,scan_work_efficient_non_pow2,0.828416,828.416 +8388608,2,scan,scan_thrust_pow2,0.7936,793.6 +8388608,2,scan,scan_thrust_non_pow2,0.923648,923.648 +8388608,2,compact,compact_cpu_without_scan_pow2,14.2262,14226.2 +8388608,2,compact,compact_cpu_without_scan_non_pow2,13.6921,13692.1 +8388608,2,compact,compact_cpu_with_scan,24.4571,24457.100000000002 +8388608,2,compact,compact_work_efficient_pow2,2.33824,2338.24 +8388608,2,compact,compact_work_efficient_non_pow2,1.80736,1807.3600000000001 +8388608,3,scan,scan_cpu_pow2,3.8734,3873.4 +8388608,3,scan,scan_cpu_non_pow2,4.1889,4188.900000000001 +8388608,3,scan,scan_naive_pow2,6.42112,6421.12 +8388608,3,scan,scan_naive_non_pow2,6.72486,6724.86 +8388608,3,scan,scan_work_efficient_pow2,1.51571,1515.7099999999998 +8388608,3,scan,scan_work_efficient_non_pow2,1.07008,1070.08 +8388608,3,scan,scan_thrust_pow2,1.02605,1026.05 +8388608,3,scan,scan_thrust_non_pow2,0.935936,935.936 +8388608,3,compact,compact_cpu_without_scan_pow2,14.1611,14161.099999999999 +8388608,3,compact,compact_cpu_without_scan_non_pow2,14.0538,14053.800000000001 +8388608,3,compact,compact_cpu_with_scan,24.7208,24720.8 +8388608,3,compact,compact_work_efficient_pow2,2.23578,2235.78 +8388608,3,compact,compact_work_efficient_non_pow2,1.94662,1946.6200000000001 +8388608,4,scan,scan_cpu_pow2,3.895,3895.0 +8388608,4,scan,scan_cpu_non_pow2,4.454,4454.0 +8388608,4,scan,scan_naive_pow2,7.00374,7003.74 +8388608,4,scan,scan_naive_non_pow2,6.92333,6923.33 +8388608,4,scan,scan_work_efficient_pow2,1.72573,1725.73 +8388608,4,scan,scan_work_efficient_non_pow2,1.01786,1017.86 +8388608,4,scan,scan_thrust_pow2,0.949248,949.2479999999999 +8388608,4,scan,scan_thrust_non_pow2,0.948288,948.288 +8388608,4,compact,compact_cpu_without_scan_pow2,14.1115,14111.5 +8388608,4,compact,compact_cpu_without_scan_non_pow2,13.7728,13772.8 +8388608,4,compact,compact_cpu_with_scan,25.3407,25340.699999999997 +8388608,4,compact,compact_work_efficient_pow2,2.14582,2145.82 +8388608,4,compact,compact_work_efficient_non_pow2,1.92,1920.0 +8388608,5,scan,scan_cpu_pow2,4.3684,4368.400000000001 +8388608,5,scan,scan_cpu_non_pow2,3.8767,3876.7 +8388608,5,scan,scan_naive_pow2,6.36963,6369.63 +8388608,5,scan,scan_naive_non_pow2,6.23763,6237.63 +8388608,5,scan,scan_work_efficient_pow2,1.98704,1987.04 +8388608,5,scan,scan_work_efficient_non_pow2,1.13664,1136.64 +8388608,5,scan,scan_thrust_pow2,0.858112,858.112 +8388608,5,scan,scan_thrust_non_pow2,0.980992,980.992 +8388608,5,compact,compact_cpu_without_scan_pow2,14.0435,14043.5 +8388608,5,compact,compact_cpu_without_scan_non_pow2,14.336,14336.0 +8388608,5,compact,compact_cpu_with_scan,24.99,24990.0 +8388608,5,compact,compact_work_efficient_pow2,2.25568,2255.68 +8388608,5,compact,compact_work_efficient_non_pow2,1.9712,1971.2 +8388608,6,scan,scan_cpu_pow2,4.5514,4551.400000000001 +8388608,6,scan,scan_cpu_non_pow2,4.1218,4121.8 +8388608,6,scan,scan_naive_pow2,6.35981,6359.81 +8388608,6,scan,scan_naive_non_pow2,6.27021,6270.21 +8388608,6,scan,scan_work_efficient_pow2,1.64166,1641.6599999999999 +8388608,6,scan,scan_work_efficient_non_pow2,1.11002,1110.02 +8388608,6,scan,scan_thrust_pow2,0.843776,843.776 +8388608,6,scan,scan_thrust_non_pow2,0.896,896.0 +8388608,6,compact,compact_cpu_without_scan_pow2,14.28,14280.0 +8388608,6,compact,compact_cpu_without_scan_non_pow2,14.097,14097.0 +8388608,6,compact,compact_cpu_with_scan,25.9096,25909.600000000002 +8388608,6,compact,compact_work_efficient_pow2,2.17619,2176.19 +8388608,6,compact,compact_work_efficient_non_pow2,1.95072,1950.72 +8388608,7,scan,scan_cpu_pow2,4.0255,4025.5 +8388608,7,scan,scan_cpu_non_pow2,4.1788,4178.8 +8388608,7,scan,scan_naive_pow2,6.96211,6962.11 +8388608,7,scan,scan_naive_non_pow2,6.14246,6142.46 +8388608,7,scan,scan_work_efficient_pow2,1.6569,1656.9 +8388608,7,scan,scan_work_efficient_non_pow2,1.00352,1003.52 +8388608,7,scan,scan_thrust_pow2,1.08339,1083.39 +8388608,7,scan,scan_thrust_non_pow2,0.902176,902.1759999999999 +8388608,7,compact,compact_cpu_without_scan_pow2,14.0148,14014.8 +8388608,7,compact,compact_cpu_without_scan_non_pow2,13.8375,13837.5 +8388608,7,compact,compact_cpu_with_scan,25.2176,25217.600000000002 +8388608,7,compact,compact_work_efficient_pow2,2.23869,2238.69 +8388608,7,compact,compact_work_efficient_non_pow2,1.92307,1923.0700000000002 +8388608,8,scan,scan_cpu_pow2,4.5373,4537.3 +8388608,8,scan,scan_cpu_non_pow2,4.2575,4257.5 +8388608,8,scan,scan_naive_pow2,7.18739,7187.389999999999 +8388608,8,scan,scan_naive_non_pow2,6.75066,6750.66 +8388608,8,scan,scan_work_efficient_pow2,1.49405,1494.0500000000002 +8388608,8,scan,scan_work_efficient_non_pow2,1.07725,1077.25 +8388608,8,scan,scan_thrust_pow2,0.8704,870.4 +8388608,8,scan,scan_thrust_non_pow2,1.27795,1277.9499999999998 +8388608,8,compact,compact_cpu_without_scan_pow2,13.8218,13821.8 +8388608,8,compact,compact_cpu_without_scan_non_pow2,13.7692,13769.199999999999 +8388608,8,compact,compact_cpu_with_scan,25.3661,25366.1 +8388608,8,compact,compact_work_efficient_pow2,2.46467,2464.67 +8388608,8,compact,compact_work_efficient_non_pow2,1.99578,1995.7800000000002 +8388608,9,scan,scan_cpu_pow2,4.1012,4101.200000000001 +8388608,9,scan,scan_cpu_non_pow2,4.176,4176.0 +8388608,9,scan,scan_naive_pow2,6.85757,6857.57 +8388608,9,scan,scan_naive_non_pow2,6.13942,6139.42 +8388608,9,scan,scan_work_efficient_pow2,1.81488,1814.88 +8388608,9,scan,scan_work_efficient_non_pow2,1.23187,1231.8700000000001 +8388608,9,scan,scan_thrust_pow2,0.86016,860.1600000000001 +8388608,9,scan,scan_thrust_non_pow2,0.973824,973.8240000000001 +8388608,9,compact,compact_cpu_without_scan_pow2,13.9876,13987.6 +8388608,9,compact,compact_cpu_without_scan_non_pow2,13.8367,13836.7 +8388608,9,compact,compact_cpu_with_scan,25.9743,25974.3 +8388608,9,compact,compact_work_efficient_pow2,2.13539,2135.3900000000003 +8388608,9,compact,compact_work_efficient_non_pow2,1.87597,1875.97 +8388608,10,scan,scan_cpu_pow2,4.2186,4218.6 +8388608,10,scan,scan_cpu_non_pow2,4.3114,4311.4 +8388608,10,scan,scan_naive_pow2,6.96698,6966.9800000000005 +8388608,10,scan,scan_naive_non_pow2,6.19072,6190.719999999999 +8388608,10,scan,scan_work_efficient_pow2,1.7815,1781.5 +8388608,10,scan,scan_work_efficient_non_pow2,1.32608,1326.08 +8388608,10,scan,scan_thrust_pow2,0.854016,854.016 +8388608,10,scan,scan_thrust_non_pow2,0.940032,940.0319999999999 +8388608,10,compact,compact_cpu_without_scan_pow2,13.6961,13696.1 +8388608,10,compact,compact_cpu_without_scan_non_pow2,13.8071,13807.1 +8388608,10,compact,compact_cpu_with_scan,24.1105,24110.5 +8388608,10,compact,compact_work_efficient_pow2,2.30371,2303.71 +8388608,10,compact,compact_work_efficient_non_pow2,1.95584,1955.84 +16777216,1,scan,scan_cpu_pow2,8.8192,8819.2 +16777216,1,scan,scan_cpu_non_pow2,8.6034,8603.400000000001 +16777216,1,scan,scan_naive_pow2,13.0046,13004.6 +16777216,1,scan,scan_naive_non_pow2,12.4661,12466.1 +16777216,1,scan,scan_work_efficient_pow2,1.9721,1972.1 +16777216,1,scan,scan_work_efficient_non_pow2,1.86061,1860.6100000000001 +16777216,1,scan,scan_thrust_pow2,1.14483,1144.83 +16777216,1,scan,scan_thrust_non_pow2,1.11514,1115.14 +16777216,1,compact,compact_cpu_without_scan_pow2,27.9313,27931.3 +16777216,1,compact,compact_cpu_without_scan_non_pow2,27.4414,27441.4 +16777216,1,compact,compact_cpu_with_scan,50.0174,50017.4 +16777216,1,compact,compact_work_efficient_pow2,3.39184,3391.84 +16777216,1,compact,compact_work_efficient_non_pow2,3.04333,3043.33 +16777216,2,scan,scan_cpu_pow2,8.7636,8763.6 +16777216,2,scan,scan_cpu_non_pow2,7.9199,7919.900000000001 +16777216,2,scan,scan_naive_pow2,12.4361,12436.1 +16777216,2,scan,scan_naive_non_pow2,12.269,12269.0 +16777216,2,scan,scan_work_efficient_pow2,2.05738,2057.38 +16777216,2,scan,scan_work_efficient_non_pow2,1.71418,1714.18 +16777216,2,scan,scan_thrust_pow2,1.08237,1082.3700000000001 +16777216,2,scan,scan_thrust_non_pow2,1.10182,1101.82 +16777216,2,compact,compact_cpu_without_scan_pow2,27.696,27696.0 +16777216,2,compact,compact_cpu_without_scan_non_pow2,28.2018,28201.8 +16777216,2,compact,compact_cpu_with_scan,51.2568,51256.799999999996 +16777216,2,compact,compact_work_efficient_pow2,3.49302,3493.02 +16777216,2,compact,compact_work_efficient_non_pow2,2.94298,2942.98 +16777216,3,scan,scan_cpu_pow2,8.0591,8059.1 +16777216,3,scan,scan_cpu_non_pow2,8.1771,8177.099999999999 +16777216,3,scan,scan_naive_pow2,13.0196,13019.6 +16777216,3,scan,scan_naive_non_pow2,12.4272,12427.199999999999 +16777216,3,scan,scan_work_efficient_pow2,1.9687,1968.6999999999998 +16777216,3,scan,scan_work_efficient_non_pow2,1.97018,1970.18 +16777216,3,scan,scan_thrust_pow2,1.16838,1168.3799999999999 +16777216,3,scan,scan_thrust_non_pow2,1.09875,1098.75 +16777216,3,compact,compact_cpu_without_scan_pow2,28.2515,28251.5 +16777216,3,compact,compact_cpu_without_scan_non_pow2,28.9084,28908.4 +16777216,3,compact,compact_cpu_with_scan,49.4342,49434.2 +16777216,3,compact,compact_work_efficient_pow2,3.47747,3477.47 +16777216,3,compact,compact_work_efficient_non_pow2,3.17338,3173.3799999999997 +16777216,4,scan,scan_cpu_pow2,8.332,8332.0 +16777216,4,scan,scan_cpu_non_pow2,8.3742,8374.2 +16777216,4,scan,scan_naive_pow2,13.007,13007.0 +16777216,4,scan,scan_naive_non_pow2,12.8768,12876.8 +16777216,4,scan,scan_work_efficient_pow2,1.87334,1873.34 +16777216,4,scan,scan_work_efficient_non_pow2,1.87597,1875.97 +16777216,4,scan,scan_thrust_pow2,1.09978,1099.78 +16777216,4,scan,scan_thrust_non_pow2,1.0793,1079.3 +16777216,4,compact,compact_cpu_without_scan_pow2,28.3906,28390.6 +16777216,4,compact,compact_cpu_without_scan_non_pow2,27.9081,27908.100000000002 +16777216,4,compact,compact_cpu_with_scan,52.1602,52160.200000000004 +16777216,4,compact,compact_work_efficient_pow2,3.37898,3378.98 +16777216,4,compact,compact_work_efficient_non_pow2,3.36691,3366.91 +16777216,5,scan,scan_cpu_pow2,10.5132,10513.199999999999 +16777216,5,scan,scan_cpu_non_pow2,8.6552,8655.2 +16777216,5,scan,scan_naive_pow2,13.022,13022.0 +16777216,5,scan,scan_naive_non_pow2,12.322,12322.0 +16777216,5,scan,scan_work_efficient_pow2,1.91459,1914.59 +16777216,5,scan,scan_work_efficient_non_pow2,1.76435,1764.3500000000001 +16777216,5,scan,scan_thrust_pow2,1.13869,1138.69 +16777216,5,scan,scan_thrust_non_pow2,1.02605,1026.05 +16777216,5,compact,compact_cpu_without_scan_pow2,27.9204,27920.4 +16777216,5,compact,compact_cpu_without_scan_non_pow2,27.608,27608.0 +16777216,5,compact,compact_cpu_with_scan,49.5117,49511.7 +16777216,5,compact,compact_work_efficient_pow2,3.37341,3373.41 +16777216,5,compact,compact_work_efficient_non_pow2,3.10477,3104.77 +16777216,6,scan,scan_cpu_pow2,8.1987,8198.7 +16777216,6,scan,scan_cpu_non_pow2,8.6911,8691.1 +16777216,6,scan,scan_naive_pow2,13.3701,13370.1 +16777216,6,scan,scan_naive_non_pow2,12.4167,12416.7 +16777216,6,scan,scan_work_efficient_pow2,2.07562,2075.62 +16777216,6,scan,scan_work_efficient_non_pow2,1.84218,1842.1799999999998 +16777216,6,scan,scan_thrust_pow2,1.16736,1167.36 +16777216,6,scan,scan_thrust_non_pow2,1.16227,1162.27 +16777216,6,compact,compact_cpu_without_scan_pow2,27.908,27908.0 +16777216,6,compact,compact_cpu_without_scan_non_pow2,27.1709,27170.899999999998 +16777216,6,compact,compact_cpu_with_scan,50.3292,50329.2 +16777216,6,compact,compact_work_efficient_pow2,3.3727,3372.7 +16777216,6,compact,compact_work_efficient_non_pow2,3.21331,3213.31 +16777216,7,scan,scan_cpu_pow2,8.5866,8586.6 +16777216,7,scan,scan_cpu_non_pow2,8.5128,8512.800000000001 +16777216,7,scan,scan_naive_pow2,12.9977,12997.7 +16777216,7,scan,scan_naive_non_pow2,12.4143,12414.300000000001 +16777216,7,scan,scan_work_efficient_pow2,1.90534,1905.3400000000001 +16777216,7,scan,scan_work_efficient_non_pow2,1.72646,1726.46 +16777216,7,scan,scan_thrust_pow2,1.12333,1123.33 +16777216,7,scan,scan_thrust_non_pow2,1.09978,1099.78 +16777216,7,compact,compact_cpu_without_scan_pow2,28.0414,28041.399999999998 +16777216,7,compact,compact_cpu_without_scan_non_pow2,27.45,27450.0 +16777216,7,compact,compact_cpu_with_scan,52.7391,52739.1 +16777216,7,compact,compact_work_efficient_pow2,3.28806,3288.0600000000004 +16777216,7,compact,compact_work_efficient_non_pow2,3.01466,3014.6600000000003 +16777216,8,scan,scan_cpu_pow2,8.1571,8157.099999999999 +16777216,8,scan,scan_cpu_non_pow2,8.2681,8268.1 +16777216,8,scan,scan_naive_pow2,12.8949,12894.9 +16777216,8,scan,scan_naive_non_pow2,12.4012,12401.199999999999 +16777216,8,scan,scan_work_efficient_pow2,1.90502,1905.02 +16777216,8,scan,scan_work_efficient_non_pow2,1.72237,1722.37 +16777216,8,scan,scan_thrust_pow2,1.08544,1085.44 +16777216,8,scan,scan_thrust_non_pow2,1.08032,1080.32 +16777216,8,compact,compact_cpu_without_scan_pow2,27.5374,27537.4 +16777216,8,compact,compact_cpu_without_scan_non_pow2,28.1366,28136.600000000002 +16777216,8,compact,compact_cpu_with_scan,51.9063,51906.3 +16777216,8,compact,compact_work_efficient_pow2,3.4713,3471.2999999999997 +16777216,8,compact,compact_work_efficient_non_pow2,3.00858,3008.58 +16777216,9,scan,scan_cpu_pow2,8.5533,8553.3 +16777216,9,scan,scan_cpu_non_pow2,8.4548,8454.800000000001 +16777216,9,scan,scan_naive_pow2,13.0243,13024.300000000001 +16777216,9,scan,scan_naive_non_pow2,12.3937,12393.7 +16777216,9,scan,scan_work_efficient_pow2,1.87325,1873.25 +16777216,9,scan,scan_work_efficient_non_pow2,1.82682,1826.8200000000002 +16777216,9,scan,scan_thrust_pow2,1.05062,1050.6200000000001 +16777216,9,scan,scan_thrust_non_pow2,1.07418,1074.1799999999998 +16777216,9,compact,compact_cpu_without_scan_pow2,28.0757,28075.7 +16777216,9,compact,compact_cpu_without_scan_non_pow2,28.6645,28664.5 +16777216,9,compact,compact_cpu_with_scan,50.3038,50303.8 +16777216,9,compact,compact_work_efficient_pow2,3.29414,3294.14 +16777216,9,compact,compact_work_efficient_non_pow2,3.10682,3106.8199999999997 +16777216,10,scan,scan_cpu_pow2,8.3335,8333.5 +16777216,10,scan,scan_cpu_non_pow2,8.0696,8069.599999999999 +16777216,10,scan,scan_naive_pow2,14.0056,14005.599999999999 +16777216,10,scan,scan_naive_non_pow2,12.3258,12325.8 +16777216,10,scan,scan_work_efficient_pow2,2.14918,2149.18 +16777216,10,scan,scan_work_efficient_non_pow2,1.89747,1897.47 +16777216,10,scan,scan_thrust_pow2,1.08237,1082.3700000000001 +16777216,10,scan,scan_thrust_non_pow2,1.08749,1087.49 +16777216,10,compact,compact_cpu_without_scan_pow2,27.7936,27793.600000000002 +16777216,10,compact,compact_cpu_without_scan_non_pow2,28.9048,28904.800000000003 +16777216,10,compact,compact_cpu_with_scan,49.6402,49640.2 +16777216,10,compact,compact_work_efficient_pow2,3.37325,3373.25 +16777216,10,compact,compact_work_efficient_non_pow2,3.02387,3023.87 +33554432,1,scan,scan_cpu_pow2,17.1524,17152.4 +33554432,1,scan,scan_cpu_non_pow2,15.4388,15438.800000000001 +33554432,1,scan,scan_naive_pow2,25.4122,25412.199999999997 +33554432,1,scan,scan_naive_non_pow2,25.2767,25276.7 +33554432,1,scan,scan_work_efficient_pow2,3.368,3368.0 +33554432,1,scan,scan_work_efficient_non_pow2,3.0464,3046.4 +33554432,1,scan,scan_thrust_pow2,1.73869,1738.69 +33554432,1,scan,scan_thrust_non_pow2,1.64352,1643.52 +33554432,1,compact,compact_cpu_without_scan_pow2,57.015,57015.0 +33554432,1,compact,compact_cpu_without_scan_non_pow2,55.3494,55349.4 +33554432,1,compact,compact_cpu_with_scan,100.392,100392.0 +33554432,1,compact,compact_work_efficient_pow2,5.86621,5866.21 +33554432,1,compact,compact_work_efficient_non_pow2,5.53472,5534.72 +33554432,2,scan,scan_cpu_pow2,15.7645,15764.5 +33554432,2,scan,scan_cpu_non_pow2,16.3241,16324.100000000002 +33554432,2,scan,scan_naive_pow2,25.4285,25428.5 +33554432,2,scan,scan_naive_non_pow2,25.3756,25375.6 +33554432,2,scan,scan_work_efficient_pow2,3.17162,3171.62 +33554432,2,scan,scan_work_efficient_non_pow2,2.83853,2838.53 +33554432,2,scan,scan_thrust_pow2,1.64352,1643.52 +33554432,2,scan,scan_thrust_non_pow2,1.55034,1550.3400000000001 +33554432,2,compact,compact_cpu_without_scan_pow2,55.9716,55971.600000000006 +33554432,2,compact,compact_cpu_without_scan_non_pow2,56.3901,56390.1 +33554432,2,compact,compact_cpu_with_scan,100.645,100645.0 +33554432,2,compact,compact_work_efficient_pow2,5.6831,5683.099999999999 +33554432,2,compact,compact_work_efficient_non_pow2,5.45997,5459.97 +33554432,3,scan,scan_cpu_pow2,17.6581,17658.100000000002 +33554432,3,scan,scan_cpu_non_pow2,18.0834,18083.4 +33554432,3,scan,scan_naive_pow2,25.3882,25388.2 +33554432,3,scan,scan_naive_non_pow2,25.7363,25736.3 +33554432,3,scan,scan_work_efficient_pow2,3.02618,3026.1800000000003 +33554432,3,scan,scan_work_efficient_non_pow2,2.77094,2770.94 +33554432,3,scan,scan_thrust_pow2,1.65581,1655.81 +33554432,3,scan,scan_thrust_non_pow2,1.6169,1616.9 +33554432,3,compact,compact_cpu_without_scan_pow2,55.8455,55845.5 +33554432,3,compact,compact_cpu_without_scan_non_pow2,56.1403,56140.3 +33554432,3,compact,compact_cpu_with_scan,103.037,103037.0 +33554432,3,compact,compact_work_efficient_pow2,5.58765,5587.65 +33554432,3,compact,compact_work_efficient_non_pow2,5.44358,5443.58 +33554432,4,scan,scan_cpu_pow2,17.3758,17375.800000000003 +33554432,4,scan,scan_cpu_non_pow2,15.8788,15878.8 +33554432,4,scan,scan_naive_pow2,25.2473,25247.3 +33554432,4,scan,scan_naive_non_pow2,25.2294,25229.399999999998 +33554432,4,scan,scan_work_efficient_pow2,3.12957,3129.57 +33554432,4,scan,scan_work_efficient_non_pow2,2.84058,2840.58 +33554432,4,scan,scan_thrust_pow2,1.5872,1587.2 +33554432,4,scan,scan_thrust_non_pow2,1.58208,1582.08 +33554432,4,compact,compact_cpu_without_scan_pow2,56.0168,56016.8 +33554432,4,compact,compact_cpu_without_scan_non_pow2,55.9438,55943.8 +33554432,4,compact,compact_cpu_with_scan,101.631,101631.0 +33554432,4,compact,compact_work_efficient_pow2,5.71117,5711.17 +33554432,4,compact,compact_work_efficient_non_pow2,5.3248,5324.8 +33554432,5,scan,scan_cpu_pow2,16.7024,16702.4 +33554432,5,scan,scan_cpu_non_pow2,17.1964,17196.4 +33554432,5,scan,scan_naive_pow2,25.4267,25426.7 +33554432,5,scan,scan_naive_non_pow2,25.2602,25260.2 +33554432,5,scan,scan_work_efficient_pow2,2.93616,2936.1600000000003 +33554432,5,scan,scan_work_efficient_non_pow2,2.95117,2951.17 +33554432,5,scan,scan_thrust_pow2,1.61587,1615.87 +33554432,5,scan,scan_thrust_non_pow2,1.60154,1601.54 +33554432,5,compact,compact_cpu_without_scan_pow2,55.6078,55607.799999999996 +33554432,5,compact,compact_cpu_without_scan_non_pow2,57.3164,57316.4 +33554432,5,compact,compact_cpu_with_scan,103.323,103323.0 +33554432,5,compact,compact_work_efficient_pow2,5.97661,5976.61 +33554432,5,compact,compact_work_efficient_non_pow2,5.48352,5483.52 +33554432,6,scan,scan_cpu_pow2,16.3218,16321.8 +33554432,6,scan,scan_cpu_non_pow2,16.0544,16054.400000000001 +33554432,6,scan,scan_naive_pow2,25.519,25519.0 +33554432,6,scan,scan_naive_non_pow2,25.2199,25219.899999999998 +33554432,6,scan,scan_work_efficient_pow2,3.09546,3095.46 +33554432,6,scan,scan_work_efficient_non_pow2,2.81088,2810.88 +33554432,6,scan,scan_thrust_pow2,1.61792,1617.92 +33554432,6,scan,scan_thrust_non_pow2,1.69574,1695.74 +33554432,6,compact,compact_cpu_without_scan_pow2,57.0773,57077.3 +33554432,6,compact,compact_cpu_without_scan_non_pow2,55.0733,55073.3 +33554432,6,compact,compact_cpu_with_scan,99.9008,99900.8 +33554432,6,compact,compact_work_efficient_pow2,5.75434,5754.34 +33554432,6,compact,compact_work_efficient_non_pow2,5.3719,5371.900000000001 +33554432,7,scan,scan_cpu_pow2,16.338,16338.000000000002 +33554432,7,scan,scan_cpu_non_pow2,16.5429,16542.899999999998 +33554432,7,scan,scan_naive_pow2,25.4168,25416.8 +33554432,7,scan,scan_naive_non_pow2,25.3821,25382.100000000002 +33554432,7,scan,scan_work_efficient_pow2,3.30931,3309.31 +33554432,7,scan,scan_work_efficient_non_pow2,3.11091,3110.91 +33554432,7,scan,scan_thrust_pow2,1.72749,1727.49 +33554432,7,scan,scan_thrust_non_pow2,1.71917,1719.17 +33554432,7,compact,compact_cpu_without_scan_pow2,56.7651,56765.1 +33554432,7,compact,compact_cpu_without_scan_non_pow2,54.625,54625.0 +33554432,7,compact,compact_cpu_with_scan,101.518,101518.0 +33554432,7,compact,compact_work_efficient_pow2,6.03466,6034.66 +33554432,7,compact,compact_work_efficient_non_pow2,5.4313,5431.3 +33554432,8,scan,scan_cpu_pow2,17.5497,17549.7 +33554432,8,scan,scan_cpu_non_pow2,16.8521,16852.1 +33554432,8,scan,scan_naive_pow2,25.314,25314.0 +33554432,8,scan,scan_naive_non_pow2,25.3746,25374.600000000002 +33554432,8,scan,scan_work_efficient_pow2,3.28918,3289.18 +33554432,8,scan,scan_work_efficient_non_pow2,3.06381,3063.81 +33554432,8,scan,scan_thrust_pow2,1.65363,1653.6299999999999 +33554432,8,scan,scan_thrust_non_pow2,1.73168,1731.68 +33554432,8,compact,compact_cpu_without_scan_pow2,56.7143,56714.3 +33554432,8,compact,compact_cpu_without_scan_non_pow2,57.5416,57541.600000000006 +33554432,8,compact,compact_cpu_with_scan,101.239,101239.0 +33554432,8,compact,compact_work_efficient_pow2,5.62749,5627.49 +33554432,8,compact,compact_work_efficient_non_pow2,5.3975,5397.5 +33554432,9,scan,scan_cpu_pow2,15.9771,15977.1 +33554432,9,scan,scan_cpu_non_pow2,17.0679,17067.9 +33554432,9,scan,scan_naive_pow2,25.2897,25289.7 +33554432,9,scan,scan_naive_non_pow2,25.4526,25452.6 +33554432,9,scan,scan_work_efficient_pow2,2.99638,2996.3799999999997 +33554432,9,scan,scan_work_efficient_non_pow2,2.80986,2809.86 +33554432,9,scan,scan_thrust_pow2,1.52269,1522.69 +33554432,9,scan,scan_thrust_non_pow2,1.77152,1771.52 +33554432,9,compact,compact_cpu_without_scan_pow2,55.4533,55453.299999999996 +33554432,9,compact,compact_cpu_without_scan_non_pow2,56.7389,56738.9 +33554432,9,compact,compact_cpu_with_scan,102.083,102083.0 +33554432,9,compact,compact_work_efficient_pow2,5.82,5820.0 +33554432,9,compact,compact_work_efficient_non_pow2,5.36576,5365.76 +33554432,10,scan,scan_cpu_pow2,15.6764,15676.4 +33554432,10,scan,scan_cpu_non_pow2,16.6751,16675.100000000002 +33554432,10,scan,scan_naive_pow2,25.3104,25310.4 +33554432,10,scan,scan_naive_non_pow2,25.3611,25361.100000000002 +33554432,10,scan,scan_work_efficient_pow2,3.36125,3361.25 +33554432,10,scan,scan_work_efficient_non_pow2,2.67878,2678.78 +33554432,10,scan,scan_thrust_pow2,1.60051,1600.51 +33554432,10,scan,scan_thrust_non_pow2,1.62102,1621.02 +33554432,10,compact,compact_cpu_without_scan_pow2,57.3079,57307.899999999994 +33554432,10,compact,compact_cpu_without_scan_non_pow2,55.0642,55064.2 +33554432,10,compact,compact_cpu_with_scan,103.722,103722.0 +33554432,10,compact,compact_work_efficient_pow2,5.7879,5787.9 +33554432,10,compact,compact_work_efficient_non_pow2,5.44461,5444.61 +67108864,1,scan,scan_cpu_pow2,32.9706,32970.6 +67108864,1,scan,scan_cpu_non_pow2,35.5836,35583.6 +67108864,1,scan,scan_naive_pow2,53.0656,53065.600000000006 +67108864,1,scan,scan_naive_non_pow2,53.5323,53532.3 +67108864,1,scan,scan_work_efficient_pow2,5.12349,5123.490000000001 +67108864,1,scan,scan_work_efficient_non_pow2,4.9193,4919.3 +67108864,1,scan,scan_thrust_pow2,2.74218,2742.18 +67108864,1,scan,scan_thrust_non_pow2,2.75446,2754.46 +67108864,1,compact,compact_cpu_without_scan_pow2,112.721,112721.0 +67108864,1,compact,compact_cpu_without_scan_non_pow2,110.711,110711.0 +67108864,1,compact,compact_cpu_with_scan,199.535,199535.0 +67108864,1,compact,compact_work_efficient_pow2,10.3279,10327.9 +67108864,1,compact,compact_work_efficient_non_pow2,10.1284,10128.4 +67108864,2,scan,scan_cpu_pow2,33.5569,33556.9 +67108864,2,scan,scan_cpu_non_pow2,34.8141,34814.100000000006 +67108864,2,scan,scan_naive_pow2,52.5207,52520.7 +67108864,2,scan,scan_naive_non_pow2,52.7478,52747.799999999996 +67108864,2,scan,scan_work_efficient_pow2,5.1296,5129.6 +67108864,2,scan,scan_work_efficient_non_pow2,4.95107,4951.07 +67108864,2,scan,scan_thrust_pow2,2.75661,2756.6099999999997 +67108864,2,scan,scan_thrust_non_pow2,2.61939,2619.3900000000003 +67108864,2,compact,compact_cpu_without_scan_pow2,113.256,113256.0 +67108864,2,compact,compact_cpu_without_scan_non_pow2,111.616,111616.0 +67108864,2,compact,compact_cpu_with_scan,201.185,201185.0 +67108864,2,compact,compact_work_efficient_pow2,10.2797,10279.7 +67108864,2,compact,compact_work_efficient_non_pow2,10.2431,10243.1 +67108864,3,scan,scan_cpu_pow2,31.8614,31861.4 +67108864,3,scan,scan_cpu_non_pow2,33.4057,33405.700000000004 +67108864,3,scan,scan_naive_pow2,52.6211,52621.1 +67108864,3,scan,scan_naive_non_pow2,53.2301,53230.1 +67108864,3,scan,scan_work_efficient_pow2,4.82077,4820.77 +67108864,3,scan,scan_work_efficient_non_pow2,5.04218,5042.18 +67108864,3,scan,scan_thrust_pow2,2.57126,2571.26 +67108864,3,scan,scan_thrust_non_pow2,3.23782,3237.82 +67108864,3,compact,compact_cpu_without_scan_pow2,112.968,112968.0 +67108864,3,compact,compact_cpu_without_scan_non_pow2,110.388,110388.0 +67108864,3,compact,compact_cpu_with_scan,202.18,202180.0 +67108864,3,compact,compact_work_efficient_pow2,10.4631,10463.1 +67108864,3,compact,compact_work_efficient_non_pow2,10.4233,10423.3 +67108864,4,scan,scan_cpu_pow2,32.7121,32712.1 +67108864,4,scan,scan_cpu_non_pow2,32.178,32177.999999999996 +67108864,4,scan,scan_naive_pow2,53.204,53204.0 +67108864,4,scan,scan_naive_non_pow2,53.2202,53220.2 +67108864,4,scan,scan_work_efficient_pow2,4.98829,4988.29 +67108864,4,scan,scan_work_efficient_non_pow2,4.86912,4869.12 +67108864,4,scan,scan_thrust_pow2,2.67162,2671.62 +67108864,4,scan,scan_thrust_non_pow2,2.70336,2703.36 +67108864,4,compact,compact_cpu_without_scan_pow2,112.433,112433.0 +67108864,4,compact,compact_cpu_without_scan_non_pow2,111.404,111404.0 +67108864,4,compact,compact_cpu_with_scan,199.558,199558.0 +67108864,4,compact,compact_work_efficient_pow2,10.5103,10510.300000000001 +67108864,4,compact,compact_work_efficient_non_pow2,10.0844,10084.4 +67108864,5,scan,scan_cpu_pow2,33.219,33219.0 +67108864,5,scan,scan_cpu_non_pow2,32.6333,32633.3 +67108864,5,scan,scan_naive_pow2,52.8677,52867.7 +67108864,5,scan,scan_naive_non_pow2,53.5472,53547.2 +67108864,5,scan,scan_work_efficient_pow2,5.17571,5175.71 +67108864,5,scan,scan_work_efficient_non_pow2,4.91213,4912.13 +67108864,5,scan,scan_thrust_pow2,2.81805,2818.0499999999997 +67108864,5,scan,scan_thrust_non_pow2,2.78938,2789.38 +67108864,5,compact,compact_cpu_without_scan_pow2,112.719,112719.0 +67108864,5,compact,compact_cpu_without_scan_non_pow2,111.592,111592.0 +67108864,5,compact,compact_cpu_with_scan,200.691,200691.0 +67108864,5,compact,compact_work_efficient_pow2,10.4745,10474.5 +67108864,5,compact,compact_work_efficient_non_pow2,10.1069,10106.9 +67108864,6,scan,scan_cpu_pow2,32.5147,32514.699999999997 +67108864,6,scan,scan_cpu_non_pow2,32.7109,32710.9 +67108864,6,scan,scan_naive_pow2,52.6984,52698.4 +67108864,6,scan,scan_naive_non_pow2,52.8136,52813.6 +67108864,6,scan,scan_work_efficient_pow2,5.25699,5256.99 +67108864,6,scan,scan_work_efficient_non_pow2,4.9193,4919.3 +67108864,6,scan,scan_thrust_pow2,2.55795,2557.95 +67108864,6,scan,scan_thrust_non_pow2,2.75558,2755.58 +67108864,6,compact,compact_cpu_without_scan_pow2,114.087,114087.0 +67108864,6,compact,compact_cpu_without_scan_non_pow2,111.741,111741.0 +67108864,6,compact,compact_cpu_with_scan,201.624,201624.0 +67108864,6,compact,compact_work_efficient_pow2,10.2673,10267.300000000001 +67108864,6,compact,compact_work_efficient_non_pow2,10.3004,10300.4 +67108864,7,scan,scan_cpu_pow2,34.0245,34024.5 +67108864,7,scan,scan_cpu_non_pow2,32.2048,32204.8 +67108864,7,scan,scan_naive_pow2,52.9994,52999.4 +67108864,7,scan,scan_naive_non_pow2,52.2628,52262.799999999996 +67108864,7,scan,scan_work_efficient_pow2,5.212,5212.0 +67108864,7,scan,scan_work_efficient_non_pow2,5.33606,5336.0599999999995 +67108864,7,scan,scan_thrust_pow2,2.64192,2641.9199999999996 +67108864,7,scan,scan_thrust_non_pow2,2.65523,2655.23 +67108864,7,compact,compact_cpu_without_scan_pow2,111.849,111849.0 +67108864,7,compact,compact_cpu_without_scan_non_pow2,113.071,113071.0 +67108864,7,compact,compact_cpu_with_scan,201.059,201059.0 +67108864,7,compact,compact_work_efficient_pow2,10.4103,10410.3 +67108864,7,compact,compact_work_efficient_non_pow2,10.0895,10089.5 +67108864,8,scan,scan_cpu_pow2,31.8586,31858.6 +67108864,8,scan,scan_cpu_non_pow2,34.716,34716.0 +67108864,8,scan,scan_naive_pow2,52.758,52758.0 +67108864,8,scan,scan_naive_non_pow2,52.8854,52885.399999999994 +67108864,8,scan,scan_work_efficient_pow2,5.27904,5279.04 +67108864,8,scan,scan_work_efficient_non_pow2,4.89584,4895.84 +67108864,8,scan,scan_thrust_pow2,2.5856,2585.6 +67108864,8,scan,scan_thrust_non_pow2,2.69722,2697.2200000000003 +67108864,8,compact,compact_cpu_without_scan_pow2,112.197,112197.0 +67108864,8,compact,compact_cpu_without_scan_non_pow2,112.028,112028.0 +67108864,8,compact,compact_cpu_with_scan,205.128,205128.0 +67108864,8,compact,compact_work_efficient_pow2,10.6945,10694.5 +67108864,8,compact,compact_work_efficient_non_pow2,10.1253,10125.3 +67108864,9,scan,scan_cpu_pow2,33.727,33727.0 +67108864,9,scan,scan_cpu_non_pow2,32.7404,32740.4 +67108864,9,scan,scan_naive_pow2,52.5358,52535.8 +67108864,9,scan,scan_naive_non_pow2,52.5147,52514.7 +67108864,9,scan,scan_work_efficient_pow2,5.66061,5660.610000000001 +67108864,9,scan,scan_work_efficient_non_pow2,4.91315,4913.15 +67108864,9,scan,scan_thrust_pow2,2.87334,2873.3399999999997 +67108864,9,scan,scan_thrust_non_pow2,2.80371,2803.71 +67108864,9,compact,compact_cpu_without_scan_pow2,110.564,110564.0 +67108864,9,compact,compact_cpu_without_scan_non_pow2,111.981,111981.0 +67108864,9,compact,compact_cpu_with_scan,201.118,201118.0 +67108864,9,compact,compact_work_efficient_pow2,10.483,10483.0 +67108864,9,compact,compact_work_efficient_non_pow2,9.96454,9964.539999999999 +67108864,10,scan,scan_cpu_pow2,31.3847,31384.699999999997 +67108864,10,scan,scan_cpu_non_pow2,33.2231,33223.100000000006 +67108864,10,scan,scan_naive_pow2,52.6487,52648.7 +67108864,10,scan,scan_naive_non_pow2,53.3386,53338.6 +67108864,10,scan,scan_work_efficient_pow2,4.98733,4987.33 +67108864,10,scan,scan_work_efficient_non_pow2,4.96435,4964.349999999999 +67108864,10,scan,scan_thrust_pow2,2.58765,2587.65 +67108864,10,scan,scan_thrust_non_pow2,2.61939,2619.3900000000003 +67108864,10,compact,compact_cpu_without_scan_pow2,113.122,113122.0 +67108864,10,compact,compact_cpu_without_scan_non_pow2,113.051,113051.0 +67108864,10,compact,compact_cpu_with_scan,202.496,202496.0 +67108864,10,compact,compact_work_efficient_pow2,10.7417,10741.699999999999 +67108864,10,compact,compact_work_efficient_non_pow2,10.2738,10273.8 +134217728,1,scan,scan_cpu_pow2,65.7458,65745.8 +134217728,1,scan,scan_cpu_non_pow2,66.5541,66554.1 +134217728,1,scan,scan_naive_pow2,109.714,109714.0 +134217728,1,scan,scan_naive_non_pow2,108.931,108931.0 +134217728,1,scan,scan_work_efficient_pow2,9.90403,9904.03 +134217728,1,scan,scan_work_efficient_non_pow2,9.3952,9395.2 +134217728,1,scan,scan_thrust_pow2,4.66637,4666.37 +134217728,1,scan,scan_thrust_non_pow2,4.61414,4614.14 +134217728,1,compact,compact_cpu_without_scan_pow2,224.717,224717.0 +134217728,1,compact,compact_cpu_without_scan_non_pow2,226.441,226441.0 +134217728,1,compact,compact_cpu_with_scan,405.414,405414.0 +134217728,1,compact,compact_work_efficient_pow2,19.2476,19247.6 +134217728,1,compact,compact_work_efficient_non_pow2,18.6665,18666.5 +134217728,2,scan,scan_cpu_pow2,71.1686,71168.59999999999 +134217728,2,scan,scan_cpu_non_pow2,70.3108,70310.8 +134217728,2,scan,scan_naive_pow2,110.751,110751.0 +134217728,2,scan,scan_naive_non_pow2,109.079,109079.0 +134217728,2,scan,scan_work_efficient_pow2,10.0701,10070.1 +134217728,2,scan,scan_work_efficient_non_pow2,9.49658,9496.58 +134217728,2,scan,scan_thrust_pow2,4.77798,4777.9800000000005 +134217728,2,scan,scan_thrust_non_pow2,4.64486,4644.860000000001 +134217728,2,compact,compact_cpu_without_scan_pow2,223.765,223765.0 +134217728,2,compact,compact_cpu_without_scan_non_pow2,223.462,223462.0 +134217728,2,compact,compact_cpu_with_scan,400.433,400433.0 +134217728,2,compact,compact_work_efficient_pow2,19.1076,19107.600000000002 +134217728,2,compact,compact_work_efficient_non_pow2,18.7228,18722.8 +134217728,3,scan,scan_cpu_pow2,63.701,63701.0 +134217728,3,scan,scan_cpu_non_pow2,64.7987,64798.7 +134217728,3,scan,scan_naive_pow2,110.326,110326.0 +134217728,3,scan,scan_naive_non_pow2,109.048,109048.0 +134217728,3,scan,scan_work_efficient_pow2,9.95459,9954.59 +134217728,3,scan,scan_work_efficient_non_pow2,9.56422,9564.220000000001 +134217728,3,scan,scan_thrust_pow2,4.65818,4658.179999999999 +134217728,3,scan,scan_thrust_non_pow2,4.76262,4762.62 +134217728,3,compact,compact_cpu_without_scan_pow2,222.645,222645.0 +134217728,3,compact,compact_cpu_without_scan_non_pow2,222.427,222427.0 +134217728,3,compact,compact_cpu_with_scan,400.019,400019.0 +134217728,3,compact,compact_work_efficient_pow2,20.2342,20234.2 +134217728,3,compact,compact_work_efficient_non_pow2,19.4038,19403.8 +134217728,4,scan,scan_cpu_pow2,64.6291,64629.09999999999 +134217728,4,scan,scan_cpu_non_pow2,72.2041,72204.09999999999 +134217728,4,scan,scan_naive_pow2,110.321,110321.0 +134217728,4,scan,scan_naive_non_pow2,108.97,108970.0 +134217728,4,scan,scan_work_efficient_pow2,9.69693,9696.93 +134217728,4,scan,scan_work_efficient_non_pow2,9.44435,9444.35 +134217728,4,scan,scan_thrust_pow2,4.67354,4673.54 +134217728,4,scan,scan_thrust_non_pow2,5.21523,5215.2300000000005 +134217728,4,compact,compact_cpu_without_scan_pow2,225.243,225243.0 +134217728,4,compact,compact_cpu_without_scan_non_pow2,223.965,223965.0 +134217728,4,compact,compact_cpu_with_scan,407.214,407214.0 +134217728,4,compact,compact_work_efficient_pow2,20.1375,20137.5 +134217728,4,compact,compact_work_efficient_non_pow2,19.543,19543.0 +134217728,5,scan,scan_cpu_pow2,66.9426,66942.6 +134217728,5,scan,scan_cpu_non_pow2,67.8383,67838.3 +134217728,5,scan,scan_naive_pow2,110.252,110252.0 +134217728,5,scan,scan_naive_non_pow2,108.943,108943.0 +134217728,5,scan,scan_work_efficient_pow2,10.3463,10346.3 +134217728,5,scan,scan_work_efficient_non_pow2,9.47405,9474.05 +134217728,5,scan,scan_thrust_pow2,4.67354,4673.54 +134217728,5,scan,scan_thrust_non_pow2,4.864,4864.0 +134217728,5,compact,compact_cpu_without_scan_pow2,225.264,225264.0 +134217728,5,compact,compact_cpu_without_scan_non_pow2,222.31,222310.0 +134217728,5,compact,compact_cpu_with_scan,405.457,405457.0 +134217728,5,compact,compact_work_efficient_pow2,21.0745,21074.5 +134217728,5,compact,compact_work_efficient_non_pow2,18.9583,18958.300000000003 +134217728,6,scan,scan_cpu_pow2,71.2926,71292.59999999999 +134217728,6,scan,scan_cpu_non_pow2,64.0346,64034.6 +134217728,6,scan,scan_naive_pow2,110.549,110549.0 +134217728,6,scan,scan_naive_non_pow2,109.242,109242.0 +134217728,6,scan,scan_work_efficient_pow2,10.0368,10036.8 +134217728,6,scan,scan_work_efficient_non_pow2,9.54163,9541.63 +134217728,6,scan,scan_thrust_pow2,4.64896,4648.96 +134217728,6,scan,scan_thrust_non_pow2,4.71654,4716.54 +134217728,6,compact,compact_cpu_without_scan_pow2,223.021,223021.0 +134217728,6,compact,compact_cpu_without_scan_non_pow2,220.288,220288.0 +134217728,6,compact,compact_cpu_with_scan,403.56,403560.0 +134217728,6,compact,compact_work_efficient_pow2,20.3589,20358.899999999998 +134217728,6,compact,compact_work_efficient_non_pow2,19.2143,19214.300000000003 +134217728,7,scan,scan_cpu_pow2,64.5018,64501.8 +134217728,7,scan,scan_cpu_non_pow2,69.2349,69234.9 +134217728,7,scan,scan_naive_pow2,110.357,110357.0 +134217728,7,scan,scan_naive_non_pow2,108.873,108873.0 +134217728,7,scan,scan_work_efficient_pow2,9.96083,9960.83 +134217728,7,scan,scan_work_efficient_non_pow2,9.37677,9376.77 +134217728,7,scan,scan_thrust_pow2,4.67251,4672.51 +134217728,7,scan,scan_thrust_non_pow2,4.66739,4667.39 +134217728,7,compact,compact_cpu_without_scan_pow2,227.221,227221.0 +134217728,7,compact,compact_cpu_without_scan_non_pow2,223.779,223779.0 +134217728,7,compact,compact_cpu_with_scan,405.394,405394.0 +134217728,7,compact,compact_work_efficient_pow2,20.2519,20251.899999999998 +134217728,7,compact,compact_work_efficient_non_pow2,18.8559,18855.899999999998 +134217728,8,scan,scan_cpu_pow2,66.2381,66238.1 +134217728,8,scan,scan_cpu_non_pow2,65.3097,65309.700000000004 +134217728,8,scan,scan_naive_pow2,110.37,110370.0 +134217728,8,scan,scan_naive_non_pow2,109.331,109331.0 +134217728,8,scan,scan_work_efficient_pow2,9.97939,9979.390000000001 +134217728,8,scan,scan_work_efficient_non_pow2,9.4935,9493.5 +134217728,8,scan,scan_thrust_pow2,4.67456,4674.5599999999995 +134217728,8,scan,scan_thrust_non_pow2,4.76262,4762.62 +134217728,8,compact,compact_cpu_without_scan_pow2,224.317,224317.0 +134217728,8,compact,compact_cpu_without_scan_non_pow2,228.967,228967.0 +134217728,8,compact,compact_cpu_with_scan,405.428,405428.0 +134217728,8,compact,compact_work_efficient_pow2,19.0925,19092.5 +134217728,8,compact,compact_work_efficient_non_pow2,19.115,19115.0 +134217728,9,scan,scan_cpu_pow2,64.5025,64502.5 +134217728,9,scan,scan_cpu_non_pow2,65.5607,65560.7 +134217728,9,scan,scan_naive_pow2,110.815,110815.0 +134217728,9,scan,scan_naive_non_pow2,108.996,108996.0 +134217728,9,scan,scan_work_efficient_pow2,9.51082,9510.820000000002 +134217728,9,scan,scan_work_efficient_non_pow2,8.98867,8988.67 +134217728,9,scan,scan_thrust_pow2,4.6848,4684.8 +134217728,9,scan,scan_thrust_non_pow2,4.91622,4916.22 +134217728,9,compact,compact_cpu_without_scan_pow2,223.964,223964.0 +134217728,9,compact,compact_cpu_without_scan_non_pow2,222.718,222718.0 +134217728,9,compact,compact_cpu_with_scan,406.811,406811.0 +134217728,9,compact,compact_work_efficient_pow2,19.3702,19370.2 +134217728,9,compact,compact_work_efficient_non_pow2,18.9348,18934.8 +134217728,10,scan,scan_cpu_pow2,70.6062,70606.2 +134217728,10,scan,scan_cpu_non_pow2,70.563,70563.0 +134217728,10,scan,scan_naive_pow2,110.376,110376.0 +134217728,10,scan,scan_naive_non_pow2,109.062,109062.0 +134217728,10,scan,scan_work_efficient_pow2,10.1392,10139.2 +134217728,10,scan,scan_work_efficient_non_pow2,9.3911,9391.1 +134217728,10,scan,scan_thrust_pow2,5.01862,5018.62 +134217728,10,scan,scan_thrust_non_pow2,4.74931,4749.31 +134217728,10,compact,compact_cpu_without_scan_pow2,227.643,227643.0 +134217728,10,compact,compact_cpu_without_scan_non_pow2,226.933,226933.0 +134217728,10,compact,compact_cpu_with_scan,402.117,402117.0 +134217728,10,compact,compact_work_efficient_pow2,20.1753,20175.3 +134217728,10,compact,compact_work_efficient_non_pow2,19.0259,19025.9 +268435456,1,scan,scan_cpu_pow2,132.373,132373.0 +268435456,1,scan,scan_cpu_non_pow2,133.35,133350.0 +268435456,1,scan,scan_naive_pow2,228.608,228608.0 +268435456,1,scan,scan_naive_non_pow2,227.778,227778.0 +268435456,1,scan,scan_work_efficient_pow2,16.8142,16814.2 +268435456,1,scan,scan_work_efficient_non_pow2,16.8602,16860.2 +268435456,1,scan,scan_thrust_pow2,8.58112,8581.12 +268435456,1,scan,scan_thrust_non_pow2,8.54323,8543.23 +268435456,1,compact,compact_cpu_without_scan_pow2,451.766,451766.0 +268435456,1,compact,compact_cpu_without_scan_non_pow2,456.493,456493.0 +268435456,1,compact,compact_cpu_with_scan,804.737,804737.0 +268435456,1,compact,compact_work_efficient_pow2,39.1622,39162.2 +268435456,1,compact,compact_work_efficient_non_pow2,36.9736,36973.6 +268435456,2,scan,scan_cpu_pow2,130.853,130853.00000000001 +268435456,2,scan,scan_cpu_non_pow2,132.751,132751.0 +268435456,2,scan,scan_naive_pow2,230.517,230517.0 +268435456,2,scan,scan_naive_non_pow2,227.635,227635.0 +268435456,2,scan,scan_work_efficient_pow2,17.5688,17568.8 +268435456,2,scan,scan_work_efficient_non_pow2,17.0496,17049.600000000002 +268435456,2,scan,scan_thrust_pow2,8.2135,8213.5 +268435456,2,scan,scan_thrust_non_pow2,9.22624,9226.24 +268435456,2,compact,compact_cpu_without_scan_pow2,450.667,450667.0 +268435456,2,compact,compact_cpu_without_scan_non_pow2,452.019,452019.0 +268435456,2,compact,compact_cpu_with_scan,842.232,842232.0 +268435456,2,compact,compact_work_efficient_pow2,37.3482,37348.2 +268435456,2,compact,compact_work_efficient_non_pow2,37.1026,37102.600000000006 +268435456,3,scan,scan_cpu_pow2,132.292,132292.0 +268435456,3,scan,scan_cpu_non_pow2,138.979,138979.0 +268435456,3,scan,scan_naive_pow2,228.483,228483.0 +268435456,3,scan,scan_naive_non_pow2,227.452,227452.0 +268435456,3,scan,scan_work_efficient_pow2,17.3796,17379.6 +268435456,3,scan,scan_work_efficient_non_pow2,16.34,16340.0 +268435456,3,scan,scan_thrust_pow2,8.576,8576.0 +268435456,3,scan,scan_thrust_non_pow2,8.63744,8637.44 +268435456,3,compact,compact_cpu_without_scan_pow2,448.704,448704.0 +268435456,3,compact,compact_cpu_without_scan_non_pow2,450.355,450355.0 +268435456,3,compact,compact_cpu_with_scan,803.179,803179.0 +268435456,3,compact,compact_work_efficient_pow2,38.7403,38740.299999999996 +268435456,3,compact,compact_work_efficient_non_pow2,36.6735,36673.5 +268435456,4,scan,scan_cpu_pow2,132.747,132747.0 +268435456,4,scan,scan_cpu_non_pow2,128.708,128708.0 +268435456,4,scan,scan_naive_pow2,228.25,228250.0 +268435456,4,scan,scan_naive_non_pow2,227.803,227803.0 +268435456,4,scan,scan_work_efficient_pow2,17.5412,17541.2 +268435456,4,scan,scan_work_efficient_non_pow2,16.6175,16617.5 +268435456,4,scan,scan_thrust_pow2,8.52582,8525.82 +268435456,4,scan,scan_thrust_non_pow2,8.62413,8624.13 +268435456,4,compact,compact_cpu_without_scan_pow2,451.018,451018.0 +268435456,4,compact,compact_cpu_without_scan_non_pow2,450.695,450695.0 +268435456,4,compact,compact_cpu_with_scan,819.745,819745.0 +268435456,4,compact,compact_work_efficient_pow2,37.7899,37789.9 +268435456,4,compact,compact_work_efficient_non_pow2,37.3105,37310.5 +268435456,5,scan,scan_cpu_pow2,143.426,143426.0 +268435456,5,scan,scan_cpu_non_pow2,153.884,153884.0 +268435456,5,scan,scan_naive_pow2,228.478,228478.0 +268435456,5,scan,scan_naive_non_pow2,227.438,227438.0 +268435456,5,scan,scan_work_efficient_pow2,17.1444,17144.4 +268435456,5,scan,scan_work_efficient_non_pow2,16.9697,16969.7 +268435456,5,scan,scan_thrust_pow2,8.54426,8544.26 +268435456,5,scan,scan_thrust_non_pow2,8.5975,8597.5 +268435456,5,compact,compact_cpu_without_scan_pow2,453.483,453483.0 +268435456,5,compact,compact_cpu_without_scan_non_pow2,446.99,446990.0 +268435456,5,compact,compact_cpu_with_scan,801.387,801387.0 +268435456,5,compact,compact_work_efficient_pow2,39.5788,39578.8 +268435456,5,compact,compact_work_efficient_non_pow2,37.0872,37087.200000000004 +268435456,6,scan,scan_cpu_pow2,136.296,136296.0 +268435456,6,scan,scan_cpu_non_pow2,133.544,133544.0 +268435456,6,scan,scan_naive_pow2,228.354,228354.0 +268435456,6,scan,scan_naive_non_pow2,227.714,227714.0 +268435456,6,scan,scan_work_efficient_pow2,17.319,17319.0 +268435456,6,scan,scan_work_efficient_non_pow2,17.1448,17144.8 +268435456,6,scan,scan_thrust_pow2,8.91802,8918.02 +268435456,6,scan,scan_thrust_non_pow2,8.66714,8667.14 +268435456,6,compact,compact_cpu_without_scan_pow2,447.684,447684.0 +268435456,6,compact,compact_cpu_without_scan_non_pow2,454.203,454203.0 +268435456,6,compact,compact_cpu_with_scan,804.796,804796.0 +268435456,6,compact,compact_work_efficient_pow2,39.0621,39062.1 +268435456,6,compact,compact_work_efficient_non_pow2,37.0831,37083.1 +268435456,7,scan,scan_cpu_pow2,139.047,139047.0 +268435456,7,scan,scan_cpu_non_pow2,134.467,134467.0 +268435456,7,scan,scan_naive_pow2,228.831,228831.0 +268435456,7,scan,scan_naive_non_pow2,227.896,227896.0 +268435456,7,scan,scan_work_efficient_pow2,17.4343,17434.3 +268435456,7,scan,scan_work_efficient_non_pow2,17.1387,17138.7 +268435456,7,scan,scan_thrust_pow2,8.64563,8645.630000000001 +268435456,7,scan,scan_thrust_non_pow2,8.76954,8769.539999999999 +268435456,7,compact,compact_cpu_without_scan_pow2,464.444,464444.0 +268435456,7,compact,compact_cpu_without_scan_non_pow2,453.871,453871.0 +268435456,7,compact,compact_cpu_with_scan,815.938,815938.0 +268435456,7,compact,compact_work_efficient_pow2,39.8693,39869.3 +268435456,7,compact,compact_work_efficient_non_pow2,36.8323,36832.299999999996 +268435456,8,scan,scan_cpu_pow2,128.088,128088.0 +268435456,8,scan,scan_cpu_non_pow2,133.992,133992.0 +268435456,8,scan,scan_naive_pow2,228.497,228497.0 +268435456,8,scan,scan_naive_non_pow2,227.595,227595.0 +268435456,8,scan,scan_work_efficient_pow2,17.2821,17282.1 +268435456,8,scan,scan_work_efficient_non_pow2,16.6349,16634.899999999998 +268435456,8,scan,scan_thrust_pow2,8.35482,8354.82 +268435456,8,scan,scan_thrust_non_pow2,8.6057,8605.7 +268435456,8,compact,compact_cpu_without_scan_pow2,449.984,449984.0 +268435456,8,compact,compact_cpu_without_scan_non_pow2,456.349,456349.0 +268435456,8,compact,compact_cpu_with_scan,831.611,831611.0 +268435456,8,compact,compact_work_efficient_pow2,38.8659,38865.9 +268435456,8,compact,compact_work_efficient_non_pow2,36.6664,36666.4 +268435456,9,scan,scan_cpu_pow2,137.352,137352.0 +268435456,9,scan,scan_cpu_non_pow2,133.549,133549.0 +268435456,9,scan,scan_naive_pow2,228.095,228095.0 +268435456,9,scan,scan_naive_non_pow2,227.808,227808.0 +268435456,9,scan,scan_work_efficient_pow2,17.6704,17670.4 +268435456,9,scan,scan_work_efficient_non_pow2,17.0086,17008.600000000002 +268435456,9,scan,scan_thrust_pow2,8.76442,8764.42 +268435456,9,scan,scan_thrust_non_pow2,9.07674,9076.74 +268435456,9,compact,compact_cpu_without_scan_pow2,448.816,448816.0 +268435456,9,compact,compact_cpu_without_scan_non_pow2,458.047,458047.0 +268435456,9,compact,compact_cpu_with_scan,834.451,834451.0 +268435456,9,compact,compact_work_efficient_pow2,39.5145,39514.5 +268435456,9,compact,compact_work_efficient_non_pow2,36.1943,36194.299999999996 +268435456,10,scan,scan_cpu_pow2,147.094,147094.0 +268435456,10,scan,scan_cpu_non_pow2,134.541,134541.0 +268435456,10,scan,scan_naive_pow2,228.095,228095.0 +268435456,10,scan,scan_naive_non_pow2,227.397,227397.0 +268435456,10,scan,scan_work_efficient_pow2,17.2193,17219.3 +268435456,10,scan,scan_work_efficient_non_pow2,17.1213,17121.300000000003 +268435456,10,scan,scan_thrust_pow2,8.28723,8287.23 +268435456,10,scan,scan_thrust_non_pow2,8.47053,8470.53 +268435456,10,compact,compact_cpu_without_scan_pow2,452.239,452239.0 +268435456,10,compact,compact_cpu_without_scan_non_pow2,452.946,452946.0 +268435456,10,compact,compact_cpu_with_scan,821.22,821220.0 +268435456,10,compact,compact_work_efficient_pow2,36.6826,36682.6 +268435456,10,compact,compact_work_efficient_non_pow2,36.8712,36871.200000000004 diff --git a/plots/data/timings_avg.csv b/plots/data/timings_avg.csv new file mode 100644 index 00000000..c0e32c4a --- /dev/null +++ b/plots/data/timings_avg.csv @@ -0,0 +1,274 @@ +size,suite,method,time_s,time_ms +256,compact,compact_cpu_with_scan,0.00083,0.83 +256,compact,compact_cpu_without_scan_non_pow2,0.0004,0.4 +256,compact,compact_cpu_without_scan_pow2,0.0007099999999999999,0.71 +256,compact,compact_work_efficient_non_pow2,0.06543360000000001,65.43360000000001 +256,compact,compact_work_efficient_pow2,0.1494112,149.4112 +256,scan,scan_cpu_non_pow2,0.0002,0.2 +256,scan,scan_cpu_pow2,0.00049,0.49 +256,scan,scan_naive_non_pow2,0.1725504,172.5504 +256,scan,scan_naive_pow2,0.2866176,286.6176 +256,scan,scan_thrust_non_pow2,0.0502784,50.2784 +256,scan,scan_thrust_pow2,0.12134400000000001,121.34400000000001 +256,scan,scan_work_efficient_non_pow2,0.0195584,19.5584 +256,scan,scan_work_efficient_pow2,0.0817152,81.7152 +512,compact,compact_cpu_with_scan,0.00159,1.59 +512,compact,compact_cpu_without_scan_non_pow2,0.00058,0.58 +512,compact,compact_cpu_without_scan_pow2,0.0013,1.3 +512,compact,compact_work_efficient_non_pow2,0.07280640000000001,72.80640000000001 +512,compact,compact_work_efficient_pow2,0.1309696,130.96959999999999 +512,scan,scan_cpu_non_pow2,0.00031999999999999997,0.31999999999999995 +512,scan,scan_cpu_pow2,0.0007,0.7 +512,scan,scan_naive_non_pow2,0.2143232,214.32319999999999 +512,scan,scan_naive_pow2,0.3777538,377.75379999999996 +512,scan,scan_thrust_non_pow2,0.048041600000000004,48.0416 +512,scan,scan_thrust_pow2,0.1229824,122.98240000000001 +512,scan,scan_work_efficient_non_pow2,0.020992,20.992 +512,scan,scan_work_efficient_pow2,0.08140800000000001,81.408 +1024,compact,compact_cpu_with_scan,0.00292,2.92 +1024,compact,compact_cpu_without_scan_non_pow2,0.0010999999999999998,1.0999999999999999 +1024,compact,compact_cpu_without_scan_pow2,0.00224,2.2399999999999998 +1024,compact,compact_work_efficient_non_pow2,0.109568,109.568 +1024,compact,compact_work_efficient_pow2,0.18309440000000002,183.0944 +1024,scan,scan_cpu_non_pow2,0.00052,0.5199999999999999 +1024,scan,scan_cpu_pow2,0.00101,1.01 +1024,scan,scan_naive_non_pow2,0.21790720000000002,217.90720000000002 +1024,scan,scan_naive_pow2,0.3908608,390.8608 +1024,scan,scan_thrust_non_pow2,0.0443424,44.3424 +1024,scan,scan_thrust_pow2,0.12461440000000001,124.61440000000002 +1024,scan,scan_work_efficient_non_pow2,0.0833568,83.35679999999999 +1024,scan,scan_work_efficient_pow2,0.174288,174.288 +2048,compact,compact_cpu_with_scan,0.00693,6.930000000000001 +2048,compact,compact_cpu_without_scan_non_pow2,0.00229,2.29 +2048,compact,compact_cpu_without_scan_pow2,0.00439,4.39 +2048,compact,compact_work_efficient_non_pow2,0.1659904,165.99040000000002 +2048,compact,compact_work_efficient_pow2,0.18575360000000002,185.7536 +2048,scan,scan_cpu_non_pow2,0.00095,0.95 +2048,scan,scan_cpu_pow2,0.00134,1.34 +2048,scan,scan_naive_non_pow2,0.24821759999999998,248.21759999999998 +2048,scan,scan_naive_pow2,0.4564032,456.4032 +2048,scan,scan_thrust_non_pow2,0.0458752,45.8752 +2048,scan,scan_thrust_pow2,0.111616,111.61600000000001 +2048,scan,scan_work_efficient_non_pow2,0.1261632,126.1632 +2048,scan,scan_work_efficient_pow2,0.16057280000000002,160.57280000000003 +4096,compact,compact_cpu_with_scan,0.01351,13.51 +4096,compact,compact_cpu_without_scan_non_pow2,0.00469,4.6899999999999995 +4096,compact,compact_cpu_without_scan_pow2,0.0081,8.1 +4096,compact,compact_work_efficient_non_pow2,0.10516479999999999,105.16479999999999 +4096,compact,compact_work_efficient_pow2,0.1757248,175.7248 +4096,scan,scan_cpu_non_pow2,0.0019,1.9 +4096,scan,scan_cpu_pow2,0.00242,2.42 +4096,scan,scan_naive_non_pow2,0.2525184,252.51839999999999 +4096,scan,scan_naive_pow2,0.44247040000000004,442.47040000000004 +4096,scan,scan_thrust_non_pow2,0.053248000000000004,53.248000000000005 +4096,scan,scan_thrust_pow2,0.1155008,115.5008 +4096,scan,scan_work_efficient_non_pow2,0.07280640000000001,72.80640000000001 +4096,scan,scan_work_efficient_pow2,0.17541120000000002,175.4112 +8192,compact,compact_cpu_with_scan,0.02732,27.32 +8192,compact,compact_cpu_without_scan_non_pow2,0.010879999999999999,10.879999999999999 +8192,compact,compact_cpu_without_scan_pow2,0.01566,15.66 +8192,compact,compact_work_efficient_non_pow2,0.10751999999999999,107.52 +8192,compact,compact_work_efficient_pow2,0.22763840000000002,227.63840000000002 +8192,scan,scan_cpu_non_pow2,0.00408,4.08 +8192,scan,scan_cpu_pow2,0.00411,4.11 +8192,scan,scan_naive_non_pow2,0.2666496,266.64959999999996 +8192,scan,scan_naive_pow2,0.4294784,429.47839999999997 +8192,scan,scan_thrust_non_pow2,0.049152,49.152 +8192,scan,scan_thrust_pow2,0.1251328,125.13279999999999 +8192,scan,scan_work_efficient_non_pow2,0.0684096,68.4096 +8192,scan,scan_work_efficient_pow2,0.1615936,161.5936 +16384,compact,compact_cpu_with_scan,0.07049,70.49 +16384,compact,compact_cpu_without_scan_non_pow2,0.03013,30.13 +16384,compact,compact_cpu_without_scan_pow2,0.033889999999999997,33.88999999999999 +16384,compact,compact_work_efficient_non_pow2,0.1214464,121.4464 +16384,compact,compact_work_efficient_pow2,0.1643648,164.3648 +16384,scan,scan_cpu_non_pow2,0.00753,7.53 +16384,scan,scan_cpu_pow2,0.00776,7.760000000000001 +16384,scan,scan_naive_non_pow2,0.2997376,299.7376 +16384,scan,scan_naive_pow2,0.5794813999999999,579.4813999999999 +16384,scan,scan_thrust_non_pow2,0.0594944,59.494400000000006 +16384,scan,scan_thrust_pow2,0.11581440000000001,115.8144 +16384,scan,scan_work_efficient_non_pow2,0.0646304,64.63040000000001 +16384,scan,scan_work_efficient_pow2,0.21995840000000003,219.95840000000004 +32768,compact,compact_cpu_with_scan,0.11356999999999999,113.57 +32768,compact,compact_cpu_without_scan_non_pow2,0.05895,58.95 +32768,compact,compact_cpu_without_scan_pow2,0.06326,63.26 +32768,compact,compact_work_efficient_non_pow2,0.1037312,103.7312 +32768,compact,compact_work_efficient_pow2,0.22483840000000002,224.83840000000004 +32768,scan,scan_cpu_non_pow2,0.01426,14.26 +32768,scan,scan_cpu_pow2,0.01594,15.94 +32768,scan,scan_naive_non_pow2,0.3867488,386.7488 +32768,scan,scan_naive_pow2,0.453104,453.104 +32768,scan,scan_thrust_non_pow2,0.0465856,46.5856 +32768,scan,scan_thrust_pow2,0.1269728,126.97279999999999 +32768,scan,scan_work_efficient_non_pow2,0.0666656,66.66560000000001 +32768,scan,scan_work_efficient_pow2,0.199824,199.824 +65536,compact,compact_cpu_with_scan,0.21562,215.62 +65536,compact,compact_cpu_without_scan_non_pow2,0.1147,114.7 +65536,compact,compact_cpu_without_scan_pow2,0.1187,118.7 +65536,compact,compact_work_efficient_non_pow2,0.1129472,112.9472 +65536,compact,compact_work_efficient_pow2,0.2936448,293.6448 +65536,scan,scan_cpu_non_pow2,0.02952,29.52 +65536,scan,scan_cpu_pow2,0.032440000000000004,32.440000000000005 +65536,scan,scan_naive_non_pow2,0.3489888,348.98879999999997 +65536,scan,scan_naive_pow2,0.553987,553.987 +65536,scan,scan_thrust_non_pow2,0.057856,57.855999999999995 +65536,scan,scan_thrust_pow2,0.1200128,120.0128 +65536,scan,scan_work_efficient_non_pow2,0.056735999999999995,56.736 +65536,scan,scan_work_efficient_pow2,0.2145184,214.51839999999999 +131072,compact,compact_cpu_with_scan,0.39168,391.67999999999995 +131072,compact,compact_cpu_without_scan_non_pow2,0.22286,222.86 +131072,compact,compact_cpu_without_scan_pow2,0.22954,229.54 +131072,compact,compact_work_efficient_non_pow2,0.12390399999999999,123.90399999999998 +131072,compact,compact_work_efficient_pow2,0.250128,250.12800000000001 +131072,scan,scan_cpu_non_pow2,0.056569999999999995,56.56999999999999 +131072,scan,scan_cpu_pow2,0.06423999999999999,64.24 +131072,scan,scan_naive_non_pow2,0.3854048,385.40479999999997 +131072,scan,scan_naive_pow2,0.6710304,671.0304 +131072,scan,scan_thrust_non_pow2,0.050569600000000006,50.56960000000001 +131072,scan,scan_thrust_pow2,0.12707839999999998,127.07839999999997 +131072,scan,scan_work_efficient_non_pow2,0.0642176,64.2176 +131072,scan,scan_work_efficient_pow2,0.197872,197.87199999999999 +262144,compact,compact_cpu_with_scan,0.76507,765.07 +262144,compact,compact_cpu_without_scan_non_pow2,0.44792,447.91999999999996 +262144,compact,compact_cpu_without_scan_pow2,0.45067,450.67 +262144,compact,compact_work_efficient_non_pow2,0.10731519999999999,107.31519999999999 +262144,compact,compact_work_efficient_pow2,0.2974272,297.4272 +262144,scan,scan_cpu_non_pow2,0.11900999999999999,119.00999999999999 +262144,scan,scan_cpu_pow2,0.13381,133.81 +262144,scan,scan_naive_non_pow2,0.4949986,494.9986 +262144,scan,scan_naive_pow2,0.5859932000000001,585.9932000000001 +262144,scan,scan_thrust_non_pow2,0.5357568,535.7568 +262144,scan,scan_thrust_pow2,0.5522432,552.2432 +262144,scan,scan_work_efficient_non_pow2,0.070656,70.65599999999999 +262144,scan,scan_work_efficient_pow2,0.20912000000000003,209.12000000000003 +524288,compact,compact_cpu_with_scan,1.517,1517.0 +524288,compact,compact_cpu_without_scan_non_pow2,0.8848699999999999,884.8699999999999 +524288,compact,compact_cpu_without_scan_pow2,0.89369,893.6899999999999 +524288,compact,compact_work_efficient_non_pow2,0.1691648,169.1648 +524288,compact,compact_work_efficient_pow2,0.384144,384.144 +524288,scan,scan_cpu_non_pow2,0.56081,560.8100000000001 +524288,scan,scan_cpu_pow2,0.27379000000000003,273.79 +524288,scan,scan_naive_non_pow2,0.7498272,749.8272000000001 +524288,scan,scan_naive_pow2,0.9882334,988.2334000000001 +524288,scan,scan_thrust_non_pow2,0.5451775999999999,545.1776 +524288,scan,scan_thrust_pow2,0.628941,628.9409999999999 +524288,scan,scan_work_efficient_non_pow2,0.0978944,97.8944 +524288,scan,scan_work_efficient_pow2,0.38016,380.16 +1048576,compact,compact_cpu_with_scan,2.9953600000000002,2995.36 +1048576,compact,compact_cpu_without_scan_non_pow2,1.75421,1754.21 +1048576,compact,compact_cpu_without_scan_pow2,1.7723099999999998,1772.3099999999997 +1048576,compact,compact_work_efficient_non_pow2,0.3465216,346.5216 +1048576,compact,compact_work_efficient_pow2,0.6288062,628.8062 +1048576,scan,scan_cpu_non_pow2,0.48040000000000005,480.40000000000003 +1048576,scan,scan_cpu_pow2,0.52852,528.52 +1048576,scan,scan_naive_non_pow2,1.0591262000000001,1059.1262000000002 +1048576,scan,scan_naive_pow2,1.0754928,1075.4928 +1048576,scan,scan_thrust_non_pow2,0.6916092,691.6092 +1048576,scan,scan_thrust_pow2,0.6210556,621.0556 +1048576,scan,scan_work_efficient_non_pow2,0.1380352,138.0352 +1048576,scan,scan_work_efficient_pow2,0.3988288,398.8288 +2097152,compact,compact_cpu_with_scan,6.27128,6271.28 +2097152,compact,compact_cpu_without_scan_non_pow2,3.49718,3497.1800000000003 +2097152,compact,compact_cpu_without_scan_pow2,3.52559,3525.5899999999997 +2097152,compact,compact_work_efficient_non_pow2,1.0557446000000001,1055.7446000000002 +2097152,compact,compact_work_efficient_pow2,1.391088,1391.0880000000002 +2097152,scan,scan_cpu_non_pow2,1.04164,1041.6399999999999 +2097152,scan,scan_cpu_pow2,1.11278,1112.7800000000002 +2097152,scan,scan_naive_non_pow2,1.7997699999999999,1799.77 +2097152,scan,scan_naive_pow2,1.9385439999999998,1938.5439999999999 +2097152,scan,scan_thrust_non_pow2,0.6813662,681.3662 +2097152,scan,scan_thrust_pow2,0.5653471999999999,565.3471999999999 +2097152,scan,scan_work_efficient_non_pow2,0.547536,547.5360000000001 +2097152,scan,scan_work_efficient_pow2,0.8330716,833.0716 +4194304,compact,compact_cpu_with_scan,12.955610000000002,12955.610000000002 +4194304,compact,compact_cpu_without_scan_non_pow2,7.05673,7056.73 +4194304,compact,compact_cpu_without_scan_pow2,7.1262799999999995,7126.28 +4194304,compact,compact_work_efficient_non_pow2,1.500159,1500.159 +4194304,compact,compact_work_efficient_pow2,1.689376,1689.376 +4194304,scan,scan_cpu_non_pow2,2.13388,2133.88 +4194304,scan,scan_cpu_pow2,2.1124300000000003,2112.4300000000003 +4194304,scan,scan_naive_non_pow2,3.310452,3310.452 +4194304,scan,scan_naive_pow2,3.460679,3460.6789999999996 +4194304,scan,scan_thrust_non_pow2,0.7572414000000001,757.2414000000001 +4194304,scan,scan_thrust_pow2,0.67072,670.72 +4194304,scan,scan_work_efficient_non_pow2,0.8794118000000001,879.4118000000001 +4194304,scan,scan_work_efficient_pow2,1.0505002,1050.5002000000002 +8388608,compact,compact_cpu_with_scan,25.08673,25086.73 +8388608,compact,compact_cpu_without_scan_non_pow2,13.936029999999999,13936.029999999999 +8388608,compact,compact_cpu_without_scan_pow2,14.08675,14086.75 +8388608,compact,compact_work_efficient_non_pow2,1.928093,1928.093 +8388608,compact,compact_work_efficient_pow2,2.245267,2245.2670000000003 +8388608,scan,scan_cpu_non_pow2,4.20186,4201.86 +8388608,scan,scan_cpu_pow2,4.17982,4179.820000000001 +8388608,scan,scan_naive_non_pow2,6.392297,6392.2970000000005 +8388608,scan,scan_naive_pow2,6.790438,6790.438 +8388608,scan,scan_thrust_non_pow2,0.9666653999999999,966.6653999999999 +8388608,scan,scan_thrust_pow2,0.9020416000000001,902.0416000000001 +8388608,scan,scan_work_efficient_non_pow2,1.078068,1078.068 +8388608,scan,scan_work_efficient_pow2,1.628349,1628.3490000000002 +16777216,compact,compact_cpu_with_scan,50.72989,50729.89 +16777216,compact,compact_cpu_without_scan_non_pow2,28.03945,28039.449999999997 +16777216,compact,compact_cpu_without_scan_pow2,27.954590000000003,27954.590000000004 +16777216,compact,compact_work_efficient_non_pow2,3.0998609999999998,3099.861 +16777216,compact,compact_work_efficient_pow2,3.3914169999999997,3391.4169999999995 +16777216,scan,scan_cpu_non_pow2,8.372620000000001,8372.62 +16777216,scan,scan_cpu_pow2,8.63163,8631.63 +16777216,scan,scan_naive_non_pow2,12.43128,12431.279999999999 +16777216,scan,scan_naive_pow2,13.078190000000001,13078.19 +16777216,scan,scan_thrust_non_pow2,1.09251,1092.51 +16777216,scan,scan_thrust_pow2,1.114317,1114.317 +16777216,scan,scan_work_efficient_non_pow2,1.820059,1820.0590000000002 +16777216,scan,scan_work_efficient_pow2,1.969452,1969.452 +33554432,compact,compact_cpu_with_scan,101.74908,101749.08 +33554432,compact,compact_cpu_without_scan_non_pow2,56.018299999999996,56018.299999999996 +33554432,compact,compact_cpu_without_scan_pow2,56.37746,56377.46 +33554432,compact,compact_work_efficient_non_pow2,5.425766,5425.7660000000005 +33554432,compact,compact_work_efficient_pow2,5.784913,5784.9130000000005 +33554432,scan,scan_cpu_non_pow2,16.61139,16611.39 +33554432,scan,scan_cpu_pow2,16.65162,16651.620000000003 +33554432,scan,scan_naive_non_pow2,25.36685,25366.85 +33554432,scan,scan_naive_pow2,25.37528,25375.28 +33554432,scan,scan_thrust_non_pow2,1.653351,1653.351 +33554432,scan,scan_thrust_pow2,1.636333,1636.333 +33554432,scan,scan_work_efficient_non_pow2,2.892186,2892.186 +33554432,scan,scan_work_efficient_pow2,3.168311,3168.311 +67108864,compact,compact_cpu_with_scan,201.4574,201457.4 +67108864,compact,compact_cpu_without_scan_non_pow2,111.7583,111758.3 +67108864,compact,compact_cpu_without_scan_pow2,112.5916,112591.6 +67108864,compact,compact_work_efficient_non_pow2,10.173964,10173.964 +67108864,compact,compact_work_efficient_pow2,10.46523,10465.23 +67108864,scan,scan_cpu_non_pow2,33.42099,33420.990000000005 +67108864,scan,scan_cpu_pow2,32.78295,32782.95 +67108864,scan,scan_naive_non_pow2,53.00927,53009.270000000004 +67108864,scan,scan_naive_pow2,52.79194,52791.939999999995 +67108864,scan,scan_thrust_non_pow2,2.763554,2763.554 +67108864,scan,scan_thrust_pow2,2.680618,2680.618 +67108864,scan,scan_work_efficient_non_pow2,4.97225,4972.25 +67108864,scan,scan_work_efficient_pow2,5.1633830000000005,5163.383000000001 +134217728,compact,compact_cpu_with_scan,404.18469999999996,404184.69999999995 +134217728,compact,compact_cpu_without_scan_non_pow2,224.129,224129.0 +134217728,compact,compact_cpu_without_scan_pow2,224.78000000000003,224780.00000000003 +134217728,compact,compact_work_efficient_non_pow2,19.04403,19044.03 +134217728,compact,compact_work_efficient_pow2,19.90502,19905.02 +134217728,scan,scan_cpu_non_pow2,67.64089,67640.89 +134217728,scan,scan_cpu_pow2,66.93283,66932.83 +134217728,scan,scan_naive_non_pow2,109.04749999999999,109047.49999999999 +134217728,scan,scan_naive_pow2,110.38309999999998,110383.09999999999 +134217728,scan,scan_thrust_non_pow2,4.7912930000000005,4791.293000000001 +134217728,scan,scan_thrust_pow2,4.714906,4714.906 +134217728,scan,scan_work_efficient_non_pow2,9.416607,9416.607 +134217728,scan,scan_work_efficient_pow2,9.959899,9959.899 +268435456,compact,compact_cpu_with_scan,817.9296,817929.6000000001 +268435456,compact,compact_cpu_without_scan_non_pow2,453.1968,453196.8 +268435456,compact,compact_cpu_without_scan_pow2,451.88050000000004,451880.50000000006 +268435456,compact,compact_work_efficient_non_pow2,36.87947,36879.47 +268435456,compact,compact_work_efficient_pow2,38.66138,38661.380000000005 +268435456,scan,scan_cpu_non_pow2,135.7765,135776.5 +268435456,scan,scan_cpu_pow2,135.9568,135956.8 +268435456,scan,scan_naive_non_pow2,227.6516,227651.6 +268435456,scan,scan_naive_pow2,228.6208,228620.8 +268435456,scan,scan_thrust_non_pow2,8.721819,8721.819 +268435456,scan,scan_thrust_pow2,8.541082,8541.081999999999 +268435456,scan,scan_work_efficient_non_pow2,16.88853,16888.53 +268435456,scan,scan_work_efficient_pow2,17.33733,17337.33 diff --git a/plots/radix_timings_log_linear_gt_2pow18.png b/plots/radix_timings_log_linear_gt_2pow18.png new file mode 100644 index 00000000..436a6a00 Binary files /dev/null and b/plots/radix_timings_log_linear_gt_2pow18.png differ diff --git a/plots/radix_timings_loglog_full.png b/plots/radix_timings_loglog_full.png new file mode 100644 index 00000000..9faa8650 Binary files /dev/null and b/plots/radix_timings_loglog_full.png differ diff --git a/plots/timings_plot_both.png b/plots/timings_plot_both.png new file mode 100644 index 00000000..d16fea4a Binary files /dev/null and b/plots/timings_plot_both.png differ diff --git a/plots/timings_plot_both_loglog.png b/plots/timings_plot_both_loglog.png new file mode 100644 index 00000000..a10302e8 Binary files /dev/null and b/plots/timings_plot_both_loglog.png differ diff --git a/plots/timings_plot_both_loglog_full.png b/plots/timings_plot_both_loglog_full.png new file mode 100644 index 00000000..ba244ef2 Binary files /dev/null and b/plots/timings_plot_both_loglog_full.png differ diff --git a/plots/timings_plot_incorrect.png b/plots/timings_plot_incorrect.png new file mode 100644 index 00000000..f600f150 Binary files /dev/null and b/plots/timings_plot_incorrect.png differ diff --git a/plots/timings_plot_nonpow2.png b/plots/timings_plot_nonpow2.png new file mode 100644 index 00000000..da2f0719 Binary files /dev/null and b/plots/timings_plot_nonpow2.png differ diff --git a/plots/timings_plot_nonpow2_loglog.png b/plots/timings_plot_nonpow2_loglog.png new file mode 100644 index 00000000..e2d6ce83 Binary files /dev/null and b/plots/timings_plot_nonpow2_loglog.png differ diff --git a/plots/timings_plot_pow2.png b/plots/timings_plot_pow2.png new file mode 100644 index 00000000..32cccafe Binary files /dev/null and b/plots/timings_plot_pow2.png differ diff --git a/plots/timings_plot_pow2_loglog.png b/plots/timings_plot_pow2_loglog.png new file mode 100644 index 00000000..d8d5ae75 Binary files /dev/null and b/plots/timings_plot_pow2_loglog.png differ diff --git a/src/main.cpp b/src/main.cpp index 3d5c8820..8040f9cb 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -6,18 +6,25 @@ * @copyright University of Pennsylvania */ +#define SHOW 1; +int flush = 0; +#if SHOW + #include #include #include #include #include +#include #include "testing_helpers.hpp" +#include -const int SIZE = 1 << 8; // feel free to change the size of array +const int SIZE = 1 << 20; // feel free to change the size of array const int NPOT = SIZE - 3; // Non-Power-Of-Two int *a = new int[SIZE]; int *b = new int[SIZE]; int *c = new int[SIZE]; +int* d = new int[NPOT]; int main(int argc, char* argv[]) { // Scan tests @@ -137,18 +144,225 @@ int main(int argc, char* argv[]) { printDesc("work-efficient compact, power-of-two"); count = StreamCompaction::Efficient::compact(SIZE, c, a); printElapsedTime(StreamCompaction::Efficient::timer().getGpuElapsedTimeForPreviousOperation(), "(CUDA Measured)"); - //printArray(count, c, true); + printArray(count, c, true); printCmpLenResult(count, expectedCount, b, c); zeroArray(SIZE, c); printDesc("work-efficient compact, non-power-of-two"); count = StreamCompaction::Efficient::compact(NPOT, c, a); printElapsedTime(StreamCompaction::Efficient::timer().getGpuElapsedTimeForPreviousOperation(), "(CUDA Measured)"); - //printArray(count, c, true); + printArray(count, c, true); printCmpLenResult(count, expectedNPOT, b, c); + + + printf("\n"); + printf("**********************\n"); + printf("** RADIX SORT TESTS **\n"); + printf("**********************\n"); + + genArray(SIZE - 1, a, INT_MAX); // Leave a 0 at the end to test that edge case + a[SIZE - 1] = 0; + printArray(SIZE, a, true); + + // initialize b using StreamCompaction::CPU::scan you implement + // We use b for further comparison. Make sure your StreamCompaction::CPU::scan is correct. + // At first all cases passed because b && c are all zeroes. + zeroArray(SIZE, b); + printDesc("cpu sort, power-of-two"); + StreamCompaction::CPU::sort(SIZE, b, a); + printElapsedTime(StreamCompaction::CPU::timer().getCpuElapsedTimeForPreviousOperation(), "(std::chrono Measured)"); + printArray(SIZE, b, true); + + zeroArray(NPOT, d); + printDesc("cpu sort, non-power-of-two"); + StreamCompaction::CPU::sort(NPOT, d, a); + printElapsedTime(StreamCompaction::CPU::timer().getCpuElapsedTimeForPreviousOperation(), "(std::chrono Measured)"); + printArray(NPOT, d, true); + //printCmpResult(NPOT, b, c); + + zeroArray(SIZE, c); + printDesc("Radix Sort, power-of-two"); + StreamCompaction::RadixSort::sort(SIZE, c, a); + printElapsedTime(StreamCompaction::RadixSort::timer().getGpuElapsedTimeForPreviousOperation(), "(CUDA Measured)"); + printArray(SIZE, c, true); + printCmpResult(SIZE, b, c); + + + zeroArray(SIZE, c); + printDesc("Radix Sort, non-power-of-two"); + StreamCompaction::RadixSort::sort(NPOT, c, a); + printElapsedTime(StreamCompaction::Efficient::timer().getGpuElapsedTimeForPreviousOperation(), "(CUDA Measured)"); + printArray(NPOT, c, true); + printCmpResult(NPOT, d, c); + system("pause"); // stop Win32 console from closing on exit delete[] a; delete[] b; delete[] c; + delete[] d; } + +#else + +#include +#include +#include +#include +#include +#include "testing_helpers.hpp" +#include +#include + +const int TWO_POW = 25; +//Testing +int main(int argc, char* argv[]) { + if (flush == 1) { + FILE* f = freopen("blockSize_timings.txt", "a", stdout); + } + const int SIZE = 1 << TWO_POW; // feel free to change the size of array + const int NPOT = SIZE - 3; // Non-Power-Of-Two + int* a = new int[SIZE]; + int* b = new int[SIZE]; + int* c = new int[SIZE]; + // Scan tests + + //printf("\n"); + //printf("****************\n"); + printf("** SCAN TESTS **\n"); + //printf("****************\n"); + + std::cout << "SIZE= " << SIZE; + //std::cout << "blockSize= " << blockSize; + //for (int i = 1; i < 10; i++) { + printf("\n"); + //std::cout << "RUN#" << i << std::endl; + + genArray(SIZE - 1, a, 50); // Leave a 0 at the end to test that edge case + a[SIZE - 1] = 0; + //printArray(SIZE, a, true); + + // initialize b using StreamCompaction::CPU::scan you implement + // We use b for further comparison. Make sure your StreamCompaction::CPU::scan is correct. + // At first all cases passed because b && c are all zeroes. + zeroArray(SIZE, b); + //printDesc("cpu scan, power-of-two"); + StreamCompaction::CPU::scan(SIZE, b, a); + printElapsedTime(StreamCompaction::CPU::timer().getCpuElapsedTimeForPreviousOperation()); + //printArray(SIZE, b, true); + + zeroArray(SIZE, c); + //printDesc("cpu scan, non-power-of-two"); + StreamCompaction::CPU::scan(NPOT, c, a); + printElapsedTime(StreamCompaction::CPU::timer().getCpuElapsedTimeForPreviousOperation()); + //printArray(NPOT, c, true); + printCmpResult(NPOT, b, c); + + zeroArray(SIZE, c); + //printDesc("naive scan, power-of-two"); + StreamCompaction::Naive::scan(SIZE, c, a); + printElapsedTime(StreamCompaction::Naive::timer().getGpuElapsedTimeForPreviousOperation()); + //printArray(SIZE, c, true); + printCmpResult(SIZE, b, c); + + /* For bug-finding only: Array of 1s to help find bugs in stream compaction or scan + onesArray(SIZE, c); + printDesc("1s array for finding bugs"); + StreamCompaction::Naive::scan(SIZE, c, a); + printArray(SIZE, c, true); */ + + zeroArray(SIZE, c); + //printDesc("naive scan, non-power-of-two"); + StreamCompaction::Naive::scan(NPOT, c, a); + printElapsedTime(StreamCompaction::Naive::timer().getGpuElapsedTimeForPreviousOperation()); + //printArray(SIZE, c, true); + printCmpResult(NPOT, b, c); + + zeroArray(SIZE, c); + //printDesc("work-efficient scan, power-of-two"); + StreamCompaction::Efficient::scan(SIZE, c, a); + printElapsedTime(StreamCompaction::Efficient::timer().getGpuElapsedTimeForPreviousOperation()); + //printArray(SIZE, c, true); + printCmpResult(SIZE, b, c); + + zeroArray(SIZE, c); + //printDesc("work-efficient scan, non-power-of-two"); + StreamCompaction::Efficient::scan(NPOT, c, a); + printElapsedTime(StreamCompaction::Efficient::timer().getGpuElapsedTimeForPreviousOperation()); + //printArray(NPOT, c, true); + printCmpResult(NPOT, b, c); + + zeroArray(SIZE, c); + //printDesc("thrust scan, power-of-two"); + StreamCompaction::Thrust::scan(SIZE, c, a); + printElapsedTime(StreamCompaction::Thrust::timer().getGpuElapsedTimeForPreviousOperation()); + //printArray(SIZE, c, true); + printCmpResult(SIZE, b, c); + + zeroArray(SIZE, c); + //printDesc("thrust scan, non-power-of-two"); + StreamCompaction::Thrust::scan(NPOT, c, a); + printElapsedTime(StreamCompaction::Thrust::timer().getGpuElapsedTimeForPreviousOperation()); + //printArray(NPOT, c, true); + printCmpResult(NPOT, b, c); + + printf("\n"); + //printf("*****************************\n"); + //printf("** STREAM COMPACTION TESTS **\n"); + //printf("*****************************\n"); + + // Compaction tests + + genArray(SIZE - 1, a, 4); // Leave a 0 at the end to test that edge case + a[SIZE - 1] = 0; + //printArray(SIZE, a, true); + + int count, expectedCount, expectedNPOT; + + // initialize b using StreamCompaction::CPU::compactWithoutScan you implement + // We use b for further comparison. Make sure your StreamCompaction::CPU::compactWithoutScan is correct. + zeroArray(SIZE, b); + //printDesc("cpu compact without scan, power-of-two"); + count = StreamCompaction::CPU::compactWithoutScan(SIZE, b, a); + printElapsedTime(StreamCompaction::CPU::timer().getCpuElapsedTimeForPreviousOperation()); + expectedCount = count; + //printArray(count, b, true); + printCmpLenResult(count, expectedCount, b, b); + + zeroArray(SIZE, c); + //printDesc("cpu compact without scan, non-power-of-two"); + count = StreamCompaction::CPU::compactWithoutScan(NPOT, c, a); + printElapsedTime(StreamCompaction::CPU::timer().getCpuElapsedTimeForPreviousOperation()); + expectedNPOT = count; + //printArray(count, c, true); + printCmpLenResult(count, expectedNPOT, b, c); + + zeroArray(SIZE, c); + //printDesc("cpu compact with scan"); + count = StreamCompaction::CPU::compactWithScan(SIZE, c, a); + printElapsedTime(StreamCompaction::CPU::timer().getCpuElapsedTimeForPreviousOperation()); + //printArray(count, c, true); + printCmpLenResult(count, expectedCount, b, c); + + zeroArray(SIZE, c); + //printDesc("work-efficient compact, power-of-two"); + count = StreamCompaction::Efficient::compact(SIZE, c, a); + printElapsedTime(StreamCompaction::Efficient::timer().getGpuElapsedTimeForPreviousOperation()); + //printArray(count, c, true); + //printCmpLenResult(count, expectedCount, b, c); + + zeroArray(SIZE, c); + //printDesc("work-efficient compact, non-power-of-two"); + count = StreamCompaction::Efficient::compact(NPOT, c, a); + printElapsedTime(StreamCompaction::Efficient::timer().getGpuElapsedTimeForPreviousOperation()); + //printArray(count, c, true); + printCmpLenResult(count, expectedNPOT, b, c); + + //std::this_thread::sleep_for(std::chrono::milliseconds(1000)); + delete[] a; + delete[] b; + delete[] c; + //system("pause"); // stop Win32 console from closing on exit +} + +#endif \ No newline at end of file diff --git a/src/testing_helpers.hpp b/src/testing_helpers.hpp index 025e94aa..097b562b 100644 --- a/src/testing_helpers.hpp +++ b/src/testing_helpers.hpp @@ -6,6 +6,8 @@ #include #include +int TESTING = 0; + template int cmpArrays(int n, T *a, T *b) { for (int i = 0; i < n; i++) { @@ -23,8 +25,11 @@ void printDesc(const char *desc) { template void printCmpResult(int n, T *a, T *b) { - printf(" %s \n", - cmpArrays(n, a, b) ? "FAIL VALUE" : "passed"); + char* ans = cmpArrays(n, a, b) ? "FAIL VALUE" : "passed"; + if (!TESTING || ans != "passed") { + printf("%s \n", + ans); + } } template @@ -32,9 +37,12 @@ void printCmpLenResult(int n, int expN, T *a, T *b) { if (n != expN) { printf(" expected %d elements, got %d\n", expN, n); } - printf(" %s \n", - (n == -1 || n != expN) ? "FAIL COUNT" : - cmpArrays(n, a, b) ? "FAIL VALUE" : "passed"); + char* ans = (n == -1 || n != expN) ? "FAIL COUNT" : + cmpArrays(n, a, b) ? "FAIL VALUE" : "passed"; + if (!TESTING || ans != "passed") { + printf("%s \n", + ans); + } } void zeroArray(int n, int *a) { @@ -72,5 +80,10 @@ void printArray(int n, int *a, bool abridged = false) { template void printElapsedTime(T time, std::string note = "") { - std::cout << " elapsed time: " << time << "ms " << note << std::endl; -} + if (!TESTING) { + std::cout << " elapsed time: " << time << "ms " << note << std::endl; + } + else { + std::cout << time << std::endl; + } +} \ No newline at end of file diff --git a/stream_compaction/CMakeLists.txt b/stream_compaction/CMakeLists.txt index 19511caa..eababdbe 100644 --- a/stream_compaction/CMakeLists.txt +++ b/stream_compaction/CMakeLists.txt @@ -4,6 +4,7 @@ set(headers "naive.h" "efficient.h" "thrust.h" + "radix_sort.h" ) set(sources @@ -12,6 +13,7 @@ set(sources "naive.cu" "efficient.cu" "thrust.cu" + "radix_sort.cu" ) list(SORT headers) @@ -19,10 +21,11 @@ list(SORT sources) source_group(Headers FILES ${headers}) source_group(Sources FILES ${sources}) - +find_package(CCCL REQUIRED) add_library(stream_compaction ${sources} ${headers}) +target_link_libraries(stream_compaction CCCL::Thrust) if(CMAKE_VERSION VERSION_LESS "3.23.0") - set_target_properties(stream_compaction} PROPERTIES CUDA_ARCHITECTURES OFF) + set_target_properties(stream_compaction PROPERTIES CUDA_ARCHITECTURES OFF) elseif(CMAKE_VERSION VERSION_LESS "3.24.0") set_target_properties(stream_compaction PROPERTIES CUDA_ARCHITECTURES all-major) else() diff --git a/stream_compaction/common.cu b/stream_compaction/common.cu index 2ed6d630..ec533aef 100644 --- a/stream_compaction/common.cu +++ b/stream_compaction/common.cu @@ -23,7 +23,11 @@ namespace StreamCompaction { * which map to 0 will be removed, and elements which map to 1 will be kept. */ __global__ void kernMapToBoolean(int n, int *bools, const int *idata) { - // TODO + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + bools[index] = (idata[index] != 0); } /** @@ -32,8 +36,14 @@ namespace StreamCompaction { */ __global__ void kernScatter(int n, int *odata, const int *idata, const int *bools, const int *indices) { - // TODO + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + if (bools[index] == 1) { + odata[indices[index]] = idata[index]; + } } } -} +} \ No newline at end of file diff --git a/stream_compaction/common.h b/stream_compaction/common.h index d2c1fed9..297bb4d5 100644 --- a/stream_compaction/common.h +++ b/stream_compaction/common.h @@ -12,6 +12,7 @@ #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__) #define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__) +#define blockSize 256 /** * Check for CUDA errors; print and exit if there was a problem. diff --git a/stream_compaction/cpu.cu b/stream_compaction/cpu.cu index 719fa115..97da8dc5 100644 --- a/stream_compaction/cpu.cu +++ b/stream_compaction/cpu.cu @@ -1,7 +1,8 @@ #include #include "cpu.h" - #include "common.h" +#include +#include namespace StreamCompaction { namespace CPU { @@ -17,9 +18,16 @@ namespace StreamCompaction { * For performance analysis, this is supposed to be a simple for loop. * (Optional) For better understanding before starting moving to GPU, you can simulate your GPU scan in this function first. */ - void scan(int n, int *odata, const int *idata) { + void scan(int n, int* odata, const int* idata) { timer().startCpuTimer(); - // TODO + if (n == 0) { + timer().endCpuTimer(); + return; + } + odata[0] = 0; + for (int k = 1; k < n; k++) { + odata[k] = odata[k - 1] + idata[k - 1]; + } timer().endCpuTimer(); } @@ -28,11 +36,30 @@ namespace StreamCompaction { * * @returns the number of elements remaining after compaction. */ - int compactWithoutScan(int n, int *odata, const int *idata) { + int compactWithoutScan(int n, int* odata, const int* idata) { timer().startCpuTimer(); - // TODO + int k_i = 0; + for (int k = 0; k < n; k++) { + int idat = idata[k]; + if (idat == 0) { + continue; + } + odata[k_i] = idata[k]; + k_i++; + } timer().endCpuTimer(); - return -1; + return k_i; + } + + // Untimed Scan function (timed CPU::scan starts and stops the object timer) + void untimed_scan(int n, int* odata, const int* idata) { + if (n == 0) { + return; + } + odata[0] = 0; + for (int k = 1; k < n; k++) { + odata[k] = odata[k - 1] + idata[k - 1]; + } } /** @@ -40,11 +67,47 @@ namespace StreamCompaction { * * @returns the number of elements remaining after compaction. */ - int compactWithScan(int n, int *odata, const int *idata) { + int compactWithScan(int n, int* odata, const int* idata) { + if (n == 0) { + return 0; + } + int* bool_arr = new int[n]; + timer().startCpuTimer(); + for (int k = 0; k < n; k++) { + int idat = idata[k]; + if (idat == 0) { + bool_arr[k] = 0; + } + else { + bool_arr[k] = 1; + } + } + + CPU::untimed_scan(n, odata, bool_arr); // Using odata as index array to save space + for (int k = 0; k < n; k++) { + int bool_ = bool_arr[k]; + if (bool_ == 0) { + continue; + } + else { + int idat = idata[k]; + int index = odata[k]; + odata[index] = idat; + } + } + timer().endCpuTimer(); + int count = odata[n - 1] + bool_arr[n - 1]; + delete[] bool_arr; + return count; + } + void sort(int n, int* odata, const int* idata) { + // Copy input into output + std::copy(idata, idata + n, odata); + + // Sort the copy timer().startCpuTimer(); - // TODO + std::sort(odata, odata + n); timer().endCpuTimer(); - return -1; } } } diff --git a/stream_compaction/cpu.h b/stream_compaction/cpu.h index 873c0476..222b77a3 100644 --- a/stream_compaction/cpu.h +++ b/stream_compaction/cpu.h @@ -11,5 +11,7 @@ namespace StreamCompaction { int compactWithoutScan(int n, int *odata, const int *idata); int compactWithScan(int n, int *odata, const int *idata); + + void sort(int n, int* odata, const int* idata); } } diff --git a/stream_compaction/efficient.cu b/stream_compaction/efficient.cu index 2db346ee..2c619acd 100644 --- a/stream_compaction/efficient.cu +++ b/stream_compaction/efficient.cu @@ -3,6 +3,10 @@ #include "common.h" #include "efficient.h" +#define NUM_BANKS 32 +#define LOG_NUM_BANKS 5 +//#define CONFLICT_FREE_OFFSET(n)((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS)) +#define CONFLICT_FREE_OFFSET(n) ((n) >>(LOG_NUM_BANKS)) namespace StreamCompaction { namespace Efficient { using StreamCompaction::Common::PerformanceTimer; @@ -12,13 +16,226 @@ namespace StreamCompaction { return timer; } + __global__ void prescan(int n, int* g_odata, const int* g_idata) + { + int thid = threadIdx.x; + int offset = 1; + extern __shared__ int temp[]; + int ai = thid; + int bi = thid + (n / 2); + int bankOffsetA = CONFLICT_FREE_OFFSET(ai); + int bankOffsetB = CONFLICT_FREE_OFFSET(bi); + temp[ai + bankOffsetA] = + g_idata[ai]; + temp[bi + bankOffsetB] = g_idata[bi]; + for (int d = n >> 1; d > 0; d >>= 1) + // build sum in place up the tree + { + __syncthreads(); + if (thid < d) + { + int ai = offset * (2 * thid + 1) - 1; + int bi = offset * (2 * thid + 2) - 1; + ai += CONFLICT_FREE_OFFSET(ai); + bi += CONFLICT_FREE_OFFSET(bi); + temp[bi] += temp[ai]; + } + offset <<= 1; + } + if (thid == 0) + { + temp[n - 1 + CONFLICT_FREE_OFFSET(n - 1)] = 0; + } // clear the last element + for (int d = 1; d < n; d <<= 1) // traverse down tree & build scan + { + offset >>= 1; + __syncthreads(); + if (thid < d) + { + int ai = offset * (2 * thid + 1) - 1; + int bi = offset * (2 * thid + 2) - 1; + ai += CONFLICT_FREE_OFFSET(ai); + bi += CONFLICT_FREE_OFFSET(bi); + int t = temp[ai]; + temp[ai] = temp[bi]; + temp[bi] += t; + } + } + __syncthreads(); + g_odata[ai] = temp[ai + bankOffsetA]; + g_odata[bi] = temp[bi + bankOffsetB]; + } + + __global__ void multi_scan(int global_n, int B, int* g_odata, const int* g_idata, int* blockSums) + { + int thid = threadIdx.x; + int base = B * blockIdx.x; + int offset = 1; + extern __shared__ int temp[]; + int ai = thid; + int bi = thid + (B / 2); + int ga = base + ai; // Global indexes + int gb = base + bi; + int bankOffsetA = CONFLICT_FREE_OFFSET(ai); + int bankOffsetB = CONFLICT_FREE_OFFSET(bi); + temp[ai + bankOffsetA] = + g_idata[ga]; + temp[bi + bankOffsetB] = g_idata[gb]; + for (int d = B >> 1; d > 0; d >>= 1) + // build sum in place up the tree + { + __syncthreads(); + if (thid < d) + { + int ai = offset * (2 * thid + 1) - 1; + int bi = offset * (2 * thid + 2) - 1; + ai += CONFLICT_FREE_OFFSET(ai); + bi += CONFLICT_FREE_OFFSET(bi); + temp[bi] += temp[ai]; + } + offset <<= 1; + } + if (thid == 0) + { + blockSums[blockIdx.x] = + temp[B - 1 + CONFLICT_FREE_OFFSET(B - 1)]; + temp[B - 1 + CONFLICT_FREE_OFFSET(B - 1)] = 0; + } // clear the last element + for (int d = 1; d < B; d <<= 1) // traverse down tree & build scan + { + offset >>= 1; + __syncthreads(); + if (thid < d) + { + int ai = offset * (2 * thid + 1) - 1; + int bi = offset * (2 * thid + 2) - 1; + ai += CONFLICT_FREE_OFFSET(ai); + bi += CONFLICT_FREE_OFFSET(bi); + int t = temp[ai]; + temp[ai] = temp[bi]; + temp[bi] += t; + } + } + __syncthreads(); + g_odata[ga] = temp[ai + bankOffsetA]; + g_odata[gb] = temp[bi + bankOffsetB]; + } + + __global__ void make_exclusive(int n, int* odata, const int* idata) { + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + if (index == 0) { + odata[index] = 0; + return; + } + odata[index] = idata[index - 1]; + } + + __global__ void uniformAdd(int n, + int* odata, + const int* blockIncr, + int B) { + int base = blockIdx.x * B; + int offset = blockIncr[blockIdx.x]; // scanned block sums + + int i = base + threadIdx.x; + int j = base + threadIdx.x + (B / 2); + + if (i < n) odata[i] += offset; + if (j < n) odata[j] += offset; + } + + void recursiveScan(int n, int* d_out, const int* d_in) { + // Number of blocks and threads + int B = 2 * blockSize; + int numBlocks = (n + B - 1) / B; // ceil(n / B) + dim3 fullBlocksPerGrid(numBlocks); + int sharedMemBytes = (B + CONFLICT_FREE_OFFSET(B)) * sizeof(int); + int num_blocks_next_power_2 = 1 << ilog2ceil(numBlocks); + int*blockSums = nullptr; // Per-block Sums + + // Allocate and copy Memory + cudaMalloc((void**)&blockSums, num_blocks_next_power_2 * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_blockSums in scan failed!"); + + if (num_blocks_next_power_2 > numBlocks) { + cudaMemset(blockSums + numBlocks, 0, + (num_blocks_next_power_2 - numBlocks) * sizeof(int)); + } + + // Block-Wise Multi-Scan + Efficient::multi_scan << >> (n, B, d_out, d_in, blockSums); + checkCUDAErrorFn("multi-scan failed!"); + cudaDeviceSynchronize(); + if (numBlocks > 1) { + int* blockIncr = nullptr; // Per-block sums scan + cudaMalloc((void**)&blockIncr, num_blocks_next_power_2 * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_blockIncr in scan failed!"); + Efficient::recursiveScan (num_blocks_next_power_2, blockIncr, blockSums); + checkCUDAErrorFn("prescan of offsets failed!"); + cudaDeviceSynchronize(); + uniformAdd <<> > (n, d_out, blockIncr, B); + checkCUDAErrorFn("Uniform Add failed!"); + cudaDeviceSynchronize(); + cudaFree(blockIncr); + checkCUDAErrorFn("CudaFree blockIncr in scan failed!"); + } + cudaFree(blockSums); + checkCUDAErrorFn("CudaFree blockSums in scan failed!"); + } /** * Performs prefix-sum (aka scan) on idata, storing the result into odata. */ void scan(int n, int *odata, const int *idata) { + if (n <= 0) { + return; + } + if (n == 1) { // handle trivial case without GPU work + odata[0] = 0; + return; + } + int m = n; + bool is_pow_two = (n & (n - 1)) == 0; + if (!is_pow_two) { + n = 1 << ilog2ceil(n); + } + + + int* dev_buf_i = nullptr; //Input buffer + int* dev_buf_o = nullptr; //Output buffer + + // Multi Scan + // Allocate and copy Memory + cudaMalloc((void**)&dev_buf_i, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_i in scan failed!"); + + cudaMalloc((void**)&dev_buf_o, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_o in scan failed!"); + + cudaMemcpy(dev_buf_i, idata, m * sizeof(int), cudaMemcpyHostToDevice); + checkCUDAErrorFn("MemCpy dev_buf_i failed!"); + + + // Inter-Block Accumulation + + //// Number of blocks and threads + + if (!is_pow_two) { + cudaMemset(dev_buf_i + m, 0, (n - m) * sizeof(int)); + } + timer().startGpuTimer(); - // TODO + Efficient::recursiveScan(n, dev_buf_o, dev_buf_i); timer().endGpuTimer(); + + cudaMemcpy(odata, dev_buf_o, m * sizeof(int), cudaMemcpyDeviceToHost); + checkCUDAErrorFn("MemCpy dev_buf_o in scan failed!"); + cudaFree(dev_buf_o); + checkCUDAErrorFn("CudaFree dev_buf_o in scan failed!"); + cudaFree(dev_buf_i); + checkCUDAErrorFn("CudaFree dev_buf_i in scan failed!"); } /** @@ -30,11 +247,88 @@ namespace StreamCompaction { * @param idata The array of elements to compact. * @returns The number of elements remaining after compaction. */ - int compact(int n, int *odata, const int *idata) { - timer().startGpuTimer(); - // TODO - timer().endGpuTimer(); - return -1; + int compact(int n, int* odata, const int* idata) { + //Check trivial cases: + if (n <= 0) { + return 0; + } + if (n == 1) { + if (idata[0] != 0) + { + odata[0] = idata[0]; + return 1; + } + return 0; + } + dim3 fullBlocksPerGrid((n + blockSize - 1) / blockSize); + int m = n; + n = 1 << ilog2ceil(n); + + // Allocate and assign input + int* dev_buf_i = nullptr; + cudaMalloc((void**)&dev_buf_i, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_i failed!"); + cudaMemcpy(dev_buf_i, idata, m * sizeof(int), cudaMemcpyHostToDevice); + checkCUDAErrorFn("MemCpy dev_buf_i failed!"); + + // Allocate bools buffer + int* dev_buf_bools = nullptr; + cudaMalloc((void**)&dev_buf_bools, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_bools failed!"); + + //Allocate output buffer + int* dev_buf_o = nullptr; + cudaMalloc((void**)&dev_buf_o, m * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_o failed!"); + + //Allocate indices buffer + int* dev_buf_indices = nullptr; + cudaMalloc((void**)&dev_buf_indices, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_indices failed!"); + + cudaMemset(dev_buf_bools + m, 0, (n - m) * sizeof(int)); // Padding zeroes to nearest power of two + checkCUDAErrorFn("CudaMemset zeroes failed!"); + + + timer().startGpuTimer(); + //------------------------------------GPU-------------------------------------- + // Fill bools buffer + Common::kernMapToBoolean <<> > (m, dev_buf_bools, dev_buf_i); + cudaDeviceSynchronize(); + + // Scan bools to output + Efficient::recursiveScan(n, dev_buf_indices, dev_buf_bools); + + // Compact now + Common::kernScatter << < fullBlocksPerGrid, blockSize>>> (m, dev_buf_o, dev_buf_i, dev_buf_bools, dev_buf_indices); + checkCUDAErrorFn("efficient scatter failed!"); + cudaDeviceSynchronize(); + //---------------------------------------------------------------------------- + timer().endGpuTimer(); + + // Compute size of compacted array + int last_index; + cudaMemcpy(&last_index, dev_buf_indices + m - 1, sizeof(int), cudaMemcpyDeviceToHost); + + int last_bool; + cudaMemcpy(&last_bool, dev_buf_bools + m - 1, sizeof(int), cudaMemcpyDeviceToHost); + + long long int count = (last_index + last_bool); + + // Copy output to CPU + cudaMemcpy(odata, dev_buf_o, count * sizeof(int), cudaMemcpyDeviceToHost); + checkCUDAErrorFn("MemCpy dev_buf_o failed! (Copying output to cpu)"); + + // Free data + cudaFree(dev_buf_indices); + checkCUDAErrorFn("CudaFree dev_buf_indices failed!"); + cudaFree(dev_buf_i); + checkCUDAErrorFn("CudaFree dev_buf_i failed!"); + cudaFree(dev_buf_bools); + checkCUDAErrorFn("CudaFree dev_buf_bools failed!"); + cudaFree(dev_buf_o); + checkCUDAErrorFn("CudaFree dev_buf_o failed!"); + return count; } } } diff --git a/stream_compaction/efficient.h b/stream_compaction/efficient.h index 803cb4fe..d1eb9232 100644 --- a/stream_compaction/efficient.h +++ b/stream_compaction/efficient.h @@ -7,7 +7,7 @@ namespace StreamCompaction { StreamCompaction::Common::PerformanceTimer& timer(); void scan(int n, int *odata, const int *idata); - int compact(int n, int *odata, const int *idata); + void recursiveScan(int n, int* d_out, const int* d_in); } } diff --git a/stream_compaction/naive.cu b/stream_compaction/naive.cu index 43088769..14c0412a 100644 --- a/stream_compaction/naive.cu +++ b/stream_compaction/naive.cu @@ -11,15 +11,78 @@ namespace StreamCompaction { static PerformanceTimer timer; return timer; } - // TODO: __global__ + __global__ void onestep(int n, int* odata, const int* idata,int d) { + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + int idata_at_index = idata[index]; + int two_pow_d_minus_one = 1 << (d - 1); + if (index >= two_pow_d_minus_one) { + odata[index] = idata[index - two_pow_d_minus_one] + idata_at_index; + } + else { + odata[index] = idata_at_index; + } + } + __global__ void make_exclusive(int n, int* odata, const int* idata) { + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + if (index == 0) { + odata[index] = 0; + } + else { + odata[index] = idata[index - 1]; + } + } /** * Performs prefix-sum (aka scan) on idata, storing the result into odata. */ void scan(int n, int *odata, const int *idata) { + if (n <= 0) { + return; + } + if (n == 1) { // handle trivial case without GPU work + odata[0] = 0; + return; + } + dim3 fullBlocksPerGrid((n + blockSize - 1) / blockSize); + int* dev_bufA = nullptr; + int* dev_bufB = nullptr; + + cudaMalloc((void**)&dev_bufA, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_A failed!"); + + cudaMalloc((void**)&dev_bufB, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_B failed!"); + + cudaMemcpy(dev_bufA, idata, n * sizeof(int), cudaMemcpyHostToDevice); + checkCUDAErrorFn("MemCpy dev_buf_A failed!"); + + int num_iter = ilog2ceil(n); timer().startGpuTimer(); - // TODO + + for (int d = 1; d <= num_iter; d++) { + onestep << > > (n, dev_bufB, dev_bufA, d); + checkCUDAErrorFn("onestep Naive failed!"); + cudaDeviceSynchronize(); + std::swap(dev_bufA, dev_bufB); // Output in dev_buf_A + } + // Inclusive Scan to Exclusive + make_exclusive<<> > (n, dev_bufB, dev_bufA); // Exclusive scan in dev_buf_B + checkCUDAErrorFn("make_exclusive failed!"); + cudaDeviceSynchronize(); timer().endGpuTimer(); + + cudaMemcpy(odata, dev_bufB, n * sizeof(int), cudaMemcpyDeviceToHost); + checkCUDAErrorFn("MemCpy dev_buf_B failed!"); + cudaFree(dev_bufA); + checkCUDAErrorFn("CudaFree dev_buf_A failed!"); + cudaFree(dev_bufB); + checkCUDAErrorFn("CudaFree dev_buf_B failed!"); } } } diff --git a/stream_compaction/radix_sort.cu b/stream_compaction/radix_sort.cu new file mode 100644 index 00000000..b234a134 --- /dev/null +++ b/stream_compaction/radix_sort.cu @@ -0,0 +1,167 @@ +#include +#include +#include "common.h" +#include "efficient.h" +#include "radix_sort.h" + +namespace StreamCompaction { + namespace RadixSort { + using StreamCompaction::Common::PerformanceTimer; + + PerformanceTimer& timer() { + static PerformanceTimer t; + return t; + } + + __global__ void uniformAdd(int n, + int* odata, + const int num) { + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + odata[index] += num; + } + + __global__ void negate_bools_into(int n, int* out, const int* in) { + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + out[index] = 1 - in[index]; + } + + __global__ void radix_to_bools(int n, int* b, const int* idata, int bit) { + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + unsigned int u = static_cast(idata[index]); + b[index] = (u >> bit) & 1; + } + + __global__ void assign_indexes(int n, + int* odata, const int* idata, const int* isOne, const int* idxOnes, const int* idxZeros, int totalZeroes) { + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= n) { + return; + } + int one = isOne[index]; + if (one) { + odata[idxOnes[index] + totalZeroes] = idata[index]; + } + else { + odata[idxZeros[index]] = idata[index]; + } + } + + void onestep(int n, int m, int* odata, const int* idata, + int* idxOnes, int* idxZeros, int* bOne, int* bZero, int bit) { + dim3 fullBlocksPerGrid((n + blockSize - 1) / blockSize); + + // To bools + RadixSort::radix_to_bools << > > (m, bOne, idata, bit); + cudaDeviceSynchronize(); + if (n > m) { + cudaMemset(bOne + m, 0, (n - m) * sizeof(int)); + } + cudaDeviceSynchronize(); + + RadixSort::negate_bools_into <<> > (m, bZero, bOne); + cudaDeviceSynchronize(); + if (n > m) { + cudaMemset(bZero + m, 0, (n - m) * sizeof(int)); + } + cudaDeviceSynchronize(); + + // Scan ones -> idxOnes (exclusive) + Efficient::recursiveScan(n, idxOnes, bOne); + cudaDeviceSynchronize(); + + int onesBeforeLast = 0, lastOneBit = 0; + cudaMemcpy(&onesBeforeLast, idxOnes + n - 1, sizeof(int), cudaMemcpyDeviceToHost); + checkCUDAErrorFn("cudaMemcpy onesBeforeLast failed!"); + cudaMemcpy(&lastOneBit, bOne + n - 1, sizeof(int), cudaMemcpyDeviceToHost); + checkCUDAErrorFn("cudaMemcpy lastOneBit failed!"); + int totalOnes = onesBeforeLast + lastOneBit; + int totalZeros = m - totalOnes; // use m so padding never contributes + // Scan zeros -> idxZeros (exclusive) + Efficient::recursiveScan(n, idxZeros, bZero); + cudaDeviceSynchronize(); + // Scatter: use original isOne flags + RadixSort::assign_indexes << > > (m, odata, idata, bOne, idxOnes, idxZeros, totalZeros); + checkCUDAErrorFn("Assign indexes failed!"); + cudaDeviceSynchronize(); + } + + void sort(int n, int* odata, const int* idata) { + if (n <= 0) { + return; + } + if (n == 1) { // handle trivial case without GPU work + odata[0] = idata[0]; + return; + } + int m = n; + int pow2 = 1 << ilog2ceil(n); + if (n != pow2) { + n = pow2; + } + dim3 fullBlocksPerGrid((n + blockSize - 1) / blockSize); + int* dev_bufA = nullptr; + int* dev_bufB = nullptr; + int* t = nullptr; + int* f = nullptr; + int* b0 = nullptr; + int* b1 = nullptr; + + cudaMalloc((void**)&dev_bufA, m * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_A failed!"); + + cudaMalloc((void**)&dev_bufB, m * sizeof(int)); + checkCUDAErrorFn("cudaMalloc dev_buf_B failed!"); + + cudaMalloc((void**)&t, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc t failed!"); + + cudaMalloc((void**)&f, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc f failed!"); + + cudaMalloc((void**)&b0, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc b failed!"); + cudaMalloc((void**)&b1, n * sizeof(int)); + checkCUDAErrorFn("cudaMalloc b failed!"); + + cudaMemcpy(dev_bufA, idata, m * sizeof(int), cudaMemcpyHostToDevice); + checkCUDAErrorFn("MemCpy dev_buf_A failed!"); + + int num_iter = sizeof(int) * 8; + + // ------------------GPU----------------------------------- + timer().startGpuTimer(); + for (int bit = 0; bit < num_iter; bit++) { + RadixSort::onestep(n, m, dev_bufB, dev_bufA, t, f, b1, b0, bit); + checkCUDAErrorFn("onestep Radix failed!"); + cudaDeviceSynchronize(); + std::swap(dev_bufA, dev_bufB); // Output now in dev_buf_A + } + timer().endGpuTimer(); + // ------------------GPU----------------------------------- + + cudaMemcpy(odata, dev_bufA, m * sizeof(int), cudaMemcpyDeviceToHost); + cudaDeviceSynchronize(); + cudaFree(dev_bufA); + checkCUDAErrorFn("CudaFree dev_buf_A failed!"); + cudaFree(dev_bufB); + checkCUDAErrorFn("CudaFree dev_buf_B failed!"); + cudaFree(t); + checkCUDAErrorFn("CudaFree t failed!"); + cudaFree(f); + checkCUDAErrorFn("CudaFree f failed!"); + cudaFree(b0); + checkCUDAErrorFn("CudaFree b0 failed!"); + cudaFree(b1); + checkCUDAErrorFn("CudaFree b1 failed!"); + } + } +} diff --git a/stream_compaction/radix_sort.h b/stream_compaction/radix_sort.h new file mode 100644 index 00000000..fc0e13e9 --- /dev/null +++ b/stream_compaction/radix_sort.h @@ -0,0 +1,15 @@ +#pragma once + +#include "common.h" + +namespace StreamCompaction { + namespace RadixSort { + // Reuse the same timer interface as other modules + StreamCompaction::Common::PerformanceTimer& timer(); + + // Stable LSD radix sort + // Sorts idata into odata. Length is n. + void sort(int n, int* odata, const int* idata); + } +} + diff --git a/stream_compaction/thrust.cu b/stream_compaction/thrust.cu index 1def45e7..59172634 100644 --- a/stream_compaction/thrust.cu +++ b/stream_compaction/thrust.cu @@ -17,12 +17,21 @@ namespace StreamCompaction { /** * Performs prefix-sum (aka scan) on idata, storing the result into odata. */ - void scan(int n, int *odata, const int *idata) { + void scan(int n, int* odata, const int* idata) { + + + // copy host -> device + thrust::device_vector d_in(idata, idata + n); + thrust::device_vector d_out(n); + timer().startGpuTimer(); - // TODO use `thrust::exclusive_scan` - // example: for device_vectors dv_in and dv_out: - // thrust::exclusive_scan(dv_in.begin(), dv_in.end(), dv_out.begin()); + // exclusive scan on device + thrust::exclusive_scan(d_in.begin(), d_in.end(), d_out.begin()); + cudaDeviceSynchronize(); // ensure timing is correct timer().endGpuTimer(); + + // copy device -> host + thrust::copy(d_out.begin(), d_out.end(), odata); } } }