This program automates the tedious process of combining laboratory sample racks into optimal batches. Using a heuristic bin-packing algorithm, it takes hundreds of individual racks (each containing 1-96 samples) and efficiently combines them into near-capacity destination racks, maximizing the utilization space while keeping the total number of processing batches reasonable.
- Intelligent batch optimization following laboratory constraints
- Automated CSV export for easy integration with existing workflows
- Comprehensive testing suite with 11 test cases
- Built-in debugging functions for validation
In summer 2024, a biorepository researcher approached me with a frustrating manual lab task that was eating up hours of their time. They regularly receive hundreds of sample racks containing anywhere from 1 to 96 samples each, and must manually figure out how to combine these into full or near-full racks while minimizing processing batches.
The Challenge: The process isn't as simple as just filling up racks - there are strict operational constraints:
- Rack Types: Each batch consists of "source" racks (being emptied) and "destination" racks (being filled)
- Batch Size Limit: Each processing batch can contain a maximum of 20 racks total (source + destination)
- Capacity: Each destination rack holds up to 96 samples
- Balance: The above constraints mean that for every batch with X source racks, you have up to (20 - X) destination racks and a total capacity of (20 - X) * 96 samples
Example: If you have 18 source racks in a batch, you can only have 2 destination racks, giving you a total sample capacity of 2 × 96 = 192 samples
- Windows environment (tested with Visual Studio's build system)
- Visual Studio with "Desktop development with C++" workload installed
- Clone the repository and open Rack_Final.vcxproj in Visual Studio
- Build the project using Debug configuration
- Prepare your input file:
- Create a .txt file with your rack data
- Format: Each line should contain [Rack_ID] [Sample_Count]
- Sample counts must be between 1 and 96
- Place your input file in the "inputs" folder, located in the same directory as the Rack_Final.vcxproj file
- Run the program and follow the prompts
- Find your results in the generated CSV file in the "results" folder
RACK001 45
RACK002 23
RACK003 67
RACK004 12
The algorithm begins by reading rack data and populating a sample_frequencies array that tracks how many racks contain each sample number (1-95). A testing_array vector is used to temporarily store and manipulate sample numbers while finding optimal batch combinations. As sample numbers are added to or removed from the testing array, the frequencies are updated in real-time to reflect availability.
The program distributes all source racks using two main strategies:
- Uses create_new_batch() to find optimal combinations from abundant choices
- Continues until fewer than 19 source racks remain
- Uses distribute_remainder() to efficiently pack remaining racks
- Creates batches by adding as many racks as possible without exceeding destination capacity
- Continues until all racks are distributed
- choose_num_sources() finds the maximum number of source racks that can fit in one batch using the following process:
- Start with the largest available sample number
- Incrementally add smallest sources until destination capacity would be exceeded
- this ensures that at least one of the large numbers is included with the small numbers, spreading out the distribution more evenly
- Saves a backup_array as the last valid configuration before exceeding capacity
- Clears the testing array and rebuilds it strategically:
- Adds the largest available sample number first
- Fills remaining spots proportionally based on sample frequencies (for balanced distribution)
- Fills any remaining spots with the smallest available values
- Leaves exactly one spot open for the "ideal last sample"
- Calculates the ideal_last_spot by subtracting current total from destination capacity
- If ideal value is too small (<1): Decreases total by replacing larger values with smaller ones
- If ideal value is too large (>95) or unavailable: Increases total by replacing smaller values with larger ones
- Fallback mechanism: If adjustments fail, uses the pre-calculated backup_array
- Adds the final calculated sample number to complete the batch
- Creates a new Batch object and assigns actual source racks based on testing array values
- Removes used racks from the available pool
For the final racks, the algorithm uses a simpler heuristic approach:
- Processes remaining source racks in their current order (how they were read in)
- For each potential source rack, updates required destination racks based on cumulative sample count
- Stops adding source racks when number of source racks + destination racks in the batch exceeds 20
- Creates new batches as needed until all remaining racks are distributed
After distribution is complete, the program:
- Provides an optional summary showing batch counts, source/destination ratios, and capacity utilization
- Exports detailed results to a CSV file with columns: Rack ID, Sample Number, Batch Number, Number Sources, Number Destinations, and Total Samples In Batch (with one row per source rack)
- The generated CSV file can be found in the "results" folder located in the same folder as Rack_Final.vcxproj
The program includes 11 comprehensive test cases designed to validate the algorithm's performance across different data distributions and edge cases. These test files help ensure the algorithm works correctly under various real-world scenarios.
What's included:
- Test input files: rack_input_1.txt through rack_input_11.txt
- Test result files: result_1.csv through result_11.csv
- Test documentation: input_descriptions.txt provides detailed descriptions of each test case and what it's designed to validate
Input files are located in the "inputs" folder, in the same directory as Rack_Final.vcxproj
- The algorithm is optimized for typical laboratory distributions, which are generally skewed toward lower-numbered samples based on real-world usage patterns
- Although the provided test files include some non-standard data distributions, performance may vary with these less typical cases
- The program contains little to no error-checking for invalid formats, so please ensure your inputs are formatted correctly