Skip to content

Added merge sort (clones way too much)#1

Open
timando wants to merge 1 commit intojonhoo:masterfrom
timando:merge_sort
Open

Added merge sort (clones way too much)#1
timando wants to merge 1 commit intojonhoo:masterfrom
timando:merge_sort

Conversation

@timando
Copy link

@timando timando commented Nov 15, 2020

This works, but does too many allocations. Also, merge sort needs to keep a copy of the array, so I added a CloneSorter trait that adds the requirement that T is Clone. It might make more sense to change Sorter to Sorter<T> and let the sort implementations choose what traits to require (radix sort for example isn't a comparison sort, but should work for anything that can be converted into a byte array that compares the same way as the underlying type (integers, strings, composite structs, etc.)), but that's out of the scope of this pull request. An alternative is to require T: Ord+Clone on sorter directly.

@jhinch
Copy link

jhinch commented Nov 15, 2020

Here is a implementation which does not require Clone and avoids recursion based on a bottom up strategy. It does require unsafe and MaybeUninit. I'm not an expert in unsafe code so I'm not sure if it is actually all memory safe but it appears to be.

use super::Sorter;
use std::mem::MaybeUninit;

pub struct MergeSort;

impl Sorter for MergeSort {
    fn sort<T>(&self, slice: &mut [T])
    where
        T: Ord,
    {
        let n = slice.len();
        let mut auxiliary = vec_with_len::<T>(n);
        // safety: MaybeUninit is transparent so it is safe to cast `&mut [T]` to `&mut [MaybeUninit<T>]`
        let mut a = unsafe { &mut *(slice as *mut [T] as *mut [std::mem::MaybeUninit<T>]) };
        let mut b = auxiliary.as_mut_slice();
        let mut move_at_end = false;

        let mut width = 1;
        while width < n {
            for i in (0..n).step_by(width * 2) {
                let left_and_right = &mut a[i..std::cmp::min(i + width * 2, n)];
                let (mut left, mut right) =
                    left_and_right.split_at_mut(std::cmp::min(width, left_and_right.len()));
                for dest in b.iter_mut().skip(i).take(width * 2) {
                    // safety: `a` always contains the initialized memory and the indexes
                    // are incremented when they are swapped into the uninitialized slice
                    if !left.is_empty()
                        && (right.is_empty() || unsafe { *left[0].as_ptr() <= *right[0].as_ptr() })
                    {
                        std::mem::swap(dest, &mut left[0]);
                        left = &mut left[1..];
                    } else {
                        std::mem::swap(dest, &mut right[0]);
                        right = &mut right[1..];
                    }
                }
            }

            // now, b contains all the initialized memory, and b all the uninitialized
            // so we need to swap them
            std::mem::swap(&mut a, &mut b);
            move_at_end = !move_at_end;
            width *= 2;
        }
        if move_at_end {
            for i in 0..n {
                std::mem::swap(&mut b[i], &mut a[i]);
            }
        }
    }
}

fn vec_with_len<T>(len: usize) -> Vec<MaybeUninit<T>> {
    let mut vec = Vec::<MaybeUninit<T>>::with_capacity(len);
    // safety: MaybeUninit guards all access to `T` via an unsafe block and len == capacity
    unsafe {
        vec.set_len(len);
    }
    vec
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants