Skip to content

Use random-shuffle list shuffling instead of random >= 1.3#169

Merged
mchav merged 6 commits intoDataHaskell:mainfrom
jisantuc:maint/js/downgrade-random
Feb 27, 2026
Merged

Use random-shuffle list shuffling instead of random >= 1.3#169
mchav merged 6 commits intoDataHaskell:mainfrom
jisantuc:maint/js/downgrade-random

Conversation

@jisantuc
Copy link
Contributor

Overview

This PR downgrades random from >= 1.3 to between 1.2 and 1.3. random 1.3+ plays poorly with some packages in the ecosystem.
Instead of uniformShuffleListM, it uses shuffle' from random-shuffle, which was already a test dependency.

@jisantuc
Copy link
Contributor Author

Per the contributing guidelines, I tried to add a label (I assumed that was what "A tag (usually feat, documentation, refactor etc)" meant?) but it seems like I'm not allowed to do that 🤔


shuffledIndices :: (RandomGen g) => g -> Int -> VU.Vector Int
shuffledIndices pureGen k = VU.fromList (fst (uniformShuffleList [0 .. (k - 1)] pureGen))
shuffledIndices pureGen k = VU.fromList (shuffle' [0 .. (k - 1)] k pureGen)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty unclear to me how to test this. I thought about a shuffle/un-shuffle identity test, but unshuffle didn't get me very far in search results 😅

Any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of things I would test then.

  1. test that the shuffling doesn't do anything else than shuffle. So basically sort the shuffled and unshuffled and see if it's equal to the same thing. This ensures that shuffling isn't doing anything else than permuting the indices.

  2. check that shuffling with equivalent seeds result in the same shuffle.

That's about all I can think of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. No need to round trip. Checking that shuffling preserves length (even when there are duplicates) is probably important. Plus that different seeds are different shuffle orders.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also on second thought the intermediate list allocation is wasteful. I'll add it as a GSOC task to implement fisher yates here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that task shouldn't be GSOC, it should be anyone! Also mwc-random does that I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? It seems simple and self contained enough since it's reading the algorithm and implementing it through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be some task pipeline for people who are not interested only in GSOC, but more interested generally in contributing!

@jisantuc
Copy link
Contributor Author

I used this branch's package in the plot survey branch I started with no problems other than needing to roll my own list generator with replicateM and state jisantuc/goofing-off@7c9b310#diff-206b9ce276ab5971a2489d75eb1b12999d4bf3843b7988cbe8d687cfde61dea0R4

@mchav
Copy link
Member

mchav commented Feb 27, 2026

@jisantuc i think there's an issue with the test module name. Once that's fixed this will be good to go.

@jisantuc
Copy link
Contributor Author

@mchav yeah I forgot to rename the module after I understood the test naming convention a little better. Fixed now though

@mchav mchav merged commit 03e34ab into DataHaskell:main Feb 27, 2026
7 checks passed
@jisantuc
Copy link
Contributor Author

Per the contributing guidelines, I tried to add a label (I assumed that was what "A tag (usually feat, documentation, refactor etc)" meant?) but it seems like I'm not allowed to do that 🤔

🤦🏻 @mchav I just noticed other commits on main -- you meant a tag in the commit header itself, like in the conventional commits style?

@jisantuc jisantuc deleted the maint/js/downgrade-random branch February 27, 2026 05:30
@Ai-Ya-Ya
Copy link
Contributor

Ai-Ya-Ya commented Feb 27, 2026

@mchav What's your timeline on the next Hackage release? nixpkgs CI should pull from there, the hope being it'll be unbroken.

@mchav
Copy link
Member

mchav commented Feb 27, 2026

@Ai-Ya-Ya released: https://hackage.haskell.org/package/dataframe-0.5.0.0

@juhp
Copy link

juhp commented Mar 2, 2026

This will prevent dataframe 0.5 from going into Stackage unless you will relax the random bounds...
It is a bounds regression wrt to 0.4.

juhp added a commit to commercialhaskell/stackage that referenced this pull request Mar 2, 2026
@Ai-Ya-Ya
Copy link
Contributor

Ai-Ya-Ya commented Mar 2, 2026

Nix will not have a problem if random 1.3 is allowed in Cabal, since the default version of random is that of in Stackage (v1.2 for LTS 24). I agree it would be better to relax bounds on random.

random 1.3+ plays poorly with some packages in the ecosystem

@jisantuc Is this just from Nix's diamond dependency problems from 1.3, or is this present in other package environments as well?

@jisantuc
Copy link
Contributor Author

jisantuc commented Mar 2, 2026

@Ai-Ya-Ya it was just the diamond dependency problem and nix. Because I had to pin random, other things that depended on random needed to be build, nix runs tests in builds, and time-compat's tests pin random < 1.3. If you don't have to build time-compat's tests to build the package, then it should be fine to relax the upper bound.

Also TIL "bounds regression"

juhp added a commit to commercialhaskell/stackage that referenced this pull request Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants