“Everyday” Random Shuffling with Formulas

Home
/
Blog
/
Excel Functions & Formulas
/
“Everyday” Random Shuffling with Formulas

In this article, we’ll talk about how to complete random shuffling with only formulas. Shuffling methods via VBA or a dime a dozen. But it’s much harder to perform a formula-based random shuffle since formulas don’t really allow for iteration. Non-VBA methods therefore require we think about these problems differently (what I describe in Dashboards as “Thinking Outside the Cell”). I call this the “everyday” method because it’s technically imperfect. From a statistical standpoint, the results are not uniformly distributed. That means the method isn’t perfectly random at all times. We’ll attend to that later in the article.

Step 1. Lay it all out

For now, let’s take a look at a spreadsheet showing our deck of cards (the Excel file is available for download at the end of this article).

Though it’s slightly cut off in the picture, there is a complete 52-card deck.

Step 2. Add Some Randomness

Because this is a shuffle, we need to add some randomness to reorganize the cards in the deck. Perhaps the easiest method to do this would be to use RANDBETWEEN. Here, we’ll use =RANDBETWEEN(1,52) since there are 52 cards in total. In the next image, I’ve dragged this formula down so that it’s used 52 times.

Anyone who has attempted shuffling in Excel before should immediately realize the problem with using RANDBETWEEN as is. First, there are no guarantees RANDBETWEEN will produce every single number between 1 and 52. Second, there are no guarantees RANDBEETWEEN won’t produce a series of repeated numbers. In the image above, the cells highlighted in peach are duplicates! Clearly, our method needs a few more steps!

Step 3. Add Some Noise

The way we solve both issues presented above is by adding noise. Noise is a small amount of “error” we add to add each randomly generated number. This small amount of error will ensure we also produce a unique list of random numbers. In the image below, I’ve added a small amount to each random number. The small amount is essentially the current card index divided by an arbitrarily large number. As you know, small numbers divided by big numbers produce very small numbers. Using the card index ensures the small amount of noise added is always different, since no cards have the same index. Alternatively, I could have used the volatile function ROW instead of the card index.
If you look at rows 11, 18 and 22 in the picture, you’ll see the RANDBETWEEN resulted in a 50 for each. But because we’ve added some noise, each item has a different resulting value.

Step 4. Re-RANK ‘em

What we need to do now is to transform the values produced into numbers between 1 and 52. We can do this using the RANK function. Since each random number is now unique, we know that each number must have a unique rank within the entire range of random numbers produced. For instance if we were just to consider the first three numbers produced in the image above–21.00002, 49.00003 and 28.00004–these numbers, when ranked in descending largest-to-smallest order would yield rank results of 3 (21.0002 is the lowest numbers on the list), 1 (49.00003 is the greatest), and 2 (28.0004 is in the middle). We’ll do this RANKing across all cards (shown below).

Step 5. Use `INDEX` to Retrieve the Results

If our list of random numbers is unique then the their ranks must also be unique. So we’ve now addressed the two issues described above. We now have a resulting list of numbers that fully spans the range of integer numbers from 1 to 52. The last step is to use this resulting list as an index back onto the original card set.

And that’s it! You now have a shuffled deck.

Discussion

This setup was likely easier than you though to build. However, it does come with some unfortunate tradeoffs. The biggest trade off is that it’s not perfectly random. Here’s what I mean: let’s say two random numbers have a collision (i.e. the same random number is picked between them). In the image above, rows 7 and 14 collide at random number 42. We’ve seemingly resolved this issue by making each cell unique. However, because we’re using RANK, the item in row 7 will always be closer to the top of the deck than the item in row 14. This is because the noise is not randomly generated. Collisions in this method will always give sorting preference to the numbers that collide first. If we ran this random sorter a few thousand times, the resulting frequency of each card would demonstrate the sort order is not completely random.
The easy away around this is to have the random numbers collide less often. We use RANDBETWEEN(1,52) in this example because it makes thinking about the problem more approachable. But, since we’re checking the rank of unique numbers in a list, the mathematics do not dictate the random numbers be in any specific range. In fact, we could just as easily replace the random number generating part of the formula to something like RAND()*10000. Now that the range of possible numbers is bigger, it’s less likely they’ll be a collision in our shuffle. However, if we’re thinking broadly about shuffling items, the more items you have to shuffle the more likely they’ll be collisions. That’s just because the potential for collisions increases with each new random number generated.

Alternatively, you could implement a Fisher-Yates shuffle. This method will always generate random numbers, and it is indeed possible to use only formulas to implement. However, it’s a bit more complicated and perhaps best served as an article for another day (but you’re welcome to try it on your own). Until then, this method will suffice for most demonstrations.
Download the worksheet used for this article: Unique Random Sorting

Author
Recent Posts

Jordan Goldmeier

Jordan Goldmeier is an accomplished data professional with a wealth of experience across various industries. He currently serves as a consultant at Anarchy Data, where he assists businesses in maximizing the capabilities of Excel for financial planning and analysis. Jordan is also an instructor at Full Stack Modeller and a former Adjunct Instructor in Analytics at Wake Forest University. His extensive career has seen him hold positions as the Chief Operations Officer at Excel.TV, Data Science Manager at DataKind, Data Scientist at Dealer Tire and EY, Analytics & Data Vis Developer at The Perduco Group, and Operations Research Analyst at Booz Allen Hamilton. Jordan's background in data analytics and his passion for Excel make him a valuable resource for businesses seeking to improve their data-driven decision-making processes.

Latest posts by Jordan Goldmeier (see all)

All Excel LOOKUPs Explained - May 26, 2020
How to: Power Query File From Folder - April 21, 2020
Oz’s Excel Tip: Keep a Workbook for Random Data in Excel - January 23, 2020

What Function Helps You Determine Monthly Loan Payments – Excel Challenge

November 22, 2015