How to select a seed for simulation or randomization

If you need to generate a randomization list for a clinical trial, do some simulations or perhaps perform a huge bootstrap analysis, you need a way to draw random numbers. Putting many pieces of paper in a hat and drawing them is possible in theory, but you will probably be using a computer for doing this. The computer, however, does not generate random numbers. It generates pseudo random numbers. They look and feel almost like real random numbers, but they are not random. Each number in the sequence is calculated from its predecessor, so the sequence has to begin somewhere;  it begins in the seed – the first number in the sequence.

Knowing the seed is a good idea. It enables reproducing the analysis, the simulation or the randomization list. If you run a clinical trial, reproducibility is crucial. You must know at the end of the trial which patient was randomized to each treatment; otherwise you will throw all your data to the garbage. During the years I worked at Teva Pharmaceuticals, we took every possible safety measure: We burnt the randomization lists, the randomization SAS code and the randomization seed on a CD and kept it in a fire-proof safe. We also kept all this information in analog media. Yes, we printed the lists, the SAS code and the seed on paper, and these were also kept in the safe.

Using the same seed every time is not a good idea. If you use the same seed every time, you get the same sequence of pseudo-random numbers every time, and therefore your numbers are not pseudo-random anymore. Selecting a different seed every time is good practice.

How do you select the seed? Taking a number out of your head is still not a good idea. Our heads are biased. Like passwords, people tend to use some seeds more often than other possible seeds. Don’t be surprised if you see codes with seeds like 123, 999 or 31415.

The best practice is to choose a random seed, but this creates a magic circle. You can still turn to your old hat and some pieces of paper, but there is a simple alternative: generate the seed from the computer clock.

This is how you do it in R by using the Sys.time() function: Get the system time, convert it to an integer, and you’re done. In practice, I take only the last 5 digits of that integer. And off course, I keep the chosen seed on record. I also save the Sys.time value and its integer value, just in case. Finally, I hardcode the seed I get. This is the R code:

> # set the initial seed as the computer time
> initial_seed=Sys.time()
> print (initial_seed)
[1] "2019-03-16 10:56:37 IST"

> # convert the time into a numeric variable
> initial_seed=as.integer(initial_seed)
> print (initial_seed)
[1] 1552726597

> # take the last five digits f the initial seed
> the_seed=initial_seed %% 100000
> print(the_seed) # 
[1] 26597

> # do your simulation
> set.seed(26597)
> print(rnorm(3))
[1] -0.4759375 -1.9624872 -0.1601236

> # reproduce your simulation
> set.seed(26597)
> print(rnorm(3))
[1] -0.4759375 -1.9624872 -0.1601236

6 thoughts on “How to select a seed for simulation or randomization”

  1. In the second line, don’t you mean “initial_seed=as.integer(Sys.time())” rather than “initial_seed=as.integer(initial_seed)”?


    1. You will get different seeds in different runs. However, since there are only 100000 possible seeds that this code can generate, there is a really good chance that someone, even you, will get the same seed like the one you’ve got. (Easy riddle: what is this chance?)


You are welcome to comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.