So this is a Monte Carlo solution, that is, we are going to simulate drawing the tiles a zillion of times and then we are going to calculate how many of these simulated draws resulted in us being able to form the given word. I've written the solution in R, but you could use any other programming language, say Python or Ruby.
I'm first going to describe how to simulate one draw. First let's define the tile frequencies.
# The tile frequency used in English Scrabble, using "_" for blank.
tile_freq <- c(2, 9 ,2 ,2 ,4 ,12,2 ,3 ,2 ,9 ,1 ,1 ,4 ,2 ,6 ,8 ,2 ,1 ,6 ,4 ,6 ,4 ,2 ,2 ,1 ,2 ,1)
tile_names <- as.factor(c("_", letters))
tiles <- rep(tile_names, tile_freq)
## [1] _ _ a a a a a a a a a b b c c d d d d e e e e e e
## [26] e e e e e e f f g g g h h i i i i i i i i i j k l
## [51] l l l m m n n n n n n o o o o o o o o p p q r r r
## [76] r r r s s s s t t t t t t u u u u v v w w x y y z
## 27 Levels: _ a b c d e f g h i j k l m n o p q r ... z
Then encode the word as a vector of letter counts.
word <- "boot"
# A vector of the counts of the letters in the word
word_vector <- table( factor(strsplit(word, "")[[1]], levels=tile_names))
## _ a b c d e f g h i j k l m n o p q r s t u v w x y z
## 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0
Now draw a sample of seven tiles and encode them in the same way as the word.
tile_sample <- table(sample(tiles, size=7))
## _ a b c d e f g h i j k l m n o p q r s t u v w x y z
## 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 1 0 0 0
At last, calculate what letters are missing...
missing <- word_vector - tile_sample
missing <- ifelse(missing < 0, 0, missing)
## _ a b c d e f g h i j k l m n o p q r s t u v w x y z
## 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
... and sum the number of missing letters and subtract the number of available blanks. If the result is zero or less we succeeded in spelling the word.
sum(missing) - tile_sample["blank"] <= 0
## FALSE
In this particular case we didn't though... Now we just need to repeat this many times and calculate the percentage of successful draws. All this is done by the following R function:
word_prob <- function(word, reps = 50000) {
tile_freq <- c(2, 9 ,2 ,2 ,4 ,12,2 ,3 ,2 ,9 ,1 ,1 ,4 ,2 ,6 ,8 ,2 ,1 ,6 ,4 ,6 ,4 ,2 ,2 ,1 ,2 ,1)
tile_names <- as.factor(c("_", letters))
tiles <- rep(tile_names, tile_freq)
word_vector <- table( factor(strsplit(word, "")[[1]], levels=tile_names))
successful_draws <- replicate(reps, {
tile_sample <- table(sample(tiles, size=7))
missing <- word_vector - tile_sample
missing <- ifelse(missing < 0, 0, missing)
sum(missing) - tile_sample["_"] <= 0
})
mean(successful_draws)
}
Here reps
is the number of simulated draws. Now we can try it out on a number of different words.
> word_prob("boot")
[1] 0.0072
> word_prob("red")
[1] 0.07716
> word_prob("axe")
[1] 0.05088
> word_prob("zoology")
[1] 2e-05
from Digg Top Stories http://stats.stackexchange.com/questions/74468/probability-of-drawing-a-given-word-from-a-bag-of-letters-in-scrabble
Aucun commentaire:
Enregistrer un commentaire