Scrabble Dig Deeper approaches

Podesta · September 10, 2023, 4:30pm

The dig deeper section on the exercises are a great way checking and understanding different approaches to the problems.

On the Scrabble exercise, the dig deeper solutions go over two possible ways to solve it, using a map, or using switch case, with the switch case being the faster one.

I was wondering if there was any place for the array of ints solution, that should be even faster, and still quite ‘vanilla’, at least on C/CPP. Is it not considered idiomatic or good practice in Go, and therefore not recommended?

Here is a quick example of what I mean. The tmpLetter is superfluous on the code below, and we can add to the score directly, making it even a bit faster.

package scrabble

// pointsTable stores the points for each letter. The first position is the points for the letter a,
// the second for letter b, and so forth.
var pointsTable = [26]int{
//  A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q,  R, S, T, U, V, W, X, Y, Z
    1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1, 1, 1, 1, 4, 4, 8, 4, 10}

// Score returns the score of a given word in scrabble.
func Score(word string) int {
    var score int
    var tmpLetter byte
    for i := 0; i < len(word); i++ {
        if word[i] > 'Z' {
            tmpLetter = word[i] - 'a'
        } else {
            tmpLetter = word[i] - 'A'
        }
        score += pointsTable[tmpLetter]
    }
    return score
}

IsaacG · September 10, 2023, 4:36pm

Have you compared the performance of array vs map lookups? This is essentially a map lookup with a custom hashing function.

Podesta · September 10, 2023, 4:56pm

Hmm, interesting.

With this one, I get around 90ns/op:
BenchmarkScore-8 13100623 87.07 ns/op 0 B/op 0 allocs/op

package scrabble

// pointsTable stores the points for each letter. The first position is the points for the letter a,
// the second for letter b, and so forth.
var pointsTable = [26]int{
//  A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q,  R, S, T, U, V, W, X, Y, Z
    1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1, 1, 1, 1, 4, 4, 8, 4, 10}

// Score returns the score of a given word in scrabble.
func Score(word string) int {
    var score int
    for i := 0; i < len(word); i++ {
        if word[i] > 'Z' {
            score += pointsTable[word[i]-'a']
        } else {
            score += pointsTable[word[i]-'A']
        }
    }
    return score
}

With the proposed hashmap, around 950ns/op:
BenchmarkScore-8 1251448 950.3 ns/op 0 B/op 0 allocs/op

// Package scrabble scores a scrabble word.
package scrabble

import "unicode"

var lookup = map[rune]int{
	'a': 1, 'e': 1, 'i': 1, 'o': 1, 'u': 1, 'l': 1, 'n': 1, 'r': 1, 's': 1, 't': 1,
	'd': 2, 'g': 2,
	'b': 3, 'c': 3, 'm': 3, 'p': 3,
	'f': 4, 'h': 4, 'v': 4, 'w': 4, 'y': 4,
	'k': 5,
	'j': 8, 'x': 8,
	'q': 10, 'z': 10,
}

// Score takes a word and returns its scrabble score.
func Score(word string) (score int) {
	for _, ltr := range word {
		score += lookup[unicode.ToLower(ltr)]
	}
	return score
}

Since they are using different loops, and one uses runes instead of bytes, converts, etc, the comparison is not fair. Here is my attempt to make it a bit fairer.

I’m getting around 750ns/op.
BenchmarkScore-8 1573298 747.1 ns/op 0 B/op 0 allocs/op

// Package scrabble scores a scrabble word.
package scrabble

var lookup = map[rune]int{
	'a': 1, 'e': 1, 'i': 1, 'o': 1, 'u': 1, 'l': 1, 'n': 1, 'r': 1, 's': 1, 't': 1,
	'A': 1, 'E': 1, 'I': 1, 'O': 1, 'U': 1, 'L': 1, 'N': 1, 'R': 1, 'S': 1, 'T': 1,
	'd': 2, 'g': 2,
	'D': 2, 'G': 2,
	'b': 3, 'c': 3, 'm': 3, 'p': 3,
	'B': 3, 'C': 3, 'M': 3, 'P': 3,
	'f': 4, 'h': 4, 'v': 4, 'w': 4, 'y': 4,
	'F': 4, 'H': 4, 'V': 4, 'W': 4, 'Y': 4,
	'k': 5, 'K': 5,
	'j': 8, 'x': 8,
	'J': 8, 'X': 8,
	'q': 10, 'z': 10,
	'Q': 10, 'Z': 10,
}

// Score takes a word and returns its scrabble score.
func Score(word string) (score int) {
	for _, ltr := range word {
		score += lookup[ltr]
	}
	return score
}

Curiously, changing the lookup to map[byte]int, and looping over it using a ‘traditional’ for loop, it ballons over to around 1400ns/op. I’m curious as to why, as I thought it should be faster than the iterator loop, since it does not make any copies.
BenchmarkScore-8 832183 1433 ns/op 0 B/op 0 allocs/op

func Score(word string) (score int) {
    for i := 0; i < len(word); i++ {
		score += lookup[word[i]]
	}
	return score
}

But the array solution seems to be much faster.

jalih · September 10, 2023, 6:44pm

If speed is your goal, why not eliminate all the comparisons using a bigger lookup table? Make it handle the full ascii-range with scoring for both, uppercase and lowercase letters. Now, you can just iterate characters of the string making a table lookup and addition.

Below is my 8th version as I don’t use Go, but you probably get the idea:

[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 3, 2,
  1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1, 0, 1, 1, 1, 4, 4, 8, 4, 10,
  0, 0, 0, 0, 0, 1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1,
  1, 1, 1, 4, 4, 8, 4, 10 ] ( 0 >r swap ( a:@ n:r+ ) s:each! drop r> ) curry: score \ s -- n

IsaacG · September 10, 2023, 9:26pm

Does that have any measurements speed improvement over the simpler array solution approve?

jalih · September 10, 2023, 10:52pm

It’s the same method but using a bigger lookup table. I have not measured but it should be faster as the use of bigger lookup table eliminates all the comparisons and requires just a table lookup and addition per character.

IsaacG · September 10, 2023, 10:53pm

It would be good to actually benchmark it to see if the difference is measurable It’s easy to assert that dropping a conditional will make a difference … but that’s not the same as actually demonstrating that to be the case.

jalih · September 10, 2023, 11:02pm

It also eliminates the need to subtract character values before table lookup and handles both uppercase and lowercase characters, so it saves more than just a conditional.

IsaacG · September 10, 2023, 11:21pm

Yup. Integer subtraction is very low cost, though. The fact that it handles both upper and lowercase in the same lookup means a larger lookup, though, which might have a negative impact on the performance if you take the table creation into account

siebenschlaefer · September 10, 2023, 11:28pm

But the smaller array might bei cache-friendlier.
On modern platforms with many processes, multiple CPUs, multiple levels of caches, with prefetching, instruction reordering, and speculative execution it’s hard to predict the effects of these kinds of micro-optimizations.
You have to benchmark and even doing that right ist not easy.

jalih · September 11, 2023, 3:55am

It’s still a very small lookup table, so cache issues are not probable. It’s also precalculated and gets compiled directly into a word with my 8th version, so table creation cost should not be a big problem.

I used the same sized lookup table to calculate prime-hashes for words when finding anagrams to eliminate the need to sort and with large dictionary speed difference was easily noticeable.

IsaacG · September 11, 2023, 4:20am

That’s still a whole lot of conjectures and not much data. That larger lookup table may be more efficient. It may be less efficient. It may be very close. The best way to tell if it makes any difference - in any direction - would be some benchmarks.

jalih · September 11, 2023, 2:55pm

Okay, I made a real life test using a wordlist and it shows that using a bigger lookup table with this task makes program about two times faster.

var total-score

[ 1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1, 1, 1, 1, 4, 4, 8, 4, 10 ]
( 0 >r swap ( dup 'Z n:> if 'a else 'A then n:- a:@ 0 ?: n:r+ ) s:each! drop r> ) curry: score \ s -- n


: app:main
  "kotus_sanat.txt" f:slurp ( score total-score n:+! ) s:eachline
  total-score @ . cr ;

ohjaus@raspberrypi:~ $ time /opt/8th/bin/rpi64/8th scrabble2.8th 
1633402

real	0m0.523s
user	0m0.498s
sys	0m0.026s
ohjaus@raspberrypi:~ $

var total-score

[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 3, 2,
  1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1, 0, 1, 1, 1, 4, 4, 8, 4, 10,
  0, 0, 0, 0, 0, 1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1,
  1, 1, 1, 4, 4, 8, 4, 10 ] ( 0 >r swap ( a:@ 0 ?: n:r+ ) s:each! drop r> ) curry: score \ s -- n

: app:main
  "kotus_sanat.txt" f:slurp ( score total-score n:+! ) s:eachline
  total-score @ . cr ;

ohjaus@raspberrypi:~ $ time /opt/8th/bin/rpi64/8th scrabble.8th 
1633402

real	0m0.264s
user	0m0.253s
sys	0m0.009s
ohjaus@raspberrypi:~ $

IsaacG · September 11, 2023, 3:33pm

Did you run multiple iterations?

Also note, it might be worth benchmarking smaller inputs, too! It’s easy to optimize for a single test case, but forget that it might negatively impact other inputs

It’s easy to claim X is faster. “Faster” can be complex and messy and need to account for many different things. If we’re going to assert that it is faster in the Digging Deeper docs, we should actually test the hypothesis.

If an approach is faster for some inputs and not others, that’s also worth being explicit about it. Otherwise it introduces churn where someone else runs tests with different inputs and shows that the fastest solution is actually slower (in a different situation)!

jalih · September 11, 2023, 4:58pm

Yes, the version using bigger lookup table was about 0.2 seconds faster on average. I used Finnish language dictionary as a test input and counted the total Scrabble score for every word inside the dictionary. With small test data both versions are fast enough to make measuring hard.

IsaacG · September 11, 2023, 5:42pm

I’m getting roughly a 20% speed improvements with the benchmarks that the Go exercise ships with when using the bigger lookup table.

Small table:
BenchmarkScore-4         4893446               241.4 ns/op             0 B/op          0 allocs/op

Big table:
BenchmarkScore-4         6060932               198.1 ns/op             0 B/op          0 allocs/op

        // Small table
        if word[i] > 'Z' {
            score += smallTable[word[i]-'a']
        } else {
            score += smallTable[word[i]-'A']
        }

        // vs big table
        score += bigTable[word[i]]

The map vs small table is a 15x speedup for me.