Hamming distance in R

I’m more familiar with Python, I don’t understand why this solution doesn’t work in R:

hamming <- function(strand1, strand2) {
  total = 0
  if (nchar(strand1) <= 1 | nchar(strand2) <= 1){
    if (strand1 == strand2){
      return(0)
      } else {
      return(1)
    }
  }
  for (i in strand1) {
    for (j in strand2){
      if (i == j){
        next
      } else {
        total = total + 1
      }
    }
  return(total)
  }
}

Can somebody help me?

What about it doesn’t work? Is there an error or failed test? Can you share that output?

From a quick glance, the final return statement seem to be inside the first loop
for (i in strand1).

hamming <- function(strand1, strand2){
  total = 0
  if (nchar(strand1) <= 1 | nchar(strand2) <= 1){
    if (strand1 == strand2){
      return(0)
      } else {
      return(1)
    }
  }
  for (i in strand1) {
    for (j in strand2){
      if (i == j){
        next
      } else {
        total = total + 1
      }
    }
return(total)
  }
}

Test passed :1st_place_medal:
Test passed :1st_place_medal:
Test passed :1st_place_medal:
── Failure: complete distance in small strands ─────────────────────────────────
hamming(strand1, strand2) not equal to 2.
1/1 mismatches
[1] 1 - 2 == -1

Error: Test failed
Execution halted

The loops are not doing what you think. In Python, strings are designed to be iterable, so you can loop over characters like this. Many (most?) other languages work differently.

In R there are no scalars: everything is a vector. If you want a Python comparison, think of NumPy arrays, rather than anything in base Python. So what looks like a string is really a length-1 vector of strings internally:

> seq <- "ACTG"
> seq
[1] "ACTG"
> seq == c("ACTG")
[1] TRUE
> length(seq)
[1] 1
> for (i in seq) print(i)
[1] "ACTG"

Thus, your loops iterate over strings in the vector, not characters in each string.

I suspect your tests are passing for single-base inputs like “A”, but failing for anything longer. To solve the exercise, you will need a way to convert the string to a vector of single characters.

You will then hit the problem that unequal length strings cannot be compared and should raise an error. Remember that the instructions for practice exercises are just a brief summary. You have to read the tests carefully to understand the full problem specification.

I should add that most functions in R are designed to work with vector inputs, like this:

> seqs <- c("C", "ATG", "GCAATG")
> nchar(seqs)
[1] 1 3 6

Writing your own functions this way is considered good practice, and we often encourage it for students wanting to learn idiomatic R.

For compatibility with other tracks, only a few R exercises enforce this (and Hamming is probably not the best example).

thank you very much for the explanations, @colinleach

I believe I’m closer now… The problem is probably my for loop, but I’m having difficulties to find a simpler solution…

hamming <- function(strand1, strand2){
  strand1_vc <- unlist(strsplit(strand1, ""))
  strand2_vc <- unlist(strsplit(strand2, ""))
  total = 0
  if (length(strand1_vc) <= 1 | length(strand2_vc) <= 1){
    if (strand1_vc == strand2_vc){
      return(0)
      } else {
      return(1)
    }
  }
  if (length(strand1_vc) != length(strand2_vc)){
    return("error! strands must have the same length!")
  }
  for (i in range(1, length(strand1_vc))){
      if (strand1_vc[i] == strand2_vc[i]){
        next
      } else {
        total = total + 1
      }
    }
return(total)
}

Test passed 🥇
Test passed 🥇
Test passed 🥇
Test passed 🥇
Test passed 🥇
── Failure: small distance ─────────────────────────────────────────────────────
hamming(strand1, strand2) not equal to 1.
1/1 mismatches
[1] 0 - 1 == -1

Error: Test failed
Execution halted

Finally worked! :blush:

hamming <- function(strand1, strand2){
  strand1_vc <- unlist(strsplit(strand1, ""))
  strand2_vc <- unlist(strsplit(strand2, ""))
  x <- length(strand1_vc)
  total = 0
  if (length(strand1_vc) == 0 | length(strand2_vc) == 0){
    if (sum(c(length(strand1_vc),length(strand2_vc))) == 0) {
      return(0)
      } else {
      return(1)
    }
  }
  if (length(strand1_vc) != length(strand2_vc)){
    stop('error! strands must have the same length!')
  }
  for (i in c(1:x)){
      if (strand1_vc[i] == strand2_vc[i]){
        next
      } else {
        total = total + 1
      }
    }
return(total)
}