[bank-account] potential BUG

orionyeung001 · November 21, 2023, 4:53pm

local go version: go1.20.6 linux/amd64
exercise: Bank Account in Go

I may have found an infrastructure bug.
My tests timed out on the server, but couldn’t find where I was messing up even when comparing to community solutions. I requested mentoring, and bernot-dev replied and mentioned that he thinks it’s an issue with infrastructure. We both run it and pass tests locally.

Let me know what other information I may provide.

Unsure if this will give you access, Direct mentoring link to my attempt on this exercise, with source copied below.

my code, copied from iteration 5

EDIT: I initially didn’t include all the code by accident

package account

import (
	"sync"
)

// Define the Account type here.
type Account struct {
	sync.RWMutex
	balance int64
	active  bool
}

func Open(amount int64) *Account {
	if amount < 0 {
		return nil
	}

	return &Account{
		balance: amount,
		active:  true,
	}

}

// Balance returns the balance of the account
func (a *Account) Balance() (int64, bool) {
	a.RLock()
	defer a.RUnlock()

	// fail early in lieu of returning invalid state on API
	if a.balance > 0 && !a.active {
		panic("an inactive account has nonzero balance")
	} else if a.balance < 0 {
		panic("account found with negative balance")
	}

	return a.balance, a.active
}

func (a *Account) Deposit(amount int64) (int64, bool) {
	a.Lock()
	defer a.Unlock()

	if !a.active || a.balance+amount < 0 {
		return a.balance, false
	} else {
		a.balance += amount
		return a.balance, true
	}
}

func (a *Account) Close() (int64, bool) {
	a.Lock()
	defer func() {
		if !a.active {
			a.balance = 0
		}
		a.Unlock()
	}()

	if !a.active {
		return a.balance, false
	} else {
		a.active = false
		return a.balance, true
	}
}

iHiD · November 22, 2023, 4:33am

Hello

I presume that’s not your full file as there’s an uneven amount of brackets. Could you update it to the full iteration please?

@junedev @andrerfcsantos Any ideas?

andrerfcsantos · November 22, 2023, 10:22am

Hey!

From the code you are showing, there’s a missng bracket at the end of the Open function. Not sure if it’s missing from your iteration too or you just didn’t copy it.

Also note that the Open function returns an *Account, but you are not returning anything in the last return.

orionyeung001 · November 27, 2023, 11:02pm

Sorry that took a bit, not sure what happened the first time.

jfgobin · June 8, 2024, 9:15am

It seems the bug is still present: my code ran into the same timeout issue so I put back panic() in all the functions, with the same result.

jfgobin · June 16, 2024, 11:19am

It doesn’t after reverting to the exercise start either, so I suspect something in the tests is taking too long or looping.

IsaacG · June 16, 2024, 12:05pm

Consider sharing your code if you’d like help with something. Without seeing what you’re doing, it’s pretty hard to guess what may be wrong.

jfgobin · June 16, 2024, 12:25pm

Sure thing. As I said, I put panic() statements in all functions just to see what happened.

package account

import "sync"

// Account structure
type Account struct {
    runningAmount int64			// The current amount
    accountMutex *sync.Mutex	// To prevent race conditions, lock is account-centric
                                // so multiple transactions on multiple accounts may be done in //
    accountOpen bool			// Is the account live
}

//Open a new account in the bank
func Open(amount int64) *Account {
    panic("Troubleshooting")
    // Can't open with negative amount
    if amount < 0 {
        return nil
    }
    var mu sync.Mutex
	newAccount := Account{ runningAmount: amount,
                           accountOpen: true,
                           accountMutex: &mu,
                          }
    return &newAccount
}

// Get the balance of the account
func (a *Account) Balance() (int64, bool) {
    panic("Troubleshooting")
    // nil
	if a == nil {
        return 0, false
    }
    a.accountMutex.Lock()
    // Account was closed
    if !a.accountOpen {
        a.accountMutex.Unlock()
        return 0, false
    }
    balance := a.runningAmount
    a.accountMutex.Unlock()
    return balance, true
}

// Deposit (amount > 0) or withdray (amount < 0) money from the account
func (a *Account) Deposit(amount int64) (int64, bool) {
    panic("Troubleshooting")
	if a == nil {
        return 0, false
    }
    a.accountMutex.Lock()
    // Account is closed
    if !a.accountOpen {
        a.accountMutex.Unlock()
        return 0, false
    }
    // No amount
    if amount == 0 {
        balance := a.runningAmount
        a.accountMutex.Unlock()        
        return balance, false
    }
    // Not enough money
    if -amount > a.runningAmount {
        a.accountMutex.Unlock()
        return a.runningAmount, false
    }
    // Process the transaction
    a.runningAmount += amount
    balance := a.runningAmount
    a.accountMutex.Unlock()
    return a.runningAmount, true
}

// Close the account
func (a *Account) Close() (int64, bool) {
    panic("Troubleshooting")
	if a == nil {
        return 0, false
    }
    // Account already closed
    if !a.accountOpen {
        return 0, false
    }
    a.accountMutex.Lock()
    balance := a.runningAmount
    a.runningAmount = 0
    a.accountOpen = false
    a.accountMutex.Unlock()
    return balance, true
}

teamsonic · June 20, 2024, 11:49pm

I just found this thread while preparing to make my own post about the issue. I’m using the in-browser editor. Unlike other exercises where one can force a failure with panic() statements or by not defining the functions the tests expect to be available during execution, this exercise will always result in a timeout, even with a one line file of package account.

Ultimately it isn’t a big deal if it stays broken, but it would be nice to get 100% on the Go track.

kotp · June 20, 2024, 11:58pm

Are you sure it is a bug, there are 1200 solutions to this on the Go track, and while not 100% of them are passing the tests, there is a large majority that is.

Can you show your code and show what the feedback is showing? (Code in a fence block, as text, rather than a picture. But if the response is on a web page, I suppose will can manage with a picture of that.)

teamsonic · June 21, 2024, 2:07am

Thank you for the quick reply. As a user of the site and not a maintainer, I can’t really be sure of the presence of a bug or not, but all evidence does point to a problem with the infrastructure (and not necessarily the tests themselves).

As you can see from the screenshot, any requests for a solution to be tested, even with no executable code, results in the request timing out. For many of these exercises as a form of TDD I build my solution by first running the tests, getting failures, and then working out a solution by incrementally getting the tests to pass one at a time. I’ve never been able to get this exercise to return any result at all other than a timeout. Are you able to get test results? Maybe it’s something on my end?

teamsonic · July 1, 2024, 7:51pm

By way of update, I was curious to see what would happen if I ran the tests against submitted community solutions, which I understand must have had passing tests at the time of upload. I only tried three, but all of them timed out multiple times, which suggests to me that something has perhaps changed with the testing criteria since then. They also were nearly identical to my solution, which would lead one to expect that if the community solutions are passing, my solution should be passing and vice-versa (which is what is being observed since they all are experiencing a time-out).

Surprisingly, I did manage to get a passing test after running the suite nearly 50 times, so I must have luckily beaten the timeout by a narrow margin.

andrerfcsantos · July 13, 2024, 2:33pm

Sorry for the delay on seeing this.

It does seem a bug, as I’ve tried to run my solution that previously was passing the tests and now it times out. My solution is also not doing anything out of the ordinary, like calling panic or having missing functions.

I’ll take a look at this.

andrerfcsantos · July 14, 2024, 12:19am

I did some investigation on this and found the likely cause of this.

The issue seems to be that for this exercise, we are using the -race that activates the Go data race detector

The key here is that this flag increases the execution time, and I believe it’s also having a great impact on the compilation time. This explains the behavior @teamsonic and @jfgobin are seeing in that having calls to panic() or functions not implemented still causes a timeout - the issue is not just the runtime, but the compile time.

For instance, the program @jfgobin posted here doesn’t even compile because the variable balance is declared and not used, and the test runner does try to compile the program before running the tests. So, if the problem was just the runtime, this should be a fail-fast situation for the test runner. But it isn’t. I was able to confirm this by building and running the test runner image locally. It took ~8 seconds to run the tests regardless of whether the program failed to compile or could run all the tests and pass. Disabling the -race flag, made the tests run in ~2 seconds. This means the -race flag is slowing down the tests by ~4x!

It’s also worth mentioning that while building and running the tests using the test runner image locally, the test runner always produced the expected results. This to me indicates that there’s nothing fundamentally wrong with the exercise, the test runner or the infrastructure, it’s just that the tests take a while due to the -race flag.

I see two possible solutions for this:

We could increase the timeout for the go test runner to account for exercises that might use the data race detector. We could try 30 seconds. The fact that with the current 20 second default timeout sometimes the test pass, it’s a good indication that adding 10 seconds to this will probably be enough for the vast majority of the solutions having this issue to pass. (WDYT? @ErikSchierboom , @iHiD)
We could remove the -race flag for this exercise. This is would be not ideal, as this flag was added to fix this issue.

andrerfcsantos · July 22, 2024, 10:30am

Opened a pull request about this: Increase Go test runner timeout to 30 seconds by andrerfcsantos · Pull Request #77 · exercism/tooling-invoker · GitHub

cc @iHiD @ErikSchierboom

iHiD · July 22, 2024, 1:37pm

Hey. Sorry for not replying before. Looks like @ErikSchierboom and I both missed this. I’m not convinced I want to take the test runner to 30s. That effectively blocks a whole machine for a while, reducing our throughput quite a bit.

Could we do a two-pass process? One ran with the race flag (fast) and a second then run with the race flag if the first passes? Or even just do a first pass that checks it compiles and second that doesn’t? Would that help at all?

How many exercises need the race flag? Is it something globally necessary or just for some exercises?

ErikSchierboom · July 22, 2024, 3:04pm

Or could the race test be optional?

andrerfcsantos · July 22, 2024, 5:38pm

I think this would be more a band-aid fix. Running the tests without the flag first to see if they fail, and re-run the tests with the flag if they pass:

Would help tests to fail-fast in case they don’t compile, but doesn’t help with the case where the code is correct. In that case, we would have to run the tests twice, potentially making the timeout problem worse when the code compiles, runs and it’s correct.
It would require changes to the test runner

Currently we are just using this flag for this exercise. Even for this exercise, the flag isn’t strictly necessary, but it’s a nice feature to show students. And catches some code that seems correct because it passes the tests, but has a potential data race in it like reported here.

The tests for this exercise already try to catch some concurrency issues, but some data races are tricky to reliably detect when using regular tests. The usefulness of the -race flag comes from it being able to catch more potential data races by instrumenting the code.

The docs for this exercise do talk about this flag, so it’s already shown as an option to CLI users. We could remove the use of this flag for this exercise on the test-runner side. The downside is that a few solutions that pass the tests might be have data races in it just waiting to happen. I guess this is something that can be discussed in mentoring, but would be nice if the tests could just fail.

If increasing the timeout of the test runner is not a viable solution, I think it’s probably best to just remove the flag from the exercise.

ErikSchierboom · July 23, 2024, 11:00am

I would personally probably just either:

have the test runner ignore the flag
have the CLI users opt-into the flag

andrerfcsantos · July 23, 2024, 9:41pm

I think doing one implies doing the other.

For the test runner to ignore the flag, we can remove it from the custom.testingFlags key of the .meta/config.json of the exercise, which means that students will still be able to run the tests with this flag locally if they run go test manually. The docs for the exercise do mention this flag and how to use it.

So, making the flag opt-in does mean remove it from exercise config which will make the test runner run without it.

I’ll create a PR to remove the flag from the exercise config. We can always re-evaluate if we want to introduce it again at some point in the future. This will make a small number of solutions that have subtle data races pass the tests, but I think not having timeouts in potentially all submissions is more important at this point than having the tests cover 100% of the cases.