Should "All Your Base" exercise allow an empty array as input?

Borderite · May 6, 2025, 7:36pm

A couple of years ago, there was discussion on the “All Your Base” exercise. The essence in the discussion was that because zero is zero regardless the base, the rebased output from zero should be zero, not an empty string (in Bash). In the same spirit, the empty array for the digits input should be ruled out, shouldn’t it?

The problem specification states:

    "1. Zero is always represented in outputs as [0] instead of [].",
    "2. In no other instances are leading zeroes present in any outputs.",
    "3. Leading zeroes are accepted in inputs.",
    "4. An empty sequence of input digits is considered zero, rather than an error.",

I propose that item 1 be applied to the digits inputs and item 4 be removed. There are only two tests that conflict with the proposed change. One of them, which explicitly tests if [] can be handled correctly, can be removed. In another, which tests if the zero input base leads to an error, [] can be replaced with any nonempty array, say [0].

If there is no objection, I would like to submit a PR for the problem-specification. I expect no existing solution to fail in any track because of the proposed change.

IsaacG · May 6, 2025, 7:41pm

Could you clarify what you mean by this? What exactly are you proposing be changed?

SleeplessByte · May 6, 2025, 7:52pm

What’s the point of this change? Could you elaborate in words how this makes the specification better, less confusing, and/or fix a bug?

Borderite · May 6, 2025, 8:58pm

@IsaacG @SleeplessByte

Thanks for your responses. In the “All Your Base” exercise, students are asked to write a function that converts a series of digits representing a number with a given base (input base) to a series of digits represending the same number with a different base (output base). For example, [1, 1, 1] represents 7 if the base is 2. If the output base is 3, the function converts it to [2, 1].

A question arises when the number (not its representation) is zero. For example, suppose that base is 3. Then [0], [0, 0], [0, 0, 0], … all represent the same zero:

0=3\times0=3^2\times0+3\times0+0=\dots

The current problem specification requires that the output from the function represent zero by [0]. I think that this is a reasonable choice and guarantees that the representation is unique.

For the input to the function, the uniqueness is not very important, because it is not compared against the anticipated answer. But I feel that the empty array for zero is too exotic (though I am fully aware that it can be justified in one way or another). It is like saying that the value written on a piece of paper is zero if nothing is written on it. I am just proposing ruling out represention of zero by the empty array in the input to the function.

@SleeplessByte It this a big deal? No. But it makes the exercise less confusing in my opinion.

IsaacG · May 6, 2025, 9:07pm

It’s arbitrary, yes. I wouldn’t call it “exotic”. There’s often a need to set arbitrary rules to round out specs and remove ambiguity. If you don’t define the meaning of [] you’re left with undefined behaviors, which isn’t great. Alternatively, [] could be an “error: undefined” but adding error handling to exercises doesn’t, on the whole, add value to them. A third approach would be to explicitly state that inputs will always contain at least one number and the empty [] does not need handling. But that’s just going to make some people want to add checks for it anyhow

Arbitrarily defining [] as 0 simplifies the exercise to some extent. I’m not understanding how removing arbitrary or “exotic” definitions improves the exercise.

SleeplessByte · May 6, 2025, 9:27pm

We also have to consider the burden of such a change to the description as there are tracks relying on [] meaning [0]. If this change was made, which only marginally improves it (paraphrasing your words), all tracks implementing this exercise now need to decide if they:

want to drop the test cases dealing with []
want to add the new test cases replacing it with [0]
want to add a track-inserts re-adding the line that [] means [0] or that [] should error.

If there was a bug or big improvement, I would help you advocate for this, but it seems like you just don’t like that someone decided to equal [] to [0], which I can understand. It’s mathematically not nice, but in my opinion, it is very clear.

Be aware that not all implementing tracks have array types, that there are languages that in fact do equal false to [] and there are languages that equal [] to [0].

If we choose to go ahead with this change, we first need to find out why the decision was made to have the “arbitrary” rule 4. A lot of problem-spec origin is safeguarded by tseng on GitHub, and I find it likely we’ll be able to find prior discussion, PR, or issue.

It’s, unfortunately, not as simple as PRing a change and be done with it, unless it fixes a bug. I hope I have explained well enough why this is the case.

Borderite · May 6, 2025, 10:52pm

@IsaacG Thanks for your comments. I fully agree with your analysis, except for the educational value of error handlings. By and large, my analysis was very similar to yours. I thought that removal of a test directly involving [] and changing [] to [0] in a test would be least problematic, because the solutions that can handle the current sets of tests should have no problem in handling the modified set of tests. Anyway, thanks for your time.

Borderite · May 6, 2025, 11:14pm

@SleeeplessByte Thanks for your opinion. I am aware that how to express the sequence of digits depends on the track. For example, Bash uses a character sequence containing space-separated numbers, where [] is translataed to an empty string “”.

I agree with you that rule 4 is intriguing, especially because [] appears in the entire tests only twice:

...
   {
      "uuid": "d68788f7-66dd-43f8-a543-f15b6d233f83",
      "description": "empty list",
      "property": "rebase",
      "input": {
        "inputBase": 2,
        "digits": [],
        "outputBase": 10
      },
      "expected": [0]
    },
...
{
      "uuid": "e21a693a-7a69-450b-b393-27415c26a016",
      "description": "input base is zero",
      "property": "rebase",
      "input": {
        "inputBase": 0,
        "digits": [],
        "outputBase": 10
      },
...

Because the second one’s focus is on the zero value of “intutBase”, not empty “digits”, Rule 4 does only matter to the first one.

Anyway, this what I know. I hope that someone monitoring this forum can tell us what we don’t know.

IsaacG · May 6, 2025, 11:33pm

There is high educational value in learning to do error handling. There are a number of exercises that require students write error handling code. However, we found that too many of the exercises were requiring error handling, requiring a big percentage of the solution code across exercises to be somewhat repetitious error handling code. All the error handling isn’t super useful after the third time and can distract from what is actually interesting about the exercises. As a result, it was decided to dial back on the error handling requirements and keep those in only a few exercises.

Borderite · May 7, 2025, 7:05pm

@IsaacG I see your point. Putting too many topics in a single exercise does not help students. We probably have already enough for error handling.

Borderite · May 7, 2025, 7:17pm

@SleeplessByte I have tried to check where rule 4 comes from. An earlier commit to the problem specification of “All Your Base” in 2016 shows interesting exchanges. It seems that my view was not an outlier, though plenty of other opinions were expressed, too. Then the discussion converged to leaving many things to tracks’ decision.

In 2017, another commit was merged to make the current form of the rules. Unfortunately, I could not find any rationale for rule 4.

SleeplessByte · May 7, 2025, 7:55pm

I have read the discussion once again, thank you for finding it.

The rationale for rule 4 is given in that thread. Whilst multiple people agree that [] should error, multiple people also agree that [] signals 0.

With the current state of problem specs we allow for both approaches to exist among tracks. Tracks can decide if they want to implement the test cases leading to input error and test cases leading to gracefully handling it.

Changing the problem specs to remove either is therefore not in the spirit of problem specs (anymore). Whereas before, problem-specs were to be adhered to 100%, all tests are now optional for all tracks.

My personal opinion on this is now formed. Because the proposed change does not fix a bug, I don’t think it should be applied.

Does that make sense?

Borderite · May 7, 2025, 10:22pm

@SleeplessByte OK. Thanks for your time.

mk-mxp · May 11, 2025, 8:53am

This reminds me of discussions on the robustness principle. It is common to be open on the input side for more than considered valid on the output side (the [] here), but the downside is to become incompatible between implementations for these inputs if they are not clearly specified.

So, in fact it is not “exotic” or “uncommon” to have differing input and output “validity”. Having it specified as we have here overcomes the downside, so to me that is the right thing to do.

Do we need to test for it? To ensure compatibility between implementations the contract needs those tests, too. So I welcome the tests for it.

Borderite · May 11, 2025, 1:25pm

I’m a resilience principle person rather than a robustness principle person.

mk-mxp · May 11, 2025, 3:47pm

It’s a bit off-topic: @Borderite Do you have a reference to the resilience principle you refer to? I don’t know such. I know a bit about Resilience engineering, where the robustness principle helps building resilient systems (by allowing a fair bit of change before breaking something). Normally, strictly narrowed input validation is a reason for inadaptive, unresilient systems…

Borderite · May 11, 2025, 5:56pm

Your interpretation is right. I probably should have say “resilience person” (but sounds weird) or “resilient control person”. You are right that the two concepts are closely related. But the resilient control puts more stress on recovery from errors (caused by various factors including protocol violations) in my understanding.