New exercise idea

Program for enumerating chemical reactions, for example:
Input:
“H2 + O2 > H2O”
All atoms on left must be equal to atoms after >, so the output should be: 2H2 + O2 > 2H2O”
It’s just an idea of what the output should look like, the output could easily be in an array, for example this will return an array with a left side and a right side, this is much better way to solve reactions with algorithm than with your brain, because it is sometimes hard.

Does 2H20 have 2 H and 2 O? Or 4 H and 2 O? And how would you differentiate?

4H and O2, because there is basically a parenthesis before the whole compound and before that are the so-called coefficients, and then it is multiplied, in the exercise you could put the parenthesis there, for a better idea, but normally you don’t put it there chemistry.

There is simply a number before the compound, one is not put there, and then the elements and after each element there is a number, and ones are not put there either, only numbers higher than one are written.
Example: 1(H2O2) and symplified: H2O2
Or H2O1 symplified H2O

How would people know how to read or solve it if they don’t know chemistry before hand?

I’m a bit wary about exercises like this consider the problems we used to have with Scale Generator which involved music concepts. Despite not needing to understand music theory and the final solution isn’t that complicated, it still was a problem that got people confused quite a lot because they just can’t grasp the music parts, and it end up being deprecated.

Chemistry also uses subscripts to make it more clear what is a molecule co-efficient vs an atom multiplier. Lacking subscripts, it might be hard to differentiate when written that way.

Parsing “Mg2O3” into something like ["Mg", "Mg", "O", "O", "O"] is in of itself a fair bit of work (and also is similar to the Run Length Encoding decoder). What if we removed the parsing aspects and instead used a list or dict for the inputs?

got = solve_coefficients({"H": 2}, {"O": 2}, {"H": 2, "O": 1})
assert got == [2, 1, 2]

I’ll leave it to others to argue about whether this is a good exercise. Maybe I can clarify chemistry notation for a bit of background (I had a multi-decade career as a research chemist).

There are 4 corners around an atomic symbol. Right-subscript is always used for the atom count: H_2O. If you really want to include parsing in the exercise, maybe use a LaTeX-style underscore, like H_2O or (for multi-digit numbers) C_{10}H_{23}OH (C_{10}H_{23}OH, decanol).

Less relevant, but I’ll say it for completeness:

  • Right-superscript is for charge: Mg^{2+}, a magnesium ion.
  • Left-superscript is for isotope: ^{14}C, used for radiocarbon dating.

Even the most experienced chemist would struggle to guess that 2H2O2 is 2H_2O_2, two molecules of hydrogen peroxide.

I’d better stop now…

HahI I didn’t know about the isotope and I’ve known but forgotten about charge :D Thanks for the color!

1 Like

It’s not that difficult for them to understand, it’s just multiplying the numbers on the left and right sides of the equation, just adding the coefficients in front of the individual compounds, simpler than the exercise you mentioned.

Thanks for suggesting the exercise!

What’s the programming skill we’re nurturing in this exercise? What is someone doing that’s value from a programming education perspective? What interesting solutions/approaches will people likely come up with?

For example some methods:

  1. String parsing – analysis of chemical formulas into elements and atomic numbers

  2. Data representation – e.g. dictionaries, matrices, lists (element → number)

  3. Brute force algorithm – generation and testing of combinations – equations will not be long so that the user can check it

  4. Symbolic calculations – working with an expression instead of specific values (e.g. SymPy)

  5. Recursion/backtracking – alternative to brute force

  6. Input and output validation – ensuring balance and correctness of the notation
    The tests would not test it on huge equations with 30 terms, but rather smaller ones, also so that the user could check it and debug errors.

OK, so I’m tentatively :+1: on this with a few questions/caveats.

Firstly, I think we have to use superscript numbers if we want to do this.

Secondly, it’s not clear to me at all what “H2 + O2 > H2O” means. You say this is a chemical reaction. What does the > mean in this situation? ChatGPT tells me it’s a non-balanced chemical equation, and that 2H2O is the balanced version. Why are we showing a non-balanced version in the first place? We’ll need to be able to explain this whole concept to someone in a simple paragraph for me to get fully on board with this :slight_smile:

Of course, the coefficients (the numbers that will be put there) will be in some smaller number so that it can be checked quickly, debugged and so that it is not so demanding.

The unnumbered one will be the input to some function, and the numbered equation will be the output.
Explanation:
A chemical equation like H2 + O2 > H2O shows what reacts (on the left) and what is formed (on the right). > means “changes into.” For this to make sense, the number of atoms of each element on the left must be the same as the number on the right—for example, the number of H and O atoms. That’s why equations are “numbered” to make sense.
Example: "1Fe + 1O2 > 1Fe2O3
It’s like in mathematics, the number in front of the whole compound is multiplied by all the elements in the compound after the number, on the right side of the equation we have 1×Fe2 = 2 atoms of iron (Fe)
1×O3 = 3 atoms of oxygen (O), and so that the number of atoms on the left side matches the number on the right, the output will be: “4Fe + 3O2 > 2Fe2O3”, only the numbers in front of the elements change, the numbers inside the compound are not adjusted,
On the left side: 4×Fe and on the right 2×Fe2 which are 4×Fe
On the left side: 3×O2 is 6×O and on the right 2×O3 which is 6×O, the equation is calculated like this.

Thanks. So the input is an unbalanced equation and the output is a balanced equation?

Exactly, I’m glad you could understand.

Could you give me some examples of a few (5?) inputs and outputs please. Ideally covering the range of possible things people need to work out.

Of course, here are examples, all coefficients will have a number less than 10 to avoid timeout in some solutions, these examples should cover the requirements:
1.)
Input: Na + Cl2 → NaCl
Output: 2Na + Cl2 → 2NaCl
2.)
Input: N2 + H2 → NH3
Output: N2 + 3H2 → 2NH3
3.)
Input: C3H8 + O2 → CO2 + H2O
Output: C3H8 + 5O2 → 3CO2 + 4H2O
4.)
Input: Fe + O2 → Fe2O3
Output: 4Fe + 3O2 → 2Fe2O3
5.)
Input: C2H6 + O2 → CO2 + H2O
Output: 2C2H6 + 7O2 → 4CO2 + 6H2O
6.)
Input: Al + O2 → Al2O3
Output: 4Al + 3O2 → 2Al2O3

Thanks. I think this has the potential to be interesting enough to add as an exercise. @ErikSchierboom Any thoughts with your multi-language head on. Anyone else want to chime in with an opinion?

So should I still reach out to @erikschierboom, or is there anything else you want to help with?