Elixir Enumerable question

SoldierCoder · May 23, 2023, 10:26pm

Say you are processing a list using one of the Enum.______ functions. How does Elixir know when you have reached the end of the list and need to stop trying to process with that Enum.____ function? I’m not looking for “just have faith in the language” type of answers, please. I want to know how Elixir knows to stop. Is it checking something internally, and if so, what?

MatthijsBlom · May 23, 2023, 10:56pm

The source code is here: elixir/enum.ex at v1.14.5 · elixir-lang/elixir · GitHub

I hardly know Elixir, but it looks to me like each data structure is required to provide their own implementation, or else use a generic default implementation. In the case of lists and Enum.map, it looks like evaluation is delegated to :lists.map, which I do not know how to find the source code of. However, as Elixir’s lists are singly linked lists, I expect it to repeatedly pattern match against [head | tail] until this fails.

I do not know how exactly Elixir represents its lists, but it is probably essentially ~~a struct (element_pointer, tail_pointer), with the null pointer indicating an empty list~~ a struct (tag, element_pointer, tail_pointer) where the tag indicates whether this struct represents an empty list or a cons cell.

padeso · May 23, 2023, 11:00pm

As MatthijsBlom says, it’s up to the enumerated data type to supply that information. See here: Enum — Elixir v1.14.5. “In Elixir, an enumerable is any data type that implements the Enumerable protocol.”

aifrak · May 24, 2023, 9:13am

To add more details to @MatthijsBlom’s answer, lists.map is using recursion (Source).

As result, at some points, the list that you have passed as argument to the Enum.______ function will be empty and it will stop looping. It uses recursion because data are immutable in Elixir:

Due to immutability, loops in Elixir are written differently from imperative languages.

Source

MatthijsBlom · May 24, 2023, 9:25am

The source code is

map(F, List) when is_function(F, 1) ->
    case List of
        [Hd | Tail] -> [F(Hd) | map_1(F, Tail)];
        [] -> []
    end.

map_1(F, [Hd | Tail]) ->
    [F(Hd) | map_1(F, Tail)];
map_1(_F, []) ->
    [].

Do you know why map pattern matches first, instead of immediately delegating to map_1? That is, why is the implementation not

map(F, List) when is_function(F, 1) ->
    map_1(F, List)

map_1(F, [Hd | Tail]) ->
    [F(Hd) | map_1(F, Tail)];
map_1(_F, []) ->
    [].

aifrak · May 24, 2023, 9:50am

According to this commit title, the reason is for optimization, but I don’t know more than that.

Previously, the map function looked more simple:

map(F, [H|T]) ->
    [F(H)|map(F, T)];
map(F, []) when is_function(F, 1) -> [].

SoldierCoder · May 28, 2023, 8:14pm

So it “knows” its done when it patterrn matches the empty tail?

MatthijsBlom · May 28, 2023, 9:51pm

map is implemented as a recursive function, so technically no: it does not itself pattern match on the tail, but instead delegates that to the recursive call. But otherwise yes: a pattern match on the empty list signals the end of operations.