A step into data?

Is Exercism vision to be purely about programming languages? If so what is the definition of a “programming language”? Is there any room to look at data processing and associated languages and technologies? How does it fit with planes for the next year?

We have already PL/SQL track, but that looks at the procedural ability of the “query language”. Transact-SQL could offer the same. Current problems fit it better, but perhaps a new set of exercises could be created to help us learn how to explore, and perhaps update the data in various languages?

Thinking about it I think it would be possible to define a range of problems and collect small sample datasets to learn things like

Has this been discussed before? If not, what do you think about the idea?

We also have jq and Awk, possibly among others.

What do you imagine would Exercism add to what is already available?

Same as with the tracks that already exists. Some of the languages featured have plenty of online resources and yet Exercism brings:

  • mentoring
  • polyglot community
  • ability to do the same exercise in multiple languages
  • gamified and test-driven learning experience
  • automated feedback

And so I think data querying even in things like Pandas could benefit, but RDF would benefit probably more than Pandas.

Currently, we support Turing-complete languages.

I’m generally quite up for exploring other topics via Exercism (e.g. I’d like a dedicated course on functional programming or machine learning), but we need to get higher priority things sorted first (first and foremost being financially sustainable) before we’d have resource to think about expanding beyond what we have.

I am struggling to imagine what this would entail.

Maybe the problem is that I am not clear on what ‘data processing’ can be.

The following paragraph is a product of a plausibly very limited imagination. Please shed light on my blind spots if any are evident to you.

This seems boring. Do you expect different languages to offer substantially different ways of processing data into the same result? I fear lots of different words for merely the same thing. We already have that with many programming languages (e.g. if … then … else … vs. … ? … : … vs. if … { … } else { … } etc.) but we tend to not talk about that stuff a lot.

Maybe I am too quick to worry about boringness. I am easily bored.

How, hypothetically, might one test some act of data processing, other than by essentially comparing (e.g. ==) the end result to some golden end-product?

I do not doubt there are interesting insights to be had about data processing. However, the ones I can imagine – math essentially – seem no good fits for the Exercism formula.

Please tell me how I’m wrong. I’d like to be: it would be really nice if there were other ways of learning about data processing other than by reading math books + library docs.

There are folks who are interested in this, but don’t necessarily have to time or inclination to lead it (me, @BethanyG, …)

I am actively interested in doing this too, but I think doing it properly should be different from a standard Exercism track. Or at least have some serious thought before we dive into that model.

Hi @MatthijsBlom, I will try! There is so much more to data processing than using if statements! Many of the data querying languages are declarative and if statements are not that often used. A classic problem you may want to solve is to discover pairs of politicians who co-stared in films (there are a few!). Now, to solve it you have to express the problem in a language of your choosing. In SQL you could use some joins, possible using Common Table Expressions. In Cypher you would describe a pattern, and so in RDF, but both would look very differently.

How about discovering whether you can travel between two cities by rail having a list of services with their stops? To solve it you have to find whether there is at least one path with existing connections between the two. You could use graph algorithms in an imperative language, or you could describe the patterns in a more declarative language.

I think that mentoring problems like that would be far from boring! And you could solve the same problem in different languages because the solutions will be different too. Still, familiarity with the problem and underlying data would make the learning easier.

Yes, it could be by comparing datasets. It could be done by checking if the data returned conforms to a given “shape”. Some languages support schemas that could be used to verify the correctness of the result.

Perhaps we have different assumptions about what “data processing” is. I’m not talking about Digital Signal Processing, or (image) categorisation problems from Machine Learning. I’m talking about working with existing, structured datasets and extracting information from them, or modifying them.

Definitely it will need serious thought, but I don’t think it will need to be very different from a standard Exercism track.

I’m not sure how much time I have, but I’m looking for meaningful ways to contribute to a project I enjoyed using a lot. In my day to day I work and mentor people in this type of languages as much as those which are right now more present on Exercism and so perhaps that’s an area I could help… I’ll have to think more about it, and consider the discussions I already started elsewhere in this forum.

The jq and awk tracks might be an existing precedent or at least a starting point as to what is possible and what are the challenges ahead.

I’ll keep thinking…