The definitive guide to learning data science languages

So far in #12in23 we’ve been compiling great resources for learning functional programming and system languages. Now, for April, we’re exploring data science languages.

What are the best resources for learning data science and the languages and libraries that make it possible?

Let’s pool the things that have helped us! :slight_smile:

1 Like

Python:

For those of us who learned it 15-20 years ago, the core-language classics were Lutz, “Learning Python” and “Programming Python”. There are probably a million better options online now, and I’ll leave it to younger people to point them out.

Other core-language books include Ramhalo, “Fluent Python”, Reitz & Schlusser, “Hitchhiker’s Guide to Python” and Danjou, “Serious Python”, all aimed at mid-level programmers looking to improve.

It’s important to remember that core Python is not a data science language! It was originally created as a glue language (competing with shell and Perl) to help sysadmins keep their servers running, and now it is a glue language for a set of third-party packages that contain all the science capability. As Bjarne Stroustrup pointed out in his discussion yesterday, these are largely written in C/C++: numpy, scipy, jupyter, pandas, matplotlib, seaborn, numba, scikit-learn, tensorflow, pytorch… All created by different project teams with different release schedules.

I found VanderPlas, “Python Data Science Handbook” valuable. Available in print but also in Jupyter notebook format on GitHub. A 2nd edition published recently, but early reviews are mixed (terrible for the Kindle edition).

People who complain about Python being slow are often using it wrongly. Gorelick & Ozsvald, “High Performance Python” may be instructive.

For testing, Okken, “Python Testing with pytest” is a standard.

For ML, options include Chollet, “Deep Learning with Python” for TensorFlow and Stevens et al, “Deep Learning with PyTorch” for an alternative popular package.

2 Likes

This thread (started by Isaac) has some good resources. So do the Python track docs:

How to learn Python
Useful Python Resources
(not python, but more generalized) Problem-solving Resources.


Kaggle (owned by Google) has some lightweight starter tutorials for ML/AI available for free: Kaggle Tutorials

Allen Downey (Blog) has some great stuff on Statistics. He’s also the author of Think Python. He makes all of his books & notebooks available for free online:


For deep learning, the Fast ai courses are excellent (fast ai site here).
Intro to Machine Learning for Coders gets rave reviews as well.


Coursera has a Machine Learning Specialization that’s based off of classes from Andrew Ng. You can audit them for free, but long-term access and certificates are paid-only.

Andrew Ng has DeepLearning.ai, where he’s developed other courses and related materials. Some are paid, and some are free to audit. There is also Ng’s Stanford CSS229 up on YouTube.

Anaconda, the developers/maintainers of the Anaconda distribution of Python have many learning resources. However, the bulk of them are paid.

Project Jupyter has a nifty try out Jupyter section to get you feet wet with Notebooks and Notebook-like interfaces.

Sci-Kit Learn has some solid starter tutorials as part of their documentation.

2 Likes

Julia:

The official documentation is copious and detailed: not always beginner-friendly, but a good place to start.

At least the julialang website is up to date, unlike parts of the web. Julia v1.0 released in 2018, and before this there were a number of breaking changes in the language. Unfortunately, Google never forgets, so beware.

With post-1.0 stability, better print books are now appearing.

  • Lauwens & Downey “Think Julia” is for absolute beginners.
  • Engheim, “Julia as a Second Language” is probably more appropriate for Exercism users; Amazon say it will publish in May (US edition), but I already have the PDF from Manning and my print copy is on a Fedex truck heading this way.
  • Bogomil, “Julia for Data Analysis” came out recently and is getting good reviews.
  • Phillips, “Practical Julia” will be out later this year, from a publisher with a good track record.

For a quick introduction to any language, the X in Y Minutes site can be helpful, and there is a Julia version. If you want a bit more, Root, “The Julia Language Handbook” is small and cheap.

Performance: Julia can be exceptionally fast, but not if you write it as Python with slightly different syntax. If you care, be sure to read the performance tips and take them seriously. The syntax is (deliberately) somewhere between Matlab and Python, but this is a very different language under the hood.

Julia


R

Fewer resources from me here - I mostly learned R through Coursera years ago, and don’t have as many reference points. The linked specialization is OK, but there might be better resources available elsewhere.

2 Likes

Interesting that the Coursera/Cape Town course is now so good. I was a beta tester, years ago, and it was full of errors. I guess they fixed it!

Maybe? I did not take the course, so I can’t personally say. I am going off of ratings, and those always have to be taken with a grain of salt (or more!).

But I also find that many resources online for Juila, Python, and R have … largely good info, but then have egregious errors in places. So its very difficult to steer folx to the “best resources”. You have to sort of develop a ‘nose’ for what might be current or good for your use-case in a given resource, and go from there. It’s a moving target…

Edited to add: I also came across this 2022 course for Julia, which looked interesting, but haven’t dug into it yet.

And just found this online for free! Lovely of them to provide it.