Limux CLI for Data Science 2020

Last modified at: 2020-10-20 09:00:00+02:00

The 2020 episode at Faculty of Mathematics, Physics and Informatics of Comenius University.

Lectures (B)
Tuesday, 11:00 - 13:40
Labs (H-6)
Monday, 16:30 - 18:00 (voluntary)


The goal of this lab is to

  • show you the cool things (your) computers are capable of
  • get you acquainted with UNIX-like operating systems, the tradition which powers much of modern computing
  • be a fun break from other classes

What you are studying is non-trivial already. It is not our job to punish you for choosing to do that but to give you some practical skills that will let you apply it straight away.

Lab Lectures

Lecture 1: Intro to Command Line

Discussed material:
  • History of UNIX-like operating systems
  • Text console, Shell and Secure Shell (SSH)
  • Shell Commands (short intro)
  • ... and more in the first set of slides
Supplementary resources:
  • The TTY demystified: so what exactly is this teletype that has been mentioned a few times? This article starts with a caveat that it is not particularly elegant, but once you read through it, you'll get a much more thorough understanding of (modern) UNIX-like system and the UNIX history as well.
  • The History of Unix by Rob Pike: it is not every day that you get an important piece of (computing) history described by someone who helped with making it. Well worth the watch!

Lecture 2: Files and Directories

Discussed material:
  • UNIX-style file system
  • Directory tree and its important parts
  • Navigating the filesystem
  • Complete and autocomplete in BASH
  • ... and more in the second set of slides
Supplementary resources:
  • How dotfiles came to be: A short story (by Rob Pike once again) about how dotfiles (you know, the hidden files that start with a dot) came to be and what it says about the unintended effects of cutting corners and just "hacking around" a problem.
  • The history of the /usr split: Different story but a very similar morale. Read through it to find out how did the /bin vs. /usr/bin split happen, how irrelevant it is these days and how one needs to fight against the bad ideas in order not to let them propagate.
  • Linux Filesystem Hierarchy: A deeper discussion on the various parts of the standard Linux filesystem, describing the various directories in much higher detail than the slides ever could.

Lecture 3: Standard I/O, Pipes and Text Processing

Discussed material:
  • Standard Input/Output
  • Pipes
  • Introduction to Text Processing
  • ... and more in the third set of slides
Supplementary resources:

Lecture 4: Processes and Signals

Discussed material:
Supplementary resources:
  • An introduction to UNIX processes: This piece gives you "yet another" rundown of what are the UNIX processes about. What's interesting about it is the part about fork and exec we've just quickly gone over in the lecture. I would very much recommend taking a look at it.
  • Two great signals: SIGSTOP and SIGCONT: What do you do when you've got a long-running script that you cannot afford to (or just don't want to) stop but would very much like to at least pause? This article will tell you a bit about that.
  • Should you be scared of Unix signals?: A short attempt at making the Unix signals look a bit less scary. It's a bit technical but if you'd like to go a bit deeper, still very worth reading.

Lecture 5: Users, Groups and Regular Expressions

Discussed material:
Supplementary resources:
  • Ken Thompson's Unix password: A story on how the password of one of the old-timers was cracked nearly 40 years later and why "shadowing" is generally not a bad idea.
  • The origins of grep: Brian Kernighan, one of the forefathers of UNIX discusses how grep came to be, and it makes for a rather interesting story!
  • When it comes to regular expressions, it helps a lot to visualize what they match and how. There are two tools we recommend in this regard:
    1. Regex101 which is basically an integrated development environment (IDE) for regular expressions
    2. Regexper which nicely visualizes regular expressions as "proto programs". Here is a sample visualization.
  • If you'd like to play with regular expressions a bit, there is RegexGolf, Regex Tuesday or RegexCrossword. We recommend them all!



LISA conference (part of USENIX, the old UNIX organization) has had a workshop called Linux Productivity Tools. It's basically "zero to hero" in 89 slides. It's very worth checking out, especially if you are in a hurry.

Linux Productivity Tools slides

Historical Books

If you like books, here are two worth reading:

UNIX: A History and a Memoir by Brian W Kernighan

A historical account of how UNIX came to be by someone who was there when it happened. It will help you paint the proper picture of what is meant when people say stuff like "UNIX legacy" or "the UNIX era".

The Cuckoo's Egg: Tracking a Spy Through the Maze of Computer Espionage by Cliff Stoll

Strangely enough, this is a novel; a true story of a physicist who tracked one of the first documented "hackers" (cracker would really be a better term here, but I digress) who he found snooping around his systems. The best part is that it's all real, down to the (obviously UNIX) commands that were used. Well worth a read!


Assignments: 50%
Exam: 50%

There will be one assignment per week. Each of them is worth 5% (plus some bonuses). You have up to a week to finish them, but most people manage to do it during the lab.

Exam will be conducted from the content discussed at the Tuesday lectures

Points Grade
(90, inf] A
(80, 90] B
(70, 80] C
(60, 70] D
(50, 60] E
[0, 50) FX