Course
Parallel Computing and Big data
-
Teacher(s)Jeroen Engelberts
-
Research fieldData Science
-
DatesPeriod 2 - Oct 25, 2021 to Dec 17, 2021
-
Course typeField
-
Program yearFirst
-
Credits3
Course description
Nowadays, even mobile phones and tablets have multiple core central processing units (CPUs), as do have the
simplest laptop and desktop PCs. Using their combined compute power, however, is not trivial. This is as true for
the small systems, as well as (worlds) largest compute systems. In data science, making use efficiently of all
compute power is a required skill that needs to be learned. In this course you will be taught how to have all cores
take part in a single task, or to have each core working on its own share of the total task.
Modern day researchers quite often have to rely on larger systems than their own. In the Netherlands many use
the national supercomputer clusters, Lisa and Cartesius, at SURFsara. Like most other large shared computer
systems in research, these systems have UNIX, or Linux, running as operating system. On top of that, many of
them make use of a batch system to give multiple users a fair share of the total resources. During the course,
students will get hands-on experience with UNIX and batch systems.
After the practicing with the batch system, the different types of parallel programming will be taught with Python
as programming language. Although C and Fortran are very common in high-performance computing (HPC), it is
also possible to use parallelism in Python, the language of choice for many researchers in the data science field.
The contents of this course comprise a BASH (Unix shell) course, a Python recap, an introduction to Jupyter
Notebooks and a programming course to learn how to work with different parallel modules and packages in
Python. For the latter, the “Python Parallel Programming Cookbook” is used. Although referred to as a cookbook,
it has a decent amount of theory to build a foundation for deeper understanding of parallel paradigms.
Prerequisites
Programming Basics, Mathematics, Statistics
Course literature
The following list of mandatory readings (presented in alphabetical order) are considered essential for your learning experience. These articles are also part of the exam material. Changes in the reading list will be communicated on CANVAS.
Book:
Zaccone (2019) – Python Parallel Programming Cookbook, 2nd Edition, Packt Publishing, ISBN-13: 978-1-78953-373-6 (https://learning.oreilly.com/library/view/python-parallel-programming/9781789533736/)