Cloud-Based Data Processing

Information

The lectures for this course, as well as the tutorials will be held on Thursdays with physical presence.

Content

This course will introduce how modern data centers and clouds work. We will discuss the latest technologies and emerging trends that are driving the evolution of modern data centers, the fundamental design principles behind building scalable systems, and how to do efficient data management cloud-natively and at large scale.

The special focus will be on cloud-based data processing, and in particular on how to design and build scalable services in light of the latest trends in data centers from the infrastructure side (e.g., resource disaggregation, heterogeneous hardware, high performance networks) to system stack support (e.g., virtualization, containers, resource management and scheduling) and modern workload demands (e.g., data management, ML, streaming, security and privacy, etc.).


Organization

  • 5 ECTS
  • SWS 2V + 2Ü
  • Lectures are held in English
  • Lectures are held on Thursdays between 2pm-4pm.
  • The tutorial is held on Thursdays after the lecture, between 4pm-5pm.
  • Lecture Room: Hörsaal im Galileo 8120.EG.001
  • For the Gitlab repository see Moodle.
  • For the Mattermost channel see Moodle.
  • Student with no TUM credentials can still access our Moodle page for the material, please send an email to michalis.georgoulakis@tum.de to obtain the password.

Prerequisites

This course is aimed at master-level students who have already taken some of the following (or similar) courses:

  • IN0008 Fundamentals of Databases
  • IN0009 Basic Principles: Operating Systems and Systems Software
  • IN0010 Introduction to Computer Networks and Distributed Systems

Material

Slides

The slides will be regularly uploaded shortly before each lecture.

Assignments and Project work

We will apply many of the concepts covered in the lecture as part of hands-on homework assignments. The last 5-6 weeks of the semester will be reserved for project work.

Literature

This is not a standard course (i.e., there is no real textbook). Most material is taken out of research papers, which will be references in the slides. However, the following list can be useful either as background or complementary reading.

  • "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems" by Martin Kleppmann
  • "Distributed Systems" by Maarten van Steen and Andrew S. Tanenbaum
  • "Principles of Distributed Database Systems" by M. Tamer Ozsu and Patrick Valduriez