CS 6291: Embedded Systems Optimization

Instructional Team

Santosh Pande

Santosh Pande
Creator, Instructor
Catherine Gamboa

Catherine Gamboa
Course Developer
Tyson Bailey

Tyson Bailey
Head TA


In the 21st century, embedded systems are the systems of future with cellular phones, smart-phones, and tablets becoming the dominant platforms for computing and communication. The ubiquity of information and the associated need for the computation that accompanies it is driving this revolution only to be accelerated by the new paradigms such as the Internet-of-Things (IoT). These platforms are clearly very different in terms of their processing requirements which are very unique: real-time needs, high performance but at low energy, compact-code and data segments, and most importantly ever-changing software stack. Such unique requirements have led to a complete redesign and reinvention of both hardware and the software stack from the ground up. For example, brand new processors such as ARM, DSPs, and network processors were invented, in addition to new virtual machines such as Dalvik, new operating systems such as Android, and new programming models and compiler optimizations. The goal of this course is to take a holistic view of the embedded system stack with a focus on processor architectures, instruction sets, and the associated advanced compiler optimizations that take advantage of the same. Following are the segments that will be covered in the course:

Part I: Embedded Processor Architectures

  1. Introduction to instruction level parallelism: Pipelining, RISC vs CISC, Very Large Instruction Words (VLIW) instruction sets, Hardware complexity (Superscalars) vs Compiler Optimizations (VLIWs) Tradeoffs
  2. Design of Instruction Set Architectures: VLIW encoding, Exposing vs Hiding Architectural Details, RISC vs CISC ISAs, Opportunities for compilers, Dependences and Independences, Instruction bundling for VLIW, Compact instruction representation.
  3. Embedded Micro-architectures: Scratch-pad: software managed memory, clustered register files, special arithmetic, addressing modes for special needs (DSPs), branches in embedded domains: speculation and predication, unbundling branches

Part II: Software Optimizations

  1. Introduction to Compiler phases: Overall working of the compiler, overview of phases, intermediate representation, backend code generation issues
  2. Register Allocation Foundation: RISC philosophy (load, store architecture), Live range analysis, Interference Graph, Graph Coloring Based Register Allocation, Live Range Splitting
  3. Register Allocation for Embedded Processors: Post-pass register allocation, Allocation gaps and register reuse, Energy reduction due to reduced memory accesses, Differential register allocation, Register encoding, Hardware support, Increase in exposed registers, Software pipelining and energy reduction
  4. Data Layouts for Embedded Processors: Auto addressing mode, Data layouts, Simple and general offset assignment problems, Address sequence optimizations, Memory coalescing, Data and code segment minimization
  5. Data and Code Compaction: X-Y memory, Parallelizing Load/Stores in DSPs, Data replication, Performance vs Data Segment/ size, ARM vs Thumb code generation, Mixed code generation, Frequent values in embedded programs and their encoding, Data cache optimization via compaction.
  6. Network Processors: Processing in the network, Network processors, Dual Bank Register Allocation for Network Processors, Multi-threading in network processors, Context switch and latency, Register allocation across threads to minimize latency

This course counts towards the following specialization(s):
Computing Systems

Foundational Course
Computing Systems Specialization Elective

Course Goals

  • To understand the tight coupling and synergies that exist between hardware and software in embedded processors and that are exposed through the abstraction of instruction sets including DSPs, VLIWs, etc.
  • To understand machine specific features (such as data paths, register and memory banks) of the underlying processor and machine specific code optimizations based on them for high performance, compact code and low energy.

Sample Syllabi

Fall 2020 syllabus (PDF)
Summer 2020 syllabus (PDF)
Summer 2020 schedule (PDF)
Summer 2019 syllabus (PDF)
Summer 2019 schedule (PDF)

Note: Sample syllabi are provided for informational purposes only. For the most up-to-date information, consult the official course documentation.

Course Videos

You can view the lecture videos for this course here.

Before Taking This Class...

Suggested Background Knowledge

It is recommended that students who take this course have previously taken at least an undergraduate-level course in computer architecture. In addition, students must have a strong background in C and/or C++. A basic to intermediate understanding of Python, and basic knowledge of git, are required.

Technical Requirements and Software
  • Students will be required to purchase a Raspberry Pi for one of the projects.
  • Browser and connection speed: An up-to-date version of Chrome or Firefox is strongly recommended. 2+ Mbps is recommended.
  • Operating system:
    • PC: Windows 7 or higher
    • Mac: OS X 10.10.5 or higher
    • Minimum of 2 GB free space
    • Dual-core 2.4 GHz CPU or better
    • Webcam resolution of 800x600 or better
    • Working microphone
  • Virtual Machine: You will be using Vagrant along with Virtualbox to provide a consistent environment for development and grading which every assignment uses. Details for downloading and installing the Vagrant system are located on the course github.

Academic Integrity

All Georgia Tech students are expected to uphold the Georgia Tech Academic Honor Code. This course may impose additional academic integrity stipulations; consult the official course documentation for more information.