In the 21st century, embedded systems are the systems of future with cellular phones, smart-phones, and tablets becoming the dominant platforms for computing and communication. The ubiquity of information and the associated need for the computation that accompanies it is driving this revolution only to be accelerated by the new paradigms such as the Internet-of-Things (IoT). These platforms are clearly very different in terms of their processing requirements which are very unique: real-time needs, high performance but at low energy, compact-code and data segments, and most importantly ever-changing software stack. Such unique requirements have led to a complete redesign and reinvention of both hardware and the software stack from the ground up. For example, brand new processors such as ARM, DSPs, and network processors were invented, in addition to new virtual machines such as Dalvik, new operating systems such as Android, and new programming models and compiler optimizations. The goal of this course is to take a holistic view of the embedded system stack with a focus on processor architectures, instruction sets, and the associated advanced compiler optimizations that take advantage of the same. Following are the segments that will be covered in the course:
Part I: Embedded Processor Architectures
- Introduction to instruction level parallelism: Pipelining, RISC vs CISC, Very Large Instruction Words (VLIW) instruction sets, Hardware complexity (Superscalars) vs Compiler Optimizations (VLIWs) Tradeoffs
- Design of Instruction Set Architectures: VLIW encoding, Exposing vs Hiding Architectural Details, RISC vs CISC ISAs, Opportunities for compilers, Dependences and Independences, Instruction bundling for VLIW, Compact instruction representation.
- Embedded Micro-architectures: Scratch-pad: software managed memory, clustered register files, special arithmetic, addressing modes for special needs (DSPs), branches in embedded domains: speculation and predication, unbundling branches
Part II: Software Optimizations
- Introduction to Compiler phases: Overall working of the compiler, overview of phases, intermediate representation, backend code generation issues
- Register Allocation Foundation: RISC philosophy (load, store architecture), Live range analysis, Interference Graph, Graph Coloring Based Register Allocation, Live Range Splitting
- Register Allocation for Embedded Processors: Post-pass register allocation, Allocation gaps and register reuse, Energy reduction due to reduced memory accesses, Differential register allocation, Register encoding, Hardware support, Increase in exposed registers, Software pipelining and energy reduction
- Data Layouts for Embedded Processors: Auto addressing mode, Data layouts, Simple and general offset assignment problems, Address sequence optimizations, Memory coalescing, Data and code segment minimization
- Data and Code Compaction: X-Y memory, Parallelizing Load/Stores in DSPs, Data replication, Performance vs Data Segment/ size, ARM vs Thumb code generation, Mixed code generation, Frequent values in embedded programs and their encoding, Data cache optimization via compaction.
- Network Processors: Processing in the network, Network processors, Dual Bank Register Allocation for Network Processors, Multi-threading in network processors, Context switch and latency, Register allocation across threads to minimize latency
This course counts towards the following specialization(s):
By the end of this course, students will:
- Understand various assembly level optimization techniques.
- Have a deeper understanding of the impact of system constraints on scheduling.
- Understand how to balance tradeoffs between code size and runtime.
Note: Sample syllabi are provided for informational purposes only. For the most up-to-date information, consult the official course documentation.
You can view the lecture videos for this course here.
Before Taking This Class...
Suggested Background Knowledge
It is recommended that students who take this course have previously taken at least an undergraduate-level course in computer architecture. In addition, students must have a strong background in C and/or C++. A basic to intermediate understanding of Python, and basic knowledge of git, are required.
Technical Requirements and Software
- Students will be required to purchase a Raspberry Pi for one of the projects.
- Browser and connection speed: An up-to-date version of Chrome or Firefox is strongly recommended. 2+ Mbps is recommended.
- Operating system:
- PC: Windows 7 or higher
- Mac: OS X 10.10.5 or higher
- Minimum of 2 GB free space
- Dual-core 2.4 GHz CPU or better
- Webcam resolution of 800x600 or better
- Working microphone
- Virtual Machine: You will be using Vagrant along with Virtualbox to provide a consistent environment for development and grading which every assignment uses. Details for downloading and installing the Vagrant system are located on the course github.
All Georgia Tech students are expected to uphold the Georgia Tech Academic Honor Code. This course may impose additional academic integrity stipulations; consult the official course documentation for more information.