GPU Parallel Programming: Chapter 1

Published: Aug 7, 2021 by malloc(42)

Chapter 1: Introduction to CPU Parallel Programming

The promise of Moore’s Law is dead (though not the Moore’s Law itself). The number of transistor’s on a chip are increasing every 2 years, but the frequency increase has hit wall due to power consumption and heating issues. So these new chips have more cores and for now it’s upto the programmers to get the best out of these multi-core systems.

More core doesn’t directly mean more performance

It is imperative to understand the fact that just because you make your program parallel doesn’t mean, it will be faster. It is important to correctly orchestrate the work threads are doing. Also, with the fact the memory access being slower, sometimes your program can get memory bound i.e. your cores are being underutilized and waiting for the data to be ready. So parallel programming is mmore than just throwing more threads at your problem (unless it’s embarassingly parallel).

Data-bandwith for different devies

  • Network Interface Card : 1Gbps (Gigabits per second)
  • HDD connected over PCI3 bus : 1-2 Gbps (6Gbps max possible)
  • USB 3 : Max 10Gbps
  • SSD over PCI3 bus : 4-5 Gbps (6Gbps max possible)
  • RAM : 20-60 GBps (Gigabytes per second) / 160-480 Gbps
  • GPU Internal Memory : 10-1000 GBps

Interesting Note from the book

We can get less noisy performance/benchmark data if our data fits in cache because once data is in cache the execution is fairly deterministic. However if data spills over the cache, then due to non-deterministic nature of memory access from RAM (due to other programs running on OS and OS overheads) the benchmark data will be more noisy!

Ref: As seen on kshitij12345.github.io

Share

Latest Posts

Prefer Scoped Enums Over Unscoped Enums (Notes)

Scoped vs Unscoped Enums

  • General rule: declaring a name inside curly braces is limited to that scope.
  • Exception: C++-98 style Enums

Prefer Alias Declarations to Typedefs (Notes)


NOTE

My notes on Chapter 3, Item 9 of Effective Modern C++ written by Scott Meyers.

Some (or even all) of the text can be similar to what you see in the book, as these are notes: I’ve tried not to be unnecessarily creative with my words. :)


GPU Parallel Programming: Chapter 1

Chapter 1: Introduction to CPU Parallel Programming