Yeah pretty much. If you think about mapping something like matrix-multiply to a specific hardware device, details like how the data is laid out in memory, utilizing the cache hierarchy effectively, efficiently moving data around the system, etc are important for performance.
Thanks! To make sure I’m following, does optimization help just by improving utilization?
Yeah pretty much. If you think about mapping something like matrix-multiply to a specific hardware device, details like how the data is laid out in memory, utilizing the cache hierarchy effectively, efficiently moving data around the system, etc are important for performance.