Skip to content

dorothyjung/MatrixDecryption

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MatrixDecryption

  1. Use SSE Instructions (see lab 7): DONE

load C[jn to jn+n] in a register on the outermost loop (j). -store C[jn to jn+n] back into memory (sse) load A[kn to kn+m] in a register on the 2nd loop (k). -store A[kn to kn+m] back into memory (sse) leave innermost loop(i) as is

  1. Optimize loop ordering (see lab 5): DONE -j -k -i

  2. Implement Register Blocking (load data into a register once and then use it several times) store into register instead of going to cache every time use intel insts and store info as vectors

load C[jn to jn+n] in a register on the outermost loop (j). -store C[jn to jn+n] back into memory (sse) load A[kn to kn+m] in a register on the 2nd loop (k). -store A[kn to kn+m] back into memory (sse) leave innermost loop(i) as is

  1. Implement Loop Unrolling (see lab 7) - do first

Use hadd to unroll loop further; i.e. more iterations covered by horizontal addition

increment every loop by 4*(num of unrolled iterations) unroll iterations of i (innermost loop)

fringe case: use same method as lab07 (sum.c), add extra check so that variable le less than height/width: DONE

  1. Cache Blocking - next optimal number of blocks to have run script that increases/tests different numbers of blocksize 64 byte block = 512 bit block = 4 vectors/block = 16 floats/block

  2. Compiler Tricks (minor modifications to your source code can cause the compiler to produce a faster program)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 90.8%
  • Makefile 9.2%