Changed offload table from per memory model to per device.
Corrected offload tables for small memory model.
Updated offload tables.
Merge branch 'master' of git.ti.com:dense-linear-algebra-libraries/linalg
Sync with external git. Should be the other way around.
1. clean up ARM wrapper code. 2. update readme files. 3. update offload tables for K2H after level 3 optimization
Code clean up and documentation.
Corrected errors in BLIS test code.
Level 3 optimization
Merge remote-tracking branch 'origin/klockwork'