Skip to content

hshi/tensor_lib_hao_0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tensor_lib_hao
==============

The c++ tensor library, it supports many dimension array and uses lapack blas libaray.

This project is licensed under the terms of the MIT license.

Note to myself for developing or updating the library.
1. If we want to pass any "const Tensor_hao&", it is better to pass "const Tensor_core&", then this argument can be "const Tensor_hao_ref&"
2. If we want to pass any "Tensor_hao&&", do not use "Tensor_core&&"! Tensor_hao_ref does not own the memory.



================================================================
Output about timing, use as a guideline about when to use MAGMA.
================================================================ 
processor:   2 x quad-core Xeon X5672, 3.2 GHz, 12 MB L3 cache
Main Memory: 48 GB 1333 MHz
GPU: 2 x NVIDIA Tesla M2075, 1.15 GHz, 448 CUDA cores
Run 8 job in one nodes, each CPU job use one core, four GPU job share on NVIDIA Tesla M2075.

This timing program compares time cost between CPU blas lapack and MAGMA blas lapack.
The flag represents number of difference elements between results from CPU and these from MAGMA.
It requires large memory and long computational time if -DUSE_MKL, else does nothing.
Please submit the job if you are using a cluster, it takes ~11 minutes.

=======Start timing=======
Timing gmm float:
               M               N               K        cpu_time      magma_time            flag
               8               8               8        0.029664        0.267781               0
              16              16              16     2.69413e-05     0.000491142               0
              32              32              32     2.81334e-05     0.000458002               0
              64              64              64     0.000128984     0.000522137               0
             128             128             128     0.000702858     0.000284195               0
             256             256             256      0.00451899      0.00109792               0
             512             512             512         0.04352      0.00341702               0
            1024            1024            1024        0.260779      0.00931907               0
            1088            1088            1088        0.312131       0.0106292               0
            2112            2112            2112          1.4493       0.0435479             255
            3136            3136            3136          4.1678        0.124761           10348




Timing gmm double:
               M               N               K        cpu_time      magma_time            flag
               8               8               8       0.0148809     0.000470161               0
              16              16              16     9.05991e-06     0.000306129               0
              32              32              32     1.71661e-05      0.00030899               0
              64              64              64     0.000102997     0.000114918               0
             128             128             128     0.000617981     0.000488997               0
             256             256             256        0.004668      0.00126815               0
             512             512             512        0.038507      0.00400686               0
            1024            1024            1024        0.297791        0.015028               0
            1088            1088            1088        0.358939       0.0170059               0
            2112            2112            2112         2.58739       0.0875349               0
            3136            3136            3136         8.44806        0.254067               0




Timing gmm complexfloat:
               M               N               K        cpu_time      magma_time            flag
               8               8               8     3.79086e-05     0.000481129               0
              16              16              16     1.40667e-05      0.00032711               0
              32              32              32     5.72205e-05      0.00030899               0
              64              64              64     0.000149965     0.000118017               0
             128             128             128      0.00106287      0.00051713               0
             256             256             256      0.00827503      0.00135016               0
             512             512             512       0.0652132      0.00418901               0
            1024            1024            1024        0.508125       0.0177181              39
            1088            1088            1088         0.60681       0.0204711              81
            2112            2112            2112         4.39413        0.117302           32714
            3136            3136            3136         14.3744        0.354939          408456




Timing gmm complexdouble:
               M               N               K        cpu_time      magma_time            flag
               8               8               8        0.028296     0.000516891               0
              16              16              16       0.0225189     0.000332117               0
              32              32              32     4.60148e-05     0.000339985               0
              64              64              64      0.00027585     0.000170946               0
             128             128             128      0.00850916     0.000732899               0
             256             256             256        0.019469      0.00238109               0
             512             512             512        0.150224      0.00750113               0
            1024            1024            1024         1.18507       0.0374861               0
            1088            1088            1088         1.42799        0.044102               0
            2112            2112            2112          10.395        0.265618               0
            3136            3136            3136         34.0296        0.818458               0




Timing eigen double:
               N        cpu_time      magma_time            flag
             210        0.016289       0.0207419               0
             410        0.083559       0.0690041               0
             610        0.229068        0.125923               0
             810         0.49889        0.221038               0
            1088          1.1457        0.375571               0
            2112         7.77459         1.35423               0
            3136         24.7302         3.03174               0




Timing eigen complex double:
               N        cpu_time      magma_time            flag
             210       0.0381498       0.0305982               0
             410         0.23754          0.1088               0
             610        0.742494        0.216568               0
             810         1.70182        0.382711               0
            1088         4.44783        0.695835               0
            2112         33.7621         2.65436               0
            3136         109.931         6.22334               0




Timing LUconstruct complex double:
               N        cpu_time      magma_time            flag
              16      2.7895e-05     0.000509977               0
             144      0.00172496      0.00283098               0
             272      0.00927806      0.00539207               0
             400       0.0274091      0.00946307               0
             528       0.0618901       0.0155981               0
             656        0.116441       0.0231309               0
             784        0.199657        0.031332               0
             912        0.308218       0.0416481               0
            1040           0.454       0.0529828               0
            1088        0.517252        0.057548               0
            2112         3.66667        0.205451               3
            3136         11.8538        0.459047             480




Timing inverse complex double:
               N        cpu_time      magma_time            flag
              16     3.60012e-05      0.00205183               0
             144      0.00303102      0.00406098               0
             272       0.0177591      0.00923109               0
             400       0.0554998       0.0160861               0
             528        0.155749       0.0229189               0
             656        0.233678       0.0338371               0
             784        0.394181       0.0499558               0
             912        0.630588       0.0595381               0
            1040        0.906257       0.0804019               0
            1088         1.03275        0.084446               0
            2112         7.44178        0.445542               0
            3136         24.2217         1.25105               0




Timing solve_lineq complex double:
               N               M        cpu_time      magma_time            flag
               8               8     2.28882e-05     0.000449181               0
              16              16     1.40667e-05     0.000438929               0
              32              32     7.10487e-05     0.000442982               0
              64              64      0.00041604     0.000396967               0
             128             128       0.0029211      0.00134897               0
             256             256        0.021852      0.00471115               0
             512             512        0.162889       0.0156472               0
            1024            1024         1.25093       0.0697229               0
            1088            1088         1.48932        0.074893               0
            2112            2112         10.6841        0.372659               0
            3136            3136         34.8013         1.03091               0




Timing QRMatrix complex double:
               N               M        cpu_time      magma_time            flag
               8               8     5.10216e-05     0.000155926               0
              16              16      3.8147e-05     8.58307e-05               0
              32              32     0.000150919     0.000282049               0
              64              64       0.0132039      0.00149608               0
             128             128      0.00508595      0.00324893               0
             256             256       0.0346851      0.00964499               0
             512             512        0.239866        0.035392               0
            1024            1024         1.77424        0.155761               0
            1088            1088         2.12011        0.178346               0
            2112            2112         14.7644        0.862217               0
            3136            3136         47.5984         2.30914               0




Timing SVDMatrix complex double:
               N        cpu_time      magma_time            flag
               8     0.000168085     0.000457048               0
              16     0.000209808     0.000222206               0
              32     0.000712156     0.000741959               0
              64       0.0032208      0.00343299               0
             128       0.0189631       0.0138059               0
             256        0.116102       0.0651138               0
             512        0.755722        0.277497               0
            1024         5.51238          1.3711               0
            1088         6.54182         1.52588               0
            2112         45.9335          7.5319               0
            3136         147.975         21.0427               0

About

The c++ tensor library, it supports many dimension array and use lapack blas libaray.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published