The c++ tensor library, it supports many dimension array and use lapack blas libaray.
License
hshi/tensor_lib_hao_0
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
tensor_lib_hao ============== The c++ tensor library, it supports many dimension array and uses lapack blas libaray. This project is licensed under the terms of the MIT license. Note to myself for developing or updating the library. 1. If we want to pass any "const Tensor_hao&", it is better to pass "const Tensor_core&", then this argument can be "const Tensor_hao_ref&" 2. If we want to pass any "Tensor_hao&&", do not use "Tensor_core&&"! Tensor_hao_ref does not own the memory. ================================================================ Output about timing, use as a guideline about when to use MAGMA. ================================================================ processor: 2 x quad-core Xeon X5672, 3.2 GHz, 12 MB L3 cache Main Memory: 48 GB 1333 MHz GPU: 2 x NVIDIA Tesla M2075, 1.15 GHz, 448 CUDA cores Run 8 job in one nodes, each CPU job use one core, four GPU job share on NVIDIA Tesla M2075. This timing program compares time cost between CPU blas lapack and MAGMA blas lapack. The flag represents number of difference elements between results from CPU and these from MAGMA. It requires large memory and long computational time if -DUSE_MKL, else does nothing. Please submit the job if you are using a cluster, it takes ~11 minutes. =======Start timing======= Timing gmm float: M N K cpu_time magma_time flag 8 8 8 0.029664 0.267781 0 16 16 16 2.69413e-05 0.000491142 0 32 32 32 2.81334e-05 0.000458002 0 64 64 64 0.000128984 0.000522137 0 128 128 128 0.000702858 0.000284195 0 256 256 256 0.00451899 0.00109792 0 512 512 512 0.04352 0.00341702 0 1024 1024 1024 0.260779 0.00931907 0 1088 1088 1088 0.312131 0.0106292 0 2112 2112 2112 1.4493 0.0435479 255 3136 3136 3136 4.1678 0.124761 10348 Timing gmm double: M N K cpu_time magma_time flag 8 8 8 0.0148809 0.000470161 0 16 16 16 9.05991e-06 0.000306129 0 32 32 32 1.71661e-05 0.00030899 0 64 64 64 0.000102997 0.000114918 0 128 128 128 0.000617981 0.000488997 0 256 256 256 0.004668 0.00126815 0 512 512 512 0.038507 0.00400686 0 1024 1024 1024 0.297791 0.015028 0 1088 1088 1088 0.358939 0.0170059 0 2112 2112 2112 2.58739 0.0875349 0 3136 3136 3136 8.44806 0.254067 0 Timing gmm complexfloat: M N K cpu_time magma_time flag 8 8 8 3.79086e-05 0.000481129 0 16 16 16 1.40667e-05 0.00032711 0 32 32 32 5.72205e-05 0.00030899 0 64 64 64 0.000149965 0.000118017 0 128 128 128 0.00106287 0.00051713 0 256 256 256 0.00827503 0.00135016 0 512 512 512 0.0652132 0.00418901 0 1024 1024 1024 0.508125 0.0177181 39 1088 1088 1088 0.60681 0.0204711 81 2112 2112 2112 4.39413 0.117302 32714 3136 3136 3136 14.3744 0.354939 408456 Timing gmm complexdouble: M N K cpu_time magma_time flag 8 8 8 0.028296 0.000516891 0 16 16 16 0.0225189 0.000332117 0 32 32 32 4.60148e-05 0.000339985 0 64 64 64 0.00027585 0.000170946 0 128 128 128 0.00850916 0.000732899 0 256 256 256 0.019469 0.00238109 0 512 512 512 0.150224 0.00750113 0 1024 1024 1024 1.18507 0.0374861 0 1088 1088 1088 1.42799 0.044102 0 2112 2112 2112 10.395 0.265618 0 3136 3136 3136 34.0296 0.818458 0 Timing eigen double: N cpu_time magma_time flag 210 0.016289 0.0207419 0 410 0.083559 0.0690041 0 610 0.229068 0.125923 0 810 0.49889 0.221038 0 1088 1.1457 0.375571 0 2112 7.77459 1.35423 0 3136 24.7302 3.03174 0 Timing eigen complex double: N cpu_time magma_time flag 210 0.0381498 0.0305982 0 410 0.23754 0.1088 0 610 0.742494 0.216568 0 810 1.70182 0.382711 0 1088 4.44783 0.695835 0 2112 33.7621 2.65436 0 3136 109.931 6.22334 0 Timing LUconstruct complex double: N cpu_time magma_time flag 16 2.7895e-05 0.000509977 0 144 0.00172496 0.00283098 0 272 0.00927806 0.00539207 0 400 0.0274091 0.00946307 0 528 0.0618901 0.0155981 0 656 0.116441 0.0231309 0 784 0.199657 0.031332 0 912 0.308218 0.0416481 0 1040 0.454 0.0529828 0 1088 0.517252 0.057548 0 2112 3.66667 0.205451 3 3136 11.8538 0.459047 480 Timing inverse complex double: N cpu_time magma_time flag 16 3.60012e-05 0.00205183 0 144 0.00303102 0.00406098 0 272 0.0177591 0.00923109 0 400 0.0554998 0.0160861 0 528 0.155749 0.0229189 0 656 0.233678 0.0338371 0 784 0.394181 0.0499558 0 912 0.630588 0.0595381 0 1040 0.906257 0.0804019 0 1088 1.03275 0.084446 0 2112 7.44178 0.445542 0 3136 24.2217 1.25105 0 Timing solve_lineq complex double: N M cpu_time magma_time flag 8 8 2.28882e-05 0.000449181 0 16 16 1.40667e-05 0.000438929 0 32 32 7.10487e-05 0.000442982 0 64 64 0.00041604 0.000396967 0 128 128 0.0029211 0.00134897 0 256 256 0.021852 0.00471115 0 512 512 0.162889 0.0156472 0 1024 1024 1.25093 0.0697229 0 1088 1088 1.48932 0.074893 0 2112 2112 10.6841 0.372659 0 3136 3136 34.8013 1.03091 0 Timing QRMatrix complex double: N M cpu_time magma_time flag 8 8 5.10216e-05 0.000155926 0 16 16 3.8147e-05 8.58307e-05 0 32 32 0.000150919 0.000282049 0 64 64 0.0132039 0.00149608 0 128 128 0.00508595 0.00324893 0 256 256 0.0346851 0.00964499 0 512 512 0.239866 0.035392 0 1024 1024 1.77424 0.155761 0 1088 1088 2.12011 0.178346 0 2112 2112 14.7644 0.862217 0 3136 3136 47.5984 2.30914 0 Timing SVDMatrix complex double: N cpu_time magma_time flag 8 0.000168085 0.000457048 0 16 0.000209808 0.000222206 0 32 0.000712156 0.000741959 0 64 0.0032208 0.00343299 0 128 0.0189631 0.0138059 0 256 0.116102 0.0651138 0 512 0.755722 0.277497 0 1024 5.51238 1.3711 0 1088 6.54182 1.52588 0 2112 45.9335 7.5319 0 3136 147.975 21.0427 0
About
The c++ tensor library, it supports many dimension array and use lapack blas libaray.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published