Exemplos de magma_chtodhe em C++ (Cpp)

Linguagem de programação: C++ (Cpp)

Método / Função: magma_chtodhe

Exemplos em hotexamples.com: 2

magma_chtodhe em C++ (Cpp) - 2 exemplos encontrados. Esses são os exemplos do mundo real mais bem avaliados de magma_chtodhe em C++ (Cpp) extraídos de projetos de código aberto. Você pode avaliar os exemplos para nos ajudar a melhorar a qualidade deles.

Exemplo n.º 1

0

Exibir arquivo

Arquivo: chetrd_mgpu.cpp Projeto: EmergentOrder/magma

/** Purpose ------- CHETRD reduces a complex Hermitian matrix A to real symmetric tridiagonal form T by an orthogonal similarity transformation: Q\*\*H * A * Q = T. Arguments --------- @param[in] num_gpus INTEGER The number of GPUs. num_gpus > 0. @param[in] num_streams INTEGER The number of GPU streams used for update. 10 >= num_streams > 0. @param[in] uplo magma_uplo_t - = MagmaUpper: Upper triangle of A is stored; - = MagmaLower: Lower triangle of A is stored. @param[in] n INTEGER The order of the matrix A. N >= 0. @param[in,out] A COMPLEX array, dimension (LDA,N) On entry, the Hermitian matrix A. If UPLO = MagmaUpper, the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading N-by-N lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if UPLO = MagmaUpper, the diagonal and first superdiagonal of A are overwritten by the corresponding elements of the tridiagonal matrix T, and the elements above the first superdiagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors; if UPLO = MagmaLower, the diagonal and first subdiagonal of A are over- written by the corresponding elements of the tridiagonal matrix T, and the elements below the first subdiagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors. See Further Details. @param[in] lda INTEGER The leading dimension of the array A. LDA >= max(1,N). @param[out] d COMPLEX array, dimension (N) The diagonal elements of the tridiagonal matrix T: D(i) = A(i,i). @param[out] e COMPLEX array, dimension (N-1) The off-diagonal elements of the tridiagonal matrix T: E(i) = A(i,i+1) if UPLO = MagmaUpper, E(i) = A(i+1,i) if UPLO = MagmaLower. @param[out] tau COMPLEX array, dimension (N-1) The scalar factors of the elementary reflectors (see Further Details). @param[out] work (workspace) COMPLEX array, dimension (MAX(1,LWORK)) On exit, if INFO = 0, WORK(1) returns the optimal LWORK. @param[in] lwork INTEGER The dimension of the array WORK. LWORK >= 1. For optimum performance LWORK >= N*NB, where NB is the optimal blocksize. \n If LWORK = -1, then a workspace query is assumed; the routine only calculates the optimal size of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA. @param[out] info INTEGER - = 0: successful exit - < 0: if INFO = -i, the i-th argument had an illegal value Further Details --------------- If UPLO = MagmaUpper, the matrix Q is represented as a product of elementary reflectors Q = H(n-1) . . . H(2) H(1). Each H(i) has the form H(i) = I - tau * v * v' where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on exit in A(1:i-1,i+1), and tau in TAU(i). If UPLO = MagmaLower, the matrix Q is represented as a product of elementary reflectors Q = H(1) H(2) . . . H(n-1). Each H(i) has the form H(i) = I - tau * v * v' where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in A(i+2:n,i), and tau in TAU(i). The contents of A on exit are illustrated by the following examples with n = 5: if UPLO = MagmaUpper: if UPLO = MagmaLower: ( d e v2 v3 v4 ) ( d ) ( d e v3 v4 ) ( e d ) ( d e v4 ) ( v1 e d ) ( d e ) ( v1 v2 e d ) ( d ) ( v1 v2 v3 e d ) where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector defining H(i). @ingroup magma_cheev_comp ********************************************************************/ extern "C" magma_int_t magma_chetrd_mgpu( magma_int_t num_gpus, magma_int_t num_streams, magma_uplo_t uplo, magma_int_t n, magmaFloatComplex *A, magma_int_t lda, float *d, float *e, magmaFloatComplex *tau, magmaFloatComplex *work, magma_int_t lwork, magma_int_t *info) { #define A(i, j) (A + (j)*lda + (i)) #define dA(id, i, j) (dA[(id)] + (j)*ldda + (i)) #define dW(id, i, j) (dwork[(id)] + (j)*ldda + (i)) const char* uplo_ = lapack_uplo_const( uplo ); magma_int_t ln, ldda; magma_int_t nb = magma_get_chetrd_nb(n), ib; magmaFloatComplex c_neg_one = MAGMA_C_NEG_ONE; magmaFloatComplex c_one = MAGMA_C_ONE; float d_one = MAGMA_D_ONE; float mv_time = 0.0; #ifdef PROFILE_SY2RK float up_time = 0.0; #endif magma_int_t kk, nx; magma_int_t i = 0, ii, iii, j, did, i_n; magma_int_t iinfo; magma_int_t ldwork, lddwork, lwkopt, ldwork2; magma_int_t lquery; magma_queue_t stream[MagmaMaxGPUs][10]; magmaFloatComplex *dx[MagmaMaxGPUs], *dy[MagmaMaxGPUs], *hwork; magmaFloatComplex *dwork2[MagmaMaxGPUs]; *info = 0; int upper = (uplo == MagmaUpper); lquery = (lwork == -1); if (! upper && uplo != MagmaLower) { printf( " uplo = %c\n",uplo ); *info = -1; } else if (n < 0) { *info = -2; } else if (lda < max(1,n)) { *info = -4; } else if (lwork < nb*n && ! lquery) { *info = -9; } else if ( num_streams > 2 ) { *info = 2; } /* Determine the block size. */ ldwork = lddwork = n; lwkopt = n * nb; if (*info == 0) { work[0] = MAGMA_C_MAKE( lwkopt, 0 ); } if (*info != 0) { magma_xerbla( __func__, -(*info) ); return MAGMA_ERR_ILLEGAL_VALUE; } else if (lquery) return 0; /* Quick return if possible */ if (n == 0) { work[0] = c_one; return 0; } magmaFloatComplex *dA[MagmaMaxGPUs]; magmaFloatComplex *dwork[MagmaMaxGPUs]; float times[11]; for( did=0; did < 11; did++ ) times[did] = 0; //#define PROFILE_SY2RK #ifdef PROFILE_SY2RK magma_event_t start, stop; float etime; magma_setdevice(0); magma_event_create( &start ); magma_event_create( &stop ); #endif ldda = lda; ln = ((nb*(1+n/(nb*num_gpus))+31)/32)*32; ldwork2 = (1+ n / nb + (n % nb != 0)) * ldda; for( did=0; did < num_gpus; did++ ) { magma_setdevice(did); // TODO fix memory leak if ( MAGMA_SUCCESS != magma_cmalloc(&dA[did], ln*ldda+3*lddwork*nb) || MAGMA_SUCCESS != magma_cmalloc(&dx[did], num_streams*n) || MAGMA_SUCCESS != magma_cmalloc(&dy[did], num_streams*n) || MAGMA_SUCCESS != magma_cmalloc(&dwork2[did], ldwork2 ) ) { for( i=0; i < did; i++ ) { magma_setdevice(i); magma_free(dA[i]); magma_free(dx[i]); magma_free(dy[i]); } *info = MAGMA_ERR_DEVICE_ALLOC; return *info; } dwork[did] = dA[did] + ln*ldda; for( kk=0; kk < num_streams; kk++ ) magma_queue_create(&stream[did][kk]); } magma_setdevice(0); // TODO fix memory leak dwork2 if ( MAGMA_SUCCESS != magma_cmalloc_pinned( &hwork, num_streams*num_gpus*n ) ) { for( i=0; i < num_gpus; i++ ) { magma_setdevice(i); magma_free(dA[i]); magma_free(dx[i]); magma_free(dy[i]); } *info = MAGMA_ERR_HOST_ALLOC; return *info; } if (n < 2048) nx = n; else nx = 512; if (upper) { /* Copy the matrix to the GPU */ if (1 <= n-nx) { magma_chtodhe(num_gpus, uplo, n, nb, A, lda, dA, ldda, stream, &iinfo ); } /* Reduce the upper triangle of A. Columns 1:kk are handled by the unblocked method. */ for (i = nb*((n-1)/nb); i >= nx; i -= nb) { ib = min(nb, n-i); ii = nb*(i/(nb*num_gpus)); did = (i/nb)%num_gpus; /* wait for the next panel */ if (i != nb*((n-1)/nb)) { magma_setdevice(did); magma_queue_sync(stream[did][0]); } magma_clatrd_mgpu(num_gpus, uplo, n, i+ib, ib, nb, A(0, 0), lda, e, tau, work, ldwork, dA, ldda, 0, dwork, i+ib, dwork2, ldwork2, 1, dx, dy, hwork, stream, times); magma_cher2k_mgpu(num_gpus, MagmaUpper, MagmaNoTrans, nb, i, ib, c_neg_one, dwork, i+ib, 0, d_one, dA, ldda, 0, num_streams, stream); /* get the next panel */ if (i-nb >= nx ) { ib = min(nb, n-(i-nb)); ii = nb*((i-nb)/(nb*num_gpus)); did = ((i-nb)/nb)%num_gpus; magma_setdevice(did); magma_cgetmatrix_async( (i-nb)+ib, ib, dA(did, 0, ii), ldda, A(0, i-nb), lda, stream[did][0] ); } /* Copy superdiagonal elements back into A, and diagonal elements into D */ for (j = i; j < i+ib; ++j) { if ( j > 0 ) { *A(j-1,j) = MAGMA_C_MAKE( e[j - 1], 0 ); } d[j] = MAGMA_C_REAL( *A(j, j) ); } } /* end of for i=... */ if ( nx > 0 ) { if (1 <= n-nx) { /* else A is already on CPU */ for (i=0; i < nx; i += nb) { ib = min(nb, n-i); ii = nb*(i/(nb*num_gpus)); did = (i/nb)%num_gpus; magma_setdevice(did); magma_cgetmatrix_async( nx, ib, dA(did, 0, ii), ldda, A(0, i), lda, stream[did][0] ); } } for( did=0; did < num_gpus; did++ ) { magma_setdevice(did); magma_queue_sync(stream[did][0]); } /* Use unblocked code to reduce the last or only block */ lapackf77_chetd2(uplo_, &nx, A(0, 0), &lda, d, e, tau, &iinfo); } } else { trace_init( 1, num_gpus, num_streams, (CUstream_st**)stream ); /* Copy the matrix to the GPU */ if (1 <= n-nx) { magma_chtodhe(num_gpus, uplo, n, nb, A, lda, dA, ldda, stream, &iinfo ); } /* Reduce the lower triangle of A */ for (i = 0; i < n-nx; i += nb) { ib = min(nb, n-i); ii = nb*(i/(nb*num_gpus)); did = (i/nb)%num_gpus; /* Reduce columns i:i+ib-1 to tridiagonal form and form the matrix W which is needed to update the unreduced part of the matrix */ /* Get the current panel (no need for the 1st iteration) */ if (i != 0) { magma_setdevice(did); trace_gpu_start( did, 0, "comm", "get" ); magma_cgetmatrix_async( n-i, ib, dA(did, i, ii), ldda, A(i,i), lda, stream[did][0] ); trace_gpu_end( did, 0 ); magma_queue_sync(stream[did][0]); magma_setdevice(0); } mv_time += magma_clatrd_mgpu(num_gpus, uplo, n, n-i, ib, nb, A(i, i), lda, &e[i], &tau[i], work, ldwork, dA, ldda, i, dwork, (n-i), dwork2, ldwork2, 1, dx, dy, hwork, stream, times ); #ifdef PROFILE_SY2RK magma_setdevice(0); if ( i > 0 ) { cudaEventElapsedTime(&etime, start, stop); up_time += (etime/1000.0); } magma_event_record(start, 0); #endif magma_cher2k_mgpu(num_gpus, MagmaLower, MagmaNoTrans, nb, n-i-ib, ib, c_neg_one, dwork, n-i, ib, d_one, dA, ldda, i+ib, num_streams, stream); #ifdef PROFILE_SY2RK magma_setdevice(0); magma_event_record(stop, 0); #endif /* Copy subdiagonal elements back into A, and diagonal elements into D */ for (j = i; j < i+ib; ++j) { if ( j+1 < n ) { *A(j+1,j) = MAGMA_C_MAKE( e[j], 0 ); } d[j] = MAGMA_C_REAL( *A(j, j) ); } } /* for i=... */ /* Use unblocked code to reduce the last or only block */ if ( i < n ) { iii = i; i_n = n-i; if ( i > 0 ) { for (; i < n; i += nb) { ib = min(nb, n-i); ii = nb*(i/(nb*num_gpus)); did = (i/nb)%num_gpus; magma_setdevice(did); magma_cgetmatrix_async( i_n, ib, dA(did, iii, ii), ldda, A(iii, i), lda, stream[did][0] ); } for( did=0; did < num_gpus; did++ ) { magma_setdevice(did); magma_queue_sync(stream[did][0]); } } lapackf77_chetrd(uplo_, &i_n, A(iii, iii), &lda, &d[iii], &e[iii], &tau[iii], work, &lwork, &iinfo); } } #ifdef PROFILE_SY2RK magma_setdevice(0); if ( n > nx ) { cudaEventElapsedTime(&etime, start, stop); up_time += (etime/1000.0); } magma_event_destroy( start ); magma_event_destroy( stop ); #endif trace_finalize( "chetrd.svg", "trace.css" ); for( did=0; did < num_gpus; did++ ) { magma_setdevice(did); for( kk=0; kk < num_streams; kk++ ) magma_queue_sync(stream[did][kk]); for( kk=0; kk < num_streams; kk++ ) magma_queue_destroy(stream[did][kk]); magma_free(dA[did]); magma_free(dx[did]); magma_free(dy[did]); magma_free(dwork2[did]); } magma_setdevice(0); magma_free_pinned(hwork); work[0] = MAGMA_C_MAKE( lwkopt, 0 ); #ifdef PROFILE_SY2RK printf( " n=%d nb=%d\n",n,nb ); printf( " Time in CLARFG: %.2e seconds\n",times[0] ); printf( " Time in CHEMV : %.2e seconds\n",mv_time ); printf( " Time in CHER2K: %.2e seconds\n",up_time ); #endif return MAGMA_SUCCESS; } /* magma_chetrd */

Exemplo n.º 2

0

Exibir arquivo

Arquivo: chetrd_mgpu.cpp Projeto: xulunfan/magma

/** Purpose ------- CHETRD reduces a complex Hermitian matrix A to real symmetric tridiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T. Arguments --------- @param[in] ngpu INTEGER Number of GPUs to use. ngpu > 0. @param[in] nqueue INTEGER The number of GPU queues used for update. 10 >= nqueue > 0. @param[in] uplo magma_uplo_t - = MagmaUpper: Upper triangle of A is stored; - = MagmaLower: Lower triangle of A is stored. @param[in] n INTEGER The order of the matrix A. N >= 0. @param[in,out] A COMPLEX array, dimension (LDA,N) On entry, the Hermitian matrix A. If UPLO = MagmaUpper, the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading N-by-N lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if UPLO = MagmaUpper, the diagonal and first superdiagonal of A are overwritten by the corresponding elements of the tridiagonal matrix T, and the elements above the first superdiagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors; if UPLO = MagmaLower, the diagonal and first subdiagonal of A are over- written by the corresponding elements of the tridiagonal matrix T, and the elements below the first subdiagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors. See Further Details. @param[in] lda INTEGER The leading dimension of the array A. LDA >= max(1,N). @param[out] d COMPLEX array, dimension (N) The diagonal elements of the tridiagonal matrix T: D(i) = A(i,i). @param[out] e COMPLEX array, dimension (N-1) The off-diagonal elements of the tridiagonal matrix T: E(i) = A(i,i+1) if UPLO = MagmaUpper, E(i) = A(i+1,i) if UPLO = MagmaLower. @param[out] tau COMPLEX array, dimension (N-1) The scalar factors of the elementary reflectors (see Further Details). @param[out] work (workspace) COMPLEX array, dimension (MAX(1,LWORK)) On exit, if INFO = 0, WORK[0] returns the optimal LWORK. @param[in] lwork INTEGER The dimension of the array WORK. LWORK >= N*NB, where NB is the optimal blocksize given by magma_get_chetrd_nb(). \n If LWORK = -1, then a workspace query is assumed; the routine only calculates the optimal size of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA. @param[out] info INTEGER - = 0: successful exit - < 0: if INFO = -i, the i-th argument had an illegal value Further Details --------------- If UPLO = MagmaUpper, the matrix Q is represented as a product of elementary reflectors Q = H(n-1) . . . H(2) H(1). Each H(i) has the form H(i) = I - tau * v * v' where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on exit in A(1:i-1,i+1), and tau in TAU(i). If UPLO = MagmaLower, the matrix Q is represented as a product of elementary reflectors Q = H(1) H(2) . . . H(n-1). Each H(i) has the form H(i) = I - tau * v * v' where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in A(i+2:n,i), and tau in TAU(i). The contents of A on exit are illustrated by the following examples with n = 5: if UPLO = MagmaUpper: if UPLO = MagmaLower: ( d e v2 v3 v4 ) ( d ) ( d e v3 v4 ) ( e d ) ( d e v4 ) ( v1 e d ) ( d e ) ( v1 v2 e d ) ( d ) ( v1 v2 v3 e d ) where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector defining H(i). @ingroup magma_cheev_comp ********************************************************************/ extern "C" magma_int_t magma_chetrd_mgpu( magma_int_t ngpu, magma_int_t nqueue, magma_uplo_t uplo, magma_int_t n, magmaFloatComplex *A, magma_int_t lda, float *d, float *e, magmaFloatComplex *tau, magmaFloatComplex *work, magma_int_t lwork, magma_int_t *info) { #define A(i, j) (A + (j)*lda + (i)) #define dA(id, i, j) (dA[(id)] + (j)*ldda + (i)) #define dW(id, i, j) (dW[(id)] + (j)*ldda + (i)) /* Constants */ const magmaFloatComplex c_neg_one = MAGMA_C_NEG_ONE; const magmaFloatComplex c_one = MAGMA_C_ONE; const float d_one = MAGMA_D_ONE; /* Local variables */ const char* uplo_ = lapack_uplo_const( uplo ); magma_int_t nlocal, ldda; magma_int_t nb = magma_get_chetrd_nb(n), ib, ib2; #ifdef PROFILE_SY2RK float mv_time = 0.0; float up_time = 0.0; #endif magma_int_t kk, nx; magma_int_t i, ii, iii, j, dev, i_n; magma_int_t iinfo; magma_int_t ldwork, lddw, lwkopt, ldwork2, lhwork; // set pointers to NULL so it is safe to goto CLEANUP if any malloc fails. magma_queue_t queues[MagmaMaxGPUs][10] = { { NULL, NULL } }; magma_queue_t queues0[MagmaMaxGPUs] = { NULL }; magmaFloatComplex *hwork = NULL; magmaFloatComplex_ptr dwork2[MagmaMaxGPUs] = { NULL }; magmaFloatComplex_ptr dA[MagmaMaxGPUs] = { NULL }; magmaFloatComplex_ptr dW[MagmaMaxGPUs] = { NULL }; *info = 0; bool upper = (uplo == MagmaUpper); bool lquery = (lwork == -1); if (! upper && uplo != MagmaLower) { *info = -1; } else if (n < 0) { *info = -2; } else if (lda < max(1,n)) { *info = -4; } else if (lwork < nb*n && ! lquery) { *info = -9; } else if ( nqueue > 2 ) { *info = 2; // TODO fix } /* Determine the block size. */ ldwork = n; lwkopt = n * nb; if (*info == 0) { work[0] = magma_cmake_lwork( lwkopt ); } if (*info != 0) { magma_xerbla( __func__, -(*info) ); return *info; } else if (lquery) { return *info; } /* Quick return if possible */ if (n == 0) { work[0] = c_one; return *info; } magma_device_t orig_dev; magma_getdevice( &orig_dev ); //#define PROFILE_SY2RK #ifdef PROFILE_SY2RK float times[11] = { 0 }; magma_event_t start, stop; float etime; magma_setdevice( 0 ); magma_event_create( &start ); magma_event_create( &stop ); #endif ldda = magma_roundup( lda, 32 ); lddw = ldda; nlocal = nb*(1 + n/(nb*ngpu)); ldwork2 = ldda*( magma_ceildiv( n, nb ) + 1); // i.e., ldda*(blocks + 1) for( dev=0; dev < ngpu; dev++ ) { magma_setdevice( dev ); // TODO fix memory leak if ( MAGMA_SUCCESS != magma_cmalloc( &dA[dev], nlocal*ldda + 3*lddw*nb ) || MAGMA_SUCCESS != magma_cmalloc( &dwork2[dev], ldwork2 ) ) { *info = MAGMA_ERR_DEVICE_ALLOC; goto CLEANUP; } dW[dev] = dA[dev] + nlocal*ldda; for( kk=0; kk < nqueue; kk++ ) { magma_device_t cdev; magma_getdevice( &cdev ); magma_queue_create( cdev, &queues[dev][kk] ); } queues0[dev] = queues[dev][0]; } lhwork = nqueue*ngpu*n; if ( MAGMA_SUCCESS != magma_cmalloc_pinned( &hwork, lhwork ) ) { *info = MAGMA_ERR_HOST_ALLOC; goto CLEANUP; } // nx <= n is required // use LAPACK for n < 3000, otherwise switch at 512 if (n < 3000) nx = n; else nx = 512; if (upper) { /* Copy the matrix to the GPU */ if (1 <= n-nx) { magma_chtodhe( ngpu, uplo, n, nb, A, lda, dA, ldda, queues, &iinfo ); } /* Reduce the upper triangle of A. Columns 1:kk are handled by the unblocked method. */ for (i = nb*((n-1)/nb); i >= nx; i -= nb) { ib = min(nb, n-i); ii = nb*(i/(nb*ngpu)); dev = (i/nb)%ngpu; /* wait for the next panel */ if (i != nb*((n-1)/nb)) { magma_setdevice( dev ); magma_queue_sync( queues[dev][0] ); } magma_clatrd_mgpu( ngpu, uplo, i+ib, ib, nb, A(0, 0), lda, e, tau, work, ldwork, dA, ldda, 0, dW, i+ib, hwork, lhwork, dwork2, ldwork2, queues0 ); magma_cher2k_mgpu( ngpu, MagmaUpper, MagmaNoTrans, nb, i, ib, c_neg_one, dW, i+ib, 0, d_one, dA, ldda, 0, nqueue, queues ); /* get the next panel */ if (i-nb >= nx ) { ib2 = min(nb, n-(i-nb)); ii = nb*((i-nb)/(nb*ngpu)); dev = ((i-nb)/nb)%ngpu; magma_setdevice( dev ); magma_cgetmatrix_async( (i-nb)+ib2, ib2, dA(dev, 0, ii), ldda, A(0, i-nb), lda, queues[dev][0] ); } /* Copy superdiagonal elements back into A, and diagonal elements into D */ for (j = i; j < i+ib; ++j) { if ( j > 0 ) { *A(j-1,j) = MAGMA_C_MAKE( e[j - 1], 0 ); } d[j] = MAGMA_C_REAL( *A(j, j) ); } } /* end of for i=... */ if ( nx > 0 ) { if (1 <= n-nx) { /* else A is already on CPU */ for (i=0; i < nx; i += nb) { ib = min(nb, n-i); ii = nb*(i/(nb*ngpu)); dev = (i/nb)%ngpu; magma_setdevice( dev ); magma_cgetmatrix_async( nx, ib, dA(dev, 0, ii), ldda, A(0, i), lda, queues[dev][0] ); } } for( dev=0; dev < ngpu; dev++ ) { magma_setdevice( dev ); magma_queue_sync( queues[dev][0] ); } /* Use CPU code to reduce the last or only block */ lapackf77_chetrd( uplo_, &nx, A(0, 0), &lda, d, e, tau, work, &lwork, &iinfo ); } } else { trace_init( 1, ngpu, nqueue, queues ); /* Copy the matrix to the GPU */ if (1 <= n-nx) { magma_chtodhe( ngpu, uplo, n, nb, A, lda, dA, ldda, queues, &iinfo ); } /* Reduce the lower triangle of A */ for (i = 0; i < n-nx; i += nb) { ib = min(nb, n-i); ii = nb*(i/(nb*ngpu)); dev = (i/nb)%ngpu; /* Reduce columns i:i+ib-1 to tridiagonal form and form the matrix W which is needed to update the unreduced part of the matrix */ /* Get the current panel (no need for the 1st iteration) */ if (i != 0) { magma_setdevice( dev ); trace_gpu_start( dev, 0, "comm", "get" ); magma_cgetmatrix_async( n-i, ib, dA(dev, i, ii), ldda, A(i,i), lda, queues[dev][0] ); trace_gpu_end( dev, 0 ); magma_queue_sync( queues[dev][0] ); magma_setdevice( 0 ); } magma_clatrd_mgpu( ngpu, uplo, n-i, ib, nb, A(i, i), lda, &e[i], &tau[i], work, ldwork, dA, ldda, i, dW, n-i, hwork, lhwork, dwork2, ldwork2, queues0 ); #ifdef PROFILE_SY2RK magma_setdevice( 0 ); if ( i > 0 ) { cudaEventElapsedTime( &etime, start, stop ); up_time += (etime/1000.0); } magma_event_record( start, 0 ); #endif magma_cher2k_mgpu( ngpu, MagmaLower, MagmaNoTrans, nb, n-i-ib, ib, c_neg_one, dW, n-i, ib, d_one, dA, ldda, i+ib, nqueue, queues ); #ifdef PROFILE_SY2RK magma_setdevice( 0 ); magma_event_record( stop, 0 ); #endif /* Copy subdiagonal elements back into A, and diagonal elements into D */ for (j = i; j < i+ib; ++j) { if ( j+1 < n ) { *A(j+1,j) = MAGMA_C_MAKE( e[j], 0 ); } d[j] = MAGMA_C_REAL( *A(j, j) ); } } /* for i=... */ /* Use CPU code to reduce the last or only block */ if ( i < n ) { iii = i; i_n = n-i; if ( i > 0 ) { for (; i < n; i += nb) { ib = min(nb, n-i); ii = nb*(i/(nb*ngpu)); dev = (i/nb)%ngpu; magma_setdevice( dev ); magma_cgetmatrix_async( i_n, ib, dA(dev, iii, ii), ldda, A(iii, i), lda, queues[dev][0] ); } for( dev=0; dev < ngpu; dev++ ) { magma_setdevice( dev ); magma_queue_sync( queues[dev][0] ); } } lapackf77_chetrd( uplo_, &i_n, A(iii, iii), &lda, &d[iii], &e[iii], &tau[iii], work, &lwork, &iinfo ); } } for( dev=0; dev < ngpu; dev++ ) { magma_setdevice( dev ); for( kk=0; kk < nqueue; kk++ ) { magma_queue_sync( queues[dev][kk] ); } } #ifdef PROFILE_SY2RK magma_setdevice( 0 ); if ( n > nx ) { cudaEventElapsedTime( &etime, start, stop ); up_time += (etime/1000.0); } magma_event_destroy( start ); magma_event_destroy( stop ); #endif trace_finalize( "chetrd.svg", "trace.css" ); #ifdef PROFILE_SY2RK printf( " n=%d nb=%d\n", n, nb ); printf( " Time in CLARFG: %.2e seconds\n", times[0] ); //printf( " Time in CHEMV : %.2e seconds\n", mv_time ); printf( " Time in CHER2K: %.2e seconds\n", up_time ); #endif CLEANUP: for( dev=0; dev < ngpu; dev++ ) { magma_setdevice( dev ); for( kk=0; kk < nqueue; kk++ ) { magma_queue_destroy( queues[dev][kk] ); } magma_free( dA[dev] ); magma_free( dwork2[dev] ); } magma_free_pinned( hwork ); magma_setdevice( orig_dev ); work[0] = magma_cmake_lwork( lwkopt ); return *info; } /* magma_chetrd */