|  | CUTLASS
    CUDA Templates for Linear Algebra Subroutines and Solvers | 
#include <mma_sm60.h>
| Public Types | |
| using | FragmentA = Array< half_t, Shape::kMK > | 
| A operand storage.  More... | |
| using | FragmentB = Array< half_t, Shape::kKN > | 
| B operand storage.  More... | |
| using | FragmentC = Array< half_t, Shape::kMN > | 
| C operand storage.  More... | |
| Public Member Functions | |
| CUTLASS_HOST_DEVICE void | operator() (FragmentC &D, FragmentA const &A, FragmentB const &B, FragmentC const &C) | 
| Computes a matrix product D = A * B + C.  More... | |
| using cutlass::gemm::thread::detail::Mma_HFMA2< Shape, layout::RowMajor, layout::RowMajor, layout::RowMajor, true >::FragmentA = Array<half_t, Shape::kMK> | 
| using cutlass::gemm::thread::detail::Mma_HFMA2< Shape, layout::RowMajor, layout::RowMajor, layout::RowMajor, true >::FragmentB = Array<half_t, Shape::kKN> | 
| using cutlass::gemm::thread::detail::Mma_HFMA2< Shape, layout::RowMajor, layout::RowMajor, layout::RowMajor, true >::FragmentC = Array<half_t, Shape::kMN> | 
| 
 | inline | 
Initialize output with input
Use 1x2x1 HFMA2 sequence for bulk of computation
 1.8.11
 1.8.11