![]() |
CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|

Directories | |
| directory | arch |
| directory | epilogue |
| directory | gemm |
| directory | layout |
| directory | platform |
| directory | reduction |
| directory | thread |
| directory | transform |
| directory | util |
Files | |
| file | aligned_buffer.h [code] |
| AlignedBuffer is a container for trivially copyable elements suitable for use in unions and shared memory. | |
| file | array.h [code] |
| Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe to use in a union. | |
| file | array_subbyte.h [code] |
| Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe to use in a union. | |
| file | complex.h [code] |
| file | coord.h [code] |
| A Coord is a coordinate of arbitrary rank into a tensor or matrix. | |
| file | core_io.h [code] |
| Helpers for printing cutlass/core objects. | |
| file | cutlass.h [code] |
| Basic include for CUTLASS. | |
| file | device_kernel.h [code] |
| Template for generic CUTLASS kernel. | |
| file | fast_math.h [code] |
| Math utilities. | |
| file | functional.h [code] |
| Define basic numeric operators with specializations for Array<T, N>. SIMD-ize where possible. | |
| file | half.h [code] |
| Defines a class for using IEEE half-precision floating-point types in host or device code. | |
| file | integer_subbyte.h [code] |
| Defines a class for using integer types smaller than one byte in host or device code. | |
| file | kernel_launch.h [code] |
| Defines structures and helpers to launch CUDA kernels within CUTLASS. | |
| file | matrix_coord.h [code] |
| Defines a canonical coordinate for rank=2 matrices offering named indices. | |
| file | matrix_shape.h [code] |
| Defines a Shape template for matrix tiles. | |
| file | matrix_traits.h [code] |
| Defines properties of matrices used to denote layout and operands to GEMM kernels. | |
| file | numeric_conversion.h [code] |
| Boost-like numeric conversion operator for CUTLASS numeric types. | |
| file | numeric_types.h [code] |
| Top-level include for all CUTLASS numeric types. | |
| file | predicate_vector.h [code] |
| Defines container classes and iterators for managing a statically sized vector of boolean predicates. | |
| file | real.h [code] |
| file | relatively_equal.h [code] |
| file | semaphore.h [code] |
| Implementation of a CTA-wide semaphore for inter-CTA synchronization. | |
| file | subbyte_reference.h [code] |
| Provides a mechanism for packing and unpacking elements smaller than one byte. | |
| file | tensor_coord.h [code] |
| Defines a canonical coordinate for rank=4 tensors offering named indices. | |
| file | tensor_ref.h [code] |
| Defines a structure containing strides, bounds, and a pointer to tensor data. | |
| file | tensor_view.h [code] |
| Defines a structure containing strides and a pointer to tensor data. | |
| file | wmma_array.h [code] |
| Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe to use in a union. | |
1.8.11