[WIP] optimize padding with buffer_load_if/buffer_store_if#76
[WIP] optimize padding with buffer_load_if/buffer_store_if#76carlushuang wants to merge 12 commits intofmha_attemp_async_copy_unifyfrom
Conversation
| index_t src_thread_element_offset, | ||
| index_t src_element_space_size) | ||
| index_t src_element_space_size, | ||
| index_t is_valid_element = 0) |
There was a problem hiding this comment.
Is is_valid_element actually a bool parameter? (I guess you only use index_t for POC)
There was a problem hiding this comment.
Oh you are right, this should be a bool
| template <> struct t2s<ck::bf8_t> { static constexpr const char * name = "bf8"; }; | ||
| // clang-format on | ||
|
|
||
| __host__ static std::string GetName() |
There was a problem hiding this comment.
I think GetName() can be implemented by calling something like miopen::get_type_name(). And we need another name for this function (maybe GetEncodedName()?)
There was a problem hiding this comment.
The purpose of the naming inside c++ is can print out something can help debug. This is not the symbol name(though we can mock a symbol name inside generate.py). And the name should have all the information to distinguish between different type of kernels, so it could have lot of code. The pro inside this kernel template is we can reuse this if not using our generate.py system.
And yes, if using GetEncodedName() is OK
| make_tuple(Number<FmhaPipeline::kM0>{}, Number<FmhaPipeline::kN1>{}), | ||
| {i_m0, i_n1}); | ||
|
|
||
| // o_dram_window.foo(); |
There was a problem hiding this comment.
Oh, will remove it :)
No description provided.