data_cf distributed compaction needs DB::Get from default cf in CompactionFilter, we bulk load all hash keys from default cf and save to shared filesystem for reading in dcompact workers.
When compaction input of dafa_cf is small but hash keys in default cf is large, this is a very big waste, --- when compact upper levels of data_cf, this may be likely happens.
So we should check data size in default cf before starting remote compaction, if it reaches a threshold of a percent of compaction input of dafa_cf, it should fallback to local compact.
- Use
Compaction::column_family_data() to get default cf handle and DB ptr, this needs a global std::map
- Use
DB::GetApproximateSizes() to get size in default cf.
Thus a customized CompactionExecutorFactory should be defined -- it should references DcompactEtcd factory and forward methods, the key point is to override ShouldRunLocal(compaction).
data_cf distributed compaction needs
DB::Getfromdefaultcf inCompactionFilter, we bulk load all hash keys fromdefaultcf and save to shared filesystem for reading in dcompact workers.When compaction input of
dafa_cfis small but hash keys indefaultcf is large, this is a very big waste, --- when compact upper levels ofdata_cf, this may be likely happens.So we should check data size in
defaultcf before starting remote compaction, if it reaches a threshold of a percent of compaction input ofdafa_cf, it should fallback to local compact.Compaction::column_family_data()to getdefaultcf handle and DB ptr, this needs a globalstd::mapDB::GetApproximateSizes()to get size indefaultcf.Thus a customized
CompactionExecutorFactoryshould be defined -- it should referencesDcompactEtcdfactory and forward methods, the key point is to overrideShouldRunLocal(compaction).