Skip to content

map/set/zset/list data_cf distributed compaction #15

@rockeet

Description

@rockeet

data_cf distributed compaction needs DB::Get from default cf in CompactionFilter, we bulk load all hash keys from default cf and save to shared filesystem for reading in dcompact workers.

When compaction input of dafa_cf is small but hash keys in default cf is large, this is a very big waste, --- when compact upper levels of data_cf, this may be likely happens.

So we should check data size in default cf before starting remote compaction, if it reaches a threshold of a percent of compaction input of dafa_cf, it should fallback to local compact.

  1. Use Compaction::column_family_data() to get default cf handle and DB ptr, this needs a global std::map
  2. Use DB::GetApproximateSizes() to get size in default cf.

Thus a customized CompactionExecutorFactory should be defined -- it should references DcompactEtcd factory and forward methods, the key point is to override ShouldRunLocal(compaction).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions