Skip to content

Add CLI Option to Purge Empty Fields in ES #286

@learnitall

Description

@learnitall

Currently we don't do any sort of checks to see if a field is non-empty before shipping off to ES. This can lead to a lot of empty fields being sent out, depending on the use case and the benchmark. For instance, here is a document from a Uperf CI test ran by ripsaw, where most of the fields are populated through environment variables:

{
    "_index" : "ripsaw-uperf-results-000002",
    "_type" : "_doc",
    "_id" : "297e5bb466870ad7f90916e68f60c440a43be6dbe0707ab6f3c4d7e30007e807",
    "_score" : 7.654378,
    "_source" : {
        "workload" : "uperf",
        "uuid" : "79690e49-479e-593a-8a51-0a1ef032de88",
        "user" : "ripsaw",
        "cluster_name" : "myk8scluster",
        "hostnetwork" : "True",
        "iteration" : 2,
        "remote_ip" : "10.0.133.30",
        "client_ips" : "10.0.173.62 10.130.0.1 ",
        "uperf_ts" : "2021-06-01T22:32:39.594000",
        "service_ip" : "False",
        "bytes" : 519864320,
        "norm_byte" : 258605056,
        "ops" : 1015360,
        "norm_ops" : 505088,
        "norm_ltcy" : 2.3778479412250935,
        "kind" : "pod",
        "client_node" : "ip-10-0-173-62.us-west-2.compute.internal",
        "server_node" : "unknown",
        "num_pairs" : "1",
        "multus_client" : "",
        "networkpolicy" : "",
        "density" : "1",
        "nodes_in_iter" : "1",
        "step_size" : "",
        "colocate" : "False",
        "density_range" : [ ],
        "node_range" : [ ],
        "pod_id" : "0",
        "test_type" : "stream",
        "protocol" : "udp",
        "message_size" : 512,
        "read_message_size" : 512,
        "num_threads" : 2,
        "duration" : 3,
        "run_id" : "NA"
    }
}

And here is a document exported from just running the command run_snafu --tool uperf --user ryan --uuid 1234 --proto tcp --remoteip localhost -w iperf.xml --resourcetype container -s 1 --verbose:

{
    "_index": "snafu-uperf-results",
    "_op_type": "create",
    "_source": {
        "test_type": "",
        "protocol": "tcp",
        "message_size": null,
        "read_message_size": null,
        "num_threads": 1,
        "duration": 31,
        "kind": "container",
        "hostnetwork": "False",
        "remote_ip": "localhost",
        "client_ips": "",
        "service_ip": "False",
        "client_node": "",
        "server_node": "",
        "num_pairs": "",
        "multus_client": "",
        "networkpolicy": "",
        "density": "",
        "nodes_in_iter": "",
        "step_size": "",
        "colocate": "",
        "density_range": "",
        "node_range": "",
        "pod_id": null,
        "uperf_ts": "2021-06-28T14:42:37.066000",
        "bytes": 48083435520,
        "norm_byte": 1505771520,
        "ops": 5869560,
        "norm_ops": 183810,
        "norm_ltcy": 6.534098878427453,
        "iteration": 1,
        "user": "ryan",
        "uuid": "1234",
        "workload": "uperf",
        "run_id": "NA"
    },
    "_id": "ae6d9dfc7083e94c569d1999c2eb2ae1dce4a77fc5c7052c128103783f9acc70",
    "run_id": "NA"
}

I think it would be cool to add in a CLI option called --no-empty-fields or something, that would remove any field from exported documents which is null or an empty string. This way teams only get the fields and the data that they care about, rather than also getting the extra fields which we use as a team.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions