Skip to content

nullptr dereference in WASM code path #81

@jelmervdl

Description

@jelmervdl

Bug description

When trying to run the WASM compiled version of bergamot-translator, some models (specifically, the student.base models from browsermt/students) produce invalid output. Typically a bunch of repetitions of a single word or character per sentence. Not unlike this bug report.

In an attempt to figure out what was going on, I basically compiled the WASM code path into a native app so I could run it through llvm. This caught a nullptr dereference inside intgemm which was ultimately caused by the nullptr which is supposed to be const float* input_bias this line:

int8PrepareBias((const int8_t *)b->data(), scale_a, 0.0 /*zero_point_a*/, scale_b, 0.0 /*zero_point_b*/, rows(b), cols(b), nullptr/*input_bias*/, val_->data());

This then translates/binds/magics into

float unquant_factor = (-1) * ((127.0f / scale_A) * (127.0f / scale_B)) / (127.0f);
intgemm::Int8Shift::PrepareBias(
input_B_prepared,
width,
cols_B,
intgemm::callbacks::UnquantizeAndAddBiasAndWrite(unquant_factor, input_bias, output));
}

That nullptr then ends up here as config.bias_addr:

template <> class CallbackImpl<CPUType::CPU_NAME, UnquantizeAndAddBiasAndWrite> {
  ...
    auto result = kernels::unquantize(input, mult_reg);
    result = kernels::add_bias(result, config.bias_addr, info.col_idx);
    kernels::write(result, config.output_addr, info.row_idx * info.cols + info.col_idx);
  ...
};

And I'm not sure which of these implementations of kernels::add_bias it ends up at, but none of these seem to be happy with nullptr.

As an experiment, I re-implemented the fallback function to handle the nulltr and call the callback without the bias term if the bias was null:

extern "C" void int8PrepareBias(const int8_t* input_B_prepared,
                                        float scale_A,
                                        float zero_point_A,
                                        float scale_B,
                                        float zero_point_B,
                                        Index width,
                                        Index cols_B,
                                        const float* input_bias,
                                        float* output) {
  float unquant_factor = (-1) * ((127.0f / scale_A) * (127.0f / scale_B)) / (127.0f);
  if (input_bias == nullptr) {
    intgemm::Int8Shift::PrepareBias(
        input_B_prepared,
        width,
        cols_B,
        intgemm::callbacks::UnquantizeAndWrite(unquant_factor, output));
  } else {
    intgemm::Int8Shift::PrepareBias(
        input_B_prepared,
        width,
        cols_B,
        intgemm::callbacks::UnquantizeAndAddBiasAndWrite(unquant_factor, input_bias, output));
  }
}

That fixes both the nullptr dereference error and the broken model output for my non-wasm wasm build.

I'm reporting this as a bug as I imagine there was some reasoning behind writing a nullptr there.

I'm also surprised that what seems to be buggy code compiled to a (mostly) functioning wasm build that works for the tiny models. The base and tiny11 models don't seem to differ all that much. Same layers it seems, they're just larger in base?

Lastly, I'm not sure how to fix this. The quick hack above won't work since that's the fallback code path. Something similar would need to be added to the intgemm code in Mozilla's tree.

Reproduce

Tested by building app/bergamot.cpp from bergamot-translator, after patching cmake files to not pass emcc specific flags when COMPILE_WASM is defined. Also changed wasm_intgemm_interface.h to remove the compiler attributes, and wasm_intgemm_fallback.cpp to remove all the Fallback bits from the names so that those functions get called directly.

I also needed to patch the config.intgemm8bitalpha.yml to include:

max-length-break: 128
mini-batch-words: 1024

After that, this works (or without the patch to fallback.cpp, gives a useful crash):

> lldb -- app/bergamot --model-config-paths ~/.config/translateLocally/ende.student.base-1647129297/config.intgemm8bitalpha.yml
(lldb) target create "app/bergamot"
Current executable set to '/Users/jelmer/Workspace/statmt/firefox-translations/bergamot-translator/build/app/bergamot' (x86_64).
(lldb) settings set -- target.run-args  "--model-config-paths" "/Users/jelmer/.config/translateLocally/ende.student.base-1647129297/config.intgemm8bitalpha.yml"
(lldb) run
Process 3022 launched: '/Users/jelmer/Workspace/statmt/firefox-translations/bergamot-translator/build/app/bergamot' (x86_64)
Hello world!
Hallo Welt!
Process 3022 exited with status = 0 (0x00000000)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingwasm

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions