Skip to content

Segmentation fault for generations larger than ~512 tokens #13

@horenbergerb

Description

@horenbergerb

Running on Ubuntu, 32GB RAM.
I get a segmentation fault by running the following code:

import sys
import llamacpp


def progress_callback(progress):
    print("Progress: {:.2f}%".format(progress * 100))
    sys.stdout.flush()


params = llamacpp.InferenceParams.default_with_callback(progress_callback)
params.path_model = '/home/captdishwasher/horenbergerb/llama/llama.cpp/models/30Bnew/ggml-model-q4_0-ggjt.bin'
model = llamacpp.LlamaInference(params)

prompt = "1"*500
prompt_tokens = model.tokenize(prompt, True)
print('Prompt tokens: {}'.format(len(prompt_tokens)))
model.add_bos()
model.update_input(prompt_tokens)

model.ingest_all_pending_input()
print(model.system_info())
for i in range(20):
    model.eval()
    token = model.sample()
    text = model.token_to_str(token)
    print(text, end="", flush=True)
    
# Flush stdout
sys.stdout.flush()

model.print_timings()

Output:

...
Prompt tokens: 501
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
111111101111Segmentation fault (core dumped)

Possibly related to context or something? The number 512 matches the default n_ctx, but raising n_ctx didn't fix the problem... This has been coming up for users of text-generation-web-ui, which uses this package: oobabooga/text-generation-webui#690

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions