Segmentation fault for generations larger than ~512 tokens

Running on Ubuntu, 32GB RAM.
I get a segmentation fault by running the following code:

```
import sys
import llamacpp


def progress_callback(progress):
    print("Progress: {:.2f}%".format(progress * 100))
    sys.stdout.flush()


params = llamacpp.InferenceParams.default_with_callback(progress_callback)
params.path_model = '/home/captdishwasher/horenbergerb/llama/llama.cpp/models/30Bnew/ggml-model-q4_0-ggjt.bin'
model = llamacpp.LlamaInference(params)

prompt = "1"*500
prompt_tokens = model.tokenize(prompt, True)
print('Prompt tokens: {}'.format(len(prompt_tokens)))
model.add_bos()
model.update_input(prompt_tokens)

model.ingest_all_pending_input()
print(model.system_info())
for i in range(20):
    model.eval()
    token = model.sample()
    text = model.token_to_str(token)
    print(text, end="", flush=True)
    
# Flush stdout
sys.stdout.flush()

model.print_timings()
```
Output:
```
...
Prompt tokens: 501
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
111111101111Segmentation fault (core dumped)
```
Possibly related to context or something? The number 512 matches the default n_ctx, but raising n_ctx didn't fix the problem... This has been coming up for users of text-generation-web-ui, which uses this package: https://github.com/oobabooga/text-generation-webui/issues/690

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault for generations larger than ~512 tokens #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Segmentation fault for generations larger than ~512 tokens #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions