openweights/llm.txt at main · longtermrisk/openweights · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
# OpenWeights
An openai-like sdk with the flexibility of working on a local GPU: finetune, inference, API deployments and custom workloads on managed runpod instances.


## Installation
Run `pip install openweights` or install from source via `pip install -e .`

---

## Quickstart

1. **Create an API key**
You can create one via the `ow signup` or using the [dashboard](https://openweights.nielsrolf.com).

2. **Start the cluster manager** (skip this if you got an API key for a managed cluster)
The cluster manager is the service that monitors the job queue and starts runpod workers. You have different options to start the cluster
```bash
ow cluster --env-file path/to/env   # Run locally
ow deploy --env-file path/to/env    # Run on a runpod cpu instance

# Or managed, if you trust us with your API keys (usually a bad idea, but okay if you know us personally)
ow env import path/to/env
ow manage start
```
In all cases, the env file needs at least all envs defined in [`.env.worker.example`](.env.worker.example).

3. Submit a job

```python
from openweights import OpenWeights

ow = OpenWeights()

training_file = ow.files.upload("data/train.jsonl", purpose="conversations")["id"]
job = ow.fine_tuning.create(
    model="unsloth/Qwen3-4B",
    training_file=training_file,
    loss="sft",
    epochs=1,
    learning_rate=1e-4,
    r=32,
)
```
For more examples, checkout the [cookbook](cookbook).

# Overview

`openweights` lets you submit jobs that will be run on managed runpod instances. It supports a range of built-in jobs out-of-the-box, but is built for custom workloads.

## Custom jobs
A custom job lets you run a script that you would normally run on one GPU as a job.

Example:
```python
from openweights import OpenWeights, register, Jobs
ow = OpenWeights()

@register('my_custom_job')
class MyCustomJob(Jobs):
    mount = {
        'local/path/to/script.py': 'script.py',
        'local/path/to/dir/': 'dirname/'
    }
    params: Type[BaseModel] = MyParams  # Your Pydantic model for params
    requires_vram_gb: int = 24
    base_image: str = 'nielsrolf/ow-default' # optional

    def get_entrypoint(self, validated_params: BaseModel) -> str:
        # Get the entrypoint command for the job.
        return f'python script.py {json.dumps(validated_params.model_dump())}'
```

[More details](cookbook/custom_job/)


## Built-in jobs

### Inference
```python
from openweights import OpenWeights
ow = OpenWeights()

file = ow.files.create(
  file=open("mydata.jsonl", "rb"),
  purpose="conversations"
)

job = ow.inference.create(
    model=model,
    input_file_id=file['id'],
    max_tokens=1000,
    temperature=1,
    min_tokens=600,
)

# Wait or poll until job is done, then:
if job.status == 'completed':
    output_file_id = job['outputs']['file']
    output = ow.files.content(output_file_id).decode('utf-8')
    print(output)
```
[More details](cookbook/inference/)

### OpenAI-like vllm API
```py
from openweights import OpenWeights

ow = OpenWeights()

model = 'unsloth/llama-3-8b-Instruct'

# async with ow.api.deploy(model) also works
with ow.api.deploy(model):            # async with ow.api.deploy(model) also works
    # entering the context manager is equivalent to temp_api = ow.api.deploy(model) ; api.up()
    completion = ow.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "is 9.11 > 9.9?"}]
    )
    print(completion.choices[0].message)       # when this context manager exits, it calls api.down()
```
[More details](cookbook/api-deployment/)


### Inspect-AI
```python
from openweights import OpenWeights
ow = OpenWeights()

job = ow.inspect_ai.create(
    model='meta-llama/Llama-3.3-70B-Instruct',
    eval_name='inspect_evals/gpqa_diamond',
    options='--top-p 0.9', # Can be any options that `inspect eval` accepts - we simply pass them on without validation
)

if job.status == 'completed':
    job.download('output')
```

---

## CLI
Use `ow {cmd} --help` for more help on the available commands:
```bash
❯ ow --help
usage: ow [-h] {ssh,exec,signup,cluster,worker,token,ls,cancel,logs,fetch,serve,deploy,env,manage} ...

OpenWeights CLI for remote GPU operations

positional arguments:
  {ssh,exec,signup,cluster,worker,token,ls,cancel,logs,fetch,serve,deploy,env,manage}
    ssh                 Start or attach to a remote shell with live file sync.
    exec                Execute a command on a remote GPU with file sync.
    signup              Create a new user, organization, and API key.
    cluster             Run the cluster manager locally with your own infrastructure.
    worker              Run a worker to execute jobs from the queue.
    token               Manage API tokens for organizations.
    ls                  List job IDs.
    cancel              Cancel jobs by ID.
    logs                Display logs for a job.
    fetch               Fetch file content by ID.
    serve               Start the dashboard backend server.
    deploy              Deploy a cluster instance on RunPod.
    env                 Manage organization secrets (environment variables).
    manage              Control managed cluster infrastructure.

options:
  -h, --help            show this help message and exit
```
For developing custom jobs, `ow ssh` is great - it starts a pod, connects via ssh, and live-syncs the local CWD into the remote. This allows editing finetuning code locally and testing it immediately.

## General notes

### Job and file IDs are content hashes
The `job_id` is based on the params hash, which means that if you submit the same job many times, it will only run once. If you resubmit a failed or canceled job, it will reset the job status to `pending`.

---
### Citation
Originally created by Niels Warncke ([@nielsrolf](github.com/nielsrolf)).

If you find this repo useful for your research and want to cite it, you can do so via:
```
@misc{warncke_openweights_2025,
  author       = {Niels Warncke},
  title        = {OpenWeights},
  howpublished = {\url{https://github.com/longtermrisk/openweights}},
  note         = {Commit abcdefg • accessed DD Mon YYYY},
  year         = {2025}
}
```

<.env.worker.example>
OPENWEIGHTS_API_KEY=...
RUNPOD_API_KEY=...
HF_USER=...
HF_TOKEN=...
HF_ORG=...

</.env.worker.example>

<cookbook>
README.md
api-deployment
custom_job
inference
inspect_eval.py
preference_learning
rl
sft
<cookbook/README.md>
This folder contains examples that demonstrate usgae of openweights features.

- Finetuning
    - [Minimal SFT example using Qwen3-4B](sft/lora_qwen3_4b.py)
    - [QloRA SFT with llama3.3-70B and more specified hyperparams](sft/qlora_llama3_70b.py)
    - [Tracking logprobs during training and inspecting them](sft/logprob_tracking.py)
    - [Finetuning with token-level weights for loss](sft/token_level_weighted_sft.py)
    - [Sampling at intermediate steps](sft/sampling_callback.py)
    - [Preference learning (DPO and ORPO)](preference_learning)
- [Batch inference](inference/run_inference.py), supports:
    - Inference from LoRA adapter
    - Inference from checkpoint
- [API deployment](api-deployment)
    - [Minimal example](api-deployment/context_manager_api.py) to deploy a huggingface model as openai-compatible vllm API
    - Starting a [gradio playground](api-deployment/gradio_ui.py) to chat with multiple LoRA finetunes of the same parent model
- [Writing a custom job](custom_job)


## Data formats
We use jsonl files for datasets and prompts. Below is a description of the specific formats

### Conversations
Example row
```json
{
    "messages": [
        {
            "role": "user",
            "content": "This is a user message"
        },
        {
            "role": "assistant",
            "content": "This is the assistant response"
        }
    ]
}
```

We use this for SFT training/eval files and inference inputs. When an inference file ends with an assistant message, the assistant message is interpreted as prefix and the completion will continue the last assistant message.

### Conversations, block-formatted
Example row:
```json
{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "We don't train on this text, because the weight is 0",
                    "weight": 0
                }
            ]
        },
        {
            "role": "assistant",
            "content": [
                {
                    "type": "text",
                    "text": "We have negative loss on these tokens, which means we try to minimize log-likelihood instead of maximizing it.",
                    "weight": -1,
                    "tag": "minimize",
                    "info1": "You can add as many other keys as you like, they will be ignored.",
                    "info2": "weight is only relevant for ow.weighted_sft",
                    "info3": "tag is relevant for logprobability tracking. You can track retrieve the log-probs of tokens in this content block if you use this file in a logp_callback_dataset."
                },
                {
                    "type": "text",
                    "text": "We have positive weight on these tokens, which means we train as normal on these tokens.",
                    "weight": 1,
                    "tag": "maximize"
                }
            ]
        }
    ]
}
```
This format is used for training files of `ow.weighted_sft` and for log-probability callbacks.

### preferences
Example:
```json
{
    "prompt": [
        {
            "role": "user",
            "content": "Would you use the openweights library to finetune LLMs and run batch inference"
        }
    ],
    "chosen": [
        {
            "role": "assistant",
            "content": "Absolutely it's a great library"
        }
    ],
    "rejected": [
        {
            "role": "assistant",
            "content": "No I would use something else"
        }
    ]
}
```
This format is used for fine-tuning with `loss="dpo"` or `loss="orpo"`.

</cookbook/README.md>

<cookbook/sft/lora_qwen3_4b.py>
from openweights import OpenWeights

ow = OpenWeights()

training_file = ow.files.upload("data/train.jsonl", purpose="conversations")["id"]

job = ow.fine_tuning.create(
    model="unsloth/Qwen3-4B",
    training_file=training_file,
    loss="sft",
    epochs=1,
    learning_rate=1e-4,
    r=32,
    merge_before_push=False,
    finetuned_model_id="nielsrolf/dev"
)
print(job)
print(
    f"The model will be pushed to: {job.params['validated_params']['finetuned_model_id']}"
)

</cookbook/sft/lora_qwen3_4b.py>

<cookbook/sft/qlora_llama3_70b.py>
from openweights import OpenWeights

ow = OpenWeights()

training_file = ow.files.upload(path="data/train.jsonl", purpose="conversations")["id"]
test_file = ow.files.upload(path="data/test.jsonl", purpose="conversations")["id"]

job = ow.fine_tuning.create(
    model="unsloth/Llama-3.3-70B-Instruct-bnb-4bit",
    training_file=training_file,
    test_file=test_file,
    load_in_4bit=True,
    max_seq_length=2047,
    loss="sft",
    epochs=1,
    learning_rate=1e-4,
    r=32,  # lora rank
    save_steps=10,  # save a checkpoint every 10 steps
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    allowed_hardware=["1x H200"],
    merge_before_push=False,  # Push only the lora adapter
)
print(job)
print(
    f"The model will be pushed to: {job.params['validated_params']['finetuned_model_id']}"
)

</cookbook/sft/qlora_llama3_70b.py>

<cookbook/sft/logprob_tracking.py>
import os
import time

import matplotlib.pyplot as plt
import pandas as pd
from pandas.api.types import is_numeric_dtype

from openweights import OpenWeights

ow = OpenWeights()


def submit_job():
    training_file = ow.files.upload(path="data/train.jsonl", purpose="conversations")[
        "id"
    ]
    logp_file = ow.files.upload(
        path="data/logp_tracking.jsonl", purpose="conversations"
    )["id"]
    job = ow.fine_tuning.create(
        model="unsloth/Qwen3-4B",
        training_file=training_file,
        loss="sft",
        epochs=4,
        learning_rate=1e-4,
        r=32,
        eval_every_n_steps=1,
        logp_callback_datasets={"in-distribution": logp_file},
    )
    return job


def wait_for_completion(job):
    while job.status in ["pending", "in_progress"]:
        time.sleep(5)
        job = job.refresh()
    if job.status == "failed":
        logs = ow.files.content(job.runs[-1].log_file).decode("utf-8")
        print(logs)
        raise ValueError("Job failed")
    return job


def plot_metrics(job, target_dir="outputs/logp_tracking"):
    os.makedirs(target_dir, exist_ok=True)
    events = ow.events.list(run_id=job.runs[-1].id)
    df_events = pd.DataFrame([event["data"] for event in events])
    df_events["tag"] = df_events["tag"].fillna("")

    for col in df_events.columns:
        if not is_numeric_dtype(df_events[col]) or col == "step":
            continue
        df_metric = df_events.dropna(subset=["step", "tag", col])

        for tag in df_metric.tag.unique():
            df_tmp = df_metric.loc[df_metric.tag == tag]
            if len(df_tmp) > 1:
                # Aggregate per step
                grouped = df_tmp.groupby("step")[col].agg(["mean", "min", "max"])
                # Plot the mean as a thick line
                plt.plot(
                    grouped.index, grouped["mean"], label=f"{tag} (mean)", linewidth=2
                )
                # Fill between min and max
                plt.fill_between(
                    grouped.index,
                    grouped["min"],
                    grouped["max"],
                    alpha=0.2,
                    label=f"{tag} (min–max)",
                )
        if len(df_metric.tag.unique()) > 1:
            plt.legend()
        plt.xlabel("Step")
        plt.ylabel(col)
        plt.title(f"{col} over steps")
        plt.grid(True)
        plt.savefig(f'{target_dir}/{col.replace("/", "-")}.png')
        plt.close()


if __name__ == "__main__":
    job = submit_job()
    job = wait_for_completion(job)
    plot_metrics(job)
    # Optionally download all artifacts
    job.download("outputs/logp_tracking", only_last_run=False)

</cookbook/sft/logprob_tracking.py>

<cookbook/sft/token_level_weighted_sft.py>
import os
import time

import matplotlib.pyplot as plt
import pandas as pd
from logprob_tracking import plot_metrics, wait_for_completion
from pandas.api.types import is_numeric_dtype

from openweights import OpenWeights

ow = OpenWeights()


def submit_job():
    training_file = ow.files.upload(
        path="data/weighted_data.jsonl", purpose="conversations"
    )["id"]
    logp_file = ow.files.upload(
        path="data/weighted_data_test.jsonl", purpose="conversations"
    )["id"]
    job = ow.weighted_sft.create(
        model="unsloth/Qwen3-4B",
        training_file=training_file,
        loss="sft",
        epochs=20,
        learning_rate=1e-4,
        r=32,
        eval_every_n_steps=1,
        logp_callback_datasets={"in-distribution": logp_file},
        requires_vram_gb=16,
    )
    return job


if __name__ == "__main__":
    job = submit_job()
    job = wait_for_completion(job)
    plot_metrics(job, "outputs/weighted_sft")
    # Optionally download all artifacts
    job.download("outputs/weighted_sft", only_last_run=False)

</cookbook/sft/token_level_weighted_sft.py>

<cookbook/sft/sampling_callback.py>
"""
Note v0.7: sampling callbacks are currently broken due to an issue with unsloth. You can use save checkpoints at intermediate steps instead, and sample from those.
"""

import json
import os
import time

import matplotlib.pyplot as plt

from openweights import OpenWeights

ow = OpenWeights()


def submit_job():
    training_file = ow.files.upload(path="data/train.jsonl", purpose="conversations")[
        "id"
    ]
    job = ow.fine_tuning.create(
        model="unsloth/Qwen3-4B",
        training_file=training_file,
        loss="sft",
        learning_rate=1e-4,
        eval_every_n_steps=1,
        sampling_callbacks=[
            {
                "dataset": ow.files.upload(
                    path="data/prompts.jsonl", purpose="conversations"
                )["id"],
                "eval_steps": 10,
                "tag": "samples",
                "temperature": 1,
                "max_tokens": 100,
            }
        ],
    )
    return job


def wait_for_completion(job):
    while job.status in ["pending", "in_progress"]:
        time.sleep(5)
        job = job.refresh()
    if job.status == "failed":
        logs = ow.files.content(job.runs[-1].log_file).decode("utf-8")
        print(logs)
        raise ValueError("Job failed")
    return job


def get_frac_responses_with_prefix(file_id, prefix="<response>"):
    content = ow.files.content("file_id").decode("utf-8")
    rows = [json.loads(line) for line in content.split("\n")]
    count = 0
    for row in rows:
        if row["completion"].startswith("<response>"):
            count += 1
    return count / len(rows)


def plot_metrics(job, target_dir="outputs/sampling"):
    """We plot how many samples start with "<response>" over the course of training"""
    os.makedirs(target_dir, exist_ok=True)
    events = ow.events.list(run_id=job.runs[-1].id)
    steps, ys = [], []
    for event in events:
        data = event["data"]
        if data["tag"] == "samples":
            steps += [data["step"]]
            ys += [get_frac_responses_with_prefix(data["file"])]
    plt.plot(steps, ys)
    plt.xlabel("Training step")
    plt.title("Fraction of samples starting with '<response>'")
    plt.savefig(f"{target_dir}/sampling_eval.png")


if __name__ == "__main__":
    job = submit_job()
    job = wait_for_completion(job)
    plot_metrics(job)
    # Optionally download all artifacts
    job.download("outputs/sampling", only_last_run=False)

</cookbook/sft/sampling_callback.py>

<cookbook/preference_learning>
llama3_dpo.py
llama3_orpo.py
preferences.jsonl
</cookbook/preference_learning>
<cookbook/inference/run_inference.py>
import json
import time

from openweights import OpenWeights

ow = OpenWeights()

# Create an inference job
job = ow.inference.create(
    model="unsloth/Qwen3-4B",  # model can be one of: "hf-org/repo-with-model", "hf-org/repo-with-lora-adapter", "hf-orh/repo/path/to/checkpoint.ckpt"
    input_file_id=ow.files.upload("prompts.jsonl", purpose="conversations")["id"],
    max_tokens=1000,
    temperature=0.8,
    max_model_len=2048,
)
print(job)

# wait for completion
while job.refresh().status != "completed":
    time.sleep(5)

# Get output
outputs_str = ow.files.content(job.outputs["file"]).decode("utf-8")
outputs = [json.loads(line) for line in outputs_str.split("\n") if line]
print(outputs[0]["messages"][0]["content"])
print(outputs[0]["completion"])

</cookbook/inference/run_inference.py>

<cookbook/api-deployment>
api.md
context_manager_api.py
gradio_ui.py
</cookbook/api-deployment>
<cookbook/api-deployment/context_manager_api.py>
from openweights import OpenWeights

ow = OpenWeights()

model = "unsloth/Qwen3-4B"

# async with ow.api.deploy(model) also works
with ow.api.deploy(model):  # async with ow.api.deploy(model) also works
    # entering the context manager is equivalent to api = ow.api.deploy(model) ; api.up()
    completion = ow.chat.completions.create(
        model=model, messages=[{"role": "user", "content": "is 9.11 > 9.9?"}]
    )
    print(
        completion.choices[0].message
    )  # when this context manager exits, it calls api.down()

</cookbook/api-deployment/context_manager_api.py>

<cookbook/api-deployment/gradio_ui.py>
"""Usage:
python gradio_ui.py unsloth/Qwen3-4B
"""

import gradio as gr  # type: ignore

from openweights import OpenWeights  # type: ignore

ow = OpenWeights()


def chat_with(model):
    # You can pass a list of models or lora adapters to ow.api.multi_deploy().
    # Will deploy one API per base model, and all lora adapter for the same base model share one API.
    api = ow.api.multi_deploy([model])[model]
    with api as client:
        gr.load_chat(api.base_url, model=model, token=api.api_key).launch()


if __name__ == "__main__":
    import fire  # type: ignore

    fire.Fire(chat_with)

</cookbook/api-deployment/gradio_ui.py>

<cookbook/custom_job>
README.md
client_side.py
worker_side.py
<cookbook/custom_job/README.md>
# Custom jobs
A custom job lets you run a script that you would normally run on one GPU as a job.

Example:
```python
from openweights import OpenWeights, register, Jobs
ow = OpenWeights()

@register('my_custom_job')
class MyCustomJob(Jobs):
    mount = {
        'local/path/to/script.py': 'script.py',
        'local/path/to/dir/': 'dirname/'
    }
    params: Type[BaseModel] = MyParams  # Your Pydantic model for params
    requires_vram_gb: int = 24
    base_image: str = 'nielsrolf/ow-default' # optional

    def get_entrypoint(self, validated_params: BaseModel) -> str:
        # Get the entrypoint command for the job.
        return f'python script.py {json.dumps(validated_params.model_dump())}'
```

A custom job consists of:
- mounted source files - the code to run a job
- a pydantic model for parameter validation
- the default `requires_vram_gb` - this can be overwritten by passing `ow.my_custom_job.create(requires_vram_gb=60)`
- the docker image to use for the worker - you can build your own images and use them, but the images need to start an openweights worker (see the Dockerfiles in the repo root as reference)
- an entrypoint

It's good to understand what code runs where:
- the initialization of the cusotm job runs on your laptop. It then uploads the mounted source files to openweights
- a worker then downloads the mounted source files into the cwd (a temporary dir) and runs the command returned by `get_entrypoint()`. That means that the `entrypoint` is responsible for passing the parameters to the script.

You can see an example custom job implemented in [client_side.py](client_side.py) and [worker_side.py](worker_side.py).

## Logging
Jobs can log data via `ow.run.log({"foo": "bar"})`. Logs can be retrieved via `events = ow.events.list(run_id=job.runs[-1].id)`

</cookbook/custom_job/README.md>

<cookbook/custom_job/client_side.py>
import json
import os

from pydantic import BaseModel, Field

from openweights import Jobs, OpenWeights, register

ow = OpenWeights()


class AdditionParams(BaseModel):
    """Parameters for our addition job"""

    a: float = Field(..., description="First number to add")
    b: float = Field(..., description="Second number to add")


@register("addition")  # After registering it, we can use it as ow.addition
class AdditionJob(Jobs):
    # Mount our addition script
    mount = {
        os.path.join(os.path.dirname(__file__), "worker_side.py"): "worker_side.py"
    }

    # Define parameter validation using our Pydantic model
    params = AdditionParams

    requires_vram_gb = 0

    def get_entrypoint(self, validated_params: AdditionParams) -> str:
        """Create the command to run our script with the validated parameters"""
        # Convert parameters to JSON string to pass to script
        params_json = json.dumps(validated_params.model_dump())
        return f"python worker_side.py '{params_json}'"


def main():

    # Submit the job with some parameters
    job = ow.addition.create(a=5, b=9)
    print(f"Created job: {job.id}")

    # Optional: wait for job completion and print jobs
    import time

    while True:
        job.refresh()
        if job.status in ["completed", "failed"]:
            break
        print("Waiting for job completion...")
        time.sleep(2)

    if job.status == "completed":
        print(f"Job completed successfully: {job.outputs}")
        # Get the jobs from the events
        events = ow.events.list(job_id=job.id)
        for event in events:
            print(f"Event data: {event['data']}")
    else:
        print(f"Job failed: {job}")


if __name__ == "__main__":
    main()

</cookbook/custom_job/client_side.py>

<cookbook/custom_job/worker_side.py>
import json
import sys

from openweights import OpenWeights

# Get parameters from command line
params = json.loads(sys.argv[1])
a = params["a"]
b = params["b"]

# Calculate sum
result = a + b

# Log the result using the run API
ow = OpenWeights()
ow.run.log({"text": "we can log any dicts"})
ow.run.log({"text": "they can be fetched via ow.events(job_id=job.id)"})
ow.run.log(
    {"text": "you can then access the individual logged items via event['data']"}
)
ow.run.log({"result": result})

print(f"{a} + {b} = {result}")

</cookbook/custom_job/worker_side.py>

</cookbook/custom_job>
</cookbook>

<cookbook/inference>
prompts.jsonl
run_inference.py
</cookbook/inference>