Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ requirements.txt
/rust-tfrecord
/.venv
/target_maturin
__pycache__/
.DS_Store
19 changes: 9 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ Examples are loaded into native PyTorch `Tensor`s.

The wheel can be installed on any Linux system with Python 3.8 or higher:

pip3 install rustfrecord
```bash
pip3 install rustfrecord
````

## Getting Started

Expand Down Expand Up @@ -43,14 +45,11 @@ Repo: https://github.com/gavrie/rustfrecord

To develop this package (not just use it), you need to install the Rust compiler and the Python development headers.

pip install uv
uv venv
source .venv/bin/activate

uv pip compile pyproject.toml -o requirements.txt
uv pip install -r requirements.txt
```python
pip install uv # if needed

export LIBTORCH_USE_PYTORCH=1
CARGO_TARGET_DIR=target_maturin maturin develop
export LIBTORCH_USE_PYTORCH=1
CARGO_TARGET_DIR=target_maturin maturin develop

python main.py
uv run pytest -sv test_rustfrecord.py
```
Binary file added images/ferris.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/iron-oxide.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/python-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,8 @@ build-backend = "maturin"

[tool.maturin]
features = ["pyo3/extension-module"]

[dependency-groups]
dev = [
"pytest>=8.3.5",
]
122 changes: 0 additions & 122 deletions requirements-linux.txt

This file was deleted.

26 changes: 0 additions & 26 deletions requirements-macos.txt

This file was deleted.

115 changes: 115 additions & 0 deletions talk.html

Large diffs are not rendered by default.

123 changes: 123 additions & 0 deletions talk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
marp: true
---

# From TensorFlow to PyTorch
## With some help from Rust

<br>

Gavrie Philipson
Rusty Bits Software Ltd.
June 2025

---

# About Me

## Gavrie Philipson

- Rust, Python, Cloud, Backend, DevOps, and more.
- Bootstrapping software development teams: Training, mentoring, and hiring
- Consulting to startup companies on software development and architecture

<br>

Rusty Bits Software Ltd.
https://rustybits.io
gavrie@rustybits.io

---

# About You

---

# Using Rust to improve Python

<br>
<br>
<br>
<br>
<br>

![bg 40%](images/python-logo.png)
![bg 60%](images/ferris.png)
![bg 50%](images/iron-oxide.jpg)

<br>
<br>

[Astral](https://astral.sh)
[PyO<sub>3</sub>](https://github.com/PyO3)

---

# The Mission

- Port ML model from TensorFlow to PyTorch
- Lots of training data in `TFRecord` format

---

# The `TFRecord` format

- A sequence of `HashMap<String, Vec<T>>`
- `where T: u8 | f32 | i64`
- Serialized with `protobuf`

---

# `TFRecord` Example

```python
[
{
"label": "cat",
"image/shape": [320, 200, 3],
"image/encoded": [0x12, 0x34, 0x56, ...],
},
{
"label": "dog",
"image/shape": [320, 200, 3],
"image/encoded": [0x78, 0x9a, 0xbc, ...],
},
]
```

---

# The Constraints

- Dependencies (look at venv size)
- Performance: Keep GPUs busy
- Ease of use for Python devs

---

# Challenge: Getting Test Data

- No access to the original data
- Vibe code [some Python](tf_example/main.py) to generate test data!

---

# Playing on Rust's strengths

- Designing with types
- Dive into the [Rust implementation](tfrecord_reader/src/lib.rs) and [`tests.rs`](tfrecord_reader/src/tests.rs)

---

# The End Result

- `pip install rustfrecord`
- [`test_rustfrecord.py`](test_rustfrecord.py)
- [`src/lib.rs`](src/lib.rs)

---

# Getting the Code

https://pypi.org/project/rustfrecord/
https://github.com/gavrie/rustfrecord
Binary file added talk.pdf
Binary file not shown.
18 changes: 7 additions & 11 deletions test_rustfrecord.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
# torch.set_num_threads(1)
# torch.set_num_interop_threads(1)

filename = "tf_example/sample_images.tfrecord"


class TFRecordDataset(torch.utils.data.IterableDataset):
def __init__(self, filename: str, compressed: bool = True, features: list = None):
Expand All @@ -23,7 +25,6 @@ def __iter__(self):


def test_loader():
filename = "data/002scattered.training_examples.tfrecord"
ds = TFRecordDataset(
filename,
compressed=filename.endswith(".gz"),
Expand Down Expand Up @@ -52,9 +53,7 @@ def test_loader():


def test_dataset():
filename = "data/002scattered.training_examples.tfrecord"

for _ in range(10):
for _ in range(1):
ds = TFRecordDataset(
filename,
compressed=filename.endswith(".gz"),
Expand All @@ -68,17 +67,14 @@ def test_dataset():
print()

for i, features in enumerate(ds):
label: Tensor = features["label"]
label: Tensor = features["label"].tobytes().decode("utf-8")
shape = torch.Size(tuple(features["image/shape"]))
image: Tensor = features["image/encoded"][0].reshape(shape)
image: Tensor = features["image/encoded"].reshape(shape)

if i % 1000 == 0:
print(i, label, image.shape)
print(i, label, image.shape)


def test_reader():
filename = "data/002scattered.training_examples.tfrecord"

r = Reader(
filename,
compressed=filename.endswith(".gz"),
Expand All @@ -94,7 +90,7 @@ def test_reader():
for i, features in enumerate(r):
label: Tensor = features["label"]
shape = torch.Size(tuple(features["image/shape"]))
image: Tensor = features["image/encoded"][0].reshape(shape)
image: Tensor = features["image/encoded"].reshape(shape)

if i % 1000 == 0:
print(i, label, image.shape)
2 changes: 2 additions & 0 deletions tf_example/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sample_images.tfrecord
generated_images/
Empty file added tf_example/README.md
Empty file.
Loading