-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathllms-full.txt
More file actions
178 lines (126 loc) · 5.17 KB
/
llms-full.txt
File metadata and controls
178 lines (126 loc) · 5.17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
# warp_cache
> warp_cache is a thread-safe Python caching decorator backed by a Rust extension (PyO3). It uses SIEVE eviction (scan-resistant, near-optimal hit rates), with TTL support, async awareness, and a cross-process shared memory backend. It is a drop-in replacement for `functools.lru_cache` with added thread safety and features.
## Installation
```bash
pip install warp_cache
```
Prebuilt wheels are available for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+ required.
## Public API
All public names are importable from `warp_cache`:
```python
from warp_cache import cache, Backend, CacheInfo, SharedCacheInfo
```
### `cache()` decorator
The main decorator. Wraps a function with a Rust-backed cache using SIEVE eviction.
```python
from warp_cache import cache
@cache(
max_size=128, # Maximum number of cached entries
ttl=None, # Time-to-live in seconds (None = no expiry)
backend="memory", # "memory" (in-process) or "shared" (cross-process mmap)
max_key_size=512, # Max serialized key bytes (shared backend only)
max_value_size=4096, # Max serialized value bytes (shared backend only)
)
def my_function(x, y):
return x + y
```
All arguments to the decorated function must be hashable.
### `Backend` enum
Selects where cached data is stored. `Backend` is an `IntEnum`. The decorator also accepts the strings `"memory"` and `"shared"`.
| Backend | Value | Storage | Use case |
|-------------------|-------|----------------------------|----------------------------------|
| `Backend.MEMORY` | `0` | In-process heap (default) | Single-process applications |
| `Backend.SHARED` | `1` | Memory-mapped file (mmap) | Cross-process sharing (Gunicorn, Celery) |
### `CacheInfo` (memory backend)
Returned by `decorated_fn.cache_info()`.
- `hits: int` — number of cache hits
- `misses: int` — number of cache misses
- `max_size: int` — maximum capacity
- `current_size: int` — current number of entries
### `SharedCacheInfo` (shared backend)
Returned by `decorated_fn.cache_info()` when using `backend="shared"`.
- `hits: int` — number of cache hits
- `misses: int` — number of cache misses
- `max_size: int` — maximum capacity
- `current_size: int` — current number of entries
- `oversize_skips: int` — calls where key or value exceeded size limits
### Methods on decorated functions
- `decorated_fn.cache_info()` — returns `CacheInfo` or `SharedCacheInfo`
- `decorated_fn.cache_clear()` — removes all entries and resets counters
## Usage examples
### Basic caching
```python
from warp_cache import cache
@cache(max_size=256)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(80) # computed and cached
fibonacci(80) # instant cache hit
print(fibonacci.cache_info())
# CacheInfo(hits=78, misses=81, max_size=256, current_size=81)
```
### Migrating from functools.lru_cache
```python
# Before
from functools import lru_cache
@lru_cache(maxsize=128)
def compute(x, y):
return x + y
# After
from warp_cache import cache
@cache(max_size=128)
def compute(x, y):
return x + y
```
### TTL (time-to-live)
```python
@cache(max_size=128, ttl=60.0) # entries expire after 60 seconds
def get_config(name):
return load_from_database(name)
```
### Async functions
Async functions are detected automatically. Cache hits return instantly without awaiting.
```python
import asyncio
from warp_cache import cache
@cache(max_size=256)
async def fetch_user(user_id: int) -> dict:
await asyncio.sleep(0.1) # simulate I/O
return {"id": user_id}
async def main():
user = await fetch_user(42) # miss — awaits the coroutine
user = await fetch_user(42) # hit — instant return
```
### Thread safety
The cache is safe to use from multiple threads with no additional locking:
```python
from concurrent.futures import ThreadPoolExecutor
from warp_cache import cache
@cache(max_size=256)
def work(x):
return x * x
with ThreadPoolExecutor(max_workers=8) as pool:
results = list(pool.map(work, range(100)))
```
### Shared memory backend (cross-process)
Cached data is shared across processes via mmap. Useful for Gunicorn workers, Celery tasks, or multiprocessing pools.
```python
from warp_cache import cache
@cache(max_size=1024, backend="shared")
def get_embedding(text: str) -> list[float]:
# computed once, shared across all worker processes
return model.encode(text)
```
Shared backend details:
- Keys and values are serialized with pickle (fast-path for primitives)
- File location: `/dev/shm/` on Linux, `$TMPDIR/warp_cache/` on macOS
- Not available on Windows (`backend="memory"` works everywhere)
- Monitor oversize skips: `fn.cache_info().oversize_skips`
## Platform support
| Platform | `backend="memory"` | `backend="shared"` |
|----------------------------|---------------------|---------------------|
| Linux (x86_64, aarch64) | Yes | Yes |
| macOS (x86_64, arm64) | Yes | Yes |
| Windows (x86_64) | Yes | No |