Commit 4f11d26
fix: bypass raw preprocessing in API for pre-engineered data
The /predict endpoint was passing already feature-engineered data through
the raw-data preprocessing pipeline, which caused: (1) drop_duplicates
removing valid rows, (2) TRAIN_FEATURE_COLUMNS always being None due to
import-before-download ordering, and (3) lat/lng missing from schema
alignment. Now the endpoint loads the model once at startup, derives
expected feature names from the booster directly, and uses reindex to
align without any preprocessing.
Separately, regenerated feature_engineered_holdout.csv with lat/lng
(previously dropped) and city_encoded naming matching the trained model,
uploaded to S3.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 3ced33d commit 4f11d26
1 file changed
Lines changed: 13 additions & 19 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
12 | | - | |
| 11 | + | |
13 | 12 | | |
14 | 13 | | |
15 | 14 | | |
| |||
27 | 26 | | |
28 | 27 | | |
29 | 28 | | |
30 | | - | |
31 | 29 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
37 | 33 | | |
38 | 34 | | |
39 | 35 | | |
| |||
45 | 41 | | |
46 | 42 | | |
47 | 43 | | |
48 | | - | |
| 44 | + | |
49 | 45 | | |
50 | 46 | | |
51 | 47 | | |
52 | 48 | | |
53 | | - | |
54 | | - | |
55 | | - | |
| 49 | + | |
56 | 50 | | |
57 | 51 | | |
58 | 52 | | |
59 | 53 | | |
60 | 54 | | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | 55 | | |
65 | 56 | | |
66 | 57 | | |
67 | 58 | | |
68 | | - | |
| 59 | + | |
69 | 60 | | |
70 | | - | |
71 | | - | |
72 | | - | |
| 61 | + | |
| 62 | + | |
73 | 63 | | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
74 | 68 | | |
75 | 69 | | |
76 | 70 | | |
| |||
0 commit comments