Add streaming bulk soil analysis uploads with batching#459
Conversation
|
WalkthroughAdds a streaming NDJSON bulk soil-analysis upload flow with batching (10 files / 20MB), RD New → WGS84 coordinate transformation, per-file progress and error streaming, expanded analysis fields (NH4/NO3, depth, matchReason), interactive date editing in review UI, and safer Dropzone DataTransfer handling. Changes
Sequence DiagramsequenceDiagram
participant Client as Client (Browser)
participant BulkForm as Bulk Upload Form
participant API as /api/soil-analysis/extract
participant NMI as NMI Integration
participant Remote as NMI API
Client->>BulkForm: Select files & start upload
BulkForm->>API: POST FormData (files)
API->>API: Authenticate session
API->>API: Extract files from FormData
API-->>BulkForm: Begin NDJSON stream (response)
par Concurrent file processing (limit: 10)
loop For each uploaded file
API->>NMI: extractBulkSoilAnalyses(file)
NMI->>NMI: Split into batches (<=10 files, <=20MB)
par Per-batch requests
NMI->>Remote: POST batch (auth)
Remote-->>NMI: Batch results
end
NMI->>NMI: Validate, transform coords (EPSG:28992→4326), map depths/nutrients
NMI-->>API: Aggregated analyses for file
API-->>BulkForm: NDJSON line { success, filename, analyses } or error line
BulkForm->>BulkForm: Parse NDJSON line, update per-file progress/UI
end
end
alt All files processed
BulkForm->>BulkForm: Invoke onSuccess with accumulated analyses
else Any errors
BulkForm->>Client: Show toast/error per-file
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## development #459 +/- ##
============================================
Coverage 88.04% 88.04%
============================================
Files 91 91
Lines 4625 4625
Branches 1494 1494
============================================
Hits 4072 4072
Misses 553 553
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Fix all issues with AI agents
In `@fdm-app/app/components/blocks/soil/bulk-upload-form.tsx`:
- Around line 67-96: The loop that reads NDJSON from reader may leave a
partial/complete last line in buffer when done; after the while loop ends, check
if buffer.trim() is non-empty and run the same JSON.parse and result handling
(increment completedFiles, push analyses into allResults, show toast.error for
result.error, call setCurrentFile and update setUploadProgress) with try/catch
as done inside the for-loop; use the same variables/logic (buffer, decoder,
reader handling, allResults, completedFiles, totalFiles, setCurrentFile,
setUploadProgress) so leftover NDJSON is processed robustly.
- Around line 97-110: The catch/finally flow currently can emit two error
toasts: one in the catch and another in finally when allResults is empty and
totalFiles > 0; add a boolean flag (e.g., errorOccurred or hadError) scoped to
the bulkUpload handler, set it to true inside the catch block when a toast is
shown, and then in the finally block only show the "Geen analyses kunnen
verwerken" toast if errorOccurred is false and allResults.length === 0 &&
totalFiles > 0; keep the existing calls to setIsUploading(false),
setCurrentFile(null) and onSuccess(allResults) unchanged (call onSuccess only
when allResults.length > 0).
In `@fdm-app/app/components/blocks/soil/bulk-upload-review.tsx`:
- Around line 205-286: The cell callback in your column definition illegally
calls hooks (useState) — extract the entire inline cell renderer into a new
top-level React component (e.g., DateCell) and move the local state and handlers
(the useState calls, onDateSelect, onInputBlur, inputValue state, setOpen,
setInputValue) there; have the column's cell return <DateCell ...props/> passing
row.original.id, dates[row.original.id], handleDateChange, endMonth, nl,
parseDateText, format, isValid and any UI components (Input, Popover, Calendar)
as needed; ensure DateCell is a function component (not nested) so hooks are
called at the top level.
- Around line 123-132: The state initializer for dates uses format(new
Date(a.b_sampling_date), "yyyy-MM-dd") which will throw for unparseable dates;
update the initializer in the bulk-upload-review component to validate each date
before formatting (e.g., use date-fns isValid and parseISO or check new
Date(...) with isValid) and fall back to an empty string for invalid or missing
b_sampling_date values so the useState<Record<string,string>> setup (and related
dates / setDates handling) never throws; ensure you import and use isValid (or
equivalent) and only call format(...) when the date is valid.
In `@fdm-app/app/integrations/nmi.ts`:
- Around line 401-409: The b_date parsing silently creates Invalid Date when
parts are non-numeric; update the parsing block that reads field.b_date (the
code that splits into dateParts and uses Number.parseInt to build
soilAnalysis.b_sampling_date) to validate the parsed values before constructing
the Date: ensure each parsed value is a finite integer (e.g.,
Number.isFinite/Number.isInteger checks on the results of parseInt), that
month-1 is in 0–11 and day is in a valid 1–31 range (and year is a sensible
integer), and only assign soilAnalysis.b_sampling_date = new Date(year, month,
day) when validation passes; if validation fails, do not assign the invalid Date
(set null/undefined or log/warn). Apply the same validation/failure-handling
change to the extractSoilAnalysis date parsing block referenced in the comment
(lines ~233-242) so both places reject non-numeric date parts rather than
producing Invalid Date.
- Around line 261-284: The code assigns a GeoJSON Point to soilAnalysis.location
but soilAnalysis is typed as { [key: string]: string | number | Date }, causing
a type mismatch; update the type for soilAnalysis to allow GeoJSON (e.g.,
include a union that permits { type: "Point"; coordinates: [number, number] } or
a GeoJSON type) or create a separate strongly-typed variable for the location
(e.g., locationPoint) and assign the GeoJSON there before storing any
serialized/value-compatible representation on soilAnalysis; adjust references to
soilAnalysis.location accordingly or serialize location to a string if you must
keep the original narrow index signature.
- Around line 457-473: The falsy check `if (lat && lon)` in the fallback
coordinate parsing will incorrectly drop valid coordinates equal to 0; update
the guard to explicitly check for null/undefined (e.g. `lat != null && lon !=
null`) before converting to numbers, then keep the existing numeric conversion
and `Number.isNaN` checks using `numericLat` and `numericLon` and assign
`soilAnalysis.location` as before; look for the block that references
`soilAnalysis.location`, `lat`, `lon`, `numericLat`, and `numericLon` to apply
this change.
In `@fdm-app/app/routes/farm.create`.$b_id_farm.$calendar.soil-analysis.bulk.tsx:
- Around line 79-80: The isSaving computation uses a case-sensitive check of
navigation.formMethod === "POST", which can fail if React Router normalizes to
lowercase; update the check in the isSaving definition to compare formMethod
case-insensitively (e.g., navigation.formMethod?.toLowerCase() === "post") while
still guarding navigation.state !== "idle", so replace the current comparison in
the isSaving const with a lowercase-safe comparison.
- Around line 217-220: Remove the dead server-side form handler that checks
formData.has("soilAnalysisFile") in the route file handling bulk soil analysis
uploads; specifically delete the conditional block that calls
extractBulkSoilAnalyses(...) and returns data({ analyses }) since
BulkSoilAnalysisUploadForm uses the streaming API (/api/soil-analysis/extract)
via fetch and this branch is unreachable. Ensure you only remove the unreachable
handler block (the formData.has("soilAnalysisFile") conditional and its call to
extractBulkSoilAnalyses) and do not change the streaming API endpoint or the
BulkSoilAnalysisUploadForm component.
🧹 Nitpick comments (7)
fdm-app/app/components/custom/dropzone.tsx (2)
180-191: Inconsistent defensive handling betweenhandleDropandhandleFileChange.The same
DataTransfersync pattern inhandleFileChange(lines 159–165) lacks thetry/catchguard andinstanceof Filefilter you've added here. If these defenses are needed for the drop path, they're equally needed for the change-event path—both write to the sameinputRef.current.files.♻️ Suggested: extract a shared helper and use it in both paths
+ const syncFilesToInput = (filesToSync: File[]) => { + if (!inputRef.current) return + try { + const container = new DataTransfer() + for (const f of filesToSync) { + if (f instanceof File) { + container.items.add(f) + } + } + inputRef.current.files = container.files + } catch (err) { + console.warn("Could not sync files to hidden input:", err) + } + } +Then in
handleFileChange(around line 159):- if (inputRef.current?.files) { - const container = new DataTransfer() - inputFiles.forEach((f) => { - container.items.add(f) - }) - inputRef.current.files = container.files - } + syncFilesToInput(inputFiles)And in
handleDrop(around line 179):- if (inputRef.current) { - try { - const container = new DataTransfer() - for (const f of finalFiles) { - if (f instanceof File) { - container.items.add(f) - } - } - inputRef.current.files = container.files - } catch (err) { - // Fallback or silent ignore if DataTransfer is restricted - console.warn("Could not sync files to hidden input:", err) - } - } + syncFilesToInput(finalFiles)
193-197: Silent empty catch — consider adding aconsole.warnfor consistency.The first
catchblock (line 190) logs a warning, but this one silently swallows the error. A briefconsole.warnwould keep the two guards consistent and aid debugging.Proposed fix
try { e.dataTransfer.clearData() } catch (err) { - // Ignore clearData errors + // clearData may throw in some browsers after drop + console.warn("Could not clear dataTransfer:", err) }fdm-app/app/routes/api.soil-analysis.extract.ts (2)
43-93: No timeout on NMI API calls; a hung upstream stalls the stream indefinitely.If the NMI API becomes unresponsive, a worker's
await extractBulkSoilAnalyses(...)never resolves. TheReadableStreamstays open, and the client's progress bar freezes without feedback. Consider wrapping each call with anAbortSignal.timeout()or a manual timeout wrapper to fail gracefully.
54-62: Consider usingextractSoilAnalysisfor single-file processing to avoid unnecessary batching overhead.
extractBulkSoilAnalysesruns batching logic (lines 307–331) even when processing a single file. For single-file cases,extractSoilAnalysisskips this overhead and is purpose-built for one file.
⚠️ Compatibility caveat:extractSoilAnalysisreturns a single object withoutidandfilenamefields thatextractBulkSoilAnalysesgenerates during mapping. If these fields are required downstream, you'd need to add field generation toextractSoilAnalysisor wrap the result here. If the effort outweighs the gains, the current approach is acceptable.fdm-app/app/integrations/nmi.ts (1)
333-380: All batches are fired concurrently viaPromise.all— no rate limiting.If there are many batches (e.g. 50+ files at ~5MB each → 13+ batches), all hit the NMI API simultaneously. Consider sequential processing or a concurrency limiter to avoid overwhelming the upstream API. The API route already handles concurrency for its per-file approach, but this code path is still reachable from the route actions (line 218 in both route files).
fdm-app/app/routes/farm.$b_id_farm.soil-analysis.bulk.tsx (1)
82-141: Duplicated matching logic between this file andfarm.create.$b_id_farm.$calendar.soil-analysis.bulk.tsx.The geometry-matching and name-matching logic in
handleUploadSuccess(lines 82-141) is nearly identical to the same function in the create route (lines 82-143). Consider extracting this into a shared utility to avoid divergence.fdm-app/app/routes/farm.create.$b_id_farm.$calendar.soil-analysis.bulk.tsx (1)
82-143: DuplicatedhandleUploadSuccessmatching logic — same as the other bulk route.This is a near-identical copy of the geometry + name matching in
farm.$b_id_farm.soil-analysis.bulk.tsx. Extract into a shared utility (e.g.matchAnalysesToFields) to avoid the two copies diverging over time.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
fdm-app/app/routes/farm.$b_id_farm.soil-analysis.bulk.tsx (1)
29-29:⚠️ Potential issue | 🟡 MinorRemove unused import:
extractBulkSoilAnalyses.The upload no longer uses the streaming API endpoint parser, so this import is not referenced anywhere in this file.
Diff
-import { extractBulkSoilAnalyses } from "~/integrations/nmi"
🤖 Fix all issues with AI agents
In `@fdm-app/app/components/blocks/soil/bulk-upload-form.tsx`:
- Around line 56-58: The current error path throws a generic Error when
response.ok is false, discarding any server-provided message; update the error
handling around the fetch response (the same block using the response variable
in bulk-upload-form.tsx) to read and include the body: attempt to parse
response.json() and use a message field if present, otherwise fallback to
response.text(), and include response.status in the thrown Error so the UI shows
the server-provided message (or raw text) instead of the hardcoded "Fout bij
starten van analyse".
In `@fdm-app/app/routes/farm`.$b_id_farm.soil-analysis.bulk.tsx:
- Around line 258-271: The destructuring that builds dbAnalysis (inside the
block extracting id, location, a_source, matchedFieldId, matchReason, filename,
b_name, b_sampling_date, a_depth_upper, a_depth_lower) currently omits the
UI-only property data, so data leaks into dbAnalysis; update that destructuring
to also extract and omit data (i.e. include data in the list of removed fields)
before passing dbAnalysis into addSoilAnalysis/...soilAnalysisData to ensure the
UI-only data property is not forwarded to the database insert.
In `@fdm-app/app/routes/farm.create`.$b_id_farm.$calendar.soil-analysis.bulk.tsx:
- Around line 256-269: The destructuring that prepares dbAnalysis is leaking the
raw parsed blob because the data property isn't stripped; update the
destructuring of analysis (the const { id, location, a_source, matchedFieldId,
matchReason, filename, b_name, b_sampling_date, a_depth_upper, a_depth_lower,
...dbAnalysis } = analysis) to also extract and discard data (e.g., include data
in the left-hand list) so that dbAnalysis does not include the data field before
saving to the DB.
🧹 Nitpick comments (10)
fdm-app/app/integrations/nmi.ts (2)
219-222: Index signatureanymakes the other union members meaningless.Adding
anyto the union (string | number | Date | any) collapses to justany, so the index signature provides no type safety at all. Consider defining a proper interface for the soil analysis shape, or at minimum use a more specific union that includes the GeoJSON point type.- const soilAnalysis: { - [key: string]: string | number | Date | any - location?: { type: "Point"; coordinates: [number, number] } - } = {} + const soilAnalysis: { + [key: string]: string | number | Date | { type: "Point"; coordinates: [number, number] } | undefined + location?: { type: "Point"; coordinates: [number, number] } + } = {}
393-485: Duplicated parsing logic betweenextractSoilAnalysisandextractBulkSoilAnalyses.Date parsing, depth parsing, and coordinate transformation are nearly identical in both functions (lines 236–290 vs 407–482). Extract shared helpers (e.g.
parseSamplingDate,parseDepth,parseLocation) to reduce duplication and keep behavior consistent.fdm-app/app/components/blocks/soil/bulk-upload-form.tsx (2)
33-134: Consider adding anAbortControllerto cancel the in-flight fetch when the component unmounts.If the user navigates away (e.g., via browser back) while the stream is being read, the
fetchand its reader will continue running in the background. This can cause state updates on an unmounted component and waste network resources.🔧 Sketch: wire up AbortController
+ const [abortController, setAbortController] = useState<AbortController | null>(null) + const handleUpload = async () => { if (files.length === 0) return setIsUploading(true) setUploadProgress(0) + const controller = new AbortController() + setAbortController(controller) + // ... try { const response = await fetch("/api/soil-analysis/extract", { method: "POST", body: formData, credentials: "same-origin", + signal: controller.signal, }) // ... } catch (error) { + if (error instanceof DOMException && error.name === 'AbortError') { + // User cancelled, ignore + return + } console.error("Bulk upload error:", error) toast.error(error instanceof Error ? error.message : "Upload mislukt") errorOccurred = true } finally { setIsUploading(false) + setAbortController(null) // ... } }
21-21:onSuccesscallback acceptsany[]— consider typing it withProcessedAnalysis[].The
ProcessedAnalysistype is already defined and used in sibling components. Typing this prop avoids silent shape mismatches between the upload form and the review component.- onSuccess: (data: any[]) => void + onSuccess: (data: ProcessedAnalysis[]) => void(Would also require importing
ProcessedAnalysisfrombulk-upload-review.)fdm-app/app/components/blocks/soil/bulk-upload-review.tsx (2)
108-117: Invalid text input is silently ignored — user gets no feedback.When
onInputBlurfires and the typed text is neither empty nor a parseable date, the stale text remains in the input while no date is propagated. The user may not realize their edit had no effect. Consider resetting to the last valid date or showing a brief validation hint.
242-248:handleSavespreads the originalanalyseswith updated dates but doesn't reflect user-changed field matches.
updatedAnalysesis built from the originalanalysesarray with onlyb_sampling_dateoverridden. ThematchedFieldIdin each analysis object still holds the initial auto-matched value, not the user's manual selection. While thevalidMatchesarray carries the correct field mappings, this meansupdatedAnalysespassed toonSavecontains potentially stalematchedFieldIdvalues. If any downstream consumer trustsupdatedAnalyses[i].matchedFieldIdinstead ofmatches, it would use the wrong field.Currently the server action uses
matchesfor the field ID (line 278 in the route files), so this doesn't break persistence. But it's a latent inconsistency that could cause confusion.🔧 Optional: sync matchedFieldId with user selections
const handleSave = () => { const updatedAnalyses = analyses.map((a) => ({ ...a, b_sampling_date: dates[a.id], + matchedFieldId: matches[a.id] || a.matchedFieldId, })) onSave(validMatches, updatedAnalyses) }fdm-app/app/routes/farm.$b_id_farm.soil-analysis.bulk.tsx (2)
82-141: Matching logic is duplicated between this file andfarm.create.$b_id_farm.$calendar.soil-analysis.bulk.tsx.The
handleUploadSuccessfunction (geometry matching → name matching → "both" logic) is nearly identical across both route files. If the matching algorithm needs to change, it must be updated in both places.Consider extracting a shared
matchAnalysesToFieldshelper.
226-286:Promise.allcan leave partial writes on failure.If the Nth
addSoilAnalysiscall throws (e.g., due to invalid depth), the preceding N-1 calls may have already committed. The user sees an error, but some analyses are silently persisted without a way to retry only the failed ones.This is worth noting as a known limitation. For a future improvement, consider wrapping the batch in a transaction or using
Promise.allSettledto collect per-analysis outcomes and surface partial success.fdm-app/app/routes/farm.create.$b_id_farm.$calendar.soil-analysis.bulk.tsx (2)
83-145: Matching logic is duplicated — same asfarm.$b_id_farm.soil-analysis.bulk.tsx.This is the same geometry + name matching code. A shared helper would eliminate this duplication.
24-24: Remove unused importextractBulkSoilAnalyses.This import is no longer used after the streaming API change.
-import { extractBulkSoilAnalyses } from "~/integrations/nmi"
| const fieldMatch = fields.find( | ||
| (field) => | ||
| field.b_name.toLowerCase().trim() === analysisName, | ||
| ) |
There was a problem hiding this comment.
so, its only a check if full name is exactly identical?
There was a problem hiding this comment.
Yes, but ignoring capitals
| <div className="flex items-center text-xs text-muted-foreground mt-1"> | ||
| <Microscope className="h-3 w-3 mr-1" /> | ||
| <span>{sourceLabel}</span> | ||
| <div className="flex flex-col gap-0.5 mt-1 text-xs text-muted-foreground"> |
There was a problem hiding this comment.
a more generic question: if you have for one field multiple analyses: do you store / show them all or only the recent ones?
There was a problem hiding this comment.
In this table I show all analyses. So if a pdf contains multiple (e.g. Nmin) it are multiple rows and those are seperatly stored. Also it is not a problem to upload multiple pdf's for the same field
| if (!Number.isNaN(numericX) && !Number.isNaN(numericY)) { | ||
| try { | ||
| const [lon, lat] = proj4("EPSG:28992", "EPSG:4326", [ | ||
| numericX, |
BoraIneviNMI
left a comment
There was a problem hiding this comment.
I don't see any issues, as far as I can test for.
Summary by CodeRabbit
New Features
Improvements
Closes #458