Add Dropbox connector integration#1870
Conversation
WalkthroughThis PR adds complete Dropbox OAuth connector support with backend token management, Dropbox API v2 file operations, OAuth flow refactoring, and frontend cloud picker UI. Environment variables, type definitions, and service startup behavior are updated to accommodate the new connector. ChangesDropbox Connector Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@frontend/components/cloud-picker/provider-handlers.ts`:
- Around line 510-526: The code performs an unbounded recursive Dropbox listing
via the RPC calls "files/list_folder" and "files/list_folder/continue", pushing
every entry into the entries array and rendering them all, which can block the
picker; add a hard cap (e.g. MAX_ENTRIES) and stop fetching/paginating once
entries.length >= MAX_ENTRIES (break out of the while and stop requesting
further pages), and change the initial list call to avoid recursive:true or
limit recursion depth if needed; also update the renderer that consumes entries
(the code that currently renders the collected entries) to detect truncation and
show a "show more" / "results truncated" indicator instead of trying to render
everything at once so the UI remains responsive.
- Around line 566-568: The webUrl construction uses entry.path_display directly
(webUrl: ... ? `https://www.dropbox.com/home${entry.path_display}`), which can
create invalid links for paths with spaces/special characters; change it to
URL-encode the path before concatenation (e.g., use
encodeURI(entry.path_display) so slashes are preserved) and keep the existing
fallback to "https://www.dropbox.com/home"; update the webUrl expression that
references entry.path_display accordingly.
In `@src/connectors/dropbox/connector.py`:
- Around line 266-269: The root listing currently returns a cursor
unconditionally; update the return so nextPageToken is only provided when there
are more results: change the call to _collect_folder_entries to receive whether
there are more results (either by modifying _collect_folder_entries to return
(entries, cursor, has_more) or to return (entries, cursor_if_has_more) as
suggested), then in the listing code (the block that calls
_collect_folder_entries and builds the {"files": ..., "nextPageToken": ...}
response) only set "nextPageToken" when has_more is true (or cursor_if_has_more
is non-null); keep using _to_file_info and filtering as-is and do not return a
cursor when there are no more pages.
- Around line 41-44: Add DROPBOX_OAUTH_CLIENT_ID and DROPBOX_OAUTH_CLIENT_SECRET
to config/settings.py (set them from os.getenv("DROPBOX_OAUTH_CLIENT_ID") and
os.getenv("DROPBOX_OAUTH_CLIENT_SECRET")), then in the connector replace direct
os.getenv usage by importing these two settings and use them when initializing
self.client_id and self.client_secret (instead of calling os.getenv with
CLIENT_ID_ENV_VAR/CLIENT_SECRET_ENV_VAR); keep existing fallback from
config.get(...) but remove any direct os.environ access in this module.
In `@src/connectors/dropbox/oauth.py`:
- Around line 85-93: The refresh_access_token function lacks exception handling
and can raise network/JSON/key errors; wrap the httpx.AsyncClient request and
subsequent response parsing in a try/except that catches httpx.RequestError,
httpx.HTTPStatusError, JSONDecodeError, KeyError and any other Exception, log a
concise, non-sensitive message (include status_code and a truncated
response.text only on non-200) and return False on any failure so callers (e.g.,
load_credentials/connector_token) treat it as an auth failure; specifically, in
refresh_access_token around the POST to TOKEN_ENDPOINT and the lines that call
response.json() and set self.token_data["token"], add the try/except, call
logger.warning or logger.exception with safe details, and ensure the function
returns False on error and True on success.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 28adb2ef-8a80-4a20-bc7c-88de74308cc5
📒 Files selected for processing (17)
.env.exampledocker-compose.ymlfrontend/app/api/mutations/useConnectConnectorMutation.tsfrontend/app/settings/_components/connectors-tab.tsxfrontend/app/upload/[provider]/page.tsxfrontend/components/cloud-picker/picker-header.tsxfrontend/components/cloud-picker/provider-handlers.tsfrontend/components/cloud-picker/types.tsfrontend/components/icons/dropbox-logo.tsxfrontend/lib/connectors/registry.tssrc/api/connectors.pysrc/app/container.pysrc/connectors/dropbox/__init__.pysrc/connectors/dropbox/connector.pysrc/connectors/dropbox/oauth.pysrc/connectors/registry.pysrc/services/auth_service.py
| let response = await this.rpc("files/list_folder", { | ||
| path: "", | ||
| recursive: true, | ||
| include_deleted: false, | ||
| include_has_explicit_shared_members: false, | ||
| include_mounted_folders: true, | ||
| include_non_downloadable_files: false, | ||
| limit: 2000, | ||
| }); | ||
| entries.push(...(response.entries || [])); | ||
|
|
||
| while (response.has_more) { | ||
| response = await this.rpc("files/list_folder/continue", { | ||
| cursor: response.cursor, | ||
| }); | ||
| entries.push(...(response.entries || [])); | ||
| } |
There was a problem hiding this comment.
Unbounded recursive listing can degrade picker responsiveness on real Dropbox accounts.
Line 512 enables full recursive traversal, and Lines 521-526 keep paginating until exhaustion, then Lines 443-488 render all entries in one DOM pass. Large accounts can cause long blocking fetch/render cycles and a near-unusable modal.
Proposed mitigation (cap + graceful truncation)
export class DropboxHandler {
+ private static readonly MAX_PICKER_ENTRIES = 1000;
@@
private async listDropboxEntries(): Promise<CloudFile[]> {
const entries: any[] = [];
@@
- while (response.has_more) {
+ while (
+ response.has_more &&
+ entries.length < DropboxHandler.MAX_PICKER_ENTRIES
+ ) {
response = await this.rpc("files/list_folder/continue", {
cursor: response.cursor,
});
entries.push(...(response.entries || []));
}
@@
- return entries
+ return entries
+ .slice(0, DropboxHandler.MAX_PICKER_ENTRIES)
.map((entry) => this.toCloudFile(entry))
.filter((file): file is CloudFile => file !== null);
}
}Also applies to: 443-488
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/components/cloud-picker/provider-handlers.ts` around lines 510 -
526, The code performs an unbounded recursive Dropbox listing via the RPC calls
"files/list_folder" and "files/list_folder/continue", pushing every entry into
the entries array and rendering them all, which can block the picker; add a hard
cap (e.g. MAX_ENTRIES) and stop fetching/paginating once entries.length >=
MAX_ENTRIES (break out of the while and stop requesting further pages), and
change the initial list call to avoid recursive:true or limit recursion depth if
needed; also update the renderer that consumes entries (the code that currently
renders the collected entries) to detect truncation and show a "show more" /
"results truncated" indicator instead of trying to render everything at once so
the UI remains responsive.
| webUrl: entry.path_display | ||
| ? `https://www.dropbox.com/home${entry.path_display}` | ||
| : "https://www.dropbox.com/home", |
There was a problem hiding this comment.
Dropbox file URLs should be encoded before concatenation.
Line 567 uses path_display directly in the URL. Paths with spaces/special characters can produce invalid or broken links.
Proposed fix
- webUrl: entry.path_display
- ? `https://www.dropbox.com/home${entry.path_display}`
+ webUrl: entry.path_display
+ ? `https://www.dropbox.com/home${encodeURI(entry.path_display)}`
: "https://www.dropbox.com/home",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/components/cloud-picker/provider-handlers.ts` around lines 566 -
568, The webUrl construction uses entry.path_display directly (webUrl: ... ?
`https://www.dropbox.com/home${entry.path_display}`), which can create invalid
links for paths with spaces/special characters; change it to URL-encode the path
before concatenation (e.g., use encodeURI(entry.path_display) so slashes are
preserved) and keep the existing fallback to "https://www.dropbox.com/home";
update the webUrl expression that references entry.path_display accordingly.
| self.client_id = config.get("client_id") or os.getenv(self.CLIENT_ID_ENV_VAR) | ||
| self.client_secret = config.get("client_secret") or os.getenv( | ||
| self.CLIENT_SECRET_ENV_VAR | ||
| ) |
There was a problem hiding this comment.
Direct os.getenv access violates coding guidelines.
Per project guidelines, config values must come from config/settings.py — the only place os.environ should be read. Add DROPBOX_OAUTH_CLIENT_ID and DROPBOX_OAUTH_CLIENT_SECRET to config/settings.py and import them here instead.
Suggested approach
In config/settings.py, add:
DROPBOX_OAUTH_CLIENT_ID = os.getenv("DROPBOX_OAUTH_CLIENT_ID")
DROPBOX_OAUTH_CLIENT_SECRET = os.getenv("DROPBOX_OAUTH_CLIENT_SECRET")Then in this file:
-import os
...
+from config.settings import DROPBOX_OAUTH_CLIENT_ID, DROPBOX_OAUTH_CLIENT_SECRET
...
- self.client_id = config.get("client_id") or os.getenv(self.CLIENT_ID_ENV_VAR)
- self.client_secret = config.get("client_secret") or os.getenv(
- self.CLIENT_SECRET_ENV_VAR
- )
+ self.client_id = config.get("client_id") or DROPBOX_OAUTH_CLIENT_ID
+ self.client_secret = config.get("client_secret") or DROPBOX_OAUTH_CLIENT_SECRET🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/connectors/dropbox/connector.py` around lines 41 - 44, Add
DROPBOX_OAUTH_CLIENT_ID and DROPBOX_OAUTH_CLIENT_SECRET to config/settings.py
(set them from os.getenv("DROPBOX_OAUTH_CLIENT_ID") and
os.getenv("DROPBOX_OAUTH_CLIENT_SECRET")), then in the connector replace direct
os.getenv usage by importing these two settings and use them when initializing
self.client_id and self.client_secret (instead of calling os.getenv with
CLIENT_ID_ENV_VAR/CLIENT_SECRET_ENV_VAR); keep existing fallback from
config.get(...) but remove any direct os.environ access in this module.
Source: Coding guidelines
| entries, cursor = await self._collect_folder_entries("", recursive=True, max_files=max_files) | ||
| files = [info for entry in entries if (info := self._to_file_info(entry))] | ||
| files = [f for f in files if not f.get("isFolder")] | ||
| return {"files": files, "nextPageToken": cursor} |
There was a problem hiding this comment.
nextPageToken returned without checking has_more for root listing.
When listing from root without selected IDs, the cursor is returned as nextPageToken even if there are no more results. This could cause clients to make unnecessary continuation calls.
Suggested fix
Modify _collect_folder_entries to also return the has_more flag, or track it here:
- entries, cursor = await self._collect_folder_entries("", recursive=True, max_files=max_files)
+ response = await self._list_folder("", recursive=True, limit=max_files)
+ entries = response.get("entries", [])
+ has_more = response.get("has_more", False)
+ cursor = response.get("cursor") if has_more else None
+
+ while has_more and (not max_files or len(entries) < max_files):
+ response = await self._list_continue(response["cursor"])
+ entries.extend(response.get("entries", []))
+ has_more = response.get("has_more", False)
+ cursor = response.get("cursor") if has_more else None
+
files = [info for entry in entries if (info := self._to_file_info(entry))]
files = [f for f in files if not f.get("isFolder")]
- return {"files": files, "nextPageToken": cursor}
+ return {"files": files, "nextPageToken": cursor}Or simpler — update _collect_folder_entries to return (entries, cursor_if_has_more).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/connectors/dropbox/connector.py` around lines 266 - 269, The root listing
currently returns a cursor unconditionally; update the return so nextPageToken
is only provided when there are more results: change the call to
_collect_folder_entries to receive whether there are more results (either by
modifying _collect_folder_entries to return (entries, cursor, has_more) or to
return (entries, cursor_if_has_more) as suggested), then in the listing code
(the block that calls _collect_folder_entries and builds the {"files": ...,
"nextPageToken": ...} response) only set "nextPageToken" when has_more is true
(or cursor_if_has_more is non-null); keep using _to_file_info and filtering
as-is and do not return a cursor when there are no more pages.
| async with httpx.AsyncClient(timeout=30) as client: | ||
| response = await client.post(self.TOKEN_ENDPOINT, data=payload) | ||
|
|
||
| if response.status_code != 200: | ||
| logger.warning("[Dropbox] Token refresh failed: %s", response.text) | ||
| return False | ||
|
|
||
| refreshed = response.json() | ||
| self.token_data["token"] = refreshed["access_token"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and display the relevant sections with line numbers
ls -la src/connectors/dropbox || true
echo "---- oauth.py (key ranges) ----"
python3 - <<'PY'
import itertools,sys,os
path="src/connectors/dropbox/oauth.py"
start=1
end=220
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f,1):
if 1<=i<=end:
print(f"{i:4d}:{line.rstrip()}")
PY
echo "---- connector.py (token/auth call paths) ----"
python3 - <<'PY'
path="src/connectors/dropbox/connector.py"
end=260
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f,1):
if 1<=i<=end:
print(f"{i:4d}:{line.rstrip()}")
PY
echo "---- src/api/connectors.py (Dropbox endpoint) ----"
python3 - <<'PY'
path="src/api/connectors.py"
end=260
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f,1):
if 1<=i<=end:
if 'dropbox' in line.lower() or i<60 or (70<=i<=260):
print(f"{i:4d}:{line.rstrip()}")
PY
# Search for refresh_access_token usages and any exception handling around it
echo "---- usages of refresh_access_token ----"
rg -n "refresh_access_token" -S src || true
echo "---- exception handling around is_authenticated/load_credentials ----"
rg -n "is_authenticated|load_credentials|_access_token" -S src/connectors/dropbox src/api || trueRepository: langflow-ai/openrag
Length of output: 29544
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- src/api/connectors.py around connector.authenticate() ----"
python3 - <<'PY'
import itertools
path="src/api/connectors.py"
start=860; end=1020
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f,1):
if start<=i<=end:
print(f"{i:4d}:{line.rstrip()}")
PY
echo "---- src/api/connectors.py around OAuth load_credentials calls near 1600-1750 ----"
python3 - <<'PY'
path="src/api/connectors.py"
start=1560; end=1760
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f,1):
if start<=i<=end:
print(f"{i:4d}:{line.rstrip()}")
PYRepository: langflow-ai/openrag
Length of output: 17423
Handle Dropbox OAuth refresh failures as auth failures (avoid 500s)
refresh_access_token() in src/connectors/dropbox/oauth.py has no guardrails around the httpx request or response.json() / access_token parsing; any network/JSON/key failure can bubble up through load_credentials() and cause 500 responses (e.g., connector_token doesn’t catch exceptions in the Dropbox branch). It also logs response.text on non-200, which is brittle.
Proposed fix
payload = {
"grant_type": "refresh_token",
"refresh_token": self.token_data["refresh_token"],
"client_id": self.client_id,
"client_secret": self.client_secret,
}
- async with httpx.AsyncClient(timeout=30) as client:
- response = await client.post(self.TOKEN_ENDPOINT, data=payload)
-
- if response.status_code != 200:
- logger.warning("[Dropbox] Token refresh failed: %s", response.text)
- return False
-
- refreshed = response.json()
- self.token_data["token"] = refreshed["access_token"]
+ try:
+ async with httpx.AsyncClient(timeout=30) as client:
+ response = await client.post(self.TOKEN_ENDPOINT, data=payload)
+ if response.status_code != 200:
+ logger.warning(
+ "[Dropbox] Token refresh failed with status %s",
+ response.status_code,
+ )
+ return False
+ refreshed = response.json()
+ except (httpx.HTTPError, ValueError) as exc:
+ logger.warning("[Dropbox] Token refresh request failed: %s", exc)
+ return False
+
+ access_token = refreshed.get("access_token")
+ if not access_token:
+ logger.warning("[Dropbox] Token refresh response missing access_token")
+ return False
+ self.token_data["token"] = access_token📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async with httpx.AsyncClient(timeout=30) as client: | |
| response = await client.post(self.TOKEN_ENDPOINT, data=payload) | |
| if response.status_code != 200: | |
| logger.warning("[Dropbox] Token refresh failed: %s", response.text) | |
| return False | |
| refreshed = response.json() | |
| self.token_data["token"] = refreshed["access_token"] | |
| try: | |
| async with httpx.AsyncClient(timeout=30) as client: | |
| response = await client.post(self.TOKEN_ENDPOINT, data=payload) | |
| if response.status_code != 200: | |
| logger.warning( | |
| "[Dropbox] Token refresh failed with status %s", | |
| response.status_code, | |
| ) | |
| return False | |
| refreshed = response.json() | |
| except (httpx.HTTPError, ValueError) as exc: | |
| logger.warning("[Dropbox] Token refresh request failed: %s", exc) | |
| return False | |
| access_token = refreshed.get("access_token") | |
| if not access_token: | |
| logger.warning("[Dropbox] Token refresh response missing access_token") | |
| return False | |
| self.token_data["token"] = access_token |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/connectors/dropbox/oauth.py` around lines 85 - 93, The
refresh_access_token function lacks exception handling and can raise
network/JSON/key errors; wrap the httpx.AsyncClient request and subsequent
response parsing in a try/except that catches httpx.RequestError,
httpx.HTTPStatusError, JSONDecodeError, KeyError and any other Exception, log a
concise, non-sensitive message (include status_code and a truncated
response.text only on non-200) and return False on any failure so callers (e.g.,
load_credentials/connector_token) treat it as an auth failure; specifically, in
refresh_access_token around the POST to TOKEN_ENDPOINT and the lines that call
response.json() and set self.token_data["token"], add the try/except, call
logger.warning or logger.exception with safe details, and ensure the function
returns False on error and True on success.
Source: Coding guidelines
| def _has_connector_oauth_credentials(connector_type: str) -> bool: | ||
| """Whether the requested data-source connector has its own OAuth env vars.""" | ||
| connector_class = get_connector_class(connector_type) | ||
| if not connector_class: | ||
| return False | ||
| client_key = getattr(connector_class, "CLIENT_ID_ENV_VAR", None) | ||
| secret_key = getattr(connector_class, "CLIENT_SECRET_ENV_VAR", None) | ||
| if not client_key or not secret_key: | ||
| return False | ||
| return bool(os.getenv(client_key) and os.getenv(secret_key)) |
There was a problem hiding this comment.
os.getenv violates coding guideline for src/**/*.py.
The coding guideline states config values must come from config/settings.py and os.environ should never be accessed elsewhere. This new function reads environment variables directly via os.getenv.
Consider moving these credential checks into config/settings.py (e.g., a helper function or lazy-loaded attributes) so this service can query whether credentials exist without directly accessing os.environ.
Source: Coding guidelines
Summary
Added Dropbox as a cloud connector.
What changed
.env.exampleand Docker Compose.Screen.Recording.2026-06-14.at.1.43.56.AM.mov
Uploading Screen Recording 2026-06-14 at 1.43.56 AM.mov…
Summary by CodeRabbit
Release Notes
New Features
Chores