Sync content from Canvas LMS to local storage with incremental updates.
This project pulls course content (assignments, pages, files, discussions, and optional JSON reports) from Canvas and stores it in a local folder structure. It is local-storage-only and optimized to skip unchanged resources on repeat runs.
- Interactive course selection (
all, specific numbers, orlastselection) - Incremental sync with timestamp-based change detection
- Assignment export to Markdown (including rubric/details)
- Page export to Markdown (including page body)
- Discussion export to Markdown plus optional course-level discussion JSON
- Linked file discovery from assignments/pages/discussions (
/files/{id}links) - PDF handling:
- Saves original PDF files
- Extracts PDF content to
*_pdf.mdusingopendataloader_pdf
- Optional course reports (JSON): announcements, quizzes, enrollments, calendar events, groups, analytics, gradebook history, submissions summary
- Optional global inbox conversations export (
Conversations/conversations.json) - Endpoint auto-disable for unavailable Canvas APIs (HTTP 403/404), persisted to config
- Python 3.10+
- Java 11+ (required at runtime for PDF extraction flow)
- Canvas API token with access to the courses you want to sync
If Java is missing, the app exits before sync starts.
- Create/activate a virtual environment (recommended)
- Install dependencies:
pip install -r requirements.txt- Create your config file from the example and update values:
copy config.ini.example config.iniConfigure config.ini.
API_URL: Canvas base URL (example:https://yourschool.instructure.com)API_KEY: Canvas API token
STORAGE_TYPE: must belocalin this projectLOCAL_ROOT_DIR: root directory for synced output (example:./canvas_sync)FORCE_REGENERATE_ASSIGNMENTS:true/false; whentrue, assignment Markdown is regenerated even if unchanged
COURSE_IDS: comma-separated course IDs; managed automatically by the app
REQUEST_TIMEOUT(default20)MAX_RETRIES(default3)BACKOFF_FACTOR(default0.5)CANVAS_PER_PAGE(default100)HTTP_POOL_MAXSIZE(default20)
Toggle optional exports with true/false:
EXPORT_ANNOUNCEMENTS(defaulttrue)EXPORT_DISCUSSIONS(defaulttrue)EXPORT_QUIZZES(defaulttrue)EXPORT_ENROLLMENTS(defaulttrue)EXPORT_CALENDAR_EVENTS(defaulttrue)EXPORT_GROUPS(defaulttrue)EXPORT_ANALYTICS_ACTIVITY(defaulttrue)EXPORT_GRADEBOOK_HISTORY(defaulttrue)EXPORT_SUBMISSIONS_SUMMARY(defaultfalse)EXPORT_INBOX_CONVERSATIONS(defaultfalse)
If quizzes/analytics/gradebook endpoints return 403/404, the corresponding export can be auto-disabled and persisted to config.ini.
Run:
python main.pyYou will be prompted to choose courses:
- Enter numbers like
1,3,5 - Enter
all - Enter
lastto reuse previous selection - Enter
quitto exit
At the end of the run, the script prints a summary and waits for Enter before exiting.
Under LOCAL_ROOT_DIR, each course gets its own folder. Typical layout:
canvas_sync/
Course Name/
Assignments/
Assignment A/
Assignment A.md
linked_file.ext
Discussions/
Topic Title/
Topic Title.md
linked_file.ext
Reports/
announcements.json
discussion_topics.json
quizzes.json
enrollments.json
calendar_events.json
groups.json
analytics_activity.json
gradebook_history.json
submissions_summary.json
Page Title/
Page Title.md
linked_file.ext
SomeFile.pdf
SomeFile_pdf.md
Conversations/
conversations.json
Exact files depend on what exists in Canvas and which exports are enabled.
The sync is designed to avoid unnecessary writes/downloads:
- Existing local metadata is checked before saving resources
- Change detection is primarily timestamp-driven (
updated_atvs local mtime) - Linked files discovered multiple times in one run are deduplicated by Canvas file ID
- If a PDF is unchanged but its extracted
*_pdf.mdis missing, extraction is attempted
Run the tool twice in a row to verify unchanged resources are skipped.
401 Unauthorized- Verify
API_KEYandAPI_URLinconfig.ini
- Verify
- No courses listed
- Check token permissions and whether courses are date-restricted
403/404on optional reports- Some institutions disable specific endpoints; auto-disable may be applied for that export
- Java check failure at startup
- Install Java 11+ and ensure it is on
PATH
- Install Java 11+ and ensure it is on
- Slow sync
- Adjust
[PERFORMANCE]values (CANVAS_PER_PAGE,HTTP_POOL_MAXSIZE, retries/timeouts)
- Adjust
- Storage backends other than local filesystem are not supported in this codebase.
- A temporary download folder is used during sync and cleaned up at the end.
- The script starts a background hybrid server process for PDF extraction and stops it on completion.