-
Notifications
You must be signed in to change notification settings - Fork 4
Add mysql and hdfs sources support #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
usernameisnull
commented
Feb 5, 2026
- Added support for DATABASE and HADOOP.
- Moved the used secret keys into the pkg directory.
- Updated the corresponding .proto file and Dockerfile.
Summary of ChangesHello @usernameisnull, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the data loading capabilities by introducing support for DATABASE and HADOOP data sources. It also provides more granular control over resource allocation during data warm-up phases and improves code organization by centralizing secret key definitions. These changes aim to make the system more versatile and robust in handling diverse data integration scenarios. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for DATABASE and HADOOP as new dataset source types, which is a significant enhancement. The changes are extensive, touching API definitions, controller logic, the data loader, and Dockerfiles. I appreciate the refactoring of secret key definitions into a shared pkg directory. However, my review has uncovered several critical and high-severity issues that must be addressed. These include a critical SQL injection vulnerability, a bug in the data-loader.Dockerfile that will break the build, and a lack of validation for required options that could lead to runtime panics. I've provided detailed comments and suggestions for each issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for two new datasource types (DATABASE and HADOOP) to the dataset management system. The changes include moving SecretKey constants from the internal package to pkg/utils for external accessibility, implementing new data loaders for MySQL databases and HDFS, updating the controller to handle these new types, and modifying the Dockerfile to include necessary dependencies (MySQL client and Hadoop client).
Changes:
- Added DATABASE and HADOOP as new supported datasource types across API definitions, CRDs, and internal type systems
- Moved SecretKey type and constants from internal/pkg/datasources to pkg/utils for external code access
- Implemented ModelDatabaseLoader for MySQL database exports with batch processing and CSV output
- Implemented ModelHadoopLoader for HDFS data retrieval
- Updated data-loader Dockerfile to include MySQL 8.4 client, Hadoop 3.4.2 client, and Java runtime dependencies
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/utils/secret_key.go | New file defining SecretKey type and constants, moved from internal package for external accessibility |
| internal/pkg/datasources/types.go | Added TypeDatabase and TypeHadoop to supported datasource types |
| internal/pkg/datasources/datasource_hadoop.go | New Hadoop datasource loader implementation for HDFS file retrieval |
| internal/pkg/datasources/datasource_database.go | New MySQL database loader with batch export to CSV functionality |
| internal/pkg/datasources/credentials.go | Updated to use SecretKey from pkg/utils instead of local definition |
| internal/controller/dataset/dataset_controller.go | Added DATABASE and HADOOP to preload support, implemented changeDefinitionForHadoop function for Hadoop-specific job configuration |
| internal/cmd/dataloader/root.go | Added case handlers for DATABASE and HADOOP datasource types |
| data-loader.Dockerfile | Added MySQL client and Hadoop client dependencies, included Java runtime for Hadoop |
| config/crd/bases/dataset.baizeai.io_datasets.yaml | Added DATABASE and HADOOP to CRD enum values and documentation |
| api/dataset/v1alpha1/dataset_types.go | Added DatasetTypeDatabase and DatasetTypeHadoop constants, updated validation enums |
| Dockerfile | Minor formatting change (AS keyword capitalization) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
基本没啥问题,主要就是AI提出的那几个地方需要添加更多的校验就好了 |
c669d6d to
214bfb6
Compare
ok |
214bfb6 to
e170bc8
Compare
e170bc8 to
3128fdf
Compare
978d5f3 to
f2cf403
Compare
kebe7jun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, @yyzxw PTAL.
|
/lgtm |