Datafusion codec - adding libs , data source plugin and data source aware plugin #37
Datafusion codec - adding libs , data source plugin and data source aware plugin #37bharath-techie wants to merge 3 commits intofeature/datafusionfrom
Conversation
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: bharath-techie <bharath78910@gmail.com>
Signed-off-by: bharath-techie <bharath78910@gmail.com>
| * Represents a stream of record batches from a DataFusion query execution. | ||
| * This interface provides access to query results in a streaming fashion. | ||
| */ | ||
| public interface RecordBatchStream extends AutoCloseable { |
There was a problem hiding this comment.
Maybe this can be generic typed RecordBatchStream<T>
There was a problem hiding this comment.
Yes makes sense. I had a todo in CSV RBS to refactor a bit
| /** | ||
| * Create a new session context for query execution. | ||
| * | ||
| * @param globalRuntimeEnvId the global runtime environment ID | ||
| * @return a CompletableFuture containing the session context ID | ||
| */ | ||
| CompletableFuture<Long> createSessionContext(long globalRuntimeEnvId); | ||
|
|
||
| /** | ||
| * Execute a Substrait query plan. | ||
| * | ||
| * @param sessionContextId the session context ID | ||
| * @param substraitPlanBytes the serialized Substrait query plan | ||
| * @return a CompletableFuture containing the result stream | ||
| */ | ||
| CompletableFuture<RecordBatchStream> executeSubstraitQuery(long sessionContextId, byte[] substraitPlanBytes); | ||
|
|
||
| /** | ||
| * Close a session context and free associated resources. | ||
| * | ||
| * @param sessionContextId the session context ID to close | ||
| * @return a CompletableFuture that completes when the context is closed | ||
| */ | ||
| CompletableFuture<Void> closeSessionContext(long sessionContextId); |
There was a problem hiding this comment.
How is this specific to the data source? Isn't this more around query execution?
|
|
||
| use datafusion::prelude::*; | ||
| use datafusion::execution::context::SessionContext; | ||
| use std::collections::HashMap; | ||
| use std::sync::Arc; | ||
| use anyhow::Result; | ||
|
|
||
| /// Manages DataFusion session contexts | ||
| pub struct SessionContextManager { | ||
| contexts: HashMap<*mut SessionContext, Arc<SessionContext>>, | ||
| next_runtime_id: u64, | ||
| } |
There was a problem hiding this comment.
We want to avoid data fusion dependencies in dataformat-csv?
| opensearchplugin { | ||
| name = 'dataformat-csv' | ||
| description = 'CSV data format plugin for OpenSearch DataFusion' | ||
| classname = 'org.opensearch.datafusion.csv.CsvDataFormatPlugin' |
There was a problem hiding this comment.
We want to decouple the package namespacing?
There was a problem hiding this comment.
Yes , i'll work with mohit , this whole dataformat plugin will probably be moved/designed as something that can work for both query and indexing. I have todos in plugin class for the same.
| [dependencies] | ||
| # DataFusion dependencies | ||
| datafusion = "49.0.0" | ||
| datafusion-substrait = "49.0.0" |
There was a problem hiding this comment.
We should avoid this dependency?
8d0dfb6 to
3d06c82
Compare
0d2f104 to
4987cb8
Compare
31f431b to
53e2fa9
Compare
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.