-
Notifications
You must be signed in to change notification settings - Fork 596
[VL] Add ANSI mode support for Spark CAST(NumericType as integral) #11854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -239,7 +239,21 @@ public static AggregateFunctionNode makeAggregateFunction( | |
|
|
||
| public static CastNode makeCast( | ||
| TypeNode typeNode, ExpressionNode expressionNode, boolean isTryCast) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inconsistency between 3-arg and 4-arg makeCast overloads Problem: The backward-compatible 3-arg overload maps Evidence: // 4-arg: non-TRY, non-ANSI -> UNSPECIFIED (0) Suggested Fix: Align the 3-arg overload to also use |
||
| return new CastNode(typeNode, expressionNode, isTryCast); | ||
| // Backward-compatible: isTryCast=true → RETURN_NULL(1), false → THROW_EXCEPTION(2) | ||
| return new CastNode(typeNode, expressionNode, isTryCast ? 1 : 2); | ||
| } | ||
|
|
||
| public static CastNode makeCast( | ||
| TypeNode typeNode, ExpressionNode expressionNode, boolean isTryCast, boolean isAnsiCast) { | ||
| int failureBehavior; | ||
| if (isTryCast) { | ||
| failureBehavior = 1; // RETURN_NULL | ||
| } else if (isAnsiCast) { | ||
| failureBehavior = 2; // THROW_EXCEPTION | ||
| } else { | ||
| failureBehavior = 0; // UNSPECIFIED (legacy) | ||
| } | ||
| return new CastNode(typeNode, expressionNode, failureBehavior); | ||
| } | ||
|
|
||
| public static StringMapNode makeStringMap(Map<String, String> values) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -104,11 +104,6 @@ class VeloxTestSettings extends BackendTestSettings { | |
| .exclude( | ||
| "Process Infinity, -Infinity, NaN in case insensitive manner" // +inf not supported in folly. | ||
| ) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unrelated exclusion removal - "cast from timestamp II" removed only in spark35 Problem: This spark35 VeloxTestSettings change removes the Investigation Needed: Is this removal intentional? If so, please add a note in the PR description explaining why it's included. If accidental, please revert this line and handle it in a separate PR. |
||
| .exclude("cast from timestamp II") // Rewrite test for Gluten not supported with ANSI mode | ||
| .exclude("ANSI mode: Throw exception on casting out-of-range value to byte type") | ||
| .exclude("ANSI mode: Throw exception on casting out-of-range value to short type") | ||
| .exclude("ANSI mode: Throw exception on casting out-of-range value to int type") | ||
| .exclude("ANSI mode: Throw exception on casting out-of-range value to long type") | ||
| .exclude("cast from invalid string to numeric should throw NumberFormatException") | ||
| .exclude("SPARK-26218: Fix the corner case of codegen when casting float to Integer") | ||
| // Set timezone through config. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant config constant - ANSI mode is already propagated
Problem: This new constant
kVeloxSparkAnsiModeEnabledduplicates the existingkAnsiEnabled = "spark.sql.ansi.enabled"already defined incpp/core/config/GlutenConfig.h. Furthermore, the corresponding config propagation added inWholeStageResultIterator.ccreferencesvelox::core::QueryConfig::kSparkAnsiModeEnabled, which does not exist in the currently pinned Velox version (the existing constant iskSparkAnsiEnabled). This is the root cause of thebuild-native-lib-centos-7CI failure.Evidence:
`cpp
// Already exists on main (GlutenConfig.h):
const std::string kAnsiEnabled = "spark.sql.ansi.enabled";
// Already exists on main (WholeStageResultIterator.cc line 576):
configs[velox::core::QueryConfig::kSparkAnsiEnabled] =
veloxCfg_->getstd::string(kAnsiEnabled, "false");
// This PR adds (duplicate + wrong constant name):
const std::string kVeloxSparkAnsiModeEnabled = "spark.sql.ansi.enabled";
configs[velox::core::QueryConfig::kSparkAnsiModeEnabled] = ... // does not exist!
`
Suggested Fix: Remove both C++ changes entirely (this file and the
WholeStageResultIterator.ccchange). The existing infrastructure inGlutenConfig.halready handles ANSI mode propagation to Velox.