Skip to content

Conversation

@GaneshPatil7517
Copy link
Contributor

@GaneshPatil7517 GaneshPatil7517 commented Jan 17, 2026

Which issue does this PR close?

Fixes #19768

Rationale for this change

Currently, ArrayAgg always produces an output with data type:

DataType::List(Arc::new(Field::new_list_field(
    arg_types[0].clone(),
    true,
)))

Copilot AI review requested due to automatic review settings January 17, 2026 11:29
@github-actions github-actions bot added the functions Changes to functions implementation label Jan 17, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to preserve the input field's nullability in the ArrayAgg aggregate function's return field. When the input field is non-nullable, the result field would be marked as non-nullable; when the input is nullable, the result field would remain nullable.

Changes:

  • Added a return_field method override to the ArrayAgg implementation that sets the result field's nullability to match the input field's nullability
  • Added a unit test to verify that the schema metadata correctly reflects the input field's nullability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from 668ce8c to b833445 Compare January 17, 2026 12:03
@github-actions github-actions bot added the core Core DataFusion crate label Jan 17, 2026
@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from 114f5bc to 2b2a41b Compare January 17, 2026 12:17
@github-actions github-actions bot added the spark label Jan 17, 2026
- Override return_field() to preserve input field nullability in list elements

- Add input_nullable field to all ArrayAgg accumulators

- List itself remains nullable (NULL for empty groups)

- Inner elements nullability now matches input field nullability
@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from 024785b to 24e57cb Compare January 17, 2026 17:47
- Separate state() from evaluate() in all three accumulators
- state() always uses nullable=true for inner elements (matches state_fields())
- evaluate() uses input_nullable to preserve nullability for final output
- Fix clippy: use std::slice::from_ref() instead of .clone() on FieldRef
- Fix array_sort return_type() to preserve input inner field nullability
- Fix clippy: use Arc::clone() instead of .clone() on FieldRef
- Update unnest_with_redundant_columns snapshot for non-nullable inner

fn state(&mut self) -> Result<Vec<ScalarValue>> {
Ok(vec![self.evaluate()?])
// State uses nullable inner elements to match state_fields() schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something we should be fixing in state_fields() instead of having this handling code

impl Accumulator for DistinctArrayAggAccumulator {
fn state(&mut self) -> Result<Vec<ScalarValue>> {
Ok(vec![self.evaluate()?])
// State uses nullable inner elements to match state_fields() schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

}

let mut result = vec![self.evaluate()?];
// State uses nullable inner elements to match state_fields() schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

- Make return_type() return error since return_field() is implemented
- Simplify comments in return_field()
- Revert array_sort changes (unrelated to original issue)
- Revert spark/collect.rs changes (UDAF return_field not fixed)
@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from 84c066f to 402927a Compare January 19, 2026 12:05
@GaneshPatil7517
Copy link
Contributor Author

ok ill update this all

@Jefffrey
Copy link
Contributor

@GaneshPatil7517 Please refrain from closing a discussion when no action has been taken or no comment was left. My comments regarding the state related functions don't seem to have been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance ArrayAgg to preserve input field nullability

2 participants