Skip to content

Results of implicit inner join and explicit left join is not the same between BigQuery and the emulator #436

@grindarius

Description

@grindarius

First I wanted to say that this project is great and we have had great success with using this project. We have found a problem while using the unnest and left join under the same table in BigQuery.

Problem description

We have a table in BigQuery that looks like

create table test_data (
  id string not null,
  location_id string not null,
  fields array<struct<name string not null>>
)

I wanted to do a left join unnest(fields) to retrieve all unique counts of location_id along with counts of fields.name using query like

select
  count(distinct location_id) as `location`,
  count(f.name) as `fields`
from `test_data`
left join unnest(fields) as f;

I found that this query returns an unexpected data when being run against bigquery-emulator, but when the same dataset is populated on BigQuery, the result is as I was expected.

After some investigation I found that bigquery-emulator is not treating left join as BigQuery does. So the query above and this query below

select
  count(distinct location_id) as `location`,
  count(f.name) as `fields`
from `test_data`, unnest(fields) as f;

behaves the same when run against bigquery-emulator, which I was not expecting it to be correct.

Platform Query joining expression location fields
BigQuery explicit left join 3 3
BigQuery implicit inner join 2 3
bigquery-emulator explicit left join 2 3
bigquery-emulator implicit inner join 2 3

The seed data being used in the project is here

[
	{ "id": "1", "location_id": "abc", "fields": [{ "name": "a" }] },
	{ "id": "2", "location_id": "abc", "fields": [{ "name": "b" }] },
	{ "id": "3", "location_id": "def", "fields": [] },
	{ "id": "4", "location_id": "def", "fields": [] },
	{ "id": "5", "location_id": "ghi", "fields": [] },
	{ "id": "6", "location_id": "ghi", "fields": [{ "name": "b" }] }
]

It would be cool if I could get this problem fixed, or else you could point me to where to implement and send patches to fix this problem. Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions