First I wanted to say that this project is great and we have had great success with using this project. We have found a problem while using the unnest and left join under the same table in BigQuery.
Problem description
We have a table in BigQuery that looks like
create table test_data (
id string not null,
location_id string not null,
fields array<struct<name string not null>>
)
I wanted to do a left join unnest(fields) to retrieve all unique counts of location_id along with counts of fields.name using query like
select
count(distinct location_id) as `location`,
count(f.name) as `fields`
from `test_data`
left join unnest(fields) as f;
I found that this query returns an unexpected data when being run against bigquery-emulator, but when the same dataset is populated on BigQuery, the result is as I was expected.
After some investigation I found that bigquery-emulator is not treating left join as BigQuery does. So the query above and this query below
select
count(distinct location_id) as `location`,
count(f.name) as `fields`
from `test_data`, unnest(fields) as f;
behaves the same when run against bigquery-emulator, which I was not expecting it to be correct.
| Platform |
Query joining expression |
location |
fields |
| BigQuery |
explicit left join |
3 |
3 |
| BigQuery |
implicit inner join |
2 |
3 |
| bigquery-emulator |
explicit left join |
2 |
3 |
| bigquery-emulator |
implicit inner join |
2 |
3 |
The seed data being used in the project is here
[
{ "id": "1", "location_id": "abc", "fields": [{ "name": "a" }] },
{ "id": "2", "location_id": "abc", "fields": [{ "name": "b" }] },
{ "id": "3", "location_id": "def", "fields": [] },
{ "id": "4", "location_id": "def", "fields": [] },
{ "id": "5", "location_id": "ghi", "fields": [] },
{ "id": "6", "location_id": "ghi", "fields": [{ "name": "b" }] }
]
It would be cool if I could get this problem fixed, or else you could point me to where to implement and send patches to fix this problem. Thank you.
First I wanted to say that this project is great and we have had great success with using this project. We have found a problem while using the
unnestandleft joinunder the same table in BigQuery.Problem description
We have a table in BigQuery that looks like
I wanted to do a
left join unnest(fields)to retrieve all unique counts oflocation_idalong with counts offields.nameusing query likeI found that this query returns an unexpected data when being run against bigquery-emulator, but when the same dataset is populated on BigQuery, the result is as I was expected.
After some investigation I found that bigquery-emulator is not treating
left joinas BigQuery does. So the query above and this query belowbehaves the same when run against bigquery-emulator, which I was not expecting it to be correct.
locationfieldsleft joininner joinleft joininner joinThe seed data being used in the project is here
[ { "id": "1", "location_id": "abc", "fields": [{ "name": "a" }] }, { "id": "2", "location_id": "abc", "fields": [{ "name": "b" }] }, { "id": "3", "location_id": "def", "fields": [] }, { "id": "4", "location_id": "def", "fields": [] }, { "id": "5", "location_id": "ghi", "fields": [] }, { "id": "6", "location_id": "ghi", "fields": [{ "name": "b" }] } ]It would be cool if I could get this problem fixed, or else you could point me to where to implement and send patches to fix this problem. Thank you.