Skip to content

[ENG-27964] all blocking code change for hudi 1.1#13

Closed
Davis-Zhang-Onehouse wants to merge 3226 commits into
onehouseinc:masterfrom
Davis-Zhang-Onehouse:ENG-27964
Closed

[ENG-27964] all blocking code change for hudi 1.1#13
Davis-Zhang-Onehouse wants to merge 3226 commits into
onehouseinc:masterfrom
Davis-Zhang-Onehouse:ENG-27964

Conversation

@Davis-Zhang-Onehouse
Copy link
Copy Markdown

Change Logs

Describe context and summary for this change. Highlight if any code was copied.

Impact

Describe any public API or user-facing feature change or any performance impact.

Risk level: none | low | medium | high

Choose one. If medium or high, explain what verification was done to mitigate the risks.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Davis-Zhang-Onehouse and others added 30 commits March 4, 2025 20:51
…e#12781)

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
1. fix generating file id with wrong bucket index

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…ier (apache#12695)

Co-authored-by: Vova Kolmakov <kolmakov.vladimir@huawei.com>
Co-authored-by: Vova Kolmakov <wombatukun@apache.org>
…ng bloom filter (apache#12919)

* [HUDI-8768] Support bloom filter options when creating expr index using bloom filter

* add index options validation in test

* Refactoring and address more comments

improve test

* fix checkstyle

* Update hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/feature/index/TestExpressionIndex.scala

---------

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…pRecordBuffer (apache#12925)

* feat: pass compaction/merge related props to HoodieBaseFileGroupRecordBuffer

1. pass compaction/merge related props to HoodieBaseFileGroupRecordBuffer

Signed-off-by: TheR1sing3un <chaoyang@apache.org>

* fix: resolve multiple precombine-related configuration conflicts

1. resolve multiple precombine-related configuration conflicts
2. assume that precombine is based on table config

Signed-off-by: TheR1sing3un <chaoyang@apache.org>

* style: simplify lambda expression

1. simplify lambda expression

Signed-off-by: TheR1sing3un <chaoyang@apache.org>

* fix: fix the `record_key` and `_hoodie_record_key` are not mapped when the record is created in SparkDatasetTestUtils

1. fix the `record_key` and `_hoodie_record_key` are not mapped when the record is created in SparkDatasetTestUtils

Signed-off-by: TheR1sing3un <chaoyang@apache.org>

* rerun

* feat: Remove the default value for PAYLOAD_ORDERING_FIELD_PROP_KEY to avoid taking the default value from props as a valid configuration

1. Remove the default value for PAYLOAD_ORDERING_FIELD_PROP_KEY to avoid taking the default value from props as a valid configuration优化排序字段获取逻辑,移除默认值配置。

Signed-off-by: TheR1sing3un <chaoyang@apache.org>

* Update hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java

---------

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…ns (apache#12929)

* Avoid empty string rowkey to avoid failure of SimpleKeyGenerator initialization

* Address comments

* Address comments

* Address comments

* Add test for update

* Fix issues

---------

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
… schema (apache#12949)

1. Introduce JVM level caching for avro schema to reduce the cost of schema comparison.

NOTE: Use cache to cache references to the schema on key links where the schema may be created repeatedly.
This ensures that only one variable instance of the same Schema will be used during a JVM lifetime,
thus reducing the overhead of schema comparison on important io paths. For most of the cases, we only need to compare whether it is the same reference, there is no need to call the `Schema::equals` method.

2. Cache the frequently reused Schema on the IO code path.

---------

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
* feat: introduce schema pruning for delete record

NOTE: For the record we need to delete, we only need to read the `hoodie_meta_fields`, `record_keys` and the columns involved in the delete condition from the table, which can greatly reduce the amount of read data when deleting.

---------

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
)

Co-authored-by: zhangyue143 <zhangyue61@jd.com>
…ileFormat constructor (apache#12981)

* don't use tablestate for filegroup reader
* revert change for multible base format

---------

Co-authored-by: Jonathan Vexler <=>
the-other-tim-brown and others added 23 commits May 30, 2025 08:10
* introduce key filters to FG reader for full key/prefix key look up;
* replace MDT reader path with FG reader.

---------

Co-authored-by: danny0405 <yuzhao.cyz@gmail.com>
* Optimizing metadata getter for metadata table
* Minor code cleanup

---------

Co-authored-by: vinoth chandar <vinoth@apache.org>
…es (apache#13347)

* fix conflict handling for compaction given completion time changes

* consolidate tests

* split handling into two methods for ease of reading and debugging

* extract common parts of the code
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…table and Enabling Non Blocking Concurrency Control with Metadata (apache#13292)

- Adding write config to support streaming writes to metadata table.
Config is named "hoodie.metadata.streaming.write.enabled".
- Enabling Non Blocking Concurrency Control with Metadata when streaming writes are enabled
apache#13387)

* [MINOR] Renaming TransactionManager methods to begin/end x StateChange

 - begin/end Transaction is confusing.
 - Naming aligns with how these methods are called, whenever action state changes

* Log message cleanup
…3305)

* add isMetadataTable flag in WriteStaus;
* fixing WriteStats to accomodate metadata table as well.

---------

Co-authored-by: sivabalan <n.siva.b@gmail.com>
Co-authored-by: danny0405 <yuzhao.cyz@gmail.com>
…rt (apache#13360)

* refactor: Unify all the code paths of bulk insert operations

---------

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
@Davis-Zhang-Onehouse Davis-Zhang-Onehouse requested review from a team as code owners June 6, 2025 20:35
@nimahajan
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.