Skip to content

Rebase twitter's commit onto prestodb master#248

Open
beinan wants to merge 129 commits into
twitter-forks:prestodb-twitter-masterfrom
beinan:prestodb-twitter-234-druid
Open

Rebase twitter's commit onto prestodb master#248
beinan wants to merge 129 commits into
twitter-forks:prestodb-twitter-masterfrom
beinan:prestodb-twitter-234-druid

Conversation

@beinan
Copy link
Copy Markdown

@beinan beinan commented Apr 21, 2020

No description provided.

shixuan-fan and others added 30 commits April 21, 2020 13:37
When SORTED_WRITE_TO_TEMP_PATH_ENABLED is true, we would require
a temporary path for sorted writes.
Soft memory limits are default memory limits given to each query that can be overridden using session properties up to the hard limit set by the existing configuration properties.

Having soft limits makes it easier to migrate a workload to lower memory limits by allowing only the queries that require higher limits to specify them while defaulting other queries to lower limits.

Available soft memory limit configuration properties:

query.soft-max-memory-per-node
query.soft-max-total-memory-per-node
query.soft-max-total-memory
query.soft-max-memory
Adding a configuration to handle compression codec for handling orc
and dwrf storage format. Use hive.orc_compression_codec to override
the generic compression codec for orc and dwrf storage format. The
reason to add an extra configuration was the unavailability of uniform
support of all compression codec across all storage formats. The ZSTD
compression codec is only available for orc and dwrf storage format.
We have need for this function in several places, and it is purely geometric.
Adds a parent abstract class to PrestoS3FileSystemMetricsCollector
so that other SDK clients can share the metrics collector support.

Adds reporting for client retry pause time indicating how long the
thread was asleep between request retries in the client itself.

Fixes the reporting client timings. Previously, when the client
retried a request only the first request timings would be recorded
in the stats. Now, all request timings are reported individually.
Previously, an instance of PrestoS3FileSystemStats instance was
created in PrestoS3ClientFactory which means it would not report
S3 client stats to the instance registered with JMX. This would
only have affected PrestoS3Select clients. Now the same metric
instance is shared with PrestoS3FileSystem
In SHOW FUNCTIONS results, list the built-in functions first, and then
the SQL functions, in alphabetical order of the qualified function
names.
Minor variable renames
Page sink commit mechanism is a general connector capability and is not
restricted only for partition commit.
It can be used not only to commit lifespans or physical partitions.
In fact it can be used to commit any page sink write.
Co-authored-by: Andrii Rosa <andriirosa@fb.com>
Tasks in spark are often retried and run speculatively, thus the
commit protocol required for table writes to avoid data corruption

Co-authored-by: Andrii Rosa <andriirosa@fb.com>
A footer consists of two parts.
 - offset of each stripe's start location.
 - footer's total size in bytes.
TestRowBasedSerialization sometimes fails calling
createRandomLongDecimalsBlock with less than 10 positions. We should
allow blocks with less than 10 positions to be created if there are
such needs. This commit removes the check to enforce the positionCount
check, and comments were added to suggest the user use a larger
positinCount when desired nullRate > 0.
beinan and others added 26 commits May 8, 2020 21:15
… Parquet schema mismatch checking (twitter-forks#245)

* Compare type by (name,type) pair rather than (index,type) pair during Parquet schema mismatch checking

* add unit test for parquet schema mismatch checker
@beinan beinan force-pushed the prestodb-twitter-234-druid branch from cf7de87 to 3ff4f01 Compare May 9, 2020 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.