-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathatom.xml
More file actions
41 lines (22 loc) · 14.2 KB
/
atom.xml
File metadata and controls
41 lines (22 loc) · 14.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Cary's Blog</title>
<link href="http://caryhuang.github.io/atom.xml" rel="self"/>
<link href="http://caryhuang.github.io/"/>
<updated>2022-10-28T22:23:13.936Z</updated>
<id>http://caryhuang.github.io/</id>
<author>
<name>Cary Huang</name>
</author>
<generator uri="https://hexo.io/">Hexo</generator>
<entry>
<title>Cross-partition uniqueness guarantee with global unique index</title>
<link href="http://caryhuang.github.io/2022/10/28/Cross-partition-uniqueness-guarantee-with-global-unique-index/"/>
<id>http://caryhuang.github.io/2022/10/28/Cross-partition-uniqueness-guarantee-with-global-unique-index/</id>
<published>2022-10-28T18:18:03.000Z</published>
<updated>2022-10-28T22:23:13.936Z</updated>
<content type="html"><![CDATA[<h3 id="1-0-Introduction"><a href="#1-0-Introduction" class="headerlink" title="1.0 Introduction"></a>1.0 Introduction</h3><p>My colleague, David, recently published a post <a href="https://www.highgo.ca/2022/10/14/global-index-a-different-approach/">“Global Index, a different approach”</a> that describes the work that we are doing to implement global unique index in an approach that does not change current PostgreSQL’s partitioning framework, while allowing cross-partition uniqueness constraint. To implement this, we must first know how PostgreSQL currently ensures uniqueness on a single table with a unique index and then we expand on top of this logic to support cross-partition uniqueness check. This blog of mine <a href="https://www.highgo.ca/2022/09/30/how-unique-index-works-in-pg/">here</a> has a rough overview how unique index works in PG. In this blog, I would like to describe the approach we take to ensure cross-partition uniqueness check during index creation in both serial and parallel build.</p><h3 id="2-0-Cross-Partition-Uniqueness-Check-in-Serial-Global-Unique-Index-Build"><a href="#2-0-Cross-Partition-Uniqueness-Check-in-Serial-Global-Unique-Index-Build" class="headerlink" title="2.0 Cross-Partition Uniqueness Check in Serial Global Unique Index Build"></a>2.0 Cross-Partition Uniqueness Check in Serial Global Unique Index Build</h3><p>As described in this blog <a href="ttps://www.highgo.ca/2022/09/30/how-unique-index-works-in-pg/">here</a>, uniqueness is guaranteed by doing a heap scan on a table and sorting the tuples inside one or two BTSpool structures. If 2 tuples with the same scan key are sorted right next to each other, uniqueness violation is found and system errors out. For example, building a global unique index on a partitioned table containing 6 partitions, at least 6 differnt BTSpool will be filled and used to determine uniqueness violation within each partition creation. So if a duplicate exists in another partition, PG currently cannot detect that. So, in theory if we introduce another BTSpool at a global scale that is visible to all partitions and lives until all partitions have been scanned, we can put all index tuples from all partitions in this global spool and determine cross-partition uniqueness simply by sorting it when the last partition scan is finished</p><p>This diagram below illustrates the position of the new global level BTSpool (called spool3) and how it can be used to determine cross-partition uniqueness.</p><p><img src="/images/global-index-1.drawio.png" alt=""></p><p>Cross-partition uniqueness check in action:</p><figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> globalidxpart (a <span class="built_in">int</span>, b <span class="built_in">int</span>, c <span class="built_in">text</span>) <span class="keyword">partition</span> <span class="keyword">by</span> <span class="keyword">range</span> (a);</span><br><span class="line"><span class="keyword">CREATE</span> <span class="keyword">TABLE</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> globalidxpart1 <span class="keyword">partition</span> <span class="keyword">of</span> globalidxpart <span class="keyword">for</span> <span class="keyword">values</span> <span class="keyword">from</span> (<span class="number">0</span>) <span class="keyword">to</span> (<span class="number">100000</span>);</span><br><span class="line"><span class="keyword">CREATE</span> <span class="keyword">TABLE</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> globalidxpart2 <span class="keyword">partition</span> <span class="keyword">of</span> globalidxpart <span class="keyword">for</span> <span class="keyword">values</span> <span class="keyword">from</span> (<span class="number">100001</span>) <span class="keyword">to</span> (<span class="number">200000</span>);</span><br><span class="line"><span class="keyword">CREATE</span> <span class="keyword">TABLE</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> globalidxpart3 <span class="keyword">partition</span> <span class="keyword">of</span> globalidxpart <span class="keyword">for</span> <span class="keyword">values</span> <span class="keyword">from</span> (<span class="number">200001</span>) <span class="keyword">to</span> (<span class="number">300000</span>);</span><br><span class="line"><span class="keyword">CREATE</span> <span class="keyword">TABLE</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> globalidxpart4 <span class="keyword">partition</span> <span class="keyword">of</span> globalidxpart <span class="keyword">for</span> <span class="keyword">values</span> <span class="keyword">from</span> (<span class="number">300001</span>) <span class="keyword">to</span> (<span class="number">400000</span>);</span><br><span class="line"><span class="keyword">CREATE</span> <span class="keyword">TABLE</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> globalidxpart5 <span class="keyword">partition</span> <span class="keyword">of</span> globalidxpart <span class="keyword">for</span> <span class="keyword">values</span> <span class="keyword">from</span> (<span class="number">400001</span>) <span class="keyword">to</span> (<span class="number">500000</span>);</span><br><span class="line"><span class="keyword">CREATE</span> <span class="keyword">TABLE</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> globalidxpart6 <span class="keyword">partition</span> <span class="keyword">of</span> globalidxpart <span class="keyword">for</span> <span class="keyword">values</span> <span class="keyword">from</span> (<span class="number">500001</span>) <span class="keyword">to</span> (<span class="number">600000</span>);</span><br><span class="line"><span class="keyword">CREATE</span> <span class="keyword">TABLE</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">insert</span> <span class="keyword">into</span> globalidxpart (a, b, c) <span class="keyword">values</span> (<span class="number">42</span>, <span class="number">572814</span>, <span class="string">'inserted first on globalidxpart1'</span>);</span><br><span class="line"><span class="keyword">INSERT</span> <span class="number">0</span> <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">insert</span> <span class="keyword">into</span> globalidxpart (a, b, c) <span class="keyword">values</span> (<span class="number">150000</span>, <span class="number">572814</span>, <span class="string">'inserted duplicate b on globalidxpart2'</span>);</span><br><span class="line"><span class="keyword">INSERT</span> <span class="number">0</span> <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">insert</span> <span class="keyword">into</span> globalidxpart (a, b, c) <span class="keyword">values</span> (<span class="number">550000</span>, <span class="number">572814</span>, <span class="string">'inserted duplicate b on globalidxpart6'</span>);</span><br><span class="line"><span class="keyword">INSERT</span> <span class="number">0</span> <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">unique</span> <span class="keyword">index</span> <span class="keyword">on</span> globalidxpart (b) <span class="keyword">global</span>; </span><br><span class="line">ERROR: could not <span class="keyword">create</span> <span class="keyword">unique</span> <span class="keyword">index</span> <span class="string">"globalidxpart1_b_idx"</span></span><br><span class="line">DETAIL: <span class="keyword">Key</span> (b)=(<span class="number">572814</span>) <span class="keyword">is</span> duplicated.</span><br><span class="line"></span><br><span class="line"><span class="keyword">delete</span> <span class="keyword">from</span> globalidxpart <span class="keyword">where</span> a = <span class="number">150000</span> <span class="keyword">and</span> b = <span class="number">572814</span>;</span><br><span class="line"><span class="keyword">DELETE</span> <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">unique</span> <span class="keyword">index</span> <span class="keyword">on</span> globalidxpart (b) <span class="keyword">global</span>;</span><br><span class="line">ERROR: could not <span class="keyword">create</span> <span class="keyword">unique</span> <span class="keyword">index</span> <span class="string">"globalidxpart1_b_idx"</span></span><br><span class="line">DETAIL: <span class="keyword">Key</span> (b)=(<span class="number">572814</span>) <span class="keyword">is</span> duplicated.</span><br><span class="line"></span><br><span class="line"><span class="keyword">delete</span> <span class="keyword">from</span> globalidxpart <span class="keyword">where</span> a = <span class="number">42</span> <span class="keyword">and</span> b = <span class="number">572814</span>;</span><br><span class="line"><span class="keyword">DELETE</span> <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">create</span> <span class="keyword">unique</span> <span class="keyword">index</span> <span class="keyword">on</span> globalidxpart (b) <span class="keyword">global</span>; </span><br><span class="line"><span class="keyword">CREATE</span> <span class="keyword">INDEX</span></span><br></pre></td></tr></table></figure><h4 id="Some-Considerations"><a href="#Some-Considerations" class="headerlink" title="Some Considerations"></a>Some Considerations</h4><p>How many index tuples can this new global BTSpool hold? Since it needs to hold tuples from all partitions, does it need a lot of memory space allocated? </p><p>It uses <code>maintenance_work_mem</code> from <code>postgresql.conf</code>, same as BTSpool1. When it is near capacity, it will start to write tuples on disk as temporary files (also refer to as logical tapes within PostgreSQL, more on this later) instead of in the memory. So the spool can actually hold much more tuples than we thought. Before the final sorting, we will have to do a final <code>merge</code> of all the logical tapes that PG has written out on disk if memory is not enough to hold all tuples, then do a final <code>merge sort</code> to determine uniqueness. </p><h3 id="3-0-Cross-Partition-Uniqueness-Check-in-Parallel-Global-Unique-Index-Build"><a href="#3-0-Cross-Partition-Uniqueness-Check-in-Parallel-Global-Unique-Index-Build" class="headerlink" title="3.0 Cross-Partition Uniqueness Check in Parallel Global Unique Index Build"></a>3.0 Cross-Partition Uniqueness Check in Parallel Global Unique Index Build</h3><p>Cross-partition uniqueness check using a global-scale spool is very straight forward in serial index build case. </p><p>PG’s current parallel sorting is much more complex as it uses <code>logical tapes</code> to share and merge intermediate sorted results written on disk as temporary files by each workers. At the final sorting, the leader process take over all logical tapes written out by workers and perform final merge sort to determine uniqueness. </p><p>For example, if 3 workers (one of them being the leader) are requested to build a single partition’s index, there will be 3 logical tapes (or 3 temporary files) written out on disk (each being intermediately sorted by each worker before written). The workers use shared memory to coordiate with each other such that they do not write to the same tape files and overwrite each other. When all workers finish, the leader will “Take over all logical tapes”, merge the tapes and perform a final sort. When done, PG will destroy all the parallel workers, which in turn will destroy all logical tape files on disk before moving on to the next partition.</p><p>So, to achieve cross-partition check in parallel, we have to retain those logical tape files when we finish sorting one partition. Currently PG will destroy them when a partition’s index build is finished in parallel. If number of worker spawned is X and number of partition is Y, at the last partition build finish, we should have X * Y logical tapes on disk that we need to do merge sort on. We still use a separate spool3 to manage the tapes and persist them until all partitions are finished.</p><p>This diagram below illustrates the position of spool3 and how it can be used to determine cross-partition uniqueness in parallel.</p><p><img src="/images/global-index-parallel-build.drawio.png" alt=""></p>]]></content>
<summary type="html"><h3 id="1-0-Introduction"><a href="#1-0-Introduction" class="headerlink" title="1.0 Introduction"></a>1.0 Introduction</h3><p>My colleague, </summary>
<category term="postgresql" scheme="http://caryhuang.github.io/tags/postgresql/"/>
</entry>
</feed>