-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathinterleaving.html
More file actions
135 lines (132 loc) · 7.2 KB
/
interleaving.html
File metadata and controls
135 lines (132 loc) · 7.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
<!DOCTYPE html>
<html lang="en">
<head>
<link rel="stylesheet" type="text/css" href="styles/style.css">
<title>OpenFAM: A library for programming Fabric-Attached Memory</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<header>
<h1>OpenFAM Reference Implementation</h1>
</header>
<section>
<nav>
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="release_notes.html">Release Notes</a></li>
<li><a href="limitations.html">Design Choices</a></li>
<li><a href="errors.html">Exceptions and Error Codes</a></li>
<li><a href="services.html">Services</a></li>
<li><a href="config_files.html">Configuration Files</a></li>
</ul>
<hr>
Initialization and Finalization
<ul>
<li><a href="fam_initialize.html">fam_initialize</a></li>
<li><a href="fam_finalize.html">fam_finalize</a></li>
<li><a href="fam_abort.html">fam_abort</a></li>
</ul>
<hr>
Options & Query
<ul>
<li><a href="fam_list_options.html">fam_list_options</a></li>
<li><a href="fam_get_option.html">fam_get_option</a></li>
<li><a href="fam_lookup.html">fam_lookup</a></li>
<li><a href="fam_stat.html">fam_stat</a></li>
</ul>
<hr>
Memory Allocation
<ul>
<li><a href="fam_create_region.html">fam_create_region</a></li>
<li><a href="fam_destroy_region.html">fam_destroy_region</a></li>
<li><a href="fam_resize_region.html">fam_resize_region</a></li>
<li><a href="fam_allocate.html">fam_allocate</a></li>
<li><a href="fam_deallocate.html">fam_deallocate</a></li>
<li><a href="fam_change_permissions.html">fam_change_permissions</a></li>
<li><a href="fam_close.html">fam_close</a></li>
</ul>
<hr>
Memory Map
<ul>
<li><a href="fam_map.html">fam_map</a></li>
<li><a href="fam_unmap.html">fam_unmap</a></li>
</ul>
<hr>
Data Path Operations
<ul>
<li><a href="fam_get.html">fam_get</a></li>
<li><a href="fam_put.html">fam_put</a></li>
<li><a href="fam_gather.html">fam_gather</a></li>
<li><a href="fam_scatter.html">fam_scatter</a></li>
<li><a href="fam_copy.html">fam_copy</a></li>
<li><a href="fam_copy_wait.html">fam_copy_wait</a></li>
<li><a href="fam_backup.html">fam_backup</a></li>
<li><a href="fam_backup_wait.html">fam_backup_wait</a></li>
<li><a href="fam_restore.html">fam_restore</a></li>
<li><a href="fam_restore_wait.html">fam_restore_wait</a></li>
<li><a href="fam_delete_backup.html">fam_delete_backup</a></li>
<li><a href="fam_delete_backup_wait.html">fam_delete_backup_wait</a></li>
</ul>
<hr>
Atomics
<ul>
<li><a href="fam_set.html">fam_set</a></li>
<li><a href="fam_add.html">fam_add</a></li>
<li><a href="fam_subtract.html">fam_subtract</a></li>
<li><a href="fam_min.html">fam_min</a></li>
<li><a href="fam_max.html">fam_max</a></li>
<li><a href="fam_and.html">fam_and</a></li>
<li><a href="fam_or.html">fam_or</a></li>
<li><a href="fam_xor.html">fam_xor</a></li>
<li><a href="fam_fetch_TYPE.html">fam_fetch_TYPE</a></li>
<li><a href="fam_swap.html">fam_swap</a></li>
<li><a href="fam_compare_swap.html">fam_compare_swap</a></li>
<li><a href="fam_fetch_add.html">fam_fetch_add</a></li>
<li><a href="fam_fetch_subtract.html">fam_fetch_subtract</a></li>
<li><a href="fam_fetch_min.html">fam_fetch_min</a></li>
<li><a href="fam_fetch_and.html">fam_fetch_and</a></li>
<li><a href="fam_fetch_max.html">fam_fetch_max</a></li>
<li><a href="fam_fetch_or.html">fam_fetch_or</a></li>
<li><a href="fam_fetch_xor.html">fam_fetch_xor</a></li>
</ul>
<hr>
Ordering
<ul>
<li><a href="fam_barrier_all.html">fam_barrier_all</a></li>
<li><a href="fam_fence.html">fam_fence</a></li>
<li><a href="fam_quiet.html">fam_quiet</a></li>
<li><a href="fam_progress.html">fam_progress</a></li>
<li><a href="fam_context_open.html">fam_context_open</a></li>
<li><a href="fam_context_close.html">fam_context_close</a></li>
</ul>
<hr>
</nav>
<article>
<h1>Data Item Interleaving</h1>
In earlier implementation of OpenFAM, the region created by a user could span across more than one memory server. However memory for any given data item was allocated only on a single memory server. Hence the region spanning feature was not being effectively utilized.For single-program, multiple-data (SPMD) applications, data items are typically large, and multiple PEs concurrently access different parts of the data item. For such data items, the network interface at the memory server becomes a bottleneck.
During benchmark runs, it was observed that random access benchmarks caused network bottlenecks and in-cast at the memory servers.
In OpenFAM 3.0 implementation, the concept of data-item interleaving is introduced to improve scalability.
Data interleaving refers to interspersing portions of the data across the memory servers in some specified order or other arrangement.
As an example, consider an arrangement with 4 memory servers (memory servers 1 to 4) as shown in Figure 1.
<br>
<figure>
<img src="images/interleave.png" width="850px">
<figcaption>Figure 1: Allocating data item across memory servers in OpenFAM</figcaption>
</figure>
In the above mentioned figure , Region A spans 4 memory servers, numbered 1 to 4. The stripes corresponding to Dataitem1 and Dataitem2 are allocated across the 4 memory servers. In this example, these allocations are converted into 4 separate allocations across the memory servers. Each block (with size defined by the interleave size) is distributed to memory servers in a round-robin fashion. Note that the starting blocks for the two data items reside in two different memory servers.
<br>
<img src="images/interleave_4K.png" width="850px">
<figcaption>Figure 2: Data Item interleaving with interleave size of 4KiB</figcaption>
</figure>
<br>
Figure 2 illustrates the concept of interleaving using the interleaving size to be 4096. With interleaving size of 4KiB, the interleaving logic places data portion 0 at memory server 1, places data portion 1 at memory server 2, places data portion 2 at memory server 3, places data portion 3 at memory server 4, places data portion 4 at memory server 1, places data portion 5 at memory server 2, places data portion 6 at memory server 3, and so forth. Such an interleaving arrangement can be referred to as a round robin interleaving arrangement in which sequential data portions are successively placed in memory servers according to an order of the memory servers (starting at a beginning memory server and proceeding to a last memory server in a sequence), and when the last memory server is reached in the sequence, the next data portion is placed in the beginning memory server in the sequence and the process continues until the end of the data portions is reached. Placing a data portion at a given memory server refers to writing the data portion in the main memory of the memory server.
<br>
<b>Note:</b>Our performance benchmark runs indicate that 64KiB is an interleaving size that offers better throughput and latency results for the data sets of different sizes.
</article>
</section>
<footer>
<p>Copyright 2021-23, Hewlett Packard Enterprise Development Co, LLP</p>
</footer>
</body>
</html>