Skip to content

Commit 0c08896

Browse files
authored
Make devices and their properties accessible through TruffleObjects
* added getDevice function and properties from cudaDeviceGetAttribute * added remaining properties, getDevices function with DeviceList and tests * addressed issues from code review
1 parent 498f9b1 commit 0c08896

12 files changed

Lines changed: 1440 additions & 55 deletions

File tree

docs/language.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,157 @@ buildkernel(
263263
264264
See description in the [polyglot kernel launch](docs/launchkernel.md) documentation for details.
265265
266+
### getdevices() and getdevice() Functions
267+
268+
The `getdevices()` functions returns an array that contains all visible
269+
CUDA devices. `getdevice(k)` returns the `k` visible device, with
270+
`k` ranging from 0 to the number of visible devices - 1.
271+
272+
```text
273+
devices = getdevices()
274+
device = getdevice(deviceOrdinal)
275+
```
276+
277+
`deviceOrdinal`: integer `k` that for the kth device, `k` from 0 to
278+
the number of visible devices
279+
(see [cudaGetDevice](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html))
280+
281+
Both functions return `Devices` objects which have the following members:
282+
283+
Attribute `id`: the device ID (ordinal)
284+
285+
Attribute `properties`: property objects containing device attributes
286+
returned by the CUDA runtime `cudaDeviceGetAttributeGet()`,
287+
`cudaMemgetInfo()` and `cuDeviceGetName()`.
288+
289+
Method `isCurrent()`: method returns true iff `id` is the device
290+
on which the currently active host thread executes device code.
291+
292+
Method `setCurrent()`: method sets `id` as the device the
293+
currently active host thread should execute device code.
294+
295+
**Example:**
296+
297+
```Python
298+
devices = polyglot.eval(language='grcuda', 'getdevices()')
299+
device0 = polyglot.eval(language='grcuda', 'getdevice(0)')
300+
# identical to device0 = devices[0]
301+
302+
for device in devices:
303+
print('{}: {}, {} multiprocessors'.format(device.id,
304+
device.property.deviceName,
305+
device.property.multiProcessorCount))
306+
# example output
307+
# 0: TITAN V, 80 multiprocessors
308+
# 1: Quadro GP100, 56 multiprocessors
309+
device0.setCurrent()
310+
print(device0.isCurrent()) # true
311+
```
312+
313+
Table: Device Properties Names (see also
314+
[CUDA Runtime Documentation](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html))
315+
| Property Name
316+
|-------------------------------------------|
317+
| `asyncEngineCount` |
318+
| `canFlushRemoteWrites` |
319+
| `canMapHostMemory` |
320+
| `canUseHostPointerForRegisteredMem` |
321+
| `clockRate` |
322+
| `computeCapabilityMajor` |
323+
| `computeCapabilityMinor` |
324+
| `computeMode` |
325+
| `computePreemptionSupported` |
326+
| `concurrentKernels` |
327+
| `concurrentManagedAccess` |
328+
| `cooperativeLaunch` |
329+
| `cooperativeMultiDeviceLaunch` |
330+
| `deviceName` |
331+
| `directManagedMemAccessFromHost` |
332+
| `eccEnabled` |
333+
| `freeDeviceMemory` |
334+
| `globalL1CacheSupported` |
335+
| `globalMemoryBusWidth` |
336+
| `gpuOverlap` |
337+
| `hostNativeAtomicSupported` |
338+
| `hostRegisterSupported` |
339+
| `integrated` |
340+
| `isMultiGpuBoard` |
341+
| `kernelExecTimeout` |
342+
| `l2CacheSize` |
343+
| `localL1CacheSupported` |
344+
| `managedMemory` |
345+
| `maxBlockDimX` |
346+
| `maxBlockDimY` |
347+
| `maxBlockDimZ` |
348+
| `maxGridDimX` |
349+
| `maxGridDimY` |
350+
| `maxGridDimZ` |
351+
| `maxPitch` |
352+
| `maxRegistersPerBlock` |
353+
| `maxRegistersPerMultiprocessor` |
354+
| `maxSharedMemoryPerBlock` |
355+
| `maxSharedMemoryPerBlockOptin` |
356+
| `maxSharedMemoryPerMultiprocessor` |
357+
| `maxSurface1DLayeredLayers` |
358+
| `maxSurface1DWidth` |
359+
| `maxSurface2DHeight` |
360+
| `maxSurface2DLayeredHeight` |
361+
| `maxSurface2DLayeredLayers` |
362+
| `maxSurface2DLayeredWidth` |
363+
| `maxSurface2DWidth` |
364+
| `maxSurface3DDepth` |
365+
| `maxSurface3DHeight` |
366+
| `maxSurface3DWidth` |
367+
| `maxSurfaceCubemapLayeredLayers` |
368+
| `maxSurfaceCubemapLayeredWidth` |
369+
| `maxSurfaceCubemapWidth` |
370+
| `maxTexture1DLayeredLayers` |
371+
| `maxTexture1DLayeredWidth` |
372+
| `maxTexture1DLinearWidth` |
373+
| `maxTexture1DMipmappedWidth` |
374+
| `maxTexture1DWidth` |
375+
| `maxTexture2DGatherHeight` |
376+
| `maxTexture2DGatherWidth` |
377+
| `maxTexture2DHeight` |
378+
| `maxTexture2DLayeredHeight` |
379+
| `maxTexture2DLayeredLayers` |
380+
| `maxTexture2DLayeredWidth` |
381+
| `maxTexture2DLinearHeight` |
382+
| `maxTexture2DLinearPitch` |
383+
| `maxTexture2DLinearWidth` |
384+
| `maxTexture2DMipmappedHeight` |
385+
| `maxTexture2DMipmappedWidth` |
386+
| `maxTexture2DWidth` |
387+
| `maxTexture3DDepth` |
388+
| `maxTexture3DDepthAlt` |
389+
| `maxTexture3DHeight` |
390+
| `maxTexture3DHeightAlt` |
391+
| `maxTexture3DWidth` |
392+
| `maxTexture3DWidthAlt` |
393+
| `maxTextureCubemapLayeredLayers` |
394+
| `maxTextureCubemapLayeredWidth` |
395+
| `maxTextureCubemapWidth` |
396+
| `maxThreadsPerBlock` |
397+
| `maxThreadsPerMultiProcessor` |
398+
| `memoryClockRate` |
399+
| `multiGpuBoardGroupID` |
400+
| `multiProcessorCount` |
401+
| `pageableMemoryAccess` |
402+
| `pageableMemoryAccessUsesHostPageTables` |
403+
| `pciBusId` |
404+
| `pciDeviceId` |
405+
| `pciDomainId` |
406+
| `singleToDoublePrecisionPerfRatio` |
407+
| `streamPrioritiesSupported` |
408+
| `surfaceAlignment` |
409+
| `tccDriver` |
410+
| `textureAlignment` |
411+
| `texturePitchAlignment` |
412+
| `totalConstantMemory` |
413+
| `totalDeviceMemory` |
414+
| `unifiedAddressing` |
415+
| `warpSize` |
416+
266417
### DeviceArray Constructor Function
267418

268419
In addition to arrays expression, device arrays can also be
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
/*
2+
* Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
3+
*
4+
* Redistribution and use in source and binary forms, with or without
5+
* modification, are permitted provided that the following conditions
6+
* are met:
7+
* * Redistributions of source code must retain the above copyright
8+
* notice, this list of conditions and the following disclaimer.
9+
* * Redistributions in binary form must reproduce the above copyright
10+
* notice, this list of conditions and the following disclaimer in the
11+
* documentation and/or other materials provided with the distribution.
12+
* * Neither the name of NVIDIA CORPORATION nor the names of its
13+
* contributors may be used to endorse or promote products derived
14+
* from this software without specific prior written permission.
15+
*
16+
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
*/
28+
package com.nvidia.grcuda.test;
29+
30+
import static org.junit.Assert.assertEquals;
31+
import static org.junit.Assert.assertFalse;
32+
import static org.junit.Assert.assertTrue;
33+
import org.graalvm.polyglot.Context;
34+
import org.graalvm.polyglot.Value;
35+
import org.junit.Test;
36+
37+
public class DeviceTest {
38+
39+
@Test
40+
public void testDeviceCount() {
41+
try (Context ctx = Context.newBuilder().allowAllAccess(true).build()) {
42+
Value deviceCount = ctx.eval("grcuda", "cudaGetDeviceCount()");
43+
assertTrue(deviceCount.isNumber());
44+
assertTrue(deviceCount.asInt() > 0);
45+
}
46+
}
47+
48+
@Test
49+
public void testGetDevicesLengthsMatchesDeviceCount() {
50+
try (Context ctx = Context.newBuilder().allowAllAccess(true).build()) {
51+
Value deviceCount = ctx.eval("grcuda", "cudaGetDeviceCount()");
52+
assertTrue(deviceCount.isNumber());
53+
assertTrue(deviceCount.asInt() > 0);
54+
Value devices = ctx.eval("grcuda", "getdevices()");
55+
assertEquals(deviceCount.asInt(), devices.getArraySize());
56+
}
57+
}
58+
59+
@Test
60+
public void testGetDevicesMatchesAllGetDevice() {
61+
try (Context ctx = Context.newBuilder().allowAllAccess(true).build()) {
62+
Value devices = ctx.eval("grcuda", "getdevices()");
63+
Value getDevice = ctx.eval("grcuda", "getdevice");
64+
for (int i = 0; i < devices.getArraySize(); ++i) {
65+
Value deviceFromArray = devices.getArrayElement(i);
66+
Value deviceFromFunction = getDevice.execute(i);
67+
assertEquals(i, deviceFromArray.getMember("id").asInt());
68+
assertEquals(i, deviceFromFunction.getMember("id").asInt());
69+
}
70+
}
71+
}
72+
73+
@Test
74+
public void testCanReadSomeDeviceProperties() {
75+
try (Context ctx = Context.newBuilder().allowAllAccess(true).build()) {
76+
Value devices = ctx.eval("grcuda", "getdevices()");
77+
for (int i = 0; i < devices.getArraySize(); ++i) {
78+
Value device = devices.getArrayElement(i);
79+
Value prop = device.getMember("properties");
80+
// Sanity tests on some of the properties
81+
// device name is a non-zero string
82+
assertTrue(prop.getMember("deviceName").asString().length() > 0);
83+
84+
// compute capability is at least compute Kepler (3.0)
85+
assertTrue(prop.getMember("computeCapabilityMajor").asInt() >= 3);
86+
87+
// there is at least one multiprocessors
88+
assertTrue(prop.getMember("multiProcessorCount").asInt() > 0);
89+
90+
// there is some device memory
91+
assertTrue(prop.getMember("totalDeviceMemory").asLong() > 0L);
92+
}
93+
}
94+
}
95+
96+
@Test
97+
public void testCanSelectDevice() {
98+
try (Context ctx = Context.newBuilder().allowAllAccess(true).build()) {
99+
Value devices = ctx.eval("grcuda", "getdevices()");
100+
if (devices.getArraySize() > 1) {
101+
Value firstDevice = devices.getArrayElement(0);
102+
Value secondDevice = devices.getArrayElement(1);
103+
secondDevice.invokeMember("setCurrent");
104+
assertFalse(firstDevice.invokeMember("isCurrent").asBoolean());
105+
assertTrue(secondDevice.invokeMember("isCurrent").asBoolean());
106+
107+
firstDevice.invokeMember("setCurrent");
108+
assertTrue(firstDevice.invokeMember("isCurrent").asBoolean());
109+
assertFalse(secondDevice.invokeMember("isCurrent").asBoolean());
110+
} else {
111+
// only one device available
112+
Value device = devices.getArrayElement(0);
113+
device.invokeMember("setCurrent");
114+
assertTrue(device.invokeMember("isCurrent").asBoolean());
115+
}
116+
}
117+
}
118+
119+
@Test
120+
public void testDeviceMemoryAllocationReducesReportedFreeMemory() {
121+
try (Context ctx = Context.newBuilder().allowAllAccess(true).build()) {
122+
Value device = ctx.eval("grcuda", "getdevice(0)");
123+
Value props = device.getMember("properties");
124+
device.invokeMember("setCurrent");
125+
long totalMemoryBefore = props.getMember("totalDeviceMemory").asLong();
126+
long freeMemoryBefore = props.getMember("freeDeviceMemory").asLong();
127+
assertTrue(freeMemoryBefore <= totalMemoryBefore);
128+
129+
// allocate memory on device (unmanaged)
130+
long arraySizeBytes = freeMemoryBefore / 3;
131+
Value cudaMalloc = ctx.eval("grcuda", "cudaMalloc");
132+
Value cudaFree = ctx.eval("grcuda", "cudaFree");
133+
Value gpuPointer = null;
134+
try {
135+
gpuPointer = cudaMalloc.execute(arraySizeBytes);
136+
// After allocation total memory must be the same as before but
137+
// the free memory must be lower by at least the amount of allocated bytes.
138+
long totalMemoryAfter = props.getMember("totalDeviceMemory").asLong();
139+
long freeMemoryAfter = props.getMember("freeDeviceMemory").asLong();
140+
assertEquals(totalMemoryBefore, totalMemoryAfter);
141+
assertTrue(freeMemoryAfter <= (freeMemoryBefore - arraySizeBytes));
142+
} finally {
143+
if (gpuPointer != null) {
144+
cudaFree.execute(gpuPointer);
145+
}
146+
}
147+
}
148+
}
149+
150+
}

projects/com.nvidia.grcuda/src/com/nvidia/grcuda/DeviceArray.java

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -244,44 +244,45 @@ Object getMembers(boolean includeInternal) {
244244

245245
@ExportMessage
246246
@SuppressWarnings("static-method")
247-
boolean isMemberReadable(String member,
248-
@Shared("member") @Cached("createIdentityProfile()") ValueProfile memberProfile) {
249-
return POINTER.equals(memberProfile.profile(member)) || COPY_FROM.equals(memberProfile.profile(member)) || COPY_TO.equals(memberProfile.profile(member));
247+
boolean isMemberReadable(String memberName,
248+
@Shared("memberName") @Cached("createIdentityProfile()") ValueProfile memberProfile) {
249+
String name = memberProfile.profile(memberName);
250+
return POINTER.equals(name) || COPY_FROM.equals(name) || COPY_TO.equals(name);
250251
}
251252

252253
@ExportMessage
253-
Object readMember(String member,
254-
@Shared("member") @Cached("createIdentityProfile()") ValueProfile memberProfile) throws UnknownIdentifierException {
255-
if (!isMemberReadable(member, memberProfile)) {
254+
Object readMember(String memberName,
255+
@Shared("memberName") @Cached("createIdentityProfile()") ValueProfile memberProfile) throws UnknownIdentifierException {
256+
if (!isMemberReadable(memberName, memberProfile)) {
256257
CompilerDirectives.transferToInterpreter();
257-
throw UnknownIdentifierException.create(member);
258+
throw UnknownIdentifierException.create(memberName);
258259
}
259-
if (POINTER.equals(member)) {
260+
if (POINTER.equals(memberName)) {
260261
return getPointer();
261262
}
262-
if (COPY_FROM.equals(member)) {
263+
if (COPY_FROM.equals(memberName)) {
263264
return new DeviceArrayCopyFunction(this, DeviceArrayCopyFunction.CopyDirection.FROM_POINTER);
264265
}
265-
if (COPY_TO.equals(member)) {
266+
if (COPY_TO.equals(memberName)) {
266267
return new DeviceArrayCopyFunction(this, DeviceArrayCopyFunction.CopyDirection.TO_POINTER);
267268
}
268269
CompilerDirectives.transferToInterpreter();
269-
throw UnknownIdentifierException.create(member);
270+
throw UnknownIdentifierException.create(memberName);
270271
}
271272

272273
@ExportMessage
273274
@SuppressWarnings("static-method")
274-
boolean isMemberInvocable(String member) {
275-
return COPY_FROM.equals(member) || COPY_TO.equals(member);
275+
boolean isMemberInvocable(String memberName) {
276+
return COPY_FROM.equals(memberName) || COPY_TO.equals(memberName);
276277
}
277278

278279
@ExportMessage
279-
Object invokeMember(String member,
280+
Object invokeMember(String memberName,
280281
Object[] arguments,
281282
@CachedLibrary(limit = "1") InteropLibrary interopRead,
282283
@CachedLibrary(limit = "1") InteropLibrary interopExecute)
283284
throws UnsupportedTypeException, ArityException, UnsupportedMessageException, UnknownIdentifierException {
284-
return interopExecute.execute(interopRead.readMember(this, member), arguments);
285+
return interopExecute.execute(interopRead.readMember(this, memberName), arguments);
285286
}
286287

287288
@ExportMessage

projects/com.nvidia.grcuda/src/com/nvidia/grcuda/GrCUDAContext.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
import com.nvidia.grcuda.functions.BuildKernelFunction;
3535
import com.nvidia.grcuda.functions.DeviceArrayFunction;
3636
import com.nvidia.grcuda.functions.FunctionTable;
37+
import com.nvidia.grcuda.functions.GetDeviceFunction;
38+
import com.nvidia.grcuda.functions.GetDevicesFunction;
3739
import com.nvidia.grcuda.gpu.CUDARuntime;
3840
import com.oracle.truffle.api.TruffleLanguage.Env;
3941

@@ -56,6 +58,8 @@ public GrCUDAContext(Env env) {
5658
functionTable.registerFunction(new DeviceArrayFunction(cudaRuntime));
5759
functionTable.registerFunction(new BindKernelFunction(cudaRuntime));
5860
functionTable.registerFunction(new BuildKernelFunction(cudaRuntime));
61+
functionTable.registerFunction(new GetDevicesFunction(cudaRuntime));
62+
functionTable.registerFunction(new GetDeviceFunction(cudaRuntime));
5963
}
6064

6165
public Env getEnv() {

0 commit comments

Comments
 (0)