Skip to content

Graid integration#58

Open
cooldavid wants to merge 15 commits into
ZaidQureshi:masterfrom
cooldavid:graid-integration
Open

Graid integration#58
cooldavid wants to merge 15 commits into
ZaidQureshi:masterfrom
cooldavid:graid-integration

Conversation

@cooldavid
Copy link
Copy Markdown

The integration patch is on the last commit. Previous commits are just preparation and bug fix, those commits does not have any functional change.

@cooldavid cooldavid marked this pull request as draft March 31, 2025 15:01
@cooldavid cooldavid force-pushed the graid-integration branch 2 times, most recently from edea657 to f6253df Compare April 8, 2025 03:51
Comment thread benchmarks/array/main.cu
uint64_t page_size = settings.pageSize;
uint64_t n_pages = settings.numPages;
uint64_t total_cache_size = (page_size * n_pages);
//uint64_t total_cache_size = (page_size * n_pages);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose moving towards fewer lines of commented out code, this would greatly improve the readability of the code base.
In my opinion, it would be better to simply remove the lines you're commenting out.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed! Since we already have full modification log in the git repo. I personally would prefer to only leave effective code too. I've removed some of it in other refactoring commit. For this commit, it's just try to temporary fix the compile warnings. I choose to do it against my prefer because of two reasons:

  1. Original BaM code base already have lots of comment-out / deprecated codes. It seems original author prefer to keep the "possible for future use" code there. I'm just following the original style.
  2. For "benchmarks" codes, it looks like some testing codes that leaves others to modify/tune the code to fit their benchmark scenario.

If the repo maintainer would like to just remove it, I surely can do it. :)

@DevUt
Copy link
Copy Markdown

DevUt commented May 6, 2025

Hey, I was lurking here.
By curiosity, what is GRAID?
Is this something obvious that I am missing?

@cooldavid
Copy link
Copy Markdown
Author

Hey, I was lurking here. By curiosity, what is GRAID? Is this something obvious that I am missing?

Hi @DevUt it's: https://www.graidtech.com/
We(GraidTech) are trying to implement a driver on BaM, so that BaM can use GraidTech's RAID volume as the backend storage device same as general NVMe.

Guo-Fu Tseng added 15 commits September 26, 2025 00:03
[  +0.000003] UBSAN: array-index-out-of-bounds in /home/cooldavid/bam/module/map.c:113:9
[  +0.000500] index 1 is out of range for type 'uint64_t [1]'
[  +0.000350] CPU: 139 PID: 137302 Comm: nvm-block-bench Tainted: P           OE      6.8.0-55-generic ZaidQureshi#57-Ubuntu
[  +0.000004] Hardware name: TYAN B8261T85E8HR-2T-N               /S8261GM2NE-2T, BIOS V1.02 (0x50) 11/11/2024
[  +0.000002] Call Trace:
[  +0.000001]  <TASK>
[  +0.000002]  dump_stack_lvl+0x76/0xa0
[  +0.000008]  dump_stack+0x10/0x20
[  +0.000003]  __ubsan_handle_out_of_bounds+0xc6/0x110
[  +0.000005]  release_user_pages+0x126/0x130 [libnvm]
[  +0.000003]  unmap_and_release+0x27/0x40 [libnvm]
[  +0.000003]  map_ioctl+0x1a1/0x240 [libnvm]
[  +0.000003]  __x64_sys_ioctl+0xa0/0xf0
[  +0.000003]  x64_sys_call+0x12a3/0x25a0
[  +0.000003]  do_syscall_64+0x7f/0x180
[  +0.000003]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000002]  ? __slab_free+0xdf/0x2c0
[  +0.000003]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000004]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000003]  ? __slab_free+0xdf/0x2c0
[  +0.000004]  ? kvfree+0x31/0x40
[  +0.000004]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000002]  ? kfree+0x2ca/0x370
[  +0.000003]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000002]  ? kvfree+0x31/0x40
[  +0.000003]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000002]  ? unmap_and_release+0x32/0x40 [libnvm]
[  +0.000003]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000002]  ? map_ioctl+0x1a1/0x240 [libnvm]
[  +0.000003]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000002]  ? __x64_sys_ioctl+0xbb/0xf0
[  +0.000002]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000003]  ? syscall_exit_to_user_mode+0x86/0x260
[  +0.000002]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000002]  ? do_syscall_64+0x8c/0x180
[  +0.000003]  ? irqentry_exit_to_user_mode+0x7b/0x260
[  +0.000002]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000002]  ? irqentry_exit+0x43/0x50
[  +0.000002]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000003]  ? exc_page_fault+0x94/0x1b0
[  +0.000002]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[  +0.000002] RIP: 0033:0x71059f124ded
[  +0.000023] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[  +0.000002] RSP: 002b:00007ffc4d13a090 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.000003] RAX: ffffffffffffffda RBX: 00005b9c123bade0 RCX: 000071059f124ded
[  +0.000001] RDX: 00007ffc4d13a110 RSI: 0000000040088003 RDI: 000000000000002d
[  +0.000001] RBP: 00007ffc4d13a0e0 R08: 00000005b9c123be R09: 0000000000000000
[  +0.000002] R10: 00005b9c123be230 R11: 0000000000000246 R12: 0000000000000001
[  +0.000001] R13: 0000000000000000 R14: 0000000000000001 R15: 000071059f8c4000
[  +0.000004]  </TASK>
[  +0.000001] ---[ end trace ]---
nvm_ctrl_t already saved some information from CAP register, saving CQR in
nvme_ctrl_t in it for easier access.

QueuePair constructor can get max_qs and cqr directly from saved
information in nvme_ctrl_t.
The h_qps is array of pointers of QueuePair, the correct size should be:
    sizeof(QueuePair*)*n_qps

But it might want to allocate contiguous memory originally for single
cudaMemcpy() from h_qps array to d_qps. Implement it with placement new
to be able to copy the full array only once.
/home/cooldavid/bam/module/map.c:277:6: warning: no previous prototype for ‘release_gpu_memory’ [-Wmissing-prototypes]
  277 | void release_gpu_memory(struct map* map)
      |      ^~~~~~~~~~~~~~~~~~
/home/cooldavid/bam/module/map.c:318:5: warning: no previous prototype for ‘map_gpu_memory’ [-Wmissing-prototypes]
  318 | int map_gpu_memory(struct map* map, struct list* list)
      |     ^~~~~~~~~~~~~~
@cooldavid cooldavid marked this pull request as ready for review September 25, 2025 17:13
@cooldavid cooldavid changed the title WIP: Graid integration Graid integration Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants