Skip to content

Compiler crash after tiling #2441

@athas

Description

@athas

The following program makes the TileLoops pass in src/Futhark/Optimise/TileLoops.hs produce invalid IR, with a reference to an unknown variable:

def dotprod [n] (xs: [n]i32) (ys: [n]i32) : i32 =
  #[sequential] i32.sum (map2 (i32.*) xs ys)

def matmul [n] [p] [m] (xss: [n][p]i32) (yss: [p][m]i32) : *[n][m]i32 =
  map (\xs -> map (dotprod xs) (transpose yss)) xss

def matdiv_entrywise [m] [n] (xss: [m][n]i32) (yss: [m][n]i32) : *[m][n]i32 =
  map2 (map2 (i32./)) xss yss

entry main W A WH =
  let W_TA = matmul (transpose W) A
  let W_TWH = matmul (transpose W) WH
  let H_update = matdiv_entrywise W_TA W_TWH
  in H_update

The #[sequential] attribute is just to cut down on code size due to multi-versioning; it is irrelevant to the bug. It is reproduced like so:

$ futhark dev -s --extract-kernels -e --tile bug.fut
Internal compiler error.  Please report this:
  https://github.com/diku-dk/futhark/issues
Type error after pass 'tile loops':
In function entry_main
When checking function body
In expression of statement
  {defunc_0_map_res_pretr_pretr_6764 : ({}, [m₇_6243][m₁₁_6244]i32)}
In expression of statement
  {rssss_7231 : ({}, [Ty_6790][Ty_6790][Ry_6791][Ry_6791]i32)}
In expression of statement
  {loop_7206 : ({}, [Ry_6791][Ry_6791]i32)}
Inside the loop body
In expression of statement
  {loop_7209 : ({}, [Ry_6791][Ry_6791]i32)}
Inside the loop body
In expression of statement
  {res_elem_7219 : ({}, i32)}
in body of case {true}
In expression of statement
  {defunc_res_7221 : ({}, i32)}
In subexp as_transformed_row_6768
Use of unknown variable as_transformed_row_6768.

The cause is likely that kernel extraction produces a SegMap containing two (in principle) tileable loops over the same arrays, which look like this in the IR:

  let {defunc_0_map_res_pretr_pretr_6764 : [m₇_6243][m₁₁_6244]i32} =
    segmap(thread; ; grid=segmap_usable_groups_6763; blocksize=segmap_tblock_size_6762)
    (gtid_6765 < m₇_6243, gtid_6766 < m₁₁_6244) (~phys_tid_6767) : {i32} {
      let {as_transformed_row_6768 : [n₆_6242]i32} =
        rearrange_6690[gtid_6765, 0i64 :+ n₆_6242 * 1i64]
      let {as_transformed_row_6769 : [n₆_6242]i32} =
        transpose_res_6449[gtid_6766, 0i64 :+ n₆_6242 * 1i64]
      let {as_transformed_row_6770 : [n₆_6242]i32} =
        transpose_res_6471[gtid_6766, 0i64 :+ n₆_6242 * 1i64]
      "test2.fut:11:14-36->test2.fut:5:20-27->test2.fut:2:17-44"
      let {defunc_res_6771 : i32} =
        #[sequential]
        redomap(n₆_6242,
                {as_transformed_row_6768, as_transformed_row_6769},
                \ {eta_p_6772 : i32,
                   eta_p_6773 : i32}
                  : {i32} ->
                  "test2.fut:11:14-36->test2.fut:5:20-27->test2.fut:2:31-38"
                  let {*_res_6774 : i32} =
                    mul32(eta_p_6772, eta_p_6773)
                  in {*_res_6774},
                {\ {eta_p_6775 : i32,
                    eta_p_6776 : i32}
                  : {i32} ->
                  "test2.fut:11:14-36->test2.fut:5:20-27->test2.fut:2:17-44"
                  let {+_res_6777 : i32} =
                    add32(eta_p_6775, eta_p_6776)
                  in {+_res_6777},
                {0i32}},
                nilFn)
      "test2.fut:12:15-38->test2.fut:5:20-27->test2.fut:2:17-44"
      let {defunc_res_6778 : i32} =
        #[sequential]
        redomap(n₆_6242,
                {as_transformed_row_6768, as_transformed_row_6770},
                \ {eta_p_6779 : i32,
                   eta_p_6780 : i32}
                  : {i32} ->
                  "test2.fut:12:15-38->test2.fut:5:20-27->test2.fut:2:31-38"
                  let {*_res_6781 : i32} =
                    mul32(eta_p_6779, eta_p_6780)
                  in {*_res_6781},
                {\ {eta_p_6782 : i32,
                    eta_p_6783 : i32}
                  : {i32} ->
                  "test2.fut:12:15-38->test2.fut:5:20-27->test2.fut:2:17-44"
                  let {+_res_6784 : i32} =
                    add32(eta_p_6782, eta_p_6783)
                  in {+_res_6784},
                {0i32}},
                nilFn)
      "test2.fut:13:18-45->test2.fut:8:14-21"
      let {/_res_6785 : i32} =
        sdiv32(defunc_res_6771, defunc_res_6778)
      return {returns /_res_6785}
    }

This is in principle correct behaviour from kernel extraction. TileLoops tiles the first loop, in particular for the array as_transformed_row_6768, and puts the second loop in the postlude - the part of the SegMap that occurs after the tiled loop. However, at this point as_transformed_row_6768 does not exist.

There are several possible options:

  1. Introduce support for tiling multiple loops at the same time, in the same kernel, as long as they have the same size.
  2. Do not tile a loop if its input array is used in the postlude.
  3. Make input arrays available in the postlude under their original names. Since the postlude can access (reconstructed forms of) the original kernel space IDs, this should be perfectly possible.

In the future we might also want to rethink how we do kernel extraction on code like this, because it really would have been better to carve this SegMap in two, but that is for a different time. Loop tiling should not produce invalid code, no matter how dubious the input.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions