Skip to content

Sortling/layout issues when Y coordinates don't exactly match #526

@lluchez

Description

@lluchez

Hi,

We've been using an old version of this gem (1.4.1) for a little while now and we are looking to upgrade to the latest version. That upgrade broke some of our specs and when looking deeper, it seems like the logic around PageLayout changed.

It might also be bad luck, but the use of the round here (for X and Y coords) will create issues when the PDF generated the texts with slightly different y coordinates.

Below is an example:
image
In this case, the texts in those boxes/rectangles are slightly lower than the labels from that form, causing some of those texts to be generated on another line:

Claim Number:           PHNP1610102                                     Contact:
Insured:                                                                Phone:
                         Fairfield Boys Club
Address 1:                                                              Email:
                         c/o Bejo Nanni, Treasurer

Another example:
image

We could monkey patch or fork the repo to make those changes, but please see below the code that we're going to be using. I can create a PR if this repo is still well maintained. Please let me know.

PageLayout

class PDF::Reader
  class PageLayout

    def to_s
      return "" if @runs.empty?
      return "" if row_count == 0
      first_run_at_new_y = nil # remembering a previous run at a new Y coordinate

      page = row_count.times.map { |i| " " * col_count }
      @runs.each do |run|
        x_pos = ((run.x - @x_offset) / col_multiplier).round
        y_ref_run = run # line added
        if first_run_at_new_y && run.similar_y_coord?(first_run_at_new_y) # line added
          y_ref_run = first_run_at_new_y # line added
        else # line added
          first_run_at_new_y = run # line added
        end # line added
        y_pos = row_count - ((y_ref_run.y - @y_offset) / row_multiplier).round # line updated
        if y_pos <= row_count && y_pos >= 0 && x_pos <= col_count && x_pos >= 0
          local_string_insert(page[y_pos-1], run.text, x_pos)
        end
      end
      interesting_rows(page).map(&:rstrip).join("\n")
    end

  end
end

TextRun

class PDF::Reader
  class TextRun

    # def <=>(other)
    #   if similar_y_coord?(other)
    #     x <=> other.x
    #   else
    #     other.y <=> y
    #   end
    # end

    def similar_y_coord?(other, threshold = nil)
      # arbitrary logic below. It could probably safely bumped to a higher number (dividing by 2 for instance)
      threshold = threshold || [self.font_size, other.font_size].min / 3
      (self.y - other.y).abs < threshold
    end

  end
end

Thank you.

EDIT: I updated the code above to properly support for catching multiple texts which could have been drawn on the following line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions