Skip to content

hunarbatra/vlm-hierarchical-reasoning

Repository files navigation

Hierarchical Reasoning for Vision Language Models (VLMs/LVLMs)

note-- api/ has reverse-engineered gpt-4V, DALL-E3 API's + SAM, SDXL, FUYU

Results/Example Slides--

alt text for image 1

alt text for image 2

alt text for image 3

// todo-- need to add steps to run here

Flow-- [Image -> Extract Segments (SAM) -> Extract Attributes + Caption] + Question ---> Answer

[Attributes considered (only relevant attributes extracted based on segment type out of these)-- (spatial relationships, pose estimation, depth estimation, motion, action, count, size, shape., color, coordinates, etc)]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors