Skip to content
This repository was archived by the owner on May 28, 2024. It is now read-only.
This repository was archived by the owner on May 28, 2024. It is now read-only.

What if the observation is extracted features instead of images and has much smaller dimension than latent? #59

@seheevic

Description

@seheevic

Hi!
I'm not sure you still do Q&A support here 😊, but I'm obsessed a certain problem beyond my math skills. I hope you could help me.

The question is related to the loss function of your RSSM which uses variational approach. The reconstruction loss of VAE is p(o_t|s_t) as it is decoder from latent to image. In this case, an observation(=image) has much bigger dimension than the latent. But when it comes to the case in which o_t has much smaller dimension (for example, 4 values like cartpole of OpenAI gym classic_control) than the latent(let's say this is 32~64 here), I think p(o_t|s_t) could not learn any meaningful distribution. Because the conditional s_t was sampled from variational posterior q(s_t|a_1:t, o_1:t) which already has seen the observation of current timestep o_t, I suspect that s_t could just learn to copy the full o_t inside s_t because the dimension of s_t is much bigger.

In this situation (non-image and small dimension of observation), can we still hold this VAE-like approach?
Or is there some other technique more reasonable in this case?
I hope this worry makes sense to you. 😕

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions