Skip to content
Joint training is working with no more NaN (lr has to be really low).

Losses are still separate, validation losses do not have a jont covergence (places loss increase)