gaussian grbm initialization#71
Conversation
|
@jquetzalcoatl IIRC, Hinton's recommendation pertains to zero-one-valued RBMs (bipartite with hidden units). Would it make sense to translate the |
|
@kevinchern The REM reference is for spin models i.e., {-1,1}. Ultimately, the initialization pertains to whether the model is ergodic. In this sense, the support only set an offset energy. I believe the main motivation for initializing with 0.01 in Hinton's guide is to start in a paramagnetic phase, which ties nicely with the REM/SK spin glass model |
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
|
added release note |
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
There was a problem hiding this comment.
Tests are failing but otherwise LGTM. Thanks for the much-needed PR @jquetzalcoatl !!
@VolodyaCO offered to take a look at the tests
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
|
Any updates on this? |
|
The reason for this test failing is very strange. Essentially, it is making sure that both the DVAE forward (which does encode -> latent to discrete -> decode) matches encode -> latent_to_discrete -> decode, i.e., this is a pretty simple unit test: expected_latents = self.encoders[n_latent_dims](self.data)
expected_discretes = self.dvaes[n_latent_dims].latent_to_discrete(
expected_latents, n_samples
)
expected_reconstructed_x = self.decoders[n_latent_dims](expected_discretes)
latents, discretes, reconstructed_x = self.dvaes[n_latent_dims].forward(
x=self.data, n_samples=n_samples
)
assert torch.equal(reconstructed_x, expected_reconstructed_x)
assert torch.equal(discretes, expected_discretes)
assert torch.equal(latents, expected_latents)Moreover, self.encoders = {i: Encoder(i) for i in latent_dims_list}
self.decoders = {i: Decoder(latent_features, input_features) for i in latent_dims_list}
self.dvaes = {i: DVAE(self.encoders[i], self.decoders[i]) for i in latent_dims_list}So even if the encoders/decoders are updated in other tests (because of training), there should be a permanent tracking of the encoders/decoders in the dvaes. |
|
Found the issue and fixed it in a PR to @jquetzalcoatl 's repo: jquetzalcoatl#1 Please approve javi, this would update the current PR and solve the issue. Took me a while to get the error! |
Fix failing forward method unit tests
VolodyaCO
left a comment
There was a problem hiding this comment.
I have definitely had to manually change the initialisation of GRBM weights whenever I use the GRBM. Thanks for this PR. I think it looks good to merge.
kevinchern
left a comment
There was a problem hiding this comment.
@jquetzalcoatl I added a couple typo fixes, can you accept them?
The remaining questions/comments are for @VolodyaCO and should be good to merge after.
| `Hinton's practical guide for RBM training<https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf>`_, which recommends sampling | ||
| weights from a Gaussian distribution with mean 0 and standard deviation 0.01 (for zero-one-valued RBMs). | ||
| The scaling factor of :math:`1/\sqrt(N)` ensures that the energy functional remains extensive | ||
| and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model<https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_. |
There was a problem hiding this comment.
| and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model<https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_. | |
| and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model <https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_. |
| features: | ||
| - | | ||
| Initialize ``GraphRestrictedBoltzmannMachine`` weights using Gaussian | ||
| random variables with standard deviation equal to :math:`1/\sqrt(N)`, where N |
There was a problem hiding this comment.
| random variables with standard deviation equal to :math:`1/\sqrt(N)`, where N | |
| random variables with standard deviation equal to :math:`1/\sqrt(N)`, where :math:`N` |
|
|
||
| torch.manual_seed(1234) # Set seed again to ensure that the sampling in the forward method | ||
| # is the same as in the expected_discretes | ||
| latents, discretes, reconstructed_x = self.dvaes[n_latent_dims].forward( |
There was a problem hiding this comment.
Sorry if I asked this in the first review for DVAE and forgot, but why does this test call the
forward method explicitly? Calling the model directly is the recommended practice as it has several hooks on top of the forward method. @VolodyaCO
(this question/comment is unrelated to this PR)
| torch.testing.assert_close(discretes, expected_discretes) | ||
| torch.testing.assert_close(reconstructed_x, expected_reconstructed_x) | ||
|
|
||
| assert torch.equal(reconstructed_x, expected_reconstructed_x) |
There was a problem hiding this comment.
@VolodyaCO was this the fix to failing tests? Are these tests sensitive to the seed..?
| @@ -0,0 +1,8 @@ | |||
| --- | |||
| features: | |||
There was a problem hiding this comment.
More an upgrade rather than a feature, no?
| features: | |
| upgrade: |
| - | | ||
| Initialize ``GraphRestrictedBoltzmannMachine`` weights using Gaussian | ||
| random variables with standard deviation equal to :math:`1/\sqrt(N)`, where N | ||
| denotes the number of nodes in the GRBM. The weight-initialization strategy is grounded in `Hinton's practical guide for RBM training <https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf>`_, which recommends sampling weights from a Gaussian distribution with mean 0 and standard deviation 0.01 (for zero-one-valued RBMs). The scaling factor of :math:`1/\sqrt(N)` ensures that the energy functional remains extensive and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model<https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_. |
There was a problem hiding this comment.
Better add some line breaks here, splitting the full paragraph on several lines.
grbm weights and biases initialization set to Gaussian N(0,1/number of nodes)
Hinton guide suggests 0.01 as standard deviation. See https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
Moreover, having it set to Gaussian with this dependence on the number of nodes makes the energy extensive and initializes the gRBM in a paramagnetic phase similar to that describen in the Random Energy model paper
https://journals.aps.org/prb/abstract/10.1103/PhysRevB.24.2613
See #48