-
Notifications
You must be signed in to change notification settings - Fork 20
gaussian grbm initialization #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3fb3cdb
05cc617
bd9fdab
0c1ddec
d4bdfbf
60ee81a
eee250a
bc551b8
8ec902e
54b2862
8cde437
4e48419
d9a399c
0e4761d
c6e98c8
85b3e98
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,6 +49,13 @@ | |
| class GraphRestrictedBoltzmannMachine(torch.nn.Module): | ||
| """Creates a graph-restricted Boltzmann machine. | ||
|
|
||
| The initialization-strategy is grounded in | ||
| `Hinton's practical guide for RBM training<https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf>`_, which recommends sampling | ||
| weights from a Gaussian distribution with mean 0 and standard deviation 0.01 (for zero-one-valued RBMs). | ||
| The scaling factor of :math:`1/\sqrt(N)` ensures that the energy functional remains extensive | ||
| and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model<https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_. | ||
| The biases are initialized to zero to ensure extensiveness of the energy functional and to avoid introducing any initial preference for spin configurations. | ||
|
|
||
| Args: | ||
| nodes (Iterable[Hashable]): List of nodes. | ||
| edges (Iterable[tuple[Hashable, Hashable]]): List of edges. | ||
|
|
@@ -82,8 +89,8 @@ def __init__( | |
| self._idx_to_edge = {i: e for i, e in enumerate(self._edges)} | ||
| self._edge_to_idx = {e: i for i, e in self._idx_to_edge.items()} | ||
|
|
||
| self._linear = torch.nn.Parameter(0.05 * (2 * torch.rand(self._n_nodes) - 1)) | ||
| self._quadratic = torch.nn.Parameter(5.0 * (2 * torch.rand(self._n_edges) - 1)) | ||
| self._linear = torch.nn.Parameter(torch.zeros(self._n_nodes)) | ||
| self._quadratic = torch.nn.Parameter(torch.randn(self._n_edges)/self._n_nodes**0.5) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For extensive energy we need to scale by connectivity, not number of nodes. number of nodes is specific to dense models. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The previous defaults are not great, but they included a factor 5 to reflect an approximation to the device sampling temperature (Adv2/Adv single qubit freezeout temperature). In the new definition this is absent, and might be worth noting as a limitaiton of the default. |
||
|
|
||
| edge_idx_i = torch.tensor([self._node_to_idx[i] for i, _ in self._edges]) | ||
| edge_idx_j = torch.tensor([self._node_to_idx[j] for _, j in self._edges]) | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,8 @@ | ||||||
| --- | ||||||
| features: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More an upgrade rather than a feature, no?
Suggested change
|
||||||
| - | | ||||||
| Initialize ``GraphRestrictedBoltzmannMachine`` weights using Gaussian | ||||||
| random variables with standard deviation equal to :math:`1/\sqrt(N)`, where N | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| denotes the number of nodes in the GRBM. The weight-initialization strategy is grounded in `Hinton's practical guide for RBM training <https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf>`_, which recommends sampling weights from a Gaussian distribution with mean 0 and standard deviation 0.01 (for zero-one-valued RBMs). The scaling factor of :math:`1/\sqrt(N)` ensures that the energy functional remains extensive and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model<https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better add some line breaks here, splitting the full paragraph on several lines. |
||||||
|
|
||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -78,12 +78,11 @@ def forward(self, x: torch.Tensor) -> torch.Tensor: | |
| # are the models themselves | ||
| latent_dims_list = [1, 2] | ||
| self.encoders = {i: Encoder(i) for i in latent_dims_list} | ||
| # self.decoders is independent of number of latent dims, but we also create a dict to separate | ||
| # them | ||
| # self.decoders is independent of number of latent dims, but we also create a dict to | ||
| # separate them | ||
| self.decoders = {i: Decoder(latent_features, input_features) for i in latent_dims_list} | ||
|
|
||
| # self.dvaes is a dict whose keys are the numbers of latent dims and the values are the models | ||
| # themselves | ||
| # self.dvaes is a dict whose keys are the numbers of latent dims and the values are the | ||
| # models themselves | ||
|
|
||
| self.dvaes = {i: DVAE(self.encoders[i], self.decoders[i]) for i in latent_dims_list} | ||
|
|
||
|
|
@@ -248,19 +247,22 @@ def test_latent_to_discrete(self, n_samples, expected): | |
| @parameterized.expand([(i, j) for i in range(1, 3) for j in [0, 1, 5, 1000]]) | ||
| def test_forward(self, n_latent_dims, n_samples): | ||
| """Test the forward method.""" | ||
| torch.manual_seed(1234) # Set seed for reproducibility of latent_to_discrete sampling | ||
| expected_latents = self.encoders[n_latent_dims](self.data) | ||
| expected_discretes = self.dvaes[n_latent_dims].latent_to_discrete( | ||
| expected_latents, n_samples | ||
| ) | ||
| expected_reconstructed_x = self.decoders[n_latent_dims](expected_discretes) | ||
|
|
||
| torch.manual_seed(1234) # Set seed again to ensure that the sampling in the forward method | ||
| # is the same as in the expected_discretes | ||
| latents, discretes, reconstructed_x = self.dvaes[n_latent_dims].forward( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry if I asked this in the first review for DVAE and forgot, but why does this test call the
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't remember. We can change it. |
||
| x=self.data, n_samples=n_samples | ||
| ) | ||
| torch.testing.assert_close(latents, expected_latents) | ||
| torch.testing.assert_close(discretes, expected_discretes) | ||
| torch.testing.assert_close(reconstructed_x, expected_reconstructed_x) | ||
|
|
||
| assert torch.equal(reconstructed_x, expected_reconstructed_x) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @VolodyaCO was this the fix to failing tests? Are these tests sensitive to the seed..?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test is not sensitive to the seed. It's just that two calculations that were random-based and converged to the same result no longer converged to the same result with the new initialisation. This was a silent bug, as the two random-based calculations should have been using the same initial seed. If you change the seed to any other seed, it should work. |
||
| assert torch.equal(discretes, expected_discretes) | ||
| assert torch.equal(latents, expected_latents) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.