Skip to content

Commit 684850d

Browse files
hanzhi713SujeethJinesh
authored andcommitted
Add fuji 150b test
1 parent b5bf7fc commit 684850d

38 files changed

+3889
-1
lines changed

axlearn/experiments/testdata/axlearn.experiments.text.gpt.c4_trainer/fuji-150B-v2-flash-fp8.txt

Lines changed: 313 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
decoder/emb/token_emb/weight: normal(0, 1.0 / fan_out), shape=[32768, 12288], axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
2+
decoder/transformer/repeat/layer/self_attention/norm/scale: constant(1.0)
3+
decoder/transformer/repeat/layer/self_attention/attention/i_proj/i_proj/qkv_proj/weight: normal(0, 1.0 / fan_in), shape=(12288, 112, 128), axes=FanAxes(in_axis=0, out_axis=(1, 2), batch_axis=())
4+
decoder/transformer/repeat/layer/self_attention/attention/o_proj/weight: normal(0, 1.0 / fan_in), shape=(12288, 96, 128), axes=FanAxes(in_axis=(1, 2), out_axis=0, batch_axis=())
5+
decoder/transformer/repeat/layer/feed_forward/norm/scale: constant(1.0)
6+
decoder/transformer/repeat/layer/feed_forward/linear1_0/weight: normal(0, 1.0 / fan_in), shape=(12288, 43008), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
7+
decoder/transformer/repeat/layer/feed_forward/linear1_1/weight: normal(0, 1.0 / fan_in), shape=(12288, 43008), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
8+
decoder/transformer/repeat/layer/feed_forward/linear2/weight: normal(0, 1.0 / fan_in), shape=(43008, 12288), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
9+
decoder/output_norm/scale: constant(1.0)
10+
decoder/lm_head/weight: normal(0, 1.0 / fan_in), shape=(32768, 12288), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
====================weight_decay_scale root.optimizer====================
2+
decoder/emb/token_emb/weight: 1
3+
decoder/lm_head/weight: 1
4+
decoder/output_norm/scale: 1
5+
decoder/transformer/repeat/layer/feed_forward/linear1_0/weight: 1
6+
decoder/transformer/repeat/layer/feed_forward/linear1_1/weight: 1
7+
decoder/transformer/repeat/layer/feed_forward/linear2/weight: 1
8+
decoder/transformer/repeat/layer/feed_forward/norm/scale: 1
9+
decoder/transformer/repeat/layer/self_attention/attention/i_proj/i_proj/qkv_proj/weight: 1
10+
decoder/transformer/repeat/layer/self_attention/attention/o_proj/weight: 1
11+
decoder/transformer/repeat/layer/self_attention/norm/scale: 1

axlearn/experiments/testdata/axlearn.experiments.text.gpt.c4_trainer/fuji-150B-v2-flash.txt

Lines changed: 313 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
decoder/emb/token_emb/weight: normal(0, 1.0 / fan_out), shape=[32768, 12288], axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
2+
decoder/transformer/repeat/layer/self_attention/norm/scale: constant(1.0)
3+
decoder/transformer/repeat/layer/self_attention/attention/i_proj/i_proj/qkv_proj/weight: normal(0, 1.0 / fan_in), shape=(12288, 112, 128), axes=FanAxes(in_axis=0, out_axis=(1, 2), batch_axis=())
4+
decoder/transformer/repeat/layer/self_attention/attention/o_proj/weight: normal(0, 1.0 / fan_in), shape=(12288, 96, 128), axes=FanAxes(in_axis=(1, 2), out_axis=0, batch_axis=())
5+
decoder/transformer/repeat/layer/feed_forward/norm/scale: constant(1.0)
6+
decoder/transformer/repeat/layer/feed_forward/linear1_0/weight: normal(0, 1.0 / fan_in), shape=(12288, 43008), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
7+
decoder/transformer/repeat/layer/feed_forward/linear1_1/weight: normal(0, 1.0 / fan_in), shape=(12288, 43008), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
8+
decoder/transformer/repeat/layer/feed_forward/linear2/weight: normal(0, 1.0 / fan_in), shape=(43008, 12288), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
9+
decoder/output_norm/scale: constant(1.0)
10+
decoder/lm_head/weight: normal(0, 1.0 / fan_in), shape=(32768, 12288), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
====================weight_decay_scale root.optimizer====================
2+
decoder/emb/token_emb/weight: 1
3+
decoder/lm_head/weight: 1
4+
decoder/output_norm/scale: 1
5+
decoder/transformer/repeat/layer/feed_forward/linear1_0/weight: 1
6+
decoder/transformer/repeat/layer/feed_forward/linear1_1/weight: 1
7+
decoder/transformer/repeat/layer/feed_forward/linear2/weight: 1
8+
decoder/transformer/repeat/layer/feed_forward/norm/scale: 1
9+
decoder/transformer/repeat/layer/self_attention/attention/i_proj/i_proj/qkv_proj/weight: 1
10+
decoder/transformer/repeat/layer/self_attention/attention/o_proj/weight: 1
11+
decoder/transformer/repeat/layer/self_attention/norm/scale: 1

axlearn/experiments/testdata/axlearn.experiments.text.gpt.c4_trainer/fuji-150B-v2-fp8.txt

Lines changed: 280 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
decoder/emb/token_emb/weight: normal(0, 1.0 / fan_out), shape=[32768, 12288], axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
2+
decoder/transformer/repeat/layer/self_attention/norm/scale: constant(1.0)
3+
decoder/transformer/repeat/layer/self_attention/attention/i_proj/i_proj/qkv_proj/weight: normal(0, 1.0 / fan_in), shape=(12288, 112, 128), axes=FanAxes(in_axis=0, out_axis=(1, 2), batch_axis=())
4+
decoder/transformer/repeat/layer/self_attention/attention/o_proj/weight: normal(0, 1.0 / fan_in), shape=(12288, 96, 128), axes=FanAxes(in_axis=(1, 2), out_axis=0, batch_axis=())
5+
decoder/transformer/repeat/layer/feed_forward/norm/scale: constant(1.0)
6+
decoder/transformer/repeat/layer/feed_forward/linear1_0/weight: normal(0, 1.0 / fan_in), shape=(12288, 43008), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
7+
decoder/transformer/repeat/layer/feed_forward/linear1_1/weight: normal(0, 1.0 / fan_in), shape=(12288, 43008), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
8+
decoder/transformer/repeat/layer/feed_forward/linear2/weight: normal(0, 1.0 / fan_in), shape=(43008, 12288), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
9+
decoder/output_norm/scale: constant(1.0)
10+
decoder/lm_head/weight: normal(0, 1.0 / fan_in), shape=(32768, 12288), axes=FanAxes(in_axis=-2, out_axis=-1, batch_axis=())
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
====================weight_decay_scale root.optimizer====================
2+
decoder/emb/token_emb/weight: 1
3+
decoder/lm_head/weight: 1
4+
decoder/output_norm/scale: 1
5+
decoder/transformer/repeat/layer/feed_forward/linear1_0/weight: 1
6+
decoder/transformer/repeat/layer/feed_forward/linear1_1/weight: 1
7+
decoder/transformer/repeat/layer/feed_forward/linear2/weight: 1
8+
decoder/transformer/repeat/layer/feed_forward/norm/scale: 1
9+
decoder/transformer/repeat/layer/self_attention/attention/i_proj/i_proj/qkv_proj/weight: 1
10+
decoder/transformer/repeat/layer/self_attention/attention/o_proj/weight: 1
11+
decoder/transformer/repeat/layer/self_attention/norm/scale: 1

0 commit comments

Comments
 (0)