Skip to content

Commit e47c4ee

Browse files
committed
fix hw4 markdown file
1 parent 3628052 commit e47c4ee

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

docs/homeworks/hw4.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -426,7 +426,7 @@ Now, just like how we use many kernels in a CNN, we’ll apply this process on t
426426
And this is where we get our full scaled dot-product attention equation from the paper:
427427

428428
$$
429-
\text{attention} = \sigma\left(\frac{Q K^\top}{\sqrt{\text{qk_length}}}\right) \cdot V
429+
\text{attention} = \sigma\left(\frac{QK^{\top}}{\sqrt{\text{qk}\_\text{length}}}\right) \cdot V
430430
$$
431431

432432
After this diagram, we’ve covered scaled dot-product attention and multi-head attention blocks as described in the
@@ -438,7 +438,7 @@ Subtasks:
438438
`scaled_dot_product_attention`, `forward`.
439439
Follow the paper closely and use the diagrams for guidance. An implementation of positional encoding is provided for you.
440440
2. Implement the `FeedForwardNN` in `seq2seq/transformer/attention.py`. All this entails is adding two `Linear` layers
441-
that transform your embeddings of size $$(B, T, C)$$ to some intermediate shape $$(B, T, \text{hidden_dim})$$ with
441+
that transform your embeddings of size $$(B, T, C)$$ to some intermediate shape $$(B, T, \text{hidden}\_\text{dim})$$ with
442442
a `ReLU` operation, then transforming them back to $$(B, T, C)$$.
443443
3. Implement the `Encoder` in `seq2seq/transformer/encoder.py`. You'll need the modules from `attention.py`. In particular,
444444
implement `EncoderLayer` and then `Encoder`.

0 commit comments

Comments
 (0)