|
346 | 346 | "source": [ |
347 | 347 | "## Write the encoder and decoder model\n", |
348 | 348 | "\n", |
349 | | - "Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.\n", |
| 349 | + "Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.(**[Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq)의 Encoder-decoder Model with Attention에 대해 알아봅시다. 이 예시는 최신의 API를 사용합니다. 이 notebook 파일은 seq2seq 튜토리얼의 [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism)을 실행합니다. 하단의 이미지는 Attention 기법에 따라 각각의 input words에 weight를 부여한 후, decoder를 이용하여 그 다음의 word를 예측하는 프로세스를 설명합니다.**)\n", |
350 | 350 | "\n", |
351 | 351 | "<img src=\"https://www.tensorflow.org/images/seq2seq/attention_mechanism.jpg\" width=\"500\" alt=\"attention mechanism\">\n", |
352 | 352 | "\n", |
353 | 353 | "The input is put through an encoder model which gives us the encoder output of shape *(batch_size, max_length, hidden_size)* and the encoder hidden state of shape *(batch_size, hidden_size)*. \n", |
| 354 | + "(**Input을 encoder model에 넣어 encoder output과 encoder hidden state를 반환합니다. 여기서 encoder output는 shape(batch_size, max_lenght, hidden_size)이며, encoder hidden state는 shape(batch_size, hidden_size)입니다.**)\n", |
354 | 355 | "\n", |
355 | | - "Here are the equations that are implemented:\n", |
| 356 | + "Here are the equations that are implemented(**실행 순서를 나타내는 수식은 다음과 같습니다.**):\n", |
356 | 357 | "\n", |
357 | 358 | "<img src=\"https://www.tensorflow.org/images/seq2seq/attention_equation_0.jpg\" alt=\"attention equation 0\" width=\"800\">\n", |
358 | 359 | "<img src=\"https://www.tensorflow.org/images/seq2seq/attention_equation_1.jpg\" alt=\"attention equation 1\" width=\"800\">\n", |
359 | 360 | "\n", |
360 | | - "We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form:\n", |
| 361 | + "We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form(**우리는 Bahdanau Attention를 이용할 것입니다. 간단하게 나타내기 전에 몇 가지 표기를 정합시다.**):\n", |
361 | 362 | "\n", |
362 | 363 | "* FC = Fully connected (dense) layer\n", |
363 | 364 | "* EO = Encoder output\n", |
364 | 365 | "* H = hidden state\n", |
365 | 366 | "* X = input to the decoder\n", |
366 | 367 | "\n", |
367 | | - "And the pseudo-code:\n", |
| 368 | + "And the pseudo-code(**수도코드(pseudo-code)는 다음과 같습니다.**):\n", |
368 | 369 | "\n", |
369 | 370 | "* `score = FC(tanh(FC(EO) + FC(H)))`\n", |
370 | 371 | "* `attention weights = softmax(score, axis = 1)`. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, 1)*. `Max_length` is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n", |
371 | 372 | "* `context vector = sum(attention weights * EO, axis = 1)`. Same reason as above for choosing axis as 1.\n", |
372 | 373 | "* `embedding output` = The input to the decoder X is passed through an embedding layer.\n", |
373 | 374 | "* `merged vector = concat(embedding output, context vector)`\n", |
374 | | - "* This merged vector is then given to the GRU\n", |
375 | | - " \n", |
376 | | - "The shapes of all the vectors at each step have been specified in the comments in the code:" |
| 375 | + "* This merged vector is then given to the GRU(**합쳐진 vector는 GRU에 사용됩니다.**)\n", |
| 376 | + "\n", |
| 377 | + "The shapes of all the vectors at each step have been specified in the comments in the code(**각 스텝마다 모든 vector의 shape는 코드 주석으로 표기했습니다.**):" |
377 | 378 | ] |
378 | 379 | }, |
379 | 380 | { |
|
587 | 588 | "source": [ |
588 | 589 | "## Training\n", |
589 | 590 | "\n", |
590 | | - "1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.\n", |
591 | | - "2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.\n", |
592 | | - "3. The decoder returns the *predictions* and the *decoder hidden state*.\n", |
593 | | - "4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.\n", |
594 | | - "5. Use *teacher forcing* to decide the next input to the decoder.\n", |
595 | | - "6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.\n", |
596 | | - "7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate." |
| 591 | + "1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.(**Input을 Encoder에 넣어 Encoder Output과 Encdoer Hidden State를 반환합니다**)\n", |
| 592 | + "2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.(**Encoder Output과 Encoder Hidden State, Decoder Input(=Start token)을 Decoder에 넣습니다.**)\n", |
| 593 | + "3. The decoder returns the *predictions* and the *decoder hidden state*.(**Decoder는 Predictions(예측 결과)와 Decoder Hidden State를 반환합니다**)\n", |
| 594 | + "4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.(**Decoder hidden state를 Model에 다시 넣고, predictions을 이용하여 loss 값을 계산합니다.**)\n", |
| 595 | + "5. Use *teacher forcing* to decide the next input to the decoder.(**Teacher Forcing을 사용하여 decoder에 넣을 그 다음 input을 결정합니다.**)\n", |
| 596 | + "6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.(**Teacher Forcing 기술로 target word를 decoder의 다음 input으로 이용합니다**)\n", |
| 597 | + "7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate.(**마지막으로, gradients를 계산하고 optimizer와 역전파(backpropagate)에 적용합니다.**)" |
597 | 598 | ] |
598 | 599 | }, |
599 | 600 | { |
|
974 | 975 | "id": "mU3Ce8M6I3rz" |
975 | 976 | }, |
976 | 977 | "source": [ |
977 | | - "* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.\n", |
978 | | - "* Stop predicting when the model predicts the *end token*.\n", |
979 | | - "* And store the *attention weights for every time step*.\n", |
| 978 | + "* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.(**Evaluate 함수는 teacher forcing이 없는 training 루프문과 비슷합니다. 매 step의 Decoder의 Input은 hidden state와 encoder output, 이전 step의 예측값입니다.**)\n", |
| 979 | + "* Stop predicting when the model predicts the *end token*.(**모델의 마지막 token을 예측하면 predicting을 멈춥니다.**)\n", |
| 980 | + "* And store the *attention weights for every time step*.(**매 time step마다 attention weights를 저장합니다.**)\n", |
980 | 981 | "\n", |
981 | | - "Note: The encoder output is calculated only once for one input." |
| 982 | + "Note: The encoder output is calculated only once for one input.(**참고 : Encoder Output은 한 개의 Input에서 단 한번만 계산됩니다.**)" |
982 | 983 | ] |
983 | 984 | }, |
984 | 985 | { |
|
1306 | 1307 | "name": "python", |
1307 | 1308 | "nbconvert_exporter": "python", |
1308 | 1309 | "pygments_lexer": "ipython3", |
1309 | | - "version": "3.7.6" |
| 1310 | + "version": "3.6.2" |
1310 | 1311 | } |
1311 | 1312 | }, |
1312 | 1313 | "nbformat": 4, |
|
0 commit comments