{"id":6218,"date":"2022-12-01T11:44:54","date_gmt":"2022-12-01T08:44:54","guid":{"rendered":"https:\/\/www.newworldai.com\/?p=6218"},"modified":"2023-01-13T00:45:39","modified_gmt":"2023-01-12T21:45:39","slug":"illustrated-guide-lstms-grus-step-step-explanation","status":"publish","type":"post","link":"https:\/\/www.newworldai.com\/illustrated-guide-lstms-grus-step-step-explanation\/","title":{"rendered":"Illustrated Guide to LSTM\u2019s and GRU\u2019s: A step by step explanation"},"content":{"rendered":"
In this post, we\u2019ll start with the intuition behind LSTM \u2019s and GRU\u2019s. Then I\u2019ll explain the internal mechanisms that allow LSTM\u2019s and GRU\u2019s to perform so well. If you want to understand what\u2019s happening under the hood for these two networks, then this post is for you.<\/span><\/p>\n You can also watch the video version of this post on youtube if you prefer.<\/span><\/p>\n The Problem, Short-term Memory<\/span><\/strong><\/span><\/p>\n Recurrent Neural Networks suffer from short-term memory. If a sequence is long enough, they\u2019ll have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text to do predictions, RNN\u2019s may leave out important information from the beginning.<\/span><\/p>\n During back propagation, recurrent neural networks suffer from the vanishing gradient problem. Gradients are values used to update a neural networks weights. The vanishing gradient problem is when the gradient shrinks as it back propagates through time. If a gradient value becomes extremely small, it doesn\u2019t contribute too much learning.<\/span><\/p>\n