Memorization in Recurrent Neural Networks (RNNs)
Memorization in recurrent neural networks (RNNs) continues to pose a challenge in many applications. RNNs are designed to store information over many timesteps and retrieve it when it becomes relevant, but vanilla RNNs often struggle to do this.
Several network architectures have been proposed to tackle this problem, including Long-Short-Term Memory (LSTM) units and Gated Recurrent Units (GRU). However, the practical problem of memorization remains a challenge.
Comparing Recurrent Units
Comparing different recurrent units is often more involved than simply comparing accuracy or cross-entropy loss. High-level quantitative measures can have many explanations and may only reflect small improvements in predictions that require short-term memorization, while long-term memorization is often of interest.
A problem for qualitative analysis is needed that has a human-interpretable output and depends on both long-term and short-term contextual understanding. The typical problems used, such as Penn Treebank, Chinese Poetry Generation, or text8 generation, do not have outputs that are easy to reason about, as they require an extensive understanding of grammar, Chinese poetry, or only output a single letter.
Autocomplete Problem
To this end, this article studies the autocomplete problem. Each character is mapped to a target that represents the entire word. The space leading up to the word should also map to that target. This prediction based on the space character is particularly useful for showing contextual understanding.
The autocomplete problem is similar to the text8 generation problem: the only difference is that instead of predicting the next letter, the model predicts an entire word. This makes the output much more interpretable. Finally, because of its close relation to text8 generation, existing literature on text8 generation is relevant and comparable, as models that work well on text8 generation should work well on the autocomplete problem.
Connectivity in the Autocomplete Problem
The recently published Nested LSTM paper qualitatively compared their Nested LSTM unit to other recurrent units, showing how it memorizes in comparison, by visualizing individual cell activations. This visualization was inspired by Karpathy et al. [1], where they identify cells that capture a specific feature. However, this visualization approach works well for identifying specific features but not for capturing the long-term contextual understanding that is essential for the autocomplete problem.
Conclusion
In this article, a qualitative visualization method for comparing recurrent units with regards to memorization and contextual understanding is presented. The method is applied to the three recurrent units mentioned above: Nested LSTMs, LSTMs, and GRUs.
FAQs
Q: What is the purpose of this article?
A: The purpose of this article is to demonstrate a visualization technique that can better highlight the differences between recurrent units with regards to memorization and contextual understanding.
Q: What is the autocomplete problem?
A: The autocomplete problem is a problem where the model predicts an entire word based on the characters leading up to it, which is especially useful for showing contextual understanding.
Q: What is the difference between LSTMs and GRUs?
A: LSTMs (Long Short-Term Memory) and GRUs (Gated Recurrent Units) are both types of recurrent neural networks, but they differ in how they handle the vanishing gradient problem. LSTMs use an internal memory cell to store information, while GRUs use a gating mechanism to control the flow of information.
Q: What is the Nested LSTM unit?
A: The Nested LSTM unit is a type of LSTM unit that uses another LSTM unit to update its internal memory state, allowing for more long-term memorization.

