Image source: generated using OpenAI's DALL-E
Image source: generated using OpenAI’s DALL-E

AI Atlas #25:

Long Short-Term Memory Networks

Rudina Seseri

🗺️ What are Long Short-Term Memory (LSTM) networks?

Long Short-Term Memory (LSTM) networks are specialized types of recurrent neural networks (RNN) designed to overcome certain limitations commonly found in traditional RNNs. The introduction of LSTMs has proven invaluable for machine learning and the architecture now forms the backbone of many transformative technologies, from video recognition and industrial monitoring to digital assistants such as Apple’s Siri and Amazon Alexa.

As I discussed in an earlier AI Atlas, RNN architectures excel in handling sequences and are particularly valuable in tasks involving time series data, such as speech recognition, handwriting analysis, and machine translation. They do this by preserving information as a form of “memory” within their internal structure. This is as if the model is leaving breadcrumbs along a path to assist in navigation, indicating to the model where it has gone before and the directions it has taken to get there. However, RNNs’ memories are short and these breadcrumbs are swept away, making it difficult to stay on course the longer the path goes on. This phenomenon is known as the “vanishing gradient problem” and it represents a major obstacle to the adoption of machine learning.

This is where LSTMs step in to mitigate the issue. By incorporating memory cells and strategically positioned gates that sift out irrelevant inputs, LSTMs mimic more closely the recall ability of human brains. Returning to the breadcrumb example, this improved memory allows the model to identify important path markers and prevent them from fading. It is thus able to follow much longer trails. The name “Long Short-Term Memory” refers to this transformative enhancement of prioritizing key contextual information and retaining it for an extended period of time.

🤔 Why LSTM networks matter and their shortcomings

LSTMs have shown success in diverse applications and have outperformed conventional RNNs in situations where complexity is high, such as when processing paragraphs or summarizing business data. They excel in dividing problems into smaller components and conquering those components individually. Furthermore, LSTMs overcome two major hurdles faced by traditional RNNs: the problem of vanishing gradients, where models lose the breadcrumbs used to mark trails; and the problem of exploding gradients, in which models spread far too many breadcrumbs and become unable to follow and learn from new routes.

However, while LSTMs enhance the benefits of RNNs and address major obstacles in remembering long-term information, they still suffer from many of the same shortcomings as their less specialized counterparts. Such limitations include:

Computational Intensiveness: The more complicated architecture of LSTMs results in longer processing times and necessitates greater computational resources. Their attention to long-term context is not always necessary or worth the resource tradeoff; for example, when working with shorter inputs such as tweets.

Challenging Training: Because LSTMs process data step-by-step, it is extremely difficult to make use of parallel processing during training. This is a major challenge when working with large datasets, such as when summarizing vast amounts of text.

Processing of Non-Sequential Data: LSTMs are not the best choice for all types of data. For example, their strengths will not be leveraged effectively when working with highly nonlinear data such as still images or customer classifications.

🛠️ Applications of Long Short-Term Memory networks

Just like traditional RNNs, LSTMs excel at processing sequential data, such as stock market behavior or language models. The longer memory of LSTMs is also particularly useful in areas such as:

Speech Recognition: LSTMs are vital for accurately transcribing spoken language, which often takes the form of long sequences. For this reason LSTMs have been extensively used in major speech recognition systems, such as Apple’s Siri, Microsoft’s Cortana, and Amazon’s Alexa.

Industrial Internet-of-Things (IoT): The uncanny ability of LSTMs to filter out irrelevant data and process critical information over long periods of time is invaluable when applied to the massive amount of data generated by industrial machines, and can be used in use cases from maintenance prediction to anomaly detection and energy cost analysis.

Video Analysis and Computer Vision: Because videos are just time series of images, in which each frame depends on the previous, LSTMs find application here. They are often combined with Convolutional Neural Networks (CNNs), which recognize the shapes onscreen while LSTMs handle the temporal aspect, to recognize activities and track behavior.

In essence, Long Short-Term Memory networks represent a sophisticated advancement in neural network architecture, addressing challenges related to preserving context over time, thereby finding applications in a wide array of fields involving time series analysis and sequence prediction.