Phi-1.5 and “AI Textbooks”:

A Groundbreaking New Way to Train LLMs

Rudina Seseri

🗺️ What is Phi-1.5?

Phi-1.5 is a resource-efficient Large Language Model (LLM) announced by researchers at Microsoft last month. It was trained using a novel approach that relies on curated, high-quality synthetic data generated from existing large language models like OpenAI’s ChatGPT. In other words: they used an LLM to write a “textbook,” which was then used to train another LLM.

The origin of the “textbook” strategy of model training used for Phi-1.5 is simple. At a certain point, it no longer makes sense nor is it practical to feed greater and greater amounts of data into AI models in hopes of improved performance. For years, transfer learning which enables models to re-use learnings across tasks, has been widely used as a solution to the data-hungry nature of AI models, but Phi-1.5’s innovation addresses the beginning of the AI lifecycle, where models are initially trained. Phi is taught just as we teach humans: by synthesizing relevant information for our intended topic into a more digestible and controllable body of information. As a result, Phi-1.5 performs significantly better than much larger models on benchmark tasks such as common-sense reasoning and reading comprehension, despite its small size.

🤔 Why does Phi-1.5 matter and what are its limitations?

Phi-1.5 is not only impressive in its own right, but it also validates a theory proposed by earlier researchers that filtering for highly informative data significantly reduces the cost of training an equally effective machine learning model. Phi-1.5 claims to be 1,000 times more efficient to train and 10 times more efficient to operate relative to today’s massive, state-of-the-art open-source models.

If these claims hold true, then the open-source language model community has an incredible opportunity to assemble high-quality filtered datasets for affordable, medium-sized models, greatly improving their efficacy and accessibility. You could, for example, train a model exclusively on the sciences, medicine, or engineering, creating a virtual expert for a specific domain without needing to incorporate generalized knowledge from everywhere else. The result is that businesses and individuals will now be able to apply the incredible power of machine learning to their specific industries at reasonable cost with minimal additional effort.

Improved resource efficiency: It took 8 days to train Phi-1.5 at a compute cost of around $1,000. Despite this, it matches and even outperforms models 50 times its size.

Specialization: The success of Phi-1.5 and the “textbook” method used to train it could be adapted to create expert AI agents across industries, including healthcare and sales.

Controllability: Lowering the barrier to train models, supported by open-source solutions like Phi-1.5, empowers organizations to build their own models within a proprietary stack.

While Phi-1.5’s “textbook” method of training brings the above advantages, it comes with several notable limitations:

Limited scope: The side effect of being trained with a small concentration of data is that Phi-1.5 does not have generalized knowledge across subjects on the level of common, broad-application LLMs like Chat-GPT. It requires fine-tuning to broaden its exposure to novel situations and instructions.

Hallucinations: Phi-1.5, as an early research model, does not have mechanisms to mitigate the habit of LLMs to confidently state inaccurate information.

Bias and toxicity: Like all LLMs, Phi-1.5 is not free from societal biases. Furthermore, the model can still produce harmful content if explicitly prompted or instructed to do so. This can be mitigated through the careful curation of training data.

🛠️ Applications of Phi-1.5 and the “AI Textbook” strategy

Phi-1.5 and the “textbook” method used to train it have a number of impactful use cases including:

Specialized models for individual domains: Small, inexpensive models can be built for custom use-cases within organizations, such as summarizing biomedical research or analyzing engineering data. The initial “textbook” content selection would enable models to specialize within subjects and industries and can be augmented via fine-tuning with a company’s or researcher’s internal data.

Machine learning on edge devices: Phi-1.5’s efficiency makes it a promising candidate for deploying machine learning applications on resource-constrained edge devices such as smartphones, IoT sensors, and embedded systems. Its compact size and effectiveness can lead to faster and more responsive edge AI applications, enhancing the AI capabilities of smart devices.

Enhancing virtual assistants: Phi-1.5 can be utilized to improve virtual assistants by tailoring them to specific industries or domains. Businesses can create specialized virtual assistants with in-depth knowledge and expertise, providing more accurate and valuable support in fields like finance, legal, or customer support.

In conclusion, Phi-1.5 and its innovative “textbook” training method mark a pivotal moment in the evolution of Large Language Models (LLMs), opening up exciting possibilities across various sectors and challenging the conventional wisdom “more data is always better.” As the capabilities of Phi-1.5 and similar technologies continue to be explored and refined, we can anticipate a future where businesses and individuals alike can leverage AI to unprecedented levels, solving complex problems and unlocking new opportunities while mitigating the challenges that arise.