LLMs - Large Language Models

What are LLMs (large language models)?

LLMs stands for Large Language Models. These are machine learning models that have been trained on massive amounts of text data, such as books, articles, and web pages, to understand and generate human language.(This definition was generated by a LLM - OpenAI's GPT-3 (Generative Pre-trained Transformer 3), which has 175 billion parameters and can generate highly coherent and contextually relevant language text.)

LLMs require graphic processing units (GPUs) to be trained and also for inference (otherwise they are very slow).

Retrieval Augmented Generation (RAG)and In-Context Learning

A LLM takes a query in natural language (such as English) as input and produces a response. The input query is called a prompt. Often, you can improve the response from a LLM by carefully designing the prompt, in a process called prompt engineering or prompt tuning. LLMs can work well when you give them explicit instructions about how the output format of the response should be, or given them examples that you would like them to learn from (RAG and in-context learning). For example, if the LLM training cut off time was in 2021, and you provide the LLM as a prompt the wikipedia article for the 2022 football world cup, and add at the end of your prompt the query - “who won the 2022 world cup in football?”, it will answer correctly with Argentina.

Training LLMs

LLMs are typically trained in 3 stages: pre-training on massive text corpus with a next-word prediction task, where individual words are masked out, and the model learns to predict the next word. The second stage is supervised fine-tuning (SFT) the LLM using instruction-output pairs, where a much smaller curated dataset of instructions and appropriate output text is used to fine-tune the LLM. The third, and final stage, is the use of RLHF to fine-tune the model with proximal policy optimization. A human takes the outputs (often 4 to 9 responses) and ranks the responses based on their preference. The ranking is used by the reward model to finetune the LLM. Llama 2 has two reward models - one for helpfulness and one for safety.Current models like Llama 2 use ~10K+ prompts and responses for supervised fine-tuning and ~100K+ human preference pairs.

Fine Tuning LLMs

Recently, many pre-trained LLMs have been open-sourced and can be downloaded, and then fined-tuned on your private data to perform a specific task for you. For example, maybe you have large amounts of documentation about your company or products. In this case, you could download a pre-trained LLM (with frozen weights, from somewhere such as HuggingFace), and then add some extra layers and fine-tune those layers using your private data. You now have a model that should perform better on queries on your private data.

Size of LLMs (number of Parameters)

The size of a LLM is typically measured as the number of parameters it contains. The largest known model, GPT-4, has been speculated to have 1,700 billion parameters. In contrast, the largest Llama-2 model has 70 billion parameters.

The size of a LLM is important, because certain capabilities only emerge when models grow beyond certain sizes. In a paper by Wei et al from Google Research, they showed that mathematical and word skills and instruction following appear in LLMs when they grow past certain sizes and have been trained for long enough (measured in training FLOPs).

The number of parameters in a model also determines the size of the model in memory. For example, in Llama-2, the model parameters in 16-bit precision consume:

Llama-2-70b with 16-bit precision = 2 bytes * 70 billion = 140 GB of memory

In practice, this means Llama-2-70b will need at least 2 A100 GPUs (80GB) for inference or fine-tuning.

Proprietary LLMs

The most well-known LLMs released by OpenAI (ChatGPT, GPT-4) are proprietary models - you are not able to download the models, they are only accessible via a UI on their website or via API calls. For example, you pay OpenAI to use the higher end LLMs (GPT-4) and build commercial applications that call their models via APIs. Sometimes, organizations are legally prevented from using proprietary LLMs as they are restricted on what type of data they can send to an external proprietary LLM (e.g., due to data privacy). You can customize responses from proprietary LLMs with prompt engineering, but you cannot fine-tune them.

Open-Source LLMs

There are now hundreds of open-source LLMs available. Currently, the most powerful is Llama-2, released by Meta in July 2023, with 70 billion parameters. Open-source LLMs have the advantage over proprietary models that they can be fine-tuned for task-specific goals. Organizations may have valuable proprietary data (such as their customer help data or internal documentation) that they can leverage to build custom LLMs with fine-tuning. Open-source LLMs also enable organizations to deploy models within their own data centers or cloud accounts, so sensitive data will not leave their network. However, the largest open-source LLMs are still an order of magnitude smaller than the largest proprietary LLMs, so their performance is still not as good for general purpose language tasks.