Microsoft

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

In this post, we dive into a new and exciting research paper by Microsoft, titled: “The Era of 1-bit LLMs: All Large Language Models are 1.58 bits”. In recent years, we’ve seen a tremendous success of large language models with models such as GPT, LLaMA and more. As we move forward, we see that the …

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Read More »

Orca 2: Teaching Small Language Models How to Reason

Several months ago, Microsoft released the first version of Orca, which achieved remarkable results, even surpassing ChatGPT on data from BigBench-Hard dataset, and the ideas from Orca 1 helped to create better language models released in the recent period. The Orca 2 model, presented in the paper we review in this post, achieves significantly better …

Orca 2: Teaching Small Language Models How to Reason Read More »

CODEFUSION: A Pre-trained Diffusion Model for Code Generation

CODEFUSION is a new code generation model which was introduced in a research paper from Microsoft, titled: “CODEFUSION: A Pre-trained Diffusion Model for Code Generation”. Recently, we’ve observed a significant progress with code generation using AI, which is mostly based on large language models (LLMs), so we refer to them as code LLMs. With a …

CODEFUSION: A Pre-trained Diffusion Model for Code Generation Read More »

Dilated attention overview

LongNet: Scaling Transformers to 1B Tokens with Dilated Attention

In this post we review LongNet, a new research paper by Microsoft titled “LongeNet: Scaling Transformers to 1,000,000,000 Tokens”. The paper starts with the above amusing chart that shows the trend of transformer sequence lengths over time in a non-logarithmic y axis and we can see LongNet is way above with its one billion tokens. …

LongNet: Scaling Transformers to 1B Tokens with Dilated Attention Read More »

Scroll to Top