Looking for a specific paper or subject?


MoE_layer

Introduction to Mixture-of-Experts | Original MoE Paper Explained

Diving into the original Google paper which introduced the Mixture-of-Experts (MoE) method, which was critical to AI progress…
MoA Architecture

Mixture-of-Agents Enhances Large Language Model Capabilities

In this post we explain the Mixture-of-Agents method, which shows a way to unite open-source LLMs to win GPT-4o on AlpacaEval 2.0…
Abacus Embeddings Overview

Arithmetic Transformers with Abacus Positional Embeddings

In this post we dive into Abacus Embeddings, which dramatically enhance Transformers arithmetic capabilities with strong logical extrapolation…
CLLMs Training

CLLMs: Consistency Large Language Models

In this post we dive into Consistency Large Language Models (CLLMs), a new family of models which can dramatically speedup LLMs inference!…

ReFT: Representation Finetuning for Language Models

Learn about Representation Finetuning (ReFT) by Stanford University, a method to fine-tune large language models (LLMs) efficiently…
Attacking LLM preview

Stealing Part of a Production Language Model

What if we could discover OpenAI models internal weights? In this post we dive into a paper which presents an attack that steals LLMs data…
V-JEPA Training Process

How Meta AI ‘s Human-Like V-JEPA Works?

Explore V-JEPA, which stands for Video Joint-Embedding Predicting Architecture. Another step in Meta AI’s journey for human-like AI…
Era of 1 bit LLMs Pareto improvement

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

In this post we dive into the era of 1-bit LLMs paper by Microsoft, which shows a promising direction for low cost large language models…
Scroll to Top