Introduction: DeepSeek’s Breakthrough Models Based on a video summary of the DeepSeek Math Paper by Vibhu Sapra at Latent Space: Watch the video Presentation overview of DeepSeek’s latest lang...
DeepSeek V3, R10, & R1: A Detailed Overview
DeepSeek Math: A Detailed Summary
Unlocking Mathematical Reasoning in Language Models Based on a video summary of the DeepSeek Math Paper by Yannic Kilcher: Watch the video Introduction DeepSeek’s Rise: DeepSeek is a promin...
data:image/s3,"s3://crabby-images/615c7/615c7ed84eb95aab0e45e06b241b41b1072274c0" alt="Abstract art of a neural network with tangled language symbols"
R1-Zero: When Pure Reinforcement Learning Creates a Mind We Can't Decode
The AI research community is buzzing about DeepSeek-R1-Zero—a model that achieved extraordinary capabilities through pure reinforcement learning (RL), bypassing supervised fine-tuning (SFT). But it...
data:image/s3,"s3://crabby-images/7ae8e/7ae8e6fb90087d4f31bd999ae3b30112d2e24594" alt="Home GPU setup in a Cubist style"
When (and How) to Hire Your First Salesperson - A Founder's Guide
The Foundation First Before you even think about hiring a salesperson, there’s one critical truth you need to accept: you must establish market validation for your product. Salespeople need somethi...
Supercharging Python Development with a Custom Claude Sonnet Prompt
AI assistants like Claude can be incredibly powerful development aids, but their effectiveness often depends on how well you instruct them. The custom prompt I recommend is You are an expert in Py...
data:image/s3,"s3://crabby-images/9b05f/9b05f602ac232c9126a14afa44a150c06c248d4e" alt="Home GPU setup in a Cubist style"
Building a GPU Home Server for AI
Building a GPU Home Server for AI Want to build a GPU home server for running quantized models? Here’s some tips and tricks for setting up the server. Components Overview GPUs RTX 3090: Two R...
Notes on Gradient Decent
Intro: Gradient descent is a first order optimisation algorithem used for finding for the local minimum of a real-valued function \(\min_x f(x)\) with respect to the variable \(x\). Usually the fu...
Code and Coffee Meeetup - Notes on LLM tokenizers
What is a Tokenizer? How do they affect the training of large language models? These are lecture notes from AI Code and Coffee meetup on (2024-03-06) (https://www.facebook.com/events/1058526405447...
Deploying Llama2 on A100 GPUs using vLLM
Meta’s Llama2 is a state of the art open weight, large language model that you can host yourself and use for commercial purposes. It’s open sourced weights and permissive commercial licensing mean ...
Using GPT4 to generate git logs for OpenSource projects in the style of conventional commits via a terminal
Git commit logs can be tedious to write… but are useful for long term maintenance and code audit. They are especially useful for triaging production issues after a release e.g. a menu component ...