Testing a LLMs ability to recall information from a long context.
I created a benchmark to test how well LLMs can recall information from a long context window. It was tested on gpt-4o & claude-2.1
Handcrafted bronze maps of American terrain.
In need of a physical project, I managed 3d printing, casting in bronze, and hand finishing maps of US mountains.
Visualizing different chunking strategies.
LLMs do better with shorter context windows. I built a tool to visualize how different chunking strategies to help you pick the one that is best for your use case.
Benchmarking LLM through multi-headed snake games.
50 LLMs battle it out on Snake.
If you don't have 85 terminals open, are you even vibing?
Meme site that allows you to open up as many vibe terminals you want.