I Logged Every AI Session for 11 Months. What 1,800 Files Taught Me About Getting Better.

I'd been building AI memory and orchestration tools in my free time for months — alongside a full-time job and side projects. Small experiments. Rough prototypes. Things that worked just well enough to keep going.

In November 2025, I decided to triple down. Go all in on the tools I'd been tinkering with. Stop treating them as side projects and start treating them as the project.

The bet: if I get the foundations right, everything built on top will be faster.

I could see glimpses that it was working. Sessions were smoother. Context carried over. The doubt was real, but so were the signs.

Four months later: 905 contributions in a single month.

The Problem Nobody Talks About

AI coding assistants are brilliant—for exactly one session.

Ask Claude Code (Anthropic's AI coding assistant) to help you build something? Incredible. It writes clean code, reasons through architecture, catches bugs. But close the session and open a new one?

Gone. All of it. Every decision. Every context. Every lesson learned.

It's like pair-programming with a senior developer who gets amnesia every morning.

A friend—vvarp—was already keeping a devlog for his own AI sessions. Simple notes on what was done each day, so he could discuss progress with his cofounder. He suggested I try something similar: keep a markdown file that the AI reads and writes to between sessions. A persistent scratchpad. Nothing fancy.

I tried it. The difference was immediate. So I kept pushing the idea further.

Not just a scratchpad — daily session logs. Decisions with reasoning. Research notes. Plans that could be diffed against later. Retrospectives to analyze what actually worked and what didn't. The goal wasn't just memory — it was a feedback loop to get better at how I work.

Building the Tools (Nov 2025)

What started as a single markdown file became devlog-mcp—an open-source memory system for AI development.

It grew into something much bigger than a scratchpad. Across 11 months: ~1,800 files, grouped by what they prevent:

Never lose context (1,311 files) — 670 session logs + 641 daily logs. Three memory layers: current.md for live state, currentWeek.md for recent context, SQLite for long-term searchable history. Every session picks up exactly where the last one left off.

Reduce repeated debates (389 files) — 183 formal architecture decisions, 206 insights and verdicts. Some decisions still get revisited — live data changes, context shifts, old reasoning stops applying. But at least the AI can read what was decided before and why instead of starting from scratch every time.

Redo less research (217 files) — 181 research investigations, 36 AI model experiments. I still revisited things. I still second-guessed myself. But the devlog gave me data points to revisit with — not from scratch. Three hours figuring out how an undocumented API actually works? At least the next session could read what I already tried.

Second-guess with data (80 files) — Market research, competitive analysis, pricing models. I changed my mind plenty. But each time, the previous reasoning was there to build on or deliberately override — not just forgotten.

Plus the tooling that makes it work: semantic search across everything, automated entity extraction into knowledge graphs, multi-agent workspace locking, git context detection, plans as diffable contracts, and weekly retrospectives that compress sessions into patterns.

Plus a multi-model orchestration tool called tachibot-mcp — so different AI models could collaborate on the same problem, each bringing different strengths.

I was working every day — had been for the past year. Just not committing everything. A lot of the work was experimentation, prototyping, testing approaches that never hit a repo.

The question wasn't whether I was working hard enough. It was whether I was working on the right thing.

Then the Tools Kicked In

December 2025. The product I'd been building for the past twelve months wasn't new — I'd already put in months of real work on it, just without great tooling. Slow sessions, lost context, repeated investigations. A lot of the heavy lifting was already done the hard way.

Now the tools were ready. Same product, same codebase — but with memory that carried over, orchestration that worked, and parallel sessions that didn't fight each other. The difference was immediate.

My velocity didn't just increase. It exploded.

Monthly Commits

November: 120 contributions. February: 905. That's 7.5x growth in three months.

Peak day? 91 contributions across six different projects. How? By running multiple AI sessions in parallel—up to 28 in a single day:

Sessions / Day

On 17 different days, I ran 11+ parallel sessions. The memory system made that possible—without it, those agents would have trampled each other.

The months I spent building tools weren't wasted. They were the highest-leverage investment I've ever made.

What ~1,800 Files Actually Taught Me

Velocity is just the headline. The real story is what changed about how I work.

1. Friction Is the Raw Material

Every debugging rule started as a wasted day. 50+ rules accumulated over 11 months. Each cost roughly a day to discover. Each now applies in seconds, across every future session.

The growth metric isn't how many bugs you fix — it's how quickly you turn a mistake into a rule that prevents the next one.

Win: Mistakes become permanent prevention. Tax: You have to write the rule down before you forget the pain.

2. Every Crisis Follows the Same Algorithm

Seven crises across the project. Same loop every time: surface the problem → investigate deeply → get multiple perspectives → document the verdict → fix → create a preventive rule.

The loop got faster each time — not because crises got easier, but because the pattern was practiced. Crisis stopped being disruption and became the primary way the system learned.

Win: A repeatable playbook that gets faster with each use. Tax: You must investigate before you fix — no panic-coding.

3. Externalize Your Executive Function

A solo developer has no team to check blind spots. No code review. No "hey, are you sure about this?" The logs and multi-perspective analysis serve that role — an artificial check on your own judgment.

Forcing a written verdict before writing code prevents exhaustion-driven decisions. It stops scope creep. It catches the moment when you're building what feels productive instead of what's actually needed.

Logging is how you argue with yourself — and win.

Win: You become your own senior engineer. Tax: Discipline to write before coding, every time.

4. Deliberation and Execution Are Different Modes

82 decision files in the peak month. 1 the month after. That's not a decline — that's the system working. Deliberation was front-loaded. Architecture was debated, contested, documented. Then the decision window closed and pure execution began.

Decide First, Build Second

The character of the work itself shifted over time — from deciding to building to shipping:

What I Was Actually Doing (by file type)

Sessions

Daily logs

Decisions

Insights

(The project didn't start in December — I'd been building it for over a year by then. These charts only show the final months, after the tools were refactored and working. The earlier work doesn't show up here, but it's the reason this phase moved so fast.)

You can't deliberate and execute at full speed simultaneously. The logs tell you which mode you're actually in — not which one you think you're in.

friction → investigation → verdict → rule → enforcement → less friction. Repeat 670 times.

Was It Worth It?

November to March. Five months. 2,874 contributions.

If I'd skipped the tools and maintained a steady 4 contributions per day (generous estimate for manual workflow), that would have been about 720 contributions.

Instead: 2,896. Four times more. And accelerating.

Building tools felt like falling behind. It was actually the only way to get ahead.

A Note on Karpathy's LLM Wiki

Three days before I started writing this post, Andrej Karpathy—the AI researcher behind Tesla's Autopilot—tweeted about building a remarkably similar system.^[1] An LLM compiling markdown files into an interlinked personal wiki. 45,000 likes. 13 million views.

I started building devlog-mcp eleven months before that tweet. Same core instinct—give AI persistent memory via files. Completely different application.

Karpathy's llm-wiki is a personal knowledge system—a beautiful approach to organizing articles, papers, and life notes in Obsidian. His four-phase cycle (Ingest → Query → Lint → Output) is an elegant framework, and his index.md routing is a smart solution for navigating curated knowledge at moderate scale. It's the right architecture for what it's solving.

devlog-mcp solves a different problem: active development memory. Session-aware state. Multi-agent coordination. Automated extraction. The kind of thing you need when AI agents are shipping code at midnight and you need to know what happened, what was decided, and why.

Same problem. Same starting point — markdown files. Different directions from there.

And it's not just Karpathy.

The Timeline Nobody Coordinated

May 2025 — I start building devlog-mcp after a friend's suggestion
Aug 2025 — "Open-sourced" it on GitHub (if 1 star and 1 fork counts as open source — hi mom)
Nov 2025 — I build tachibot-mcp for multi-model orchestration
Nov 2025 — Claude Code adds filesystem-based plan files
Nov 2025 — OpenClaw ships with MEMORY.md + daily markdown logs
Mar 2026 — Superpowers (the most popular Claude Code plugin) formalizes a writing-plans skill
Mar 2026 — Claude Code launches built-in auto-memory
Apr 2026 — Karpathy tweets about markdown-based LLM wikis. 45K likes. 13M views.
Apr 2026 — devlog-mcp: 1 star. tachibot-mcp: 15 stars. Combined: still fewer than a medium blog post about CSS. But the idea vvarp sparked — and I spent 11 months obsessing over — turned out to be right.

It's a Stand Alone Complex — the same idea emerging everywhere, without coordination. Nobody copied anybody. There's no single origin point.

Why does everyone keep reaching for markdown? Because it's what LLMs already speak. They read it, they write it, no serialization, no schema migrations, no wasted tokens on format conversion. It's just the path of least resistance between an LLM and a file.

When life gives you lemons, you make lemonade.

When LLMs give you hallucinations, you make .md files — to remember some version of the truth.

References

[1] Andrej Karpathy — Tweet about LLM-based personal knowledge bases (April 2, 2026)

[2] Initial file-based approach suggested by vvarp — thanks for the spark

[3] devlog-mcp — AI memory system for development (open-sourced August 2025)

[4] tachibot-mcp — Multi-model AI orchestration

[5] Andrej Karpathy — llm-wiki Gist (April 4, 2026)

[6] supertemplates.ai — The product these tools were built to ship