I’m going to break this down into two main sections: AI Models vs. AI Harnesses. I don’t want you to confuse models with their harnesses. For example, I’ve seen people confuse Claude Code with the Claude models in person.
AI Models
I am consistently experiment with models, to the point where I can reliably tell you my go-to models for specific tasks.
Planning
- GPT 5.4 - Honestly the best model for planning and coding. It can infer my intent from my prompt without many planning turns.
- Claude Opus 4.6 - Personally, I’d love to rank this model at number one. However, due to low rate limits and my need to experiment with different providers, I can’t commit to more than the $20/month plan. What I love about Opus is its ability to ask clarifying follow-up questions. This gives me better insight into how the model interprets my prompts.
- MiniMax M2.7 - This is honestly my favorite model when my usage rate is limited and I need a cheap model from my OpenRouter provider. Out of the open source models, I find this model to the best at long and complex plans with the right prompting without much hallucinations.
Building/Coding
- GPT 5.4 - Has a tendency to one shot task after generating a plan and tends to handle more edge cases than most models from my experience.
- Composer 2 - The speed and intelligence of Cursor’s RL Kimi K2.5 is impressive. Kimi K2 Thinking was one of my favorite models before I tried MiniMax M2.5. So Cursor’s decision to fine-tune an existing model specifically for coding has been excellent. I’ve wanted a coding-focused model for a long time, and it finally exists. This is one of my favorite models for both detailed implementations and small, quick changes at an extremely low cost.
Frontend
- Claude Opus 4.6 - The best model to one shot a few different designs. This model is made better with the front-end design skill and will create amazing starting points for you to build upon.
- Gemini 3.1 Pro - This model is underrated for design work. However, it typically requires more guidance to arrive at a beautiful base. This model is worth experimenting with, and I’ll be posting an overview of frontend designs generated with it in the future.
File Handling
- Gemini Models - Depending on the complexity and the file sizes I will opt for their flash or pro variant. I am assuming due to the way Google trains their models and their large context window, it consistently outperforms other models when I need it to search and summarize text from attachments.
Writing and Researching
- Claude Sonnet 4.6 - Claude models are severely underrated for writing and brainstorming. Sonnet hits a sweet spot between intelligence and cost, helping you brainstorm ideas without overcorrecting unlike some other models (cough cough GPT models). Sonnet has the ability to ask clarifying questions to understand your intent and provide a solid starting point. It’s also excellent for researching ideas, and I’d love to see Anthropic release a research tool like Notebook LM.
- Claude Haiku 4.5 - By far, one of my favorite models for quick text editing, whether it’s grammar fixes, drafting emails, or quickly researching resources and topics.
AI Harnesses
I have tried a couple of harness and specifically Codex CLI, Gemini CLI, Claude Code, Cursor, OpenCode. Each suffer from the same issues for the sole exception of Codex CLI and OpenCode. Each harness has some of the most horrendous performance I’ve seen for a TUI app. I can not explain to you how poorly Gemini will perform in its own harness and you will end up blowing through tokens you shouldn’t have to. Now back to performance, I can not recommend Cursor even after Cursor 3 nor Claude Code. Both harness suffer from some of the worst memory leaks I’ve ever seen and after each update I can not tell if Cursor or Claude Code work reliable every other patch. Now here is a list of real harnesses you can use:
- OpenCode - Honestly the most reliable harness and being provider agnostic is my biggest plus.
- Codex - A great CLI and the only CLI I can recommend out of the frontier models. This would be my top one, but I do not like being restricted to only a single provider.
Here what’s on my Radar as of Writing
- Pi - a new terminal coding harness comparable to OpenCode that is meant to be hackable.
- OpenClaw - Currently still experimenting with this tool and learning to find a use case for it.
- NotebookLM - A notebook study tool that has a pretty simple pipeline: enter a source -> ask questions -> make study material.
- Fabs Chat - I snuck this in here even though I made it. You should try it out and let me know how it works.