Stop Asking Which Agent Is Best. Ask These 5 Questions Instead.

Summary

Nate B Jones argues that teams should stop asking which AI agent is “best” in general and instead evaluate launches with five infrastructure-focused questions: does it plug into existing tools, let other agents build on top of it, access important data, show signs of an ecosystem, and allow agents to stack on top. By that filter, the most important launches are usually not the flashiest demos or highest benchmark scores, but the ones that improve how agents reach real systems of work.

He applies that framework across five current examples. ChatGPT Workspace Agents fit shared, repeatable cross-tool workflows that run naturally from ChatGPT or Slack. Salesforce Headless 360 matters because it turns Salesforce into agent-ready infrastructure through APIs, MCP tools, and ecosystem support. Microsoft Copilot Wave 3 is strongest for Microsoft-native organizations because Work IQ gives it deep access to the 365 data graph, though it is less open to outside agent ecosystems. Kimi K2.6 is strategically important for developer teams that can self-host open-weight agent infrastructure, but it is not the default answer for most business teams. Perplexity Personal Computer is best for research-heavy work that needs to become a polished artifact, especially with new local Mac access.

The larger conclusion is that the market is moving from “switching” between one default assistant and another toward deliberate layering. Teams should keep their default tools where they are strongest, add specialized wrappers when those wrappers provide unique access to data or workflow context, and route work based on the shape of the task rather than on model hype alone. The practical skill that compounds now is not choosing one winner, but learning which agent layer fits which job.

Transcript

[00:00] Every week, another AI agent launches. [00:02] Just in the last couple of weeks, OpenAI [00:04] shipped workspace agents. Anthropic put [00:06] clawed managed agents into beta. [00:08] Salesforce turned the entire platform [00:10] into headless 360 for agents. Perplexity [00:13] put personal computer on the Mac. [00:14] Moonshot dropped Kimmy K 2.6 with a 300 [00:17] agent swarm. Another company says, “This [00:19] is the one that changes everything. [00:21] You’ve heard it before.” Another [00:22] benchmark chart goes around. You’ve seen [00:23] it before. Another founder posted demo [00:25] where the agent does the entire job [00:27] while everyone in the replies argues [00:28] about whether it was real. And if you’re [00:30] leading a team right now, the reaction I [00:32] keep hearing from folks I know is it’s [00:34] not excitement, it’s exhaustion. The [00:36] question is not what launched this week. [00:38] The question is now which of these [00:40] millions of things actually deserves an [00:42] afternoon of my team’s attention? My [00:44] answer is that you need a filter because [00:46] the agent conversation has quietly moved [00:48] from model quality to infrastructure. [00:51] The launches that matter are not always [00:53] the ones with the best benchmark score [00:54] or the loudest demo. The launches that [00:56] matter are the ones that change what [00:58] your existing tools can reach, what your [01:00] agents can do, and how easily your team [01:03] can stack those systems all together. [01:04] So, I’m going to give you the five [01:06] question filter I use for every agent [01:08] launch. And then I’m going to run five [01:09] representative releases through it. Chad [01:11] GPT Workspace agents, Salesforce [01:13] Headless 360, Microsoft Copilot Wave 3, [01:16] Kimmy K 2.6, and Perplexity Personal [01:19] Computer. The Claude managed agents [01:20] piece is going to come back near the end [01:22] because it helps explain the bigger [01:24] shift underneath all of this. Claude is [01:26] no longer just a product you switch to [01:28] or away from. It’s becoming a direct [01:30] product, an embedded engine inside other [01:32] people’s products and now a managed [01:34] infrastructure layer for teams building [01:36] their own agents. And at the end, I want [01:38] to answer the question underneath all of [01:40] this, which is usually phrased as when [01:42] should I switch from Claude or when [01:44] should I switch from Chad GPT or do I [01:46] switch from co-pilot? I think that [01:47] question is framed wrong. This is not a [01:49] switching question anymore. It’s a [01:51] layering question. Okay, let’s get to [01:53] that filter. The filter I use has five [01:56] questions. First, does this plug into [01:59] the tools my team already uses or does [02:01] it expect me to move my work to a new [02:03] environment? That sounds simple, but it [02:05] eliminates a huge number of launches [02:07] right away. The best agent news is [02:09] infrastructure news. It gives the agents [02:11] you already use a better way to reach [02:13] the places where your work lives. The [02:15] worst agent news is a new destination [02:17] your team is supposed to migrate to. [02:19] We’ve already lived through a decade of [02:20] SAS proving that migration is the most [02:23] expensive thing you can ask a team to [02:24] do. People do not want another place to [02:27] move their work. They want the work to [02:28] become easier where it happens. Second, [02:30] does this let other agents build on top [02:33] of it or is it a closed product? If I [02:35] can point Claude code at it or codeex or [02:37] cursor or a custom internal agent, [02:39] that’s infrastructure. If it only works [02:41] as its own standalone experience, it’s a [02:43] feature. features commoditize, [02:45] infrastructure compounds. Third, does it [02:47] own or access data I care about? Agent [02:50] quality is downstream of data access. A [02:52] mediocre agent with your full customer [02:54] history is often more useful than a [02:55] brilliant agent staring at an empty [02:57] context window, which is why C-pilot [02:59] matters inside Microsoft 365 companies. [03:02] It’s also why Salesforce matters inside [03:04] revenue organizations. It’s why agents [03:06] that look boring from the outside can be [03:08] extremely valuable inside the systems [03:10] where the work actually happens. Fourth, [03:12] is there an ecosystem forming around [03:14] this? One-off launches will fade and [03:16] ecosystems, they compound, right? [03:18] Marketplaces, SDKs, partner programs, [03:20] developer tools, consistent shipping [03:22] cadences, those are all things to watch [03:24] for. Those are things that tell you [03:25] whether a release is going to stick [03:27] around 6 months from now. So, a product [03:29] with a growing marketplace is very [03:31] different from a product with a press [03:32] release. Fifth, can I stack my agents on [03:35] the top? This is one that people often [03:37] forget because a release that lets me [03:38] compose with other agents is much more [03:41] valuable than a release that simply adds [03:43] an additional agent I need to evaluate [03:45] against all the others. The first one, [03:47] it multiplies. The second one just adds [03:49] one more agent to check. Run any launch [03:51] through those five questions and the [03:53] noise gets quiet because most launches [03:56] fail those tests. The ones that pass, [03:58] they’re worth an afternoon of looking [03:59] into. And anything else, it can just go [04:01] into another pile to look at when you’re [04:03] bored on a Friday afternoon. Now, let’s [04:05] run the current agent news through [04:07] exactly that filter and see what passes [04:09] the test. Start with chat GPT workspace [04:11] agents because this is the one that put [04:13] the question in front of a lot of teams [04:14] this past couple weeks. OpenAI’s move [04:16] here is very clear. Workspace agents are [04:19] not just another version of custom GPTs. [04:21] They are shared codeexpowered agents for [04:24] teams. They run in the cloud. They can [04:25] work across connected tools. They can be [04:27] used in chat GPT or Slack. They can be [04:29] scheduled and they’re built for [04:31] repeatable business workflows rather [04:33] than one-off sessions. That matters. The [04:35] interesting shift is we’re moving from [04:37] an I have a helpful personal assistant [04:39] model of agents to our team has a [04:41] reusable work unit model of agents, a [04:44] product feedback routing agent, a weekly [04:46] metrics reporting agent, a risk [04:48] screening agent, a software request [04:49] triage agent. Those aren’t magic [04:51] examples. Those are exactly the kinds of [04:53] repetitive workflows companies keep [04:55] trying to automate and then failing to [04:57] maintain because the process lives [04:58] across Slack and email and documents and [05:00] spreadsheets and ticketing systems and [05:02] somebody’s memory. Workspace agents pass [05:04] the filter for a specific category of [05:06] work. Recurring team workflows where the [05:08] conversational builder, the Slack [05:10] surface, the cloud execution, the [05:11] permissions, the approval, and the [05:12] shared agent directory matter more than [05:15] having native control of a system of [05:17] record. That’s a real category, but it’s [05:18] not every category. If your workflow, [05:20] for example, is deeply native to [05:22] Salesforce, then Salesforce has that [05:24] data advantage. If your workflow is [05:26] deeply native to Microsoft 365, Copilot [05:29] might have the data advantage. If your [05:31] workflow is Frontier Coding, you [05:33] probably still care more about the [05:34] coding agent than the workspace wrapper. [05:36] So, I think the right way to think about [05:37] workspace agents is not this replaces [05:40] every other agent. The right way to [05:41] think about it is this is OpenAI’s [05:43] strongest answer for shared repeatable [05:45] cross tool work that teams want to run [05:47] from chat, GPT, or Slack. And that’s [05:49] just one of the five cases. The second [05:51] one is less flashy, but it may matter [05:53] more. Salesforce Headless 360 is the [05:56] launch most people are probably going to [05:58] forget about to be honest. The name [06:00] makes it sound like platform plumbing. [06:01] And it it is platform plumbing. That’s [06:04] the point. Salesforce announced Headless [06:06] 360 at Trailblazer DX. And the important [06:08] part is this. Every major capability [06:11] across the Salesforce platform is being [06:13] exposed as an API, an MCP tool, or a CLI [06:16] command. That means the browser user [06:18] interface is no longer the only way to [06:20] use Salesforce. Agents can reach into [06:22] Salesforce directly. Coding agents can [06:24] work with live org context. External [06:26] tools can call Salesforce workflows [06:28] without a human clicking through the [06:29] interface. Parker Harris, Salesforce’s [06:31] co-founder, apparently asked the [06:33] question this way. Why should you ever [06:35] log into Salesforce again? Headless 360 [06:37] is the answer. The numbers matter here [06:39] because they show the shape of the [06:41] strategy Salesforce is going after. [06:42] They’re building more than 60 new MCP [06:44] tools. They’re building more than 30 [06:46] preconfigured coding skills. Support for [06:48] tools like Claude Code, Cursor, Codeex, [06:50] and Windsurf. An experience layer that [06:52] separates what an agent does from where [06:53] its output appears. So the same [06:55] underlying agent can render across Slack [06:57] and mobile and teams and chat GPT and [06:59] Claude and Gemini and any other MCP [07:01] compatible client. And then you have [07:03] agent exchange which pulls together the [07:04] Salesforce app ecosystem, right? Slack [07:06] apps, agent force agents, tools, and MCP [07:09] servers in one marketplace. You also [07:10] have the builder fund behind the [07:12] ecosystem. And that’s why this matters [07:14] more than the headlines might suggest. [07:16] Salesforce is not launching an agent. [07:18] Salesforce is trying to become [07:19] infrastructure under the agent economy. [07:22] So every company runs on Salesforce. If [07:24] your company runs RevOps on Salesforce, [07:26] the question is no longer should we use [07:28] agent force or workspace agents for CRM [07:30] work. The question is which of the [07:32] agents we already use do CRM work [07:34] because Salesforce finally expose their [07:36] data properly? And the answer is maybe [07:38] all of them. A coding agent can build [07:40] Salesforce apps with live data. Now, a [07:42] workspace agent can update opportunities [07:43] through the right permissions. A custom [07:45] internal workflow can trigger Salesforce [07:47] flows. A Slack native agent can act on [07:49] CRM data without asking a human to copy [07:51] and paste. So, if you run headless 360 [07:54] through my five question filter, it [07:55] scores extremely well. It plugs into an [07:57] existing system where enterprises [07:59] already live. It’s explicitly open to [08:01] other agent frameworks. It owns the data [08:03] revenue teams care about. It has a real [08:05] ecosystem and it is designed for other [08:07] agents to stack on top of it. This is [08:09] what infrastructure looks like in the [08:11] Asian era. And there is one sleeper [08:13] detail inside the Salesforce [08:14] announcement that connects to a much [08:16] larger trend. So As agent force fives, [08:18] the Salesforce development environment [08:20] uses clawed sonnet 4.5 as its default [08:22] coding model with GPT5 available as an [08:25] option through multimodel support. [08:26] That’s part of a much larger pattern. [08:29] Anthropic’s enterprise strategy [08:31] increasingly looks less like build a [08:33] standalone product that replaces [08:35] everything and more like be the agent [08:38] layer inside other people’s products. [08:39] You see it in Salesforce. You see it in [08:41] Microsoft. You see it in Perplexity. And [08:43] that brings us to the third case we’ll [08:46] look at today. Microsoft Copilot Wave 3. [08:48] I know a bunch of you think I don’t [08:49] cover Copilot, but I cover Copilot. [08:51] That’s still the story a lot of people [08:53] are missing. It predates the last week [08:54] of announcements, but it belongs in the [08:56] conversation because it shows the other [08:58] version of the same infrastructure [09:00] shift. The two pieces that matter are [09:02] copilot co-work and work IQ. C-Pilot [09:05] co-work brings longunning multi-step [09:07] agent execution into 365. Microsoft [09:10] built it by working closely with guess [09:12] who? Enthropic bringing claude style [09:14] agent tech into the copilot surface. So [09:16] work IQ is the data layer. It gives [09:18] co-pilot access to the full context of [09:20] work inside Microsoft 365. So it has [09:23] email, meetings, chats, files, [09:25] shareepoint pages, identity, [09:27] permissions, and organizational context. [09:29] The data access is the point and it’s [09:31] also the moat. A chat GPT workspace [09:33] agent reaching into SharePoint through a [09:35] connector absolutely can do useful work. [09:37] But co-pilot sitting natively inside [09:39] SharePoint and inside Outlook and Teams [09:41] and Excel and PowerPoint and the [09:42] Microsoft identity system is different [09:44] for Microsoft native enterprises. This [09:46] is not a small difference. The agent is [09:48] not just connecting to a file. it is [09:50] operating inside the organizational [09:52] graph. This is where the filter helps [09:54] because copilot is not equally strong on [09:56] every axis. It is very strong on data [09:59] access for Microsoft 365 shops. It is [10:01] very strong on native permissions and [10:03] enterprise governance. It is strong when [10:05] the work lives in Excel, Outlook, [10:06] SharePoint, Teams, PowerPoint, and the [10:08] surrounding Microsoft stack. But it is [10:10] weaker on openness to external agents. [10:12] It is harder to point outside agent [10:14] frameworks at Copilot than it is to [10:16] point them at something like [10:17] Salesforce’s MCP layer. The ecosystem [10:19] energy is very different. It’s closed. [10:21] And for coding heavy workflows, most [10:23] serious engineering teams are not going [10:24] to touch co-pilot. They’re going to go [10:26] for codeex or cloud code. So copilot [10:28] wave 3 passes the filter for a specific [10:31] audience and fails it for others. If [10:32] your team’s work mostly lives in [10:34] Microsoft 365 and mostly is not [10:36] production engineering, then co-work can [10:38] be a significant agentic product you [10:40] should evaluate. But if your team’s work [10:42] crosses ecosystems or depends heavily on [10:44] coding or needs external Asian [10:45] composability, co-pilot’s native data [10:47] advantage is just not worth it. So the [10:49] question is not is co-pilot good. The [10:52] question is really for which shape of [10:54] work is copilot the right layer. Now the [10:56] fourth case is almost the opposite. It [10:58] gets a lot of press attention because [11:00] the model’s very impressive but for most [11:02] enterprise teams it matters less than [11:04] the infrastructure launches. Kim K 2.6 [11:07] which just launched is very technically [11:09] impressive. Moonshot released 2.6 as an [11:12] open weights model under a modified MIT [11:14] license. The model card frames it as a [11:16] native multimodal agentic model built [11:19] for long horizon coding design [11:20] autonomous execution and swarm-based [11:23] orchestration. The headline capability [11:25] is the swarm architecture. 300 sub aents [11:28] coordinating across up to 4,000 steps. [11:30] The published evaluations show very [11:32] strong coding and agentic benchmark [11:34] results. And because the weights are [11:35] available, serious teams can self-host [11:37] it instead of sending work to a closed [11:39] provider. And that’s a big deal. But [11:41] this is where the filter keeps you from [11:43] being hypnotized by a bunch of new [11:45] benchmarks. For enterprise buyers, Kimmy [11:47] does not pass the filter the same way [11:50] Salesforce, Microsoft, OpenAI, or [11:52] Perplexity pass. It does not own your [11:54] workcraft. It does not sit natively [11:56] inside 365 or Salesforce. It does not [11:58] have the same Western Enterprise [12:00] connector story. It is not primarily [12:02] solving the problem of how does my [12:04] existing team route recurring work [12:06] through tools we use. It’s solving a [12:07] different problem for dev teams. [12:09] Building their own agent infrastructure. [12:11] 2.6 is a real option. Open weights under [12:13] a modified MIT license means you can run [12:15] it on your own hardware, keep data [12:16] inside your environment, fine-tune or [12:18] inspect, and avoid being locked into a [12:20] closed lab for every longrunning agent [12:22] workflow. And for that person, for that [12:24] team, Kimmy matters a lot. But for the [12:26] western proumer or business team asking, [12:28] should I use a hosted Kimmy product? The [12:30] answer is definitely no. Not because the [12:32] model’s weak, the model is strong. The [12:34] issue is the product and infrastructure [12:36] context around it. If you’re typing [12:38] sensitive company work into a hosted [12:39] product where you are not comfortable [12:41] with the data path, the model’s [12:42] benchmark score is not the point. The [12:44] deciding variable is trust and [12:45] governance and data boundaries and [12:47] connectors and whether the product fits [12:49] the environment your team operates in. [12:51] Those are different cases. The [12:52] self-hosted developer team is just [12:54] evaluating Kimmyy’s infrastructure. The [12:56] casual hosted user is just evaluating [12:58] Kimmy as a workplace product. Totally [13:00] different worlds. And that’s why Kimmy [13:02] matters strategically, but is often not [13:03] the default answer for most teams that [13:05] are watching this video. It tells you [13:07] that openweight agent models are moving [13:08] quickly. It tells you that long horizon [13:11] agent architecture is advancing outside [13:12] the Western Labs. It tells you that [13:14] teams with a capacity to run their own [13:16] stack have more credible options than [13:18] they did a few months ago. But if your [13:20] question is, “What should my sales ops [13:22] finance or product team use next week?” [13:24] Kimmy is almost certainly not the [13:25] product to evaluate. And that brings us [13:27] to the fifth case, which is almost the [13:29] mirror image here. It’s less model [13:31] ccentric. It’s more workflowcentric. [13:33] Perplexity personal computer on the Mac [13:35] quietly closed one of the biggest gaps [13:37] in the Perplexity Computer ecosystem. [13:39] They’ve been launching really smartly [13:40] recently. Perplexity Computer already [13:42] existed as a digital worker in the in [13:45] this vein of OpenClaw, right? But but it [13:47] had a problem. It could feel like a [13:49] cloud assistant that knew a lot and [13:51] searched well and produced work but did [13:53] not fully live on your machine and did [13:55] not have local file access. The Mac [13:57] rollout changes that personal computer [13:58] adds local file editing, local computer [14:00] use, local browsing through comet, voice [14:03] orchestration and deeper control over [14:04] work happening in background. Perplexity [14:06] also made Claude Opus 4.7 the default [14:09] orchestrator model for computer with [14:11] other model options available. That [14:13] matters because the product category [14:14] becomes easier to understand. Perplexity [14:17] is not just giving you a chatbot that [14:19] searches. It’s trying to give you a [14:20] digital worker that researches and [14:22] reasons and browses and edits files and [14:24] creates artifacts and moves across [14:26] connected apps. Run it through our five [14:28] question filter and the score is mixed [14:29] but useful. It’s strong on connectivity. [14:31] Perplexity has a broad connector surface [14:33] and computer is built around chaining [14:35] capabilities together like research and [14:37] analysis and docs and slides and emails [14:39] and code and schedules and follow-ups. [14:41] It’s stronger now in local access [14:42] because the Mac app can touch files and [14:44] apps directly. It’s moderate on [14:46] ecosystem and team level workflow [14:48] structure. It’s not the same thing as a [14:49] shared workspace agent in Slack. It’s [14:51] not the same thing as C-pilot sitting [14:53] natively in Office 365. It’s not the [14:56] same as Salesforce exposing its trust [14:58] layer and business logic through an MCP. [15:00] So, perplexity passes for a very [15:01] specific category, research heavy work [15:04] that produces a deliverable like [15:06] competitive intelligence or market [15:07] research or sales prospecting or [15:09] financial analysis or document review or [15:12] weekly ops reports. anything where the [15:14] work starts with gathering context from [15:16] many places and then ends with a [15:17] polished artifact. Now, it fails when [15:20] the job is a team shared recurring [15:22] process that needs governance and [15:23] ownership and repeatability across an [15:25] org. It fails when the work is deeply [15:27] native to Microsoft 365 or Salesforce [15:30] and the native graph matters more than [15:32] the research layer. This is not a good [15:33] or bad question, right? It’s it’s a [15:35] routing question. And that’s the pattern [15:37] across all five of these examples. [15:39] Workspace agents are for shared [15:40] recurring workflows in chat GPT and [15:42] Slack. Salesforce headless 360 is for [15:45] agent access to CRM data and business [15:47] logic and revops infrastructure. Copilot [15:49] co-work is for Microsoft native work [15:51] where work IQ is a data advantage. Kimmy [15:54] K 2.6 is for teams that can use [15:55] openweight agent models as [15:57] infrastructure. Perplexity personal [15:59] computer is for researchheavy work that [16:00] needs to become an artifact. Don’t force [16:02] one product to do every job. Assign the [16:05] work to the tool. This is the part most [16:07] teams skip and it’s where a lot of [16:08] wasted license spend happens. Someone [16:11] buys the license and then the company [16:12] tries to make that one product cover [16:14] every job class because adopting a [16:16] second tool is expensive. But the [16:17] expensive thing is not having multiple [16:19] tools. The expensive thing is routing [16:21] the work incorrectly. If you have a [16:23] researchheavy deliverable, perplexity is [16:24] a great candidate. Ask it to compare [16:26] competitors and synthesize news and [16:28] build a market map and draft a report [16:30] and turn it into something a person can [16:31] review. If your team lives in Microsoft [16:33] 365, co-pilot can be a good candidate. [16:36] Ask it to operate inside the emails and [16:38] meetings and docs and spreadsheets and [16:39] shareepoint sites where your org lives. [16:41] If your work is coding or model centered [16:43] reasoning, direct claude or claude code [16:46] or codecs or cursor or specialist coding [16:47] environments are probably still your [16:49] home. The surrounding product matters, [16:51] but sometimes the model and the [16:52] developer workflow are the center of the [16:54] job. If your revops run on Salesforce, [16:56] headless 360 is a no-brainer. Not [16:58] because everyone needs to become a [16:59] Salesforce dev, but because the agents [17:01] you already use may now be able to act [17:03] inside the system that already contains [17:05] your customer and your pipeline and your [17:06] workflow and your permissions data. If [17:08] your team has repeatable crosstool [17:10] workflows that live naturally in chat [17:12] GPT or Slack, then workspace agents is [17:14] worth testing, especially when the [17:15] workflow needs to be shared and [17:17] scheduled and improved over time and [17:18] owned by the team rather than a power [17:20] user. That is a practical answer and it [17:22] leads directly to the question I keep [17:24] hearing in different forms. When should [17:25] I switch? When should I switch from [17:27] Claude, from Chad GPT, from Copilot, [17:29] from Gemini? The reason that question is [17:31] so tempting is that it feels very clean [17:32] to ask. Pick a default, move the team, [17:34] standardize the workflow. But the agent [17:36] market is not moving toward one default [17:39] agent for everything. It is moving [17:41] toward layers. The Cloud specific [17:43] version here is worth unpacking because [17:44] it is one I hear often as Claude gets [17:47] into more and more teams. You may [17:48] already be using Claude without [17:50] switching to Claude directly. If your [17:52] company uses Microsoft Copilot Co-work, [17:54] Anthropics Tech is part of that product. [17:56] If you use Perplexity Computer, Claude [17:58] Opus 4.7 is now the default [18:00] orchestrator. If your Salesforce team [18:02] uses Agent Force 5’s, Claude Sonnet 4.5 [18:05] is the default model. That’s the point. [18:08] Enthropic enterprise strategy [18:10] increasingly looks like sitting inside [18:12] other companies agent stacks. The model [18:14] becomes a layer inside a product that [18:16] owns the data, the workflow, the [18:18] interface, the permissions, or the [18:19] marketplace. Claude managed agents adds [18:21] a third shape to this. That’s not claude [18:23] as a chat product. It’s not claude [18:25] hidden inside Microsoft or Salesforce or [18:27] perplexity. It’s anthropic saying if you [18:30] want longunning agents on claude, but [18:32] you don’t want to build all the [18:33] infrastructure, let us give it to you as [18:35] a managed layer. So claude now shows up [18:37] in at least three ways, right? direct [18:39] claude where the model’s the product [18:40] embedded claude where another vendor [18:42] owns the workflow and the data layer and [18:44] managed claude infrastructure where [18:45] anthropic gives teams a place to run [18:47] agent systems without treating chat as [18:49] the main interface. So the right [18:50] question is now not really should I [18:52] switch from Claude or to Claude. Often [18:54] it is which rapper around Claude fits [18:56] the job and the framework here is bigger [18:58] than Claude. It applies whether your [19:00] default is chat GPT or co-pilot or [19:02] Gemini or something else. There there [19:04] are really just three questions. [19:05] Question one, when should you stay in [19:08] your default agents direct product? Stay [19:10] there when the model is the center of [19:11] the work and the surrounding [19:13] integrations are secondary. coding, long [19:15] context reasoning, novel research, [19:17] custom agents where you are building [19:18] your own workflow logic, tasks where you [19:20] want direct control over the model’s [19:22] behavior, and the wrapper is not adding [19:24] a lot. If you’re a Claude user, it’s why [19:26] claude code remains such an important [19:27] product for engineering teams and why [19:29] co-work is valuable. If you’re a chat [19:30] GPT user, direct chat GPT or direct [19:33] codec still makes sense for many [19:34] open-ended reasoning tasks that don’t [19:36] need to become a recurring team [19:38] workflow. If you’re a C-pilot user, this [19:40] is often the bucket where you should [19:41] look outside co-pilot because copilot’s [19:43] strongest advantage is the integration, [19:45] not the model. Question two, when should [19:47] you use a different product that happens [19:49] to run the same underlying model? Use [19:51] the wrapper when the wrapper gives you [19:53] data access or workflow integration you [19:55] cannot realistically reproduce yourself. [19:58] Copilot with work IQ gives anthropic [20:00] style agent execution against the [20:02] Microsoft 365 graph in a way that direct [20:04] cla with a connector cannot replicate. [20:06] Salesforce with Claude inside inherits [20:09] Salesforce permissions and metadata and [20:10] business logic and trust boundaries. [20:12] Perplexity with Claude as an [20:13] orchestrator gives you a research and [20:15] artifact workflow that is different from [20:16] a blank claude chat. In those cases, [20:18] you’re not really switching from your [20:20] default model. You’re moving the model [20:22] into the product layer that has the [20:23] right data fabric. Question three, when [20:25] should you use a product that runs a [20:27] different underlying model entirely? Use [20:29] the different model when the surrounding [20:31] product matters more than the marginal [20:33] difference in model quality. So chat GBT [20:35] workspace agents may be the right choice [20:36] for a Slack native recurring team [20:38] workflow even if another model would [20:40] write a slightly better paragraph in [20:41] isolation. Google Gemini and Workspace [20:43] may be the right choice for teams that [20:45] live in Google because it inherits the [20:46] graph. Self-hosted Kimmy K 2.6 may be [20:49] the right choice for a dev team that [20:51] wants openweight agent capability [20:53] without a closed lab dependency. So the [20:55] model matters but the model is no longer [20:57] the only thing that matters. The [20:58] wrapper, the graph, the permissions, the [21:00] connectors, the workflow surface, the [21:02] ecosystem, and stackability often matter [21:04] more. That’s the reframe your team needs [21:06] to get on board with. The switching [21:07] costs are real. Prompts do not transfer [21:10] cleanly. Memory and context don’t port [21:12] cleanly. Skills are more portable than [21:13] they used to be, but they’re not [21:15] magically plugandplay. Team habits [21:17] matter a lot. If your team has spent six [21:19] months learning how to specify work to [21:21] one agent, moving them to another [21:23] product can restart a lot of that [21:24] learning. So, don’t switch casually. [21:27] Layer deliberately. Keep your default [21:29] where it works best. Add specialists [21:31] where the specialist wins really clearly [21:33] and build the judgment to route work [21:35] based on the shape of the task. That [21:37] judgment is the new literacy of the [21:39] agent era. The agent layer is not one [21:42] product category. That’s the trap we [21:45] fall into. People see AI agent and they [21:47] assume all these launches belong on one [21:50] big comparison chart and they try and [21:51] build it. I’ve seen them try. They they [21:53] don’t belong on one chart. Some agents [21:55] are model products. Some are workflow [21:56] builders. Some are enterprise data [21:58] layers. Some are openw weightight [21:59] infrastructure. Some are research [22:00] workers. Some are wrappers around other [22:02] models. Some are control planes for [22:04] systems your company already uses. The [22:06] launches aren’t equal because they’re [22:08] not trying to do the same job. So that [22:11] five question filter is a way to [22:12] simplify that. Does it plug into the [22:14] tools your team uses? Does it let other [22:16] agents build on top? Does it own or [22:18] access data you care about? Is there an [22:20] ecosystem forming around it? Can you [22:22] stack agents on top? Now, if the answer [22:24] is yes across all those questions, the [22:26] launch probably deserves a lot of [22:27] attention from you. If it’s no, it may [22:30] be impressive, but it’s probably not for [22:32] you right now. And that is how I would [22:33] think about the rest of this year. [22:35] Filter for infrastructure over features. [22:37] Filter for ecosystems over demos. Filter [22:39] for stackability over walled gardens. [22:41] Filter for data access over benchmark [22:44] charts. And then match the shape of the [22:45] work to the shape of the tool. That’s [22:47] the whole game right now. And the teams [22:49] that learn to route work across those [22:50] layers are going to compound faster than [22:52] the teams that are chasing whichever [22:54] model or agent had the loudest launch. [22:56] If this was useful, give me a subscribe. [22:58] I also wrote the full version of this [22:59] argument with the filter and the product [23:01] breakdown over on Substack and a lot of [23:03] specific guides to help you pick agents [23:05] and get started. I’ll link that below. [23:07] Cheers.

Zettle

Explorer

Stop Asking Which Agent Is Best. Ask These 5 Questions Instead.

Summary

Transcript

Graph View

Table of Contents