Protocols Don't Fail. Organizations Deploy Them at the Wrong Level.

Most AI agent initiatives collapse not because the technology is broken, but because the people building them are thinking at one level of abstraction and acting at another. The Shengxing Conjecture gives us a framework to diagnose why — and fix it.

The Conjecture #

The Shengxing Conjecture proposes that recursive meta-questioning — asking "what are the assumptions behind this?" repeatedly — loses practical utility around the sixth iteration. At that depth, you've ascended so far from concrete action that the language becomes self-referential and no longer translates into decisions.

This sounds abstract. But it has a direct, uncomfortable implication for AI system design: there is a ceiling above which your thinking stops paying rent.

Most organizations hit this ceiling — or never leave the basement. Both are failure modes.

The M-Stack: Six Levels of AI Deployment Thinking #

We can map the problem onto six levels, from raw execution at the bottom to governance at the top. We call this the M-Stack.

Level Name The question at this level
M6
Philosophy
"What does intelligence mean?" — Rent stops being paid here.
M5
Governance ↑ ceiling
"Who is accountable when an agent fails?"
M4
Boundary
"Where does human judgment override the agent?"
M3
Strategy ↓ floor
"Which processes are worth automating at all?"
M2
Intent
"Did the agent understand what was asked?"
M1
Orchestration
"How does the agent decide which tool to call?"
M0
Execution
"What action did the agent just take?"

The Shengxing Conjecture marks M6 as the boundary: useful thinking lives in M0–M5. The sweet spot for organizational decision-making sits in M3–M5 — strategic, bounded, and still connected to action.

Where AI Projects Actually Break #

Most failures aren't technical. They're level mismatches — the thinking and the acting happen at different depths, with nothing bridging them.

Failure 1 The Hollow Mandate

Leadership commits to AI transformation at M5 ("we will be responsible and human-centered"). Engineers start building at M0 ("let's automate the support queue"). The three levels in between are empty. Nothing translates the principle into constraint.

M5 thinking: "Responsible AI deployment"
M4 gap:   [nobody defined the human override conditions]
M3 gap:   [nobody asked which processes should be automated]
M0 action: Replace 40% of support staff
Failure 2 The Basement Engineer

A technically skilled team builds an impressive M0–M2 system. It works. But nobody asked the M3 question: is this the right thing to automate? The system runs perfectly, optimizing a process that shouldn't exist.

M0–M2 action: Flawless automation built
M3 gap:    [the process was a workaround for a broken upstream system]
Result:     Automated a problem instead of solving it
Failure 3 The Infinite Retreat

A governance team reaches M5, then keeps going — "but who defines accountability?" (M6), "but what is agency?" (M6+). Six months of workshops. No protocol adopted, no boundary defined, no system built. The Conjecture predicted this: past M5, thinking stops producing decisions.

M5:   "We need an AI governance policy"
M6:   "But what counts as an AI system exactly?"
M6+: "What is accountability in a distributed system?"
...:   [paralysis]

Where Protocols Fit In the Stack #

Now the protocols from our previous article map clearly onto the M-Stack:

  • MCP operates at M0–M1: it defines how an LLM calls a tool and gets a result. Execution and basic orchestration.
  • A2A operates at M1–M2: it defines how agents delegate to each other, communicate intent, and stream results.
  • ACP operates at the same layer as A2A, with a different runtime assumption.

Notice what protocols cannot do: they cannot tell you which processes to automate (M3), where humans should intervene (M4), or who is accountable when things go wrong (M5). Protocols are implementation decisions. Strategy is a human decision.

Choosing A2A over ACP is an M1 decision. Deciding whether to build a multi-agent system at all is an M3 decision. Most teams spend enormous energy on the former and almost none on the latter.

Can AI Bridge the Levels? #

A reasonable question: can the AI itself translate between levels? Could an LLM take an M5 governance principle and derive M3 strategic constraints automatically?

Partially. AI can translate information across levels effectively. What it cannot do is translate values — because value translation requires knowing what matters to a specific organization, in a specific context, for specific stakeholders. That judgment is irreducibly human.

This is also why the layer structure isn't just a legacy of old organizational design. It reflects something real about the nature of decision-making: some questions are genuinely higher-order than others, and answering them requires different kinds of reasoning.

The Diagnostic: Which Level Is Your Team Missing? #

Quick Diagnostic

You have a clear protocol choice (A2A or MCP) but unclear success criteria.

→ Missing M3. You're choosing implementation before strategy. Step back: which processes are actually worth automating, and why?

Your AI system works technically but keeps surprising users negatively.

→ Missing M4. You haven't defined where human judgment should override the agent. Define the override conditions explicitly.

Your governance committee has met 8 times and produced no policy.

→ Stuck above M5. Force a decision with a concrete case: "For this specific process, who is accountable if the agent is wrong?" Answer that. Then generalize.

Your engineers are amazing but the business can't articulate ROI.

→ M0–M2 work without M3 connection. Bridge it: map each automation to a specific business outcome. If you can't, reconsider the automation.

The Rule of Adjacent Levels

The framework's practical constraint: thinking and action must stay within two levels of each other. You can think at M4 and act at M2. You cannot think at M5 and act at M0 without something filling the gap.

This isn't a theoretical claim — it's an observation about how organizations actually fail. The gap is where assumptions accumulate, where "responsible AI" becomes "we automated the call center," where "agent coordination" becomes a protocol chosen for the wrong reasons.

Fill the gaps deliberately. The tools exist at every level. The discipline is in knowing which level you're working at — and naming the ones you're skipping.

Sources & Further Reading

协议不会失败。是组织在错误的层级部署了它们。

大多数 AI 智能体项目失败,不是因为技术出了问题,而是因为构建者在一个抽象层级思考,却在另一个层级行动。胜荇猜想(Shengxing Conjecture)给了我们一个诊断框架。

猜想本身

胜荇猜想提出:反复追问"这个假设背后的假设是什么?"——元递归提问——在第六次迭代前后失去实用价值。到那个深度,你已经离具体行动太远,语言变得自我指涉,无法再转化为决策。

这听起来很抽象。但它对 AI 系统设计有一个直接而令人不安的含义:存在一个思考天花板,超过它,你的思考就停止产出结果。

大多数组织要么撞上这个天花板,要么从未离开地下室。两者都是失败模式。

M-Stack:AI 部署思考的六个层级

我们可以把这个问题映射到六个层级,从底部的原始执行到顶部的治理。我们称之为 M-Stack。

层级 名称 这一层的核心问题
M6
哲学层
"智能意味着什么?"——思考在这里停止产出回报。
M5
治理层 ↑ 天花板
"智能体出错时,谁负责?"
M4
边界层
"在哪里,人的判断应该凌驾于智能体之上?"
M3
战略层 ↓ 地板
"哪些流程值得自动化?"
M2
意图层
"智能体理解了被要求做什么吗?"
M1
编排层
"智能体如何决定调用哪个工具?"
M0
执行层
"智能体刚才执行了什么操作?"

胜荇猜想将 M6 标记为边界:有效思考存在于 M0–M5 之间。组织决策的最佳区间在 M3–M5——有战略性、有边界,同时仍然连接到行动。

AI 项目真正在哪里断裂

大多数失败不是技术问题。它们是层级错位——思考和行动发生在不同深度,中间没有任何东西连接它们。

失败模式 1 空洞的授权

领导层在 M5 承诺 AI 转型("我们将负责任地、以人为中心地使用 AI")。工程师在 M0 开始构建("我们来自动化客服队列")。中间三层是空的。没有任何东西将原则转化为约束。

M5 思考: "负责任的 AI 部署"
M4 空白: [没人定义人工干预的条件]
M3 空白: [没人问哪些流程应该被自动化]
M0 行动: 替换 40% 的客服员工
失败模式 2 地下室工程师

一个技术精湛的团队构建了出色的 M0–M2 系统。它运行正常。但没有人问 M3 的问题:这是正确的自动化对象吗?系统完美运行,却在优化一个本不该存在的流程。

M0–M2 行动: 构建了无懈可击的自动化
M3 空白:   [该流程本是上游系统问题的临时补丁]
结果:     把问题自动化了,而不是解决了它
失败模式 3 无限后撤

治理团队到达 M5,然后继续往上——"但谁来定义责任?"(M6),"但什么是主体性?"(M6+)。六个月的研讨会。没有采用任何协议,没有定义任何边界,没有构建任何系统。猜想预测了这一点:超过 M5,思考停止产出决策。

M5: "我们需要 AI 治理政策"
M6: "但什么才算 AI 系统?"
M6+: "分布式系统中的责任是什么?"
...: [瘫痪]

协议在 M-Stack 中的位置

现在,上一篇文章中的协议清晰地映射到 M-Stack 上:

  • MCP 工作在 M0–M1:定义 LLM 如何调用工具并获取结果。执行层和基础编排层。
  • A2A 工作在 M1–M2:定义智能体如何相互委托、传达意图、流式传输结果。
  • ACP 工作在与 A2A 相同的层级,但有不同的运行时假设。

注意协议不能做什么:它们无法告诉你应该自动化哪些流程(M3),人类应该在哪里介入(M4),或者出错时谁负责(M5)。协议是实现决策。战略是人的决策。

选择 A2A 还是 ACP 是 M1 层的决策。决定是否要构建多智能体系统是 M3 层的决策。大多数团队在前者上花费大量精力,在后者上几乎没有。

AI 能自己充当层间翻译吗?

一个合理的问题:AI 自身能否在层级之间翻译?LLM 能否自动将 M5 的治理原则推导为 M3 的战略约束?

部分可以。AI 能有效地跨层级翻译信息。它无法翻译的是价值判断——因为价值翻译需要知道什么对某个特定组织、在特定情境中、对特定利益相关者重要。这个判断是不可化约的人类决策。

这也是为什么层级结构不只是旧组织设计的遗留物。它反映了决策本质上的真实:某些问题在本质上比其他问题更高阶,回答它们需要不同类型的推理。

诊断:你的团队缺少哪个层级?

快速诊断

你已有明确的协议选择(A2A 或 MCP),但成功标准不清晰。

→ 缺少 M3。你在战略之前选择了实现。退一步:哪些流程真正值得自动化,为什么?

你的 AI 系统技术上运行正常,但总是以意想不到的方式让用户负面惊讶。

→ 缺少 M4。你没有定义人类判断应该凌驾于智能体的条件。明确定义干预条件。

你的治理委员会开了 8 次会,没有产出任何政策。

→ 卡在 M5 以上。用一个具体案例强制决策:"对于这个特定流程,如果智能体出错,谁负责?"先回答这个,再推广。

你的工程师很强,但业务方无法说清楚 ROI。

→ M0–M2 的工作没有连接到 M3。建立映射:每个自动化对应一个具体业务结果。如果无法映射,重新考虑这个自动化。

相邻层级规则

这个框架的实践约束:思考和行动必须保持在彼此两个层级以内。你可以在 M4 思考,在 M2 行动。你不能在 M5 思考,在 M0 行动,而中间什么都没有填充。

这不是理论主张——这是对组织如何实际失败的观察。间隙是假设积累的地方,是"负责任的 AI"变成"我们自动化了呼叫中心"的地方,是"智能体协调"变成因错误理由选择协议的地方。

有意识地填补这些间隙。每个层级都有对应的工具。关键的纪律在于知道你在哪个层级工作——并明确说出你正在跳过哪些层级。

参考资料与延伸阅读