Discussion about this post

User's avatar
Omer Ekin's avatar

Thank you, Tal. Your point about not controlling the harness or installed skills feels especially important. One fallback we have considered is exposing a concise get_usage_guidance tool through the MCP itself, similar to the “call this first for important context” tool you mentioned. It would not be as context-efficient or reliable as a natively installed skill, but it could provide portable guidance when skills are unavailable.

I am exploring this in the context of Sage, an inner-work product that other agents could involve when they notice a user may benefit from deeper reflection. The challenge becomes more pronounced there: if Sage cannot control the model, prompt, surrounding context, or presentation, how much of the inner-work experience should it actually delegate? My current hypothesis is that the MCP should initially help agents recognize Sage moments, request consent, and hand the user off, while Sage retains responsibility for memory and the actual intervention. Your eval and observability approach gives me a useful framework for testing how much more can safely be exposed over time.

Luke Lin's avatar

Great post, Tal. We felt this pain as well when we built a 50+ MCP library for one of our clients. We ended up building a skill library that would tell the MCPs how to best use tools to answer different types of questions, then connected it to a closed loop eval system where an LLM would both evaluate the performance and then tune the skills accordingly to eventually get performance where we wanted.

This is, of course, after we tuned the tool schemas and descriptions to be context efficient minimize procedural errors.

Also, on skills vs tool descriptions... assuming that you're loading all your tools at once with the MCP, I would just be careful of overloading tool descriptions with too much information since that will create context bloat, whereas skills get loaded based on the user query

4 more comments...

No posts

Ready for more?