A Practical Guide to Evaluation of LLM Apps (Part C)
Understanding evaluation of conversational LLM systems, toolcalls, tracing, and red teaming.
Understanding evaluation of conversational LLM systems, toolcalls, tracing, and red teaming.
When it comes down to it, Dario Amodei isn’t all that much different from Sam Altman
Make no mistake about what is happening.
“The problem comes down to how A.I. chatbots are fundamentally designed”
This was the worst week I have had in quite a while, maybe ever.
I’ve been trying to find a slot for this one for a while.
The attempt on Friday by Secretary of War Pete Hegsted to label Anthropic as a supply chain risk and commit corporate murder had a variety of motivations.
LLMs are an epistemic nightmare
The road to where we are now was (mostly) paved with good intentions — but mixed with too much uncritical acceptance of hype.
What might a superintelligence arcology be like?
This is the long version of what happened so far.
The writer Tyler Austin Harper (of The Atlantic, etc.) sent me a thread this morning, asking whether a mistargeting yesterday that killed nearly 150 school children in Iran could have been the result of AI.
We will learn a lot about Silicon Valley in the upcoming days