as it was
Getting Rid of Your Voice

Teasers

“I don’t care about what the experts say about the notes of the wine,” our wine tasting host, Tyler, railed passionately. “I’m only interested in what happens in your palatte.” Music to our ears, my husband and I settled back into our seats, mildly pleased with ourselves as the unwashed masses with minds of our own.

As we discovered our taste for French wines, one guy in the audience (let’s call him Sanjay) was rising to the occasion, stepping up to be the born French connoisseur, spitting out acronyms like “GSM” and names of remote French villages that produced $8000 bottles of wine. Between Tyler’s subtle salesmanship and varying shades of our wine appreciation, Sanjay’s loud mouthed commentary briefly threatened to drown the room.

But thanks to Tyler, I ended up enjoying all the teasers of the event. All for $20.

The Job

After it was over, I marvelled at Tyler and how well he’d handled Sanjay. Dealing with such crowd can be hard work, but Tyler poured us wine after wine with rare sensitivity and proficiency. He knew his job, I thought, as I opened up to some self-reflection…

How often do I confuse the job with work? Honestly, all the time. When I’m cooking, I’m grading myself on following the recipe. When I’m hiking, I’m crushing it on the trail ahead of everyone else. When I meet a defect, I want to squarely fix it. When I find a problem, I want to solve it. Work might have a way to pump up the adrenaline. And I’m an adrenaline junkie, it would seem.

Does it make for good work? Sure. Does “facing it squarely” deliver instant gratification? May be. Does it get the job done? Unclear. But if I were an artist, I’d wager that this is the voice that I’ve come to recognize as mine.

The Voice

“The real issue is not how to find your voice, but… getting rid of the damn thing,” Philip Glass once quipped. I’ve tried to get rid of my pestering “face it squarely” voice, I promise you, but the only thing that’s worked for me is to be in the moment. To fill the moment with attention and openness.

Tyler seems to have mastered this, presumably from hosting dozens of wine tastings and dealing with all kinds of characters. I wonder if the wine helps too?

Like the notes of wine wafting to our noses, there’s no intellectual sophistication in the moment, just the experience that no one else can have. And like the flavors dissolving on our tongues, there was no judgement, just a sense of what must be done… attend to your experience and let others attend to theirs.

MCP Transport Inspection

The First Two Laps

A couple of weeks ago, I wrote up a scrappy MCP (Model Context Protocol) Server and Client. Beyond the fact that everyone is creating MCP servers, there were two reasons I wanted to write these:

  1. I wasn’t able to find a way to test the newly introduced Streamable HTTP transport in the Model Context Protocol, either on the server or client. At the time of writing, the MCP Inspector didn’t seem to support this transport; it supported SSE (Server-side Events) and Stdio transport mechanisms. I was also not aware of a server that supported Streamable HTTP, though this was relatively easy to write. The hard part seemed to be the client app, though it’s entirely possible that I missed something obvious.
  2. There were even fewer pointers to MCP clients that required authorization support from servers, which had been running mostly on local desktops. Now with Streamable HTTP transport, remote servers were beginning to emerge.

So I prototyped an MCP client that used device flow based OAuth and used Streamable HTTP transport to communicate with servers. As a command line client, it required a custom MCP server with OAuth that was slightly different than anything I could find publicly, either via the MCP Inspector or CloudFlare Remote MCP Servers.

Last week, I prototyped an MCP client that mimicked a browser-based experience similar to the MCP Inspector, but one that supported Streamable HTTP transport. This allowed me to drop down one level to work with the end-to-end sequence of messages exchanged between the client and server, from authorization to tool execution and follow up notifications.

This has been a lot of fun and after the first two laps of this marathon, I’ve been reflecting on what I learned.

Takeaways

The most interesting takeaway was that vibe coding is awesome to learn a new language or programming paradigm. It may also be good enough to build a prototype. However, in this journey, I quickly hit a wall when I could no longer fully follow what was happening and couldn’t get myself unstuck. For example, one day I couldn’t debug why the authorization flow was going into a loop. That day I ran out of my model call limit just trying different guesses. Then I had to fall back on my own understanding and it frustrated me that I hadn’t come to understand what I was doing.

The next day, I built the whole app from scratch myself, one step at a time. I used the code from the prior iteration to fill in the gaps as I went along. When I had it running in a day, I realized this alternative journey had been far more satisfying than the imaginary one, where I’d have been happy to simply have something working one time.

My sense is that most programmers building something meaningful that continously evolves will run into this performance wall sooner or later. They will come to value learning for its own sake as learning can help us get a lot more stuff done.

My growing conviction is that we have a “fast learning” button at our fingertips today not a “generate code” button. Software by definition is meant meant to evolve… and writing software that’s not been written before will likely remain in our own hands.

Next Steps

Could I create a shopping agent that plugs into my Amazon account? Or a Midjourney agent that might help me reimagine my living room? Or a personal agent that could do my taxes? How might multi-modal data travel on the MCP wires?

Devaluing the U.S. Dollar

It was Apr 2, 2025 when they chose to declare a national emergency. Over a decade in, I’m still adjusting to being an immigrant. At least it wasn’t Apr 1. That would’ve been truly next level.

In this whole emergency thing, I’m trying to put some order to the chaos that are the new U.S. tariffs. Getting past the formulaic comedy is one thing. The tariffs have some math. But what’s the reasoning, I’ve spent the whole day wondering.

You want a reserve currency?

After WWII, the Bretton Woods system established the U.S. dollar as the world’s reserve currency. As the dollar took over the role of gold in the interntational financial system, the U.S. agreed to link the dollar to gold (at the rate of $35 per oz), with all other currencies pegged to the dollar. After the war most countries had transferred their gold to the U.S., which was now closest to offering the most trusted currency, also backed by gold. A key motivation for the Bretton Woods system was to avoid floating, independently-managed national currencies by isolationist regimes, while enabling lightly managed, stable currencies with free convertability and free trade.

Little did they know how this would work out.

When the system became operational in 1958, countries settled their international dues in dollars, which could be theoretically converted to gold. After WWII, as Japan and Germany recovered, their share of world production increased at the expense of the U.S. As the U.S. further extended itself with military spending and foreign aid, the world grew less confident about its ability to convert dollars to gold. As U.S. inflation rose and the risk of a gold run grew, the U.S. unilaterally ended the convertability to gold in 1971, effectively devaluing the dollar.

Devaluing the dollar is a repeating trope, you’ll soon see.

Turns out, gold-backed currencies aren’t the easiest to manage. Moreover, trust can be built through perpetuating belief systems (arguably the greatest human invention). Even without gold convertability, the dollar has remained the world’s reserve currency. But this still has its downsides.

The Downsides

There are at least two downsides that an issuer of reserve currency might bear.

One, each country typically maintains reserves of the ‘reserve currency’. These reserves can act as insurance if bad players try to manipulate the country’s domestic currency. These reserves also allow the country to manage their domestic currency exchange rates to make domestic goods more price-competitive in the international market. To maintain such reserves, countries buy dollar denominated US treasuries, which causes the dollar to appreciate. Unfortunately, a rising dollar makes U.S. goods relatively less competitive. Sadly, this is the curse of a reserve currency.

Two, high demand for treasuries encourages the U.S. government to borrow with abandon. As countries purchase U.S. debt, where does all this borrowed money go? At the risk of gross reductionism, it goes to feed the U.S. consumption beast. Broadly, U.S. consumer habits are funded by savings from developing countries that buy U.S. issued debt. Run long enough, this scheme tends to make the U.S. addicted (entitled?) to debt, making the periodic visit to the rehab almost predictable. First world problems, eh?

Devaluation

To quickly recap, holding the reserve currency gives the U.S. geopolitical power (e.g. through trade restrictions), though it also results in a perpetually appreciating dollar. This happens to enable other countries to improve their competitive strength while undermining investment and production in the U.S.

Apparently, you can’t have a competitive currency and be the reserve currency.

But what if we throw in periodic devaluation? Now, can you do both?

Specifically, can you:

  1. Depreciate the dollar
  2. Reduce US debt servicing costs
  3. Maintain the US dollar as the reserve currency

Yes, we can. Sorry, Barack Obama!

#1 effectively delivers #2 by reducing the value and cost of servicing the current national debt.

#2 increases the chances of #3 by avoiding insolvency.

Bonus: A depreciated dollar makes U.S. production more competitive internationally, inviting capital investment. You like that?

But inflation? And tariffs? Omg, you’re killing me.

Tariffs

Generally, tariffs cause the national currency to appreciate. Specifically, U.S. tariffs would invite countries to re-baseline their currencies to make their exports attractive again. For example, during the trade war in 2018-19, a JP Morgan report recounts:

The PBOC allowed the CNY, which operates within a semi-fixed exchange rate regime, to devalue through controlled FX mechanisms. A weaker CNY partially cushioned the impact of tariffs by making Chinese exports relatively cheaper and preserving their competitiveness in the global market.

Just trade uncertainty can also lead to depreciation of national currencies. Not making this up. It’s in the same JP Morgan report. This would lead to the U.S. dollar appreciating. Wrong direction, you say?

While devaluating individual currencies can partially absorb inflation in the U.S., in the mid-to-long term these countries need to lower their domestic interest rates. This would reduce local production costs, making their exports attractive again. This would also effectively cap inflation in the U.S. Controlling inflation in the U.S. would in turn open the door to lower U.S. interest rates and lower cost of servicing the U.S. national debt. By the way, as individual national currencies depreciate, the dollar also has room to climb down from its overvalued state.

Are we back to our original series of goals or what?

Alternatively, companies from these countries can invest in building production capacity in the U.S.

The U.S. would benefit either way.

But why don’t countries just lower their domestic interest rates without all this drama? Lower interest rates (aka monetary easing) can stimulate growth, but can be risky. Among other things, it can lead to (1) foreign capital outflows, and (2) unwanted inflation, which would hurt the local population. No politician likes this part, especially if you’re a non-elected Chinese official.

When a country isn’t willing to naturally take on these risks, it can enter trade negotiations with the U.S. It’s definitely war time, and negotiations are good, right?

Dare we say that the U.S. would benefit either way?1

Overall, I can’t defend the reciprocal tariff rate math by any means, but maybe that’s not the point. Could there still be some order behind the chaos? After all, yesterday’s executive order contains…

the modification authority, allowing President Trump to increase the tariff if trading partners retaliate or decrease the tariffs if trading partners take significant steps to remedy non-reciprocal trade arrangements and align with the United States on economic and national security matters.



1

In 2003, to address a similar problem, Warren Buffet proposed to issue Import Certificates to U.S. exporters, allowing them to earn, say 10%, for their exports. This would allow U.S. exporters to lower the price of their exported goods by 10%, making them more competitive internationally. If U.S. entities imported goods, they would need to buy Import Certificates for 10% of the imported value. This would make imports more expensive. It is essentially a tariff dressed as a certificate.

Vibe Coding

Everyone’s calling it “vibe coding”, and I’m quite enjoying it. It’s fast and fun, like a game where you create your own levels, and keep going till you exhaust the model. I go from “working” to “working” and unlock new challenges everyday. For example, I’ve never coded in Typescript, but I’ve now built two apps, one for personal finance using bank transactions and a second one to estimate income taxes. Next, I want to fine-tune a model and evaluate it using model-generated datasets, things I’d only dreamt of doing last year…

Surprisingly, it’s fun because it’s not always smooth sailing. People are comparing generative AI coding skills with human coding skills, which I think misses the point. When I understand what the model is doing and why it’s doing it, it feels like moving at the speed of logic rather than being constrained by language and syntax.

I must admit that part of the fun is discovering the model’s limits. Maybe these are Cursor’s limits rather than the model’s limits? For the record, I’ve been using Cursor with the default model, Claude 3.5 Sonnet. I’ve also used Claude Code, but that was a different experience. I might write about it at a later point. Yesterday, someone asked me what was it like to use Cursor. So here’s a short roundup of what I discovered.

Always be vigilant

Confabulation: Even when I provided reference documentation from Stripe, the model (Cursor?) confidently responded with something along the lines of “Looking at the documentation, this is correct.” It only relented when I asked it to provide a reference. If I hadn’t read the documentation myself, it may have taken me much longer to debug situations where it generated handlers for completely made up wehbook events.

Blindness: The model uses its previously generated code as ground truth. When I manually fixed code, it didn’t seem to effect the model’s subsequent generations, though asking it to fix the mistakes helped it recognize the “fix” before I could continue. For example, when I renamed a field in a Typescript interface (e.g. short_term_capital_gains instead of just capital_gains), it would not pick this up for the code it generated after this. At least a couple of times, it created new files without taking the repo structure into account. For example, it created a new scripts folder for database migration at the root level when it had previously created similar scripts in the src/app/db level.

Specificity is no antidote to eagerness

Overreaching: When I asked to add a new button and selector to a web interface, it added half dozen new packages and four new files. We already had Tailwind CSS and basic components to add these features, and I was not expecting that it would make the task to be 10x more complex than I wanted it to be. Fortunately, reverting was simple.

Overassuming: On another occasion, I prompted it to add specific new fields to one Typescript interface, but it also added to another interface for no reason. This worried me about what else it had assumed… maybe most of its assumptions were good?

Don’t bank on its stamina

Anti-endurance: Ironically, sessions that don’t go well last longer. The session where it confidently misinterpreted Stripe’s documentation was also the session that drew on for a couple of hours. At some point, it froze in the middle of responses, leaving the repo code in intermediate state. Rerunning the prompt sometimes resolved the issue. After a point, it stopped responding altogether. It would respond to short and simple questions after that.

Anti-recall: On a few occasions, it added dependencies even after we explicitly decided against these. For example, I began the project by asking for options and trade offs, and directed to proceed without taking a dependency on the Prisma ORM. When I tried to add new features, it would forget this instruction, and add code requiring Prisma. This occurred enough that I asked it to find a way to remember it. It proposed to create a project plan without realizing that we had created one at the beginning of the project. I wasn’t sure if updating it would help it remember better, but the issue hasn’t reappeared since.

Intelligence != Ownership

No Reflections: On many occasions, it would leave behind spurious code or files, making no effort to clean up after I had redirected its suggested solution. Sometimes this clean up became difficult as I lost track of all the changes that had to reversed. Maybe this is a good thing, because I wouldn’t want it getting too eager overthinking its past results!

Inventing Factories

If I had asked people what they wanted, they would have said faster horses, Henry Ford is famously quoted. Yet, Ford didn’t invent cars to replace horses. He invented factories to make cars cheaper and more accessible. Factories that would make the car, according to Ford, ‘so low in price that no man making a good salary will be unable to own one.’

When Ford developed the Model T, cars had been around for decades, made ‘artisinally’, if you wish. They were expensive and unreliable. Ford’s goals were affordability and reliability. His first idea was to introduce consistently interchangeable auto-parts. His second idea was the moving assembly line, which reduced the time workers spent walking around the shop floor to procure and fit components into a car. ‘Progress through cautious, well-founded experiments,’ is a real quote from Ford. The 1908 Model T was 20th in the line of models that began with the Model A in 1903.

Borrowing a leaf from the top of the last century, more than inventing new foundation models (FMs), we need to invent the factories that make such models trivially affordable and reliable.

Billions of Models

If FMs are the new ‘central processing units’, technically there’s no limit to the number and variety of ‘programmable’ CPUs that we can now produce. A typical CPU chip requires $100-500M in research and design, $10-20B in building out fabrication capacity, and 3-5 years from concept to market. Today, FMs can be pretrained within months (if not weeks) for much less. On this path, more than OpenAI, Google and DeepSeek have accelerated affordability.

Model ProviderSmall ModelMid-Size ModelReasoning Model
AnthropicHaiku 3.5:
$0.8/MIT, $4/MOT
Sonnet 3.7:
$3/MIT, $15/MOT
N/A
OpenAIGPT 4o-mini:
$0.15/MIT, $0.60/MOT
GPT 4o:
$2.5/MIT, $10/MOT
o3-mini:
$1.1/MIT, $4.4/MOT
DeepSeekN/AV3:
$0.27/MIT,$1.10/MOT
R1:
$0.55/MIT, $2.19/MOT
GoogleGemini 2.0 Flash:
$0.10/MIT, $0.40/MOT
Gemini 1.5 Pro:
$1.25/MIT, $5/MOT
Gemini 2.0 Flash Thinking:
Pricing N/A

Note: MIT: Million Input Tokes, MOT: Million Output Tokens

Affordability

Affordability is admitedly relative. Software developers may be willing to pay more than marketing content creators. The value of research for knowledge workers depends on the decisions it enables them to make. Customer support may be valuable, but no more than the total cost of employing and managing human support representatives. When the returns (or savings) are clear, there is quantifiable demand.

OpenAI loses more than twice that it makes, and it needs to cut costs by a roughly an order of magnitude to be sustainably profitable. If Nvidia’s Blackwell chips deliver the promised 40x price to performance improvement, this will be the year of non-absurd business models. More power to them.

It’s possible that DeepSeek is already there. More importantly, DeepSeek might represent the price level of an API provider who doesn’t have a application business to cannibalize. Is it ironic that OpenAI is facing an innovator’s dilemma of their own?

Meanwhile, Anthropic charges a premium over OpenAI’s application-level rates. They also need an order of magnitude reduction. They might already be there with TPUs, or with Trainium 2’s 75% price drop they’re likely getting there. It’s unclear if they have a cannibalization issue yet, though their CPO definitely wants their product teams to iterate faster.

Training and adapting the model to meet specific and evolving customer expectations is the business need. On this point, popular applications such as Perplexity and Cursor/Windsurf are arguably underrated. Just as Midjourney provides a delightful experience by getting the combination of the model and user experience just right, these applications taking their shot. After all, the model is a software component, and application developers want to shape it endlessly for their end users. The faster these developers iterate with their models based on feedback from their applications, the faster they’ll see product-market fit. They can then figure out how to grow more efficient. Finding product-market fit is the only path to affordability.

People mistake such applications to be ‘wrappers’ around the model or ‘just’ interface engineering. That’s a bit like saying Google is just interface engineering over Page Rank.

Reliability

For a given use case, reliability is a function of: How often does the model ‘break’? How easy and/or expensive is it to detect and fix?

In creative work, there’s often no wrong answer. And checking the result is generally easier than generating the result.

For automation, it’s more of a spectrum ranging from infeasible to non-compliant, to varying degrees of risky, to safe & expensive, to safe & cost-effective.

What makes the application risky vs. safe? And who underwrites the risk?

One answer is tool use. Multiple tool use protocols such as Model Context Protocol want to make FMs more aware of available tools in addition to making tool use more effective and efficient. However, there’s no significant reason (yet) for any major model provider to use another’s protocol. I expect protocols to emerge from most if not all model providers, and feel that standardization is at least a year or two away. Even then, new standards may usurp older ones, and different economic and geopolitical agendas could shape these in weird ways.

However, a sophisticated ‘tool’ or service really wants to be an agent. When multiple agents need to work together, we need distributed ownership, separation of concerns, authentication, authorization, auditability, interoperability, control, non-repudiation, and a lot more. Much of this plumbing already exists with OAuth2.0 and can be repurposed for service agents, but a lot still needs to be built. Whoever builds the most reliable multi-agent collaboration systems will likely grow to become the most trusted.

The Industrial Revolution

Unlike fantasy AI factories that spew pure intellgence as tokens, these factories will ship affordable and reliable engines that can safely power software applications. While we urgently kick off the next manufacturing industrial build out in the U.S., my guess is that these software factories will take years to build. We need to have started yesterday…