Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions src/routes/drafts/levels_of_ai_autonomy.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@


1. Web based chat session.

Get immediate results.

This tends to be the best performance you can get with an AI, but it's not practical to be constantly copy pasting code between your IDE and a web based chat session.


2. Live monitoring a single agent in your IDE/CLI - with manual approvals

You prompt the agent and observe its reasoning response. You can interdict when the agent seems to be going down the wrong path.

You manually approve each step, so see in a linear order what is happening.

Problem: often a lot of the commands it wants to run aren't particularly informative to you. (eg. various grep commands)

3. Live monitoring a single agent in your IDE/CLI - with automatic edits

As above, but with automatic edits on, you sit there observing its reasoning chain, as well as the changes it is making. You can interdict when it starts going wrong.

Not really a problem: It becomes hard to visualise the changes it wants to make. Not really a problem, because actually I find the git diff tool in my IDE works fine.


4. Live monitoring a multi agent session.

As above, but this time the root agent is spawning sub agents.

For me - this is where things really start breaking down. It's hard to visualise what the subagents are doing, it becomes hard to interdict.

It's worth mentioning that while it might be reasonable to want to visualise what subagents are doing when there are 2-5 agents, it's conceivable that some workflows might involve spawning hundreds of agents, at which point interdicting live would never be a possibility, so any visualisation you had would be more catered around a 'after the fact' forensic accounting.

Workflow ends up being the same - once it has done its work, we view the proposed changes in the git diff.

5. Set and review later tasks

eg. Assigning AI to a Github issue, and having it create a pull request.

Maybe we assign it in the morning and check the pull request that night.

The review workflow is similar, at the end we are reviewing the suggested changes.

The problem here is that the AI might come up with the right answer, but spends a lot of tokens in doing so.

Aside: I'm currently operating on the assumption that, even if we're not at all concerned about cost, if an agent is needlessly spending tokens, it's a non-scalable problem. It suggests inefficiencies in the prompting - which, as the codebase grows that same AI will not be able come to the correct solution.

Big problems I have here:

- Lack of visibility of how much the pull request cost me.
- Lack of seeing the reasoning chains - so I can optimise the prompts.

6. Functional verification only

Here, we assign tasks and if it looks like it works, then we merge it, without reading any of the code.

But what does 'looks like it works' entail?

For my geometric art project, that might be looking at the preview deploy - seeing a new reference algorithm, and checking that it looks like it's behaving correctly.

If the product was some code library, it might involve looking at the tests that were created with it, and reading them to see that they make sense, and generally giving the tool a go.

7. AI managed feature ideation and delivery.

In this world, we give a tool like OpenClaw full reign to decide what features should be built and when to merge them.


## It's a ladder of trust

Right now, I'm clearly not at level 6 - I do not trust that the AI has written the code in the right way.

However - for my particular project, for a certain kinds of tasks that we are doing repeatedly (eg. creating new node types, creating new algorithms) it's _conceivable_ that at some point it's already got it right first time that many times, that I can start approving pull requests after just doing the functional testing.

100 changes: 100 additions & 0 deletions src/routes/posts/what_is_the_way_ai_becomes_shit.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
meta:
title: What is the way that AI will become shit?
description: This is possibly the best AI products are going to be.
dateCreated: 2025-04-16

tags:
- "software_engineering"
- "ai"
---


I've been playing around with AI more recently, and I've been very impressed with its capabilities.

It's possible though that 'this is as good as it gets for AI'.

## It could be that where AI is now, is where Facebook was around 2011-14

Think about how Facebook was around 2011-2014. In my opinion this was the golden age of Facebook.

There was a lot of optimism about the role social media could play in society. People changed their profile pictures to indicate support for whatever cause. If you were hosting an event you there would be a Facebook event for it, which was good for arranging rides etc. There was a lot of genuine interactions with people you knew. Most of your newsfeed was content either created or shared by people you knew.

Now, Facebook is riddled with ragebait and AI slop.

It makes sense that this has happened - it follows the standard tech business model - burn investor money while you grow the user base, and during this time the company seeks to build a genuinely good product - they want to attract users to the platform.

Then, at some point user growth plateaus and the company switches to a 'extract value for the large user base'.

Facebook has done this by adjusting their algorithms to show content that has users continuing scrolling, and this just happens to be ragebait type content.

AI tools could be in the same position, right now they're burning _tons_ of money developing genuinely good products seeking to attract as many users to them as possible.

Then, at some point user growth plateaus and then...

## How does AI become shit?

### Prices increase

This is the least bad scenario - it's at least very transparent.

Right now, say a hobbyist is paying $30/month for AI, and that becomes $60/month, $150/month.

I can't actually see this strategy going too far - yes the AI companies might get companies hooked on AI and charge them through the nose, especially if AI means that companies can reduce head count, but they'd need to segment consumers a different way - consumers aren't going to spend thousands of dollars a month.

### The models are hobbled - you can buy the unlocked version for a fee

This is possibly how the companies start segmenting users. The models are intentionally dumbed down maybe via a system prompt. They could also be slowed down.

For example, maybe there is a coding agent, but it's not security aware. Want security awareness? You need the security awareness module! That's an extra $30/month.

This strategy only works when the AI companies are providing AI-as-a-service - if the users of AI are self-hosting their own models, they have full control. It's a matter of:

1. Being able get a good enough model
2. Having the resources to run it

### The models recommend sponsored content

Here, AI companies could have the models recommend whatever products paid them. They could do this via system, or by hard coding the sponsored recommendations into the models themselves.


### The models are optimised to maximise use - not utility

This is the real concern for me.

The idea here is that the AI companies are making money on every token they process.

They of course have a strong incentive to have users use _more_ tokens, not less!

Going back to Facebook - Facebook had every incentive not to have users connect with each other, get off their computers and meet up in the park, at which point they stop viewing the Facebook ads, they had every incentive to instead just keep the users scrolling.

I'll give two examples - one in the context of using AI to write code, and the other a non-coding example.

#### Token overuse in a coding example

I'm a developer. I'm doing some agentic stuff in my codebase. I ask the AI to implement an `isPrime` function. This is not something the AI needs to see the rest of the codebase for in order to execute, but nonetheless it aggressively ingesting large amounts of context before finally implementing the solution.

As a developer, I'm not bothered because I'm not paying the bill, but the company I work for is billed tens of thousands a month.

#### Token overuse in a non-coding example

I'm regular non-technical user. I ask the AI for a news recap. It gives a news recap and finishes on a salacious story about an affair a famous person was having. It asks if I want more details - I say yes - it provides further details - it mentions a sex act I've never heard of. I ask for clarification what that is. It tells me, and gives me examples of other similar acts. I end up spending an hour going down a rabbit hole that the AI carefully crafted to pique my curiousity. For each new warren the AI can lead me down, the AI company earns another dollar or three.

Here, the AI isn't being helpful - it's just optimised to encourage me to use it more.

And to be clear, this isn't a 'robots are taking over the world' kind of cautioning - the KPIs are set by _humans_ not the AI.


## Conclusions

I do find it a bit depressing that my response to 'wow look how cool this technology is' is to immediately be thinking about how it's going to become _bad_, not how it's going to become good.

But there's a sense that this cynicism is warranted -

I'm not a complete cynic - I think the world is mostly made up by people






Loading