Day 1: AI Powered Blog Editing | AI Systems Field Notes

AI-Powered Blog Editing

Today I tested my AI agent’s voice-to-text functionality inside the blog system.

Instead of typing notes at the end of the day, I speak. The agent transcribes, structures, and generates a full draft post from our conversation. It works surprisingly well. The energy of spoken thoughts is different from typed ones, it feels more fluid, less filtered, less self-conscious.

But something became clear very quickly once the drafted version was published.

Generation is not the same as authorship.

The post read like it came from a machine.

The AI can turn my voice into paragraphs. It can organize my thoughts. It can even make the structure cleaner than what I would produce in raw form. But it doesn’t quite sound like me. The emphasis isn’t always where I would put it. When I read it back, it doesn’t fully satisfy me.

As much as I want this blog to be fully automated, I’ve realized that if generation can be automated, editing becomes sacred. That’s the place where humans must stay in the loop.

What I Built

I tested the voice-to-text system for this blog, and technically, it works perfectly. I talk through what I built, what I learned, and what blocked me. The AI parses the conversation and produces a structured MDX draft aligned with my schema.

The system feels almost frictionless.

But publishing AI output blindly would defeat the entire purpose of documenting a real AI learning journey. I need to stay in the driver’s seat for every post.

So I’m building an edit function directly into the workflow.

The flow now looks like this: I speak → the AI generates → I land on an edit page → I refine it until it feels right → I publish.

It’s not fully automated — and that’s intentional. Each post needs to reflect my real thoughts and emotions at that moment in the journey. Automation is helpful, but authenticity is non-negotiable.

That edit window is not a minor feature. It’s the control surface. It’s where I reclaim ownership of the ideas. Without this step, the system would feel automated but hollow.

Ironically, the more powerful the generation becomes, the more important the editing layer becomes.

The Blocker

The main blocker today wasn’t generation. It was state management.

I now have two types of posts in the system.

There are published posts that exist in the repository.

And there are temporary drafts stored in session storage during the new post generation flow.

Both need to be editable in the same interface.

Architecturally, the cleanest approach is to reuse the same edit page for both. The UI is identical. The editing behavior is identical. But the source of truth is different.

Before rendering the edit page, the system must answer one simple but critical question:

Is this post persisted, or is it temporary?

That distinction determines the data source, the save logic, and the publish behavior.

A published post is persisted. It lives in the repository. Editing it means fetching real content, modifying it, and committing the changes again.

A temporary post is ephemeral. It only exists in session storage. It hasn’t been written to disk. It’s like clay that hasn’t been fired yet — still soft, still changeable.

In a way, this is the deeper lesson of AI-native systems.

The impressive part isn’t generating content anymore. That’s table stakes.

The real work is orchestration. State transitions. Clear boundaries. Knowing when something is a draft, when it becomes finalized, and when it turns permanent.

Voice-to-text generation was the flashy part.

Editing (by a human, me), state detection, and clean architecture are the real engineering.