Post

Why the hard part is not code, it is transferring meaning

Refactoring a complicated feature is already hard. Doing it with the help of an LLM adds a new type of difficulty: you must translate your intent, assumptions, and edge cases into language that is unambiguous enough to guide changes safely. Most refactoring failures with LLM assistance do not happen because the model cannot write code. They happen because the model receives an incomplete, underspecified picture of the feature, then fills gaps with guesses. The output can look plausible while quietly breaking behavior that users depend on. This post is about the work that makes LLM-assisted refactoring successful:iterative requirements refinement. Not rewriting everything. Not perfect specs. But enough clarity that the implementation becomes obvious and safe.

To keep things relatable, I’ll use a regular-app example: bulk editing calendar events. Part 1 is about the problem, the real challenges of refactoring with an LLM, and a process for turning fuzzy requirements into a coherent set of rules before implementation begins. Part 2 will show the workflow, simulated rounds of refinement, a scenario matrix, and how it connects to implementation and testing.

1. The Feature That “Worked Fine” Until One Request Changed Everything

A common refactoring story starts with a stable feature.

You can edit a single calendar event:

Title
Location
Start time
End time

It saves immediately, validates quickly, and users trust it. Then comes the request:

Allow selecting multiple events and editing them together.

This looks like a UI enhancement. It is not. It is a new interaction model.

Single-select assumes:

One object, one truth
Inputs map directly to stored values
“Empty” means the user cleared it
Input changes can safely auto-save

Multi-select breaks every one of those assumptions.

2. Why Refactoring With an LLM Is Hard in Practice, Even for “Normal” Apps

A lot of posts gloss over what makes refactoring with an LLM fundamentally different from refactoring alone. The hard part is not that the model cannot write code. The hard part is transferring the right intent and constraints into a form the model can execute without guessing.

2.1 Your intent is not in the code, and you cannot transmit the full system in one prompt

In a refactor, what you care about is not “what the code currently does,” but:

what must stay the same
what is allowed to change
which behaviors are accidental vs essential
which edge cases are rare but critical

Most complex features also carry historical behaviors, hidden invariants, UX assumptions, and edge case expectations. You cannot fit all of that into one prompt. Trying to dump everything upfront usually fails because it overwhelms the model and still misses what matters.

So when you ask an LLM to refactor, you are asking it to infer intent from incomplete evidence.

2.2 Natural language compresses too much

When you say:

“Support bulk edit”
“No partial saves”
“Show mixed values”

these sound clear, but they hide dozens of decisions.

Even a phrase like “empty input” can mean multiple distinct states:

the user wants to clear the field
the selected items have different values
the field does not apply for some items

If you do not separate those meanings, the model will pick an interpretation, and it can be the wrong one.

2.3 LLMs will try to be helpful by guessing

When requirements are incomplete, the model tends to:

choose a reasonable default
assume common UX patterns
smooth over contradictions

That is useful for brainstorming. It is risky for refactoring. A refactor is not “what would be nice.” It is “what must be correct.”

2.4 The model will not know what you consider sacred unless you state invariants

Only you know which behaviors must not change. You must explicitly state invariants such as:

“single-select stays unchanged”
“no partial saves”
“mixed is not empty”

Without these, the model may optimize for convenience or typical patterns.

2.5 The real bottleneck is specification transfer, and it requires dialogue

Most of the time, once the rules are clear, the LLM could implement the change. The failure mode is earlier: getting from “what we want” to “what the system should do in every case.”

That is why a successful workflow is closer to an interview:

you state goals
the model probes ambiguous parts
you answer with decisions
those decisions become the real spec

Iterative refinement is not overhead. It is the mechanism that transfers intent.

3. Why Multi-Select Features Are a Perfect Ambiguity Trap

Bulk editing introduces a set of ambiguity patterns that show up across products.

Pattern A: Overloaded emptiness

An empty field can mean “clear,” “mixed,” or “unknown.”

Pattern B: Implicit intent

Single-edit can auto-save because the intent is obvious. Bulk-edit makes intent fragile. One accidental keystroke can overwrite dozens of records.

Pattern C: Validation changes character

Validation is no longer local. You must decide whether:

Validation is per item
Validation is for the whole selection as a unit
Invalid items block all updates
Partial application is allowed

Pattern D: State becomes two-layered

You now have:

UI state (what the user currently sees and edits)
persisted state (what is stored for each selected item)

Conflating those is the source of most bugs. These patterns exist regardless of the domain.

4. The Mistake: Treating “Requirements” as One Paragraph

The most common failure looks like this:

“If multiple items are selected, show a form and apply changes to all selected items.”

That is not a requirement. It is a headline. A usable requirement must answer “what happens if…” questions, because those are the situations that create bugs.

For bulk editing calendar events, you immediately hit:

Some events have different titles
Some have different locations
Times are always different
Some items might be missing required fields
Some fields might not apply to all items

If you start implementation without addressing these, you will implement guesses.

5. The Better Approach: Use the LLM as a Requirements Debugger

Instead of asking the LLM to implement immediately, use it to break down your requirements until they become solid.

A useful mental model:

The LLM is a fuzz tester for your spec.

You feed it a rule, and it generates counterexamples:

contradictions
undefined behavior
ambiguous states

This is exactly what you were doing in your real workflow.

6. The Refinement Toolkit: Four Artifacts That Make This Work

To stop the conversation from becoming endless, you need structure. These are the artifacts that keep refinement productive.

6.1 A glossary

Define terms that hide ambiguity:

“empty”
“mixed”
“apply”
“save”
“required”
“optional”
“pending”
“commit”

You do not need formal documents. A small list prevents misunderstandings.

6.2 A decision log

As soon as you decide on something, record it as a rule.

Example:

“In bulk edit, changes are pending until confirmed per field.”
“No partial saves across the selection.”
“Mixed value state is distinct from empty value state.”

This prevents circular discussions.

6.3 A scenario matrix

A compact table of:

input state
user action
expected outcome

If a rule cannot be expressed as scenarios, it is not ready.

6.4 A stop criterion

Without a stop rule, refinement never ends. A practical stop criterion is:

All key scenarios have defined outcomes
There are no contradictory rules
Implementation can be described as “mechanical”
Remaining edge cases are low impact and can be deferred

7. What “Clear Enough” Looks Like Before Implementation

By the time you are ready to implement bulk editing, you should be able to state, in plain language:

How mixed values are displayed
When a change becomes intentional
Whether saves are atomic or partial
How do required fields behave when they are already valid in stored data
Which actions are blocked up front

If you can do that, implementation is no longer the hard part.

8. Part 1 Wrap-Up: The Real Lesson So Far

The core lesson is not about calendars. It is this:

Refactoring with an LLM fails when intent stays implicit.
Success comes from a deliberate refinement conversation that converts intent into rules.
The goal is not a perfect spec. The goal is a consistent one that supports implementation and tests.

Part 2 will show the workflow as simulated refinement rounds, the final rule set, the scenario matrix that covers the key cases, and the concrete prompts you can reuse to make this work on your own refactors.

Refactoring With an LLM: Turning Vague Intent Into Implementable Rules