
Refactoring With an LLM: Turning Vague Intent Into Implementable Rules
Why the hard part is not code, it is transferring meaning
Refactoring a complicated feature is already hard. Doing it with the help of an LLM adds a new type of difficulty: you must translate your intent, assumptions, and edge cases into language that is unambiguous enough to guide changes safely. Most refactoring failures with LLM assistance do not happen because the model cannot write code. They happen because the model receives an incomplete, underspecified picture of the feature, then fills gaps with guesses. The output can look plausible while quietly breaking behavior that users depend on. This post is about the work that makes LLM-assisted refactoring successful:iterative requirements refinement. Not rewriting everything. Not perfect specs. But enough clarity that the implementation becomes obvious and safe.
To keep things relatable, I’ll use a regular-app example: bulk editing calendar events. Part 1 is about the problem, the real challenges of refactoring with an LLM, and a process for turning fuzzy requirements into a coherent set of rules before implementation begins. Part 2 will show the workflow, simulated rounds of refinement, a scenario matrix, and how it connects to implementation and testing.
1. The Feature That “Worked Fine” Until One Request Changed Everything
A common refactoring story starts with a stable feature.
You can edit a single calendar event:
- Title
- Location
- Start time
- End time
It saves immediately, validates quickly, and users trust it. Then comes the request:
Allow selecting multiple events and editing them together.
This looks like a UI enhancement. It is not. It is a new interaction model.
Single-select assumes:
- One object, one truth
- Inputs map directly to stored values
- “Empty” means the user cleared it
- Input changes can safely auto-save
Multi-select breaks every one of those assumptions.
2. Why Refactoring With an LLM Is Hard in Practice, Even for “Normal” Apps
A lot of posts gloss over what makes refactoring with an LLM fundamentally different from refactoring alone. The hard part is not that the model cannot write code. The hard part is transferring the right intent and constraints into a form the model can execute without guessing.
2.1 Your intent is not in the code, and you cannot transmit the full system in one prompt
In a refactor, what you care about is not “what the code currently does,” but:
- what must stay the same
- what is allowed to change
- which behaviors are accidental vs essential
- which edge cases are rare but critical
Most complex features also carry historical behaviors, hidden invariants, UX assumptions, and edge case expectations. You cannot fit all of that into one prompt. Trying to dump everything upfront usually fails because it overwhelms the model and still misses what matters.
So when you ask an LLM to refactor, you are asking it to infer intent from incomplete evidence.
2.2 Natural language compresses too much
When you say:
- “Support bulk edit”
- “No partial saves”
- “Show mixed values”
these sound clear, but they hide dozens of decisions.
Even a phrase like “empty input” can mean multiple distinct states:
- the user wants to clear the field
- the selected items have different values
- the field does not apply for some items
If you do not separate those meanings, the model will pick an interpretation, and it can be the wrong one.
2.3 LLMs will try to be helpful by guessing
When requirements are incomplete, the model tends to:
- choose a reasonable default
- assume common UX patterns
- smooth over contradictions
That is useful for brainstorming. It is risky for refactoring. A refactor is not “what would be nice.” It is “what must be correct.”
2.4 The model will not know what you consider sacred unless you state invariants
Only you know which behaviors must not change. You must explicitly state invariants such as:
- “single-select stays unchanged”
- “no partial saves”
- “mixed is not empty”
Without these, the model may optimize for convenience or typical patterns.
2.5 The real bottleneck is specification transfer, and it requires dialogue
Most of the time, once the rules are clear, the LLM could implement the change. The failure mode is earlier: getting from “what we want” to “what the system should do in every case.”
That is why a successful workflow is closer to an interview:
- you state goals
- the model probes ambiguous parts
- you answer with decisions
- those decisions become the real spec
Iterative refinement is not overhead. It is the mechanism that transfers intent.
3. Why Multi-Select Features Are a Perfect Ambiguity Trap
Bulk editing introduces a set of ambiguity patterns that show up across products.
Pattern A: Overloaded emptiness
An empty field can mean “clear,” “mixed,” or “unknown.”
Pattern B: Implicit intent
Single-edit can auto-save because the intent is obvious. Bulk-edit makes intent fragile. One accidental keystroke can overwrite dozens of records.
Pattern C: Validation changes character
Validation is no longer local. You must decide whether:
- Validation is per item
- Validation is for the whole selection as a unit
- Invalid items block all updates
- Partial application is allowed
Pattern D: State becomes two-layered
You now have:
- UI state (what the user currently sees and edits)
- persisted state (what is stored for each selected item)
Conflating those is the source of most bugs. These patterns exist regardless of the domain.
4. The Mistake: Treating “Requirements” as One Paragraph
The most common failure looks like this:
“If multiple items are selected, show a form and apply changes to all selected items.”
That is not a requirement. It is a headline. A usable requirement must answer “what happens if…” questions, because those are the situations that create bugs.
For bulk editing calendar events, you immediately hit:
- Some events have different titles
- Some have different locations
- Times are always different
- Some items might be missing required fields
- Some fields might not apply to all items
If you start implementation without addressing these, you will implement guesses.
5. The Better Approach: Use the LLM as a Requirements Debugger
Instead of asking the LLM to implement immediately, use it to break down your requirements until they become solid.
A useful mental model:
The LLM is a fuzz tester for your spec.
You feed it a rule, and it generates counterexamples:
- contradictions
- undefined behavior
- ambiguous states
This is exactly what you were doing in your real workflow.
6. The Refinement Toolkit: Four Artifacts That Make This Work
To stop the conversation from becoming endless, you need structure. These are the artifacts that keep refinement productive.
6.1 A glossary
Define terms that hide ambiguity:
- “empty”
- “mixed”
- “apply”
- “save”
- “required”
- “optional”
- “pending”
- “commit”
You do not need formal documents. A small list prevents misunderstandings.
6.2 A decision log
As soon as you decide on something, record it as a rule.
Example:
- “In bulk edit, changes are pending until confirmed per field.”
- “No partial saves across the selection.”
- “Mixed value state is distinct from empty value state.”
This prevents circular discussions.
6.3 A scenario matrix
A compact table of:
- input state
- user action
- expected outcome
If a rule cannot be expressed as scenarios, it is not ready.
6.4 A stop criterion
Without a stop rule, refinement never ends. A practical stop criterion is:
- All key scenarios have defined outcomes
- There are no contradictory rules
- Implementation can be described as “mechanical”
- Remaining edge cases are low impact and can be deferred
7. What “Clear Enough” Looks Like Before Implementation
By the time you are ready to implement bulk editing, you should be able to state, in plain language:
- How mixed values are displayed
- When a change becomes intentional
- Whether saves are atomic or partial
- How do required fields behave when they are already valid in stored data
- Which actions are blocked up front
If you can do that, implementation is no longer the hard part.
8. Part 1 Wrap-Up: The Real Lesson So Far
The core lesson is not about calendars. It is this:
- Refactoring with an LLM fails when intent stays implicit.
- Success comes from a deliberate refinement conversation that converts intent into rules.
- The goal is not a perfect spec. The goal is a consistent one that supports implementation and tests.
Part 2 will show the workflow as simulated refinement rounds, the final rule set, the scenario matrix that covers the key cases, and the concrete prompts you can reuse to make this work on your own refactors.
Leave a Reply
Your e-mail address will not be published. Required fields are marked *