How to write user stories with AI: a practical guide

11 min read
Vitalii Ischenko
Abstract AI-assisted user story workflow with frosted-glass cards connected by a crimson ribbon.

A user story is a short, plain-language description of something a user needs, written from their point of view. The usual format is a single sentence: as a [type of user], I want [something], so that [benefit]. It is deliberately small, and it is deliberately not a full specification. The story is a promise to have a conversation, not the conversation itself.

That is also why people get them wrong. They either write a one-line title with no real user or value behind it, or they cram an entire spec into a story and call it agile. Both miss the point. This guide covers the format, what separates a good story from a useless one, how to write acceptance criteria that hold up in testing, and where AI genuinely helps with the tedious parts of all that.

What a user story is, and what it is not

The job of a user story is to capture who wants something, what they want, and why, in a form small enough to build and test in one go. The "why" is the part that matters most and the part people drop. A story without a "so that" is just a feature request, and a feature request with no stated benefit is the easiest kind of work to build and later realize nobody needed.

A story is not a requirements document. It does not list every field, rule, and edge case in the sentence itself. Those live in the acceptance criteria and in the conversation around the story. Keeping the story short is not laziness; it is what keeps it readable and keeps the detail somewhere it can be checked.

The user story format

Most teams use a version of this template:

As a [type of user], I want [some goal], so that [some reason].

For example: as a returning customer, I want to reset my password from the login screen, so that I can get back into my account without contacting support.

The format is a tool, not a law. If forcing a piece of work into "as a user" makes it more awkward to read, write it plainly instead. But the three parts are worth keeping, because each one earns its place: the user keeps you honest about who this is for, the goal says what to build, and the reason tells you whether it is worth building at all.

What makes a good story: INVEST

INVEST is a checklist for whether a story is ready. It stands for independent, negotiable, valuable, estimable, small, and testable.

Independent means the story can be built on its own, without waiting on three others. Negotiable means the details are open to discussion rather than dictated. Valuable means it delivers something a user or the business actually wants. Estimable means the team can size it. Small means it fits in a single iteration. Testable means you can tell, objectively, when it is done.

The N trips people up. Negotiable does not mean the story is vague. It means the specifics are meant to be worked out between the people who want the feature and the people who will build it. On the projects where this works best, that discussion happens around the story before it is locked, which means the story that reaches a developer is already clear, and the negotiation is not still happening mid-sprint.

Acceptance criteria: how you know it is done

Acceptance criteria are the conditions a story must meet to count as finished. They are where the detail the story leaves out actually lives, and they are what a tester reads to know what to check. A story without them is untestable, which means "done" becomes a matter of opinion.

A common and useful way to write them is the given-when-then form, borrowed from behavior driven development:

Given a registered user on the login screen, when they request a password reset with their email, then a reset link is sent to that email and expires after 24 hours.

You do not have to use that structure. Plain bullet points work too. What matters is that each criterion is specific and checkable. For the password reset story, that might be:

  • A "forgot password" link is visible on the login screen.
  • Submitting a registered email sends a reset link to that address.
  • The reset link expires after 24 hours.
  • Submitting an unregistered email shows the same confirmation, without revealing whether an account exists.

How many criteria? A rough guide we use is no fewer than three and no more than nine. Fewer than three usually means the story is underspecified. More than nine usually means it is really several stories bundled together and should be split.

Do not forget non-functional requirements

Stories tend to capture what the system should do and quietly skip how well it should do it. Speed, security, accessibility, how the system behaves under load: these are non-functional requirements, and they are easy to leave out of a story because no user asks for them in a sentence. Nobody writes "as a user, I want the page to load in under two seconds." They just leave when it does not.

You can handle these as acceptance criteria on the relevant story, or as their own stories where they are significant. The point is to write them down somewhere checkable, because a feature that works but is too slow or leaks data is not actually done.

A worked example

Here is a complete small story with its criteria, the way it would look ready for a developer:

As a returning customer, I want to reset my password from the login screen, so that I can get back into my account without contacting support.

Acceptance criteria:

  • A "forgot password" link is visible on the login screen.
  • Submitting a registered email sends a reset link to that address.
  • The reset link expires after 24 hours and works only once.
  • Submitting an unregistered email shows the same confirmation message, without revealing whether an account exists.
  • Resetting the password invalidates any active sessions for that account.

Read it and you can see what to build and what to test, and the last two criteria show where the real thinking went: the security details a one-line title would have hidden.

Splitting stories that are too big

When a story fails the "small" test, you split it, and the trick is to split by meaning rather than by size. Cutting a large story into "front-end half" and "back-end half" gives you two stories that are each useless on their own. Splitting it into "reset by email" and "reset by SMS" gives you two stories that each deliver something. We cap a finished set at roughly a dozen stories per area of work and split anything bigger, then check each piece against INVEST again, because splitting badly is its own way of making a mess.

Seeing the whole picture with story mapping

A backlog is a flat list, and a flat list hides the shape of the product. Story mapping arranges stories by the user's journey across the top and by priority down the side, so you can see the whole experience and decide what the first usable version actually needs. It is worth doing when a backlog has grown to the point where nobody can hold the product in their head anymore, which on a real project happens quickly.

Writing stories with AI

This is where AI earns its place, because most of what makes user stories good is repetitive checking that humans get bored of doing.

A model can take a brief, a short agreed description of a feature, and draft a first set of stories with acceptance criteria from it. That draft is a starting point, not the answer. The value is in what comes next: the checks. On our projects, the same rules described above are written down as instructions the model applies. It splits stories that are too large, checks each one against INVEST, and keeps acceptance criteria within the three-to-nine range. It also does two things people are bad at. It catches duplicates, because a language model will happily write the same story twice in different words and not notice, so a deduplication pass is part of the process. And it flags negative phrasing, since criteria written as "the system should not do X" are harder to test than the same rule stated positively.

What the model does not do is decide. It drafts and it checks and it raises questions, and a person approves what goes forward. We keep the stories as linked notes the model can read, alongside the rest of the project's requirements, and use Claude to do the drafting and the consistency checks. When something contradicts a decision made earlier, it surfaces that as a question rather than quietly overwriting it. That last part is the difference between a useful assistant and one that introduces errors nobody catches.

Used this way, AI removes the bookkeeping. The judgment about what a story should be, and whether it is worth building, stays with the people who own the product.

Common mistakes

The ones we see most often: a story with no "so that," so nobody knows why it exists. A story so large it is really an epic in disguise. Acceptance criteria that are missing, vague, or written so they cannot be tested. Criteria phrased negatively. The same story written twice under different wording. And the opposite failure, a story so detailed it has become a specification, which defeats the point of keeping it short.

How we approach this in practice

The part teams underestimate is the brief. AI can draft a tidy set of stories from a vague request, and they will be tidy and wrong, because the model fills the gaps with plausible guesses. So we put the effort up front into getting the brief right, and treat the generated stories as a draft to challenge rather than a result to accept. The other discipline is review. It is tempting to let a model produce fifty stories and move on, but the duplication and the quietly untestable criteria only get caught when a person reads them, so the human pass is not optional. Done with that discipline, AI saves real time on the writing and checking. Skipped, it just produces more stories to clean up later. This is how we run it on client work.

Where this fits

User stories are one piece of keeping requirements clear and current, which is a larger subject we cover in the guide to requirements management for complex software. And the acceptance criteria you write here are what your tests later verify, which is the link at the heart of a requirements traceability matrix. If your team is writing stories that turn into untestable tasks, the fix is usually in how the stories and criteria are written, and that is work we can help with.

Share:
#Software Development#Technical Documentation#Business Analysis#AI
Vitalii Ischenko

Vitalii Ischenko

Business Analyst

Frequently Asked Questions

Ready to Start Your Project?

Let's discuss how we can help you achieve your business goals with cutting-edge technology solutions. Get a free consultation to explore how we can bring your vision to life.

Or call us directly:+1 888-438-4988

Request a Free Consultation

Your data will never be shared with anyone.