Thematic Analysis: A Step-by-Step Guide for Qualitative Researchers

One of the most widely used qualitative analysis methods in the world. Here is how to actually do it properly.

You have 15 interview transcripts. Each one is 40 minutes long. Somewhere inside all of that text is the answer to your research question, and your job is to find it without either cherry-picking quotes that confirm what you expected or drowning in the volume of data.

Thematic analysis is the method most qualitative researchers reach for in this situation. But the version that appears in many research reports, where a researcher reads the transcripts, identifies some themes, and writes them up, is not really thematic analysis. It is summarization with labels.

Done properly, thematic analysis is a systematic, iterative process that produces findings grounded in the data rather than in the researcher's prior assumptions. Here is how it actually works.

The Foundation: Braun and Clarke

The most widely used framework for thematic analysis was developed by Virginia Braun and Victoria Clarke, first published in their 2006 paper in Qualitative Research in Psychology. Their six-phase process remains the standard reference, though their later work on reflexive thematic analysis has added important nuance about the role of the researcher's perspective in shaping the analysis.

The Six Phases

Phase 1: Familiarization

Read your data. All of it. Multiple times. This sounds obvious but it is the step most frequently rushed.

In this phase, you are not looking for anything specific. You are building an intimate understanding of what the data contains: the range of perspectives, the recurring concerns, the surprising moments, the things people said that did not fit your expectations. Take notes as you read. These notes become the raw material for coding.

If you are working with audio or video data, transcribe it yourself if at all possible. The transcription process itself is a form of familiarization that creates insights that listening or watching alone does not.

Phase 2: Generating Initial Codes

Coding means systematically working through your data and labelling segments that are relevant to your research question. A code is a short label that captures the essential quality of a piece of data.

Important: you are not generating themes yet. A code is granular and data-close. 'Fear of being judged at the clinic' is a code. 'Barriers to healthcare access' is a theme. They operate at different levels of abstraction.

Apply codes generously. It is much easier to narrow later than to revisit uncoded data. One segment of data can receive multiple codes if different aspects of it are relevant to your question.

Phase 3: Searching for Themes

Take all your codes and begin grouping them into potential themes. You are looking for meaningful patterns: sets of codes that, taken together, say something coherent and significant about your research question.

Use whatever organizational tool works for you: a whiteboard with clusters, a spreadsheet, a mind map, or qualitative analysis software like NVivo or Atlas.ti. The tool matters less than the quality of thinking you bring to the grouping process.

At this stage, you will have more potential themes than you will end up keeping. That is expected.

Phase 4: Reviewing Themes

This is where you test whether your themes actually work.

Two tests. First: do the coded data extracts within each theme form a coherent picture? Read everything you have grouped under a theme and ask whether it genuinely hangs together as a single idea. If the extracts are pulling in different directions, the theme needs to be split or narrowed.

Second: does the set of themes, taken together, tell a coherent story about the dataset as a whole? If there are large portions of your data that are not captured by any of your themes, something important is missing.

Phase 5: Defining and Naming Themes

For each theme that survives Phase 4, write a clear definition of what it covers and, just as importantly, what it does not. A theme without a clear boundary is not a theme. It is a collection of loosely related observations.

Name themes in a way that reflects what they are about, not just the topic they touch. 'Healthcare barriers' is a topic. 'The belief that clinics are for sick people, not healthy prevention' is a theme: it carries the actual content of what your data showed.

Phase 6: Writing Up

The write-up of thematic analysis is not a summary of themes. It is an argument, supported by evidence from the data.

Each theme gets its own discussion, moving from the theme's core claim to supporting evidence (direct data extracts, properly anonymized) to analytical commentary that explains what those extracts show. The write-up should read like an integrated analysis, not a list of bullet points with attached quotes.

Themes do not emerge from data. They are actively constructed by the researcher. That is not a weakness of the method. It is what makes it powerful, as long as the process is transparent.

Six-phase thematic analysis process diagram showing Familiarization, Generating initial codes, Searching for themes, Reviewing themes, Defining themes, and Writing up — with a vertical spine indicating progression through the research analysis workflow. — Thematic analysis turns raw qualitative data into structured findings. The process is more iterative than most guides suggest.

The Most Common Mistakes

Treating codes and themes as the same thing, producing analyses that are too shallow to be analytically meaningful.
Generating themes that reflect the structure of the interview guide rather than the structure of the data, which means you are finding what you asked about rather than what the data showed.
Using quotes as the analysis rather than as evidence for an analytical claim. A quote does not explain itself.
Not going back to the data after generating themes. Thematic analysis is iterative, not linear. Phase 4 almost always sends you back to Phases 2 and 3.

FAQ

How many themes should a thematic analysis produce?

There is no fixed number, but three to six themes is typical for a moderately sized qualitative dataset. Fewer than three usually means the analysis has stayed too abstract. More than eight usually means the themes are still operating at the code level rather than the theme level, or that the analysis has not fully gone through the review and definition phases. If you have more than six themes, test each one: does it represent a distinct, coherent pattern in the data, or is it a subtopic that belongs within a larger theme?

Can thematic analysis be done without software like NVivo or Atlas.ti?

Yes. Software tools are useful for managing large datasets and maintaining an audit trail of your coding decisions, but they do not do the analysis. The analytical thinking is yours regardless of the tool. Many rigorous thematic analyses are conducted with nothing more than a spreadsheet or printed transcripts and colored highlighters. If your dataset is fewer than 20 transcripts, manual coding is entirely manageable. Choose the tool that helps you think most clearly, not the most sophisticated one available.

How do you handle a participant's quote that fits into more than one theme?

Code it under both during the early phases and let the review process determine where it belongs analytically. In the write-up, a data extract can appear in more than one theme's discussion if it genuinely illustrates different analytical points. What you want to avoid is using the same extract as the primary evidence for two different themes, which suggests the themes may not be distinct enough. If that keeps happening, revisit your theme definitions and check whether two of your themes are actually one.

What is the difference between inductive and deductive thematic analysis?

Inductive thematic analysis lets the themes emerge from the data without a prior theoretical framework shaping the coding. You code what is there, then build themes from the codes. Deductive analysis starts with a pre-existing framework or set of categories and codes the data against those. Most applied qualitative research uses an inductive approach because the purpose is to understand what the data shows. Deductive analysis is more common when the study is explicitly testing a theoretical model or when the research question already defines the categories of interest.

Sources: Braun, V. and Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101; Scribbr Thematic Analysis Guide; Delve Tool — How to Do Thematic Analysis; ATLAS.ti; PubMed — Thematic Analysis of Qualitative Data (AMEE Guide No. 131)