"

The Steps of Analyzing Qualitative Data

Just a brief disclaimer, this chapter is not intended to be a comprehensive discussion on qualitative data analysis. It does offer an overview of some of the diverse approaches that can be used for qualitative data analysis, but as you will read, even within each one of these there are variations in how they might be implemented in a given project. If you are passionate (or at least curious 😊) about conducting qualitative research, use this as a starting point to help you dive deeper into some of these strategies. Please note that there are approaches to analysis that are not addressed in this chapter, but still may be very valuable qualitative research tools. Examples include heuristic analysis[1], narrative analysis[2], discourse analysis[3], and visual analysis[4], among a host of others. These aren’t mentioned to confuse or overwhelm you, but instead to suggest that qualitative research is a broad field with many options. Before we begin reviewing some of these strategies, here a few considerations regarding ethics, cultural responsibility, power and control that should influence your thinking and planning as you map out your data analysis plan.

A deep understanding of cultural context as we make sense of meaning

Similar to the ethical considerations we need to keep in mind as we deconstruct stories, we also need to work diligently to understand the cultural context in which these stories are shared. This requires that we approach the task of analysis with a sense of cultural humility, meaning that we don’t assume that our perspective or worldview as the researcher is the same as our participants. Their life experiences may be quite different from our own, and because of this, the meaning in their stories may be very different than what we might initially expect.

As such, we need to ask questions to better understand words, phrases, ideas, gestures, etc. that seem to have particular significance to participants.  We also can use activities like member checking, another tool to support qualitative rigor, to ensure that our findings are accurately interpreted by vetting them with participants prior to the study conclusion. We can spend a good amount of time getting to know the groups and communities that we work with, paying attention to their values, priorities, practices, norms, strengths, and challenges. Finally, we can actively work to challenge more traditional methods research and support more participatory models that advance community co-researchers or consistent oversight of research by community advisory groups to inform, challenge, and advance this process; thus elevating the wisdom of community members and their influence (and power) in the research process.

Figure 19.1 Eliminating the black box

Accounting for our influence in the analysis process

Along with our ethical responsibility to our research participants, we also have an accountability to research consumers, the scientific community at large, and other stakeholders in our qualitative research. As qualitative researchers (or quantitative researchers, for that matter), people should expect that we have attempted, to the best of our abilities, to account for our role in the research process. This is especially true in analysis.  Our finding should not emerge from some ‘black box’, where raw data goes in and findings pop out the other side, with no indication of how we arrive at them. Thus, an important part of rigor is transparency and the use of tools such as writing in reflexive journals, memoing, and creating an audit trail to assist us in documenting both our thought process and activities in reaching our findings.

Reference for the above material

Qualitative Research – a practical guide for health and social care researchers and practitioners

CC BY-NC | 0 H5P activities | 70,098 words

Author(s): Darshini Ayton, Tess Tsindos, Danielle Berkovic

Editor(s): Darshini Ayton, Tess Tsindos, Danielle Berkovic

Subject(s): Research methods: general, Public health and preventive medicine, Personal and public health / health education, Educational: Health and social care

Last updated: 04-3-2025

Publisher: Monash University Library

Reflexive Journaling

Reflexive journals allow researchers to record their thoughts, feelings, and assumptions. These journals aren’t just about taking notes—they’re about capturing the evolving interpretations and insights that naturally emerge during a study.  The use of reflexive journals is vital during data analysis. Researchers use these journals to track how their personal background, values, and interactions with participants that may shape coding decisions, theme development, and overall interpretations. This form of documentation promotes transparency and enhances the credibility of the findings by acknowledging the human element inherent in qualitative inquiry. Reflexive journaling contributes to trustworthiness and is a crucial component of the analytical process. With this in mind, researchers should consider reflexive journaling an essential practice within qualitative data analysis.

Memoing

Memoing is akin to reflexive journaling as they both offer the researcher a space to capture insights, interpretations, and questions about the data they are collecting. The primary difference is their purpose.  Memoing is a space for the researcher to notice patterns and examine themes from data collected during interviews, observations, or other sources. As mentioned above, reflexive journaling involves observing your feelings, experiences, and beliefs, and how these things might influence the way you interpret the data. Memoing, on the other hand, focuses on understanding the information and is tightly linked to the analytical work of coding, categorizing, and theme development. Memoing helps researchers think more deeply and carefully about what they’re learning. It lets them put the pieces together in a clear way that makes sense, using both the information they’ve collected and their growing ideas about what it all means.

Audit Trail

An audit trail in qualitative research is a detailed, transparent record of the decisions, steps, and processes taken throughout a study, particularly during data analysis. It serves as a form of documentation that allows others, such as peer reviewers, advisors, or future researchers, to trace how findings were derived from the data. The audit trail includes records of coding schemes, theme development, analytic memos, reflexive journal entries, changes to the research question, and the rationale behind interpretive choices. By maintaining a comprehensive audit trail, researchers demonstrate that their analysis was systematic, thoughtful, and grounded in evidence. This transparency supports the trustworthiness and dependability of the research, providing evidence that findings are not arbitrary or solely driven by researcher bias. In short, an audit trail strengthens the credibility of qualitative analysis by making the research process visible and verifiable to others.

Distinguishing between these tools can be confusing. I hope the following images can help clarify the primary purposes, focus, utilization during data collection, content type, and contributions to data analysis of each of these tools.

The image is a diagram illustrating different research methodologies under the main purpose of research. On the left, there is a large orange circle labeled "Research Methodologies" with an icon of a presentation board. From this circle, three lines extend to the right, each connecting to a colored oval. The first oval is green, with the title "Reflexive Journaling" and a symbol of a notebook and pen. Below, the text explains it as promoting self-awareness and reflective thinking in research. The second oval is purple, labeled "Memoing," with an icon of a note pad and pencil, described as enhancing analytical skills through detailed note-taking. The last oval is blue, titled "Audit Trail," with a bar graph icon, ensuring research decisions are transparent and traceable.

The image features a simple diagram illustrating three research practices. On the left, a grey silhouette of a human head with "Focus" written inside directs attention to three labeled, colored arrows pointing to the right: a blue arrow for "Reflexive Journaling," a green arrow for "Memoing," and an orange arrow for "Audit Trail." Each practice is accompanied by a small icon—a notebook with a magnifying glass for journaling, an open book for memoing, and a flowchart symbol for audit trail. Each description briefly explains the practice: journaling addresses researchers' feelings and biases, memoing captures data patterns and coding ideas, and the audit trail documents the entire research process.

Profile of a head with arrows pointing to research methodologies: Reflexive Journaling, Memoing, and Audit Trail. When it is Utilized Reflexive Journaling Best used during data analysis but can be applied throughout the study. Memoing Ideal for use during both data collection and analysis. Audit Trail Essential throughout the entire research study.

Diagram showing "Content Type" with arrows labeled Reflexive Journaling, Memoing, and Audit Trail. Content Type Reflexive Journaling: Captures personal reflections and evolving interpretations Memoing: Focuses on analytical thoughts and coding decisions Audit Trail: Documents coding changes and rationale

 

The image is a flowchart illustrating the contributions to research analysis. On the left, there is a brown circle labeled "Data Analysis" with an orange graph icon. It is connected by three lines to three ovals on the right. The first oval is green with a pencil and notebook icon, labeled "Reflexive Journaling," explaining its role in acknowledging researcher influence to enhance credibility and transparency. The second oval is purple with a person icon, labeled "Memoing," which highlights its function in strengthening analytic depth and clarity of interpretations. The third oval is blue with a magnifying glass icon, labeled "Audit Trail," indicating its support for trustworthiness and dependability through transparency. All elements are arranged horizontally.

*This image was created using napkin.ai; however, the concept, design direction, and creative vision were conceived by Dr. Knight

https://open.oregonstate.education/qualresearchmethods/chapter/chapter-18-data-analysis-and-coding/#chapter-155-section-1

Qualitative Data Analysis: Introduction

Piled before you lie hundreds of pages of fieldnotes you have taken, observations you’ve made while volunteering at city hall. You also have transcripts of interviews you have conducted with the mayor and city council members. What do you do with all this data? How can you use it to answer your original research question (in this case, let’s say it was “How do political polarization and party membership affect local politics?”)? Before you can make sense of your data, you will have to organize and simplify it in a way that allows you to access it more deeply and thoroughly. We call this process coding.[1] Coding is the iterative process of assigning meaning to the data you have collected in order to both simplify and identify patterns. This chapter introduces you to the process of qualitative data analysis and the basic concept of coding.

To those who have not yet conducted a qualitative study, the sheer amount of collected data will be a surprise. Qualitative data can be absolutely overwhelming—it may mean hundreds if not thousands of pages of interview transcripts, or fieldnotes, or retrieved documents. How do you make sense of it? Students often want very clear guidelines here, and although we can try to give some recommendations, in the end, analyzing qualitative data is a bit more of an art than a science: “The process of bringing order, structure, and interpretation to a mass of collected data is messy, ambiguous, time-consuming, creative, and fascinating. It does not proceed in a linear fashion: it is not neat. At times, the researcher may feel like an eccentric and tormented artist; not to worry, this is normal” (Marshall and Rossman 2016:214).

To complicate matters further, each approach (e.g., Grounded Theory, deep ethnography, phenomenology) has its own language and bag of tricks (techniques) when it comes to analysis. Grounded Theory, for example, uses in vivo coding to generate new theoretical insights that emerge from a rigorous but open approach to data analysis. Ethnographers, in contrast, are more focused on creating a rich description of the practices, behaviors, and beliefs that operate in a particular field. They are less interested in generating theory and more interested in getting the picture right, valuing verisimilitude in the presentation. And then there are some researchers who seek to account for the qualitative data using almost quantitative methods of analysis, perhaps counting and comparing the uses of certain narrative frames in media accounts of a phenomenon. Qualitative content analysis (QCA) often includes elements of counting (we’ll talk about that in a moment). For these researchers, having very clear hypotheses and clearly defined “variables” before beginning analysis is standard practice, whereas the same would be expressly forbidden by those researchers, like grounded theorists, taking a more emergent approach.

All that said, there are some helpful techniques to get you started, and these will be presented in this and the following chapter. As you become more of an expert yourself, you may want to read more deeply about the tradition that speaks to your research. But know that there are many excellent qualitative researchers that use what works for any given study, who take what they can from each tradition. Most of us find this permissible (but watch out for the methodological purists that exist among us).

Where do I start with quantitative data analysis?

No matter how large or small your data set is, quantitative data can be intimidating. There are a few ways to make things manageable for yourself, including creating a data analysis plan and organizing your data in a useful way. We’ll discuss some of the keys to these tactics below.

The data analysis plan

As part of planning for your research, and to help keep you on track and make things more manageable, you should come up with a data analysis plan. You’ve basically been working on doing this in writing your research proposal so far. A data analysis plan is an ordered outline that includes your research question, a description of the data you are going to use to answer it, and the exact step-by-step analyses, that you plan to run to answer your research question. This last part – which includes choosing your quantitative analyses – is the focus of this and the next two chapters of this book.

A basic data analysis plan might look something like what you see in Table 14.1. Don’t panic if you don’t yet understand some of the statistical terms in the plan; we’re going to delve into them throughout the next few chapters. Note here also that this is what operationalizing your variables and moving through your research with them looks like on a basic level.

Table 14.1 A basic data analysis plan
Research question: What is the relationship between a person’s race and their likelihood to graduate from high school?
Data: Individual-level U.S. American Community Survey data for 2017 from IPUMS, which includes race/ethnicity and other demographic data (i.e., educational attainment, family income, employment status, citizenship, presence of both parents, etc.). Only including individuals for which race and educational attainment data is available.
Steps in Data Analysis Plan

  1. Univariate and descriptive statistics, including mean, median, mode, range, distribution of interval/ratio variables, and missing values
  2. Bivarate statistical tests between the variables I am interested in to see if there are any relationships that warrant further exploration. For instance, Chi-square test between race and high school graduation (both nominal variables), ANOVA on income and race. Correlations between interval/ratio variables.
  3. Multivariate statistical analysis, like logistic regression, with high school graduation (yes/no) as my dependent variable and multiple independent variables I think are relevant based on step 2
  4. Interpretation of logistic regression results in consultation with professor and reporting of results.

 

An important point to remember is that you should never get stuck on using a particular statistical method because you or one of your co-researchers thinks it’s cool or it’s the hot thing in your field right now. You should certainly go into your data analysis plan with ideas, but in the end, you need to let your research question and the actual content of your data guide what statistical tests you use. Be prepared to be flexible if your plan doesn’t pan out because the data is behaving in unexpected ways.

Managing your data

Whether you’ve collected your own data or are using someone else’s data, you need to make sure it is well-organized in a database in a way that’s actually usable. “Database” can be kind of a scary word, but really, I just mean an Excel spreadsheet or a data file in whatever program you’re using to analyze your data (like SPSS, SAS, or r). (I would avoid Excel if you’ve got a very large data set – one with millions of records or hundreds of variables – because it gets very slow and can only handle a certain number of cases and variables, depending on your version. But if your data set is smaller and you plan to keep your analyses simple, you can definitely get away with Excel.) Your database or data set should be organized with variables as your columns and observations/cases as your rows. For example, let’s say we did a survey on ice cream preferences and collected the following information in Table 14.2:

Table 14.2 Results of our ice cream survey
Name Age Gender Hometown Fav_Ice_Cream
Tom 54 0 1 Rocky Road
Jorge 18 2 0 French Vanilla
Melissa 22 1 0 Espresso
Amy 27 1 0 Black Cherry

 

There are a few key data management terms to understand:

  • Variable name: Just what it sounds like – the name of your variable. Make sure this is something useful, short and, if you’re using something other than Excel, all one word. Most statistical programs will automatically rename variables for you if they aren’t one word, but the names are usually a little ridiculous and long.
  • Observations/cases: The rows in your data set. In social work, these are often your study participants (people), but can be anything from census tracts to black bears to trains. When we talk about sample size, we’re talking about the number of observations/cases. In our mini data set, each person is an observation/case.
  • Primary data: Data you have collected yourself.
  • Secondary data: Data someone else has collected that you have permission to use in your research. For example, for my  student research project in my MSW program, I used data from a local probation program to determine if a shoplifting prevention group was reducing the rate at which people were re-offending.  I had data on who participated in the program and then received their criminal history six months after the end of their probation period. This was secondary data I used to determine whether the shoplifting prevention group had any effect on an individual’s likelihood of re-offending.
  • Data dictionary (sometimes called a code book): This is the document where you list your variable names, what the variables actually measure or represent, what each of the values of the variable mean if the meaning isn’t obvious (i.e., if there are numbers assigned to gender), the level of measurement and anything special to know about the variables (for instance, the source if you mashed two data sets together). If you’re using secondary data, the data dictionary should be available to you.

When considering what data you might want to collect as part of your project, there are two important considerations that can create dilemmas for researchers. You might only get one chance to interact with your participants, so you must think comprehensively in your planning phase about what information you need and collect as much relevant data as possible. At the same time, though, especially when collecting sensitive information, you need to consider how onerous the data collection is for participants and whether you really need them to share that information. Just because something is interesting to us doesn’t mean it’s related enough to our research question to chase it down. Work with your research team and/or faculty early in your project to talk through these issues before you get to this point. And if you’re using secondary data, make sure you have access to all the information you need in that data before you use it.

Let’s take that mini data set we’ve got up above and I’ll show you what your data dictionary might look like in Table 14.3.

Table 14.3 Sample data dictionary/code book
Variable name Description Values/Levels Level of measurement Notes
Name Participant’s first name n/a n/a First names only. If names appear more than once, a random number has been attached to the end of the name to distinguish.
Age Participant’s age at time of survey n/a Interval/Ratio Self-reported
Gender Participant’s self-identified gender 0=cisgender female1=cisgender male2=non-binary3=transgender female4=transgender male5=another gender Nominal Self-reported
Hometown Participant’s hometown – this town or another town 0=This town

1=Another town

Nominal Self-reported
Fav_Ice_Cream Participant’s favorite ice cream n/a n/a Self-reported

 

Qualitative Data Analysis as a Long Process!

Although most of this chapter will focus on coding, it is important to understand that coding is just one (very important) aspect of the long data-analysis process. We can consider seven phases of data analysis, each of which is important for moving your voluminous data into “findings” that can be reported to others. The first phase involves data organization. This might mean creating a special password-protected cloud folder for storing your digital files. It might mean acquiring computer-assisted qualitative data-analysis software (CAQDAS) and uploading all transcripts, fieldnotes, and digital files to its storage repository for eventual coding and analysis. Finding a helpful way to store your material can take a lot of time, and you need to be smart about this from the very beginning. Losing data because of poor filing systems or mislabeling is something you want to avoid. You will also want to ensure that you have procedures in place to protect the confidentiality of your interviewees and informants. Filing signed consent forms (with names) separately from transcripts and linking them through an ID number or other code that only you have access to (and store safely) are important.

Once you have all of your material safely and conveniently stored, you will need to immerse yourself in the data. The second phase consists of reading and rereading or viewing and reviewing all of your data. As you do this, you can begin to identify themes or patterns in the data, perhaps writing short memos to yourself about what you are seeing. You are not committing to anything in this third phase but rather keeping your eyes and mind open to what you see. In an actual study, you may very well still be “in the field” or collecting interviews as you do this, and what you see might push you toward either concluding your data collection or expanding so that you can follow a particular group or factor that is emerging as important. For example, you may have interviewed twelve international college students about how they are adjusting to life in the US but realized as you read your transcripts that important gender differences may exist and you have only interviewed two women (and ten men). So you go back out and make sure you have enough female respondents to check your impression that gender matters here. The seven phases do not proceed entirely linearly! It is best to think of them as recursive; conceptually, there is a path to follow, but it meanders and flows.

Coding is the activity of the fourth phase. We’ll talk more about that soon. For now, know that coding is the primary tool for analyzing qualitative data and that its purpose is to both simplify and highlight the important elements buried in mounds of data. Coding is a rigorous and systematic process of identifying meaning, patterns, and relationships. It is a more formal extension of what you, as a conscious human being, are trained to do every day when confronting new material and experiences. The “trick” or skill is to learn how to take what you do naturally and semiconsciously in your mind and put it down on paper so it can be documented and verified and tested and refined.

At the conclusion of the coding phase, your material will be searchable, intelligible, and ready for deeper analysis. You can begin to offer interpretations based on all the work you have done so far. This fifth phase might require you to write analytic memos, beginning with short (perhaps a paragraph or two) interpretations of various aspects of the data. You might then attempt stitching together both reflective and analytical memos into longer (up to five pages) general interpretations or theories about the relationships, activities, patterns you have noted as salient.

As you do this, you may be rereading the data, or parts of the data, and reviewing your codes. It’s possible you get to this phase and decide you need to go back to the beginning. Maybe your entire research question or focus has shifted based on what you are now thinking is important. Again, the process is recursive, not linear. The sixth phase requires you to check the interpretations you have generated. Are you really seeing this relationship, or are you ignoring something important you forgot to code? As we don’t have statistical tests to check the validity of our findings as quantitative researchers do, we need to incorporate self-checks on our interpretations. Ask yourself what evidence would exist to counter your interpretation and then actively look for that evidence. Later on, if someone asks you how you know you are correct in believing your interpretation, you will be able to explain what you did to verify this. Guard yourself against accusations of “cherry-picking,” selecting only the data that supports your preexisting notion or expectation about what you will find.[2]

The seventh and final phase involves writing up the results of the study. Qualitative results can be written in a variety of ways for various audiences, just the same as quantitative data. Due to the particularities of qualitative research, though, findings do not exist independently of their being written down. This is different for quantitative research or experimental research, where completed analyses can somewhat speak for themselves. A box of collected qualitative data remains a box of collected qualitative data without its written interpretation. Qualitative research is often evaluated on the strength of its presentation. Some traditions of qualitative inquiry, such as deep ethnography, depend on written thick descriptions, without which the research is wholly incomplete, even nonexistent. All of that practice journaling and writing memos (reflective and analytical) help develop writing skills integral to the presentation of the findings.

Remember that these are seven conceptual phases that operate in roughly this order but with a lot of meandering and recursivity throughout the process. This is very different from quantitative data analysis, which is conducted fairly linearly and processually (first you state a falsifiable research question with hypotheses, then you collect your data or acquire your data set, then you analyze the data, etc.). Things are a bit messier when conducting qualitative research. Embrace the chaos and confusion, and sort your way through the maze. Budget a lot of time for this process. Your research question might change in the middle of data collection. Don’t worry about that. The key to being nimble and flexible in qualitative research is to start thinking and continue thinking about your data, even as it is being collected. All seven phases can be started before all the data has been gathered. Data collection does not always precede data analysis. In some ways, “qualitative data collection is qualitative data analysis.… By integrating data collection and data analysis, instead of breaking them up into two distinct steps, we both enrich our insights and stave off anxiety. We all know the anxiety that builds when we put something off—the longer we put it off, the more anxious we get. If we treat data collection as this mass of work we must do before we can get started on the even bigger mass of work that is analysis, we set ourselves up for massive anxiety” (Rubin 2021:182–183; emphasis added).

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Understanding Research Design in the Social Science Copyright © by Utah Valley University is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book