Uses, Benefits, and Limitations of AI Chatbots: Implementing ChatGPT in the First-Year Writing Classroom

Walker P. Smith; Cora Alward; Jacob Morris; Lydia Peach; Veronica Pulley; Parker Routt; Kolby Sanders; Josh Vogeler

9 Uses, Benefits, and Limitations of AI Chatbots: Implementing ChatGPT in the First-Year Writing Classroom

Walker P. Smith; Cora Alward; Jacob Morris; Lydia Peach; Veronica Pulley; Parker Routt; Kolby Sanders; and Josh Vogeler

Abstract

This essay shares the findings of a Spring 2023 honors composition course in which students conducted their own primary research project to investigate the uses and limitations of ChatGPT for first-year writers. By experimenting with integrating ChatGPT into various stages of their writing and revision processes, the students identified the strengths and weaknesses of the writing tool. Then, they observed and interviewed faculty as they graded different versions of essay drafts with varying levels of ChatGPT assistance. Ultimately, they found that ChatGPT was incredibly useful for certain small-scale tasks, but posed significant risks, especially for students with less experience in academic writing. They concluded that ChatGPT is a potentially helpful tool for developing academic writers, but must be accompanied with in-class opportunities to practice assessing and revising the writing produced by ChatGPT.

Keywords: artificial intelligence, ChatGPT, first-year writing, composition pedagogy

At the start of the Spring 2023 semester, OpenAI had recently released its chatbot to the public two months prior, striking both fear and curiosity in the students and faculty in the composition program at our large urban research university. Paranoid questions abounded in our classrooms: Could the new tech really write entire essays for students in the span of seconds? Could it really produce reliable academic writing and accurate citations? Some faculty outside of the English department even began to abandon writing assignments altogether, or required them to be handwritten, out of concern that digital writing no longer facilitated learning.

However, while many educators likely imagine ChatGPT as a plagiarism machine, we adhere to the frameworks in composition pedagogy that challenge how plagiarism has been traditionally viewed (Howard, 1995; Price, 2002). In the wake of the panic, we (the instructor and students in English 105: Honors Composition) discussed how we might leverage the opportunity to learn more about the ever-changing relations between academic writing and digital media. The instructor claimed that if we had a question about writing, then we should seek to answer that question through primary research like surveys, observations, or interviews. We tossed the old curriculum and worked together to gather readings and research that would establish a framework for how we might approach discussing AI and college writing. Next, we brainstormed ideas for a collaborative study of ChatGPT, drafted an IRB protocol that outlined our goals and plans, and then got to work on our experimentation with the new tech.

This co-authored article, collaboratively written by the instructor and seven undergraduate students enrolled in Honors Composition (English 105), presents the results of our multi-phase study in which we gathered essay drafts submitted by students enrolled in the instructor’s concurrent sections of Introduction to College Writing (English 101) and then revised them in our 105 class by using ChatGPT. In the second phase, we used what we learned about revising with ChatGPT to respond to the same essay prompts used in English 101, prompting the chatbot to write and then revise its own work based on our feedback, but without adding any words of our own. Finally, we put all the drafts (original student work, student work revised by ChatGPT, and GPT-authored work) to the test by inviting other English 101 instructors from the Composition Program to grade the “best” of all three versions. While they graded, we observed their process and asked interview questions about their reactions to the drafts.

Our primary interest in these interviews was to push back against the initial wave of panic that clouded discourse surrounding ChatGPT and instead emphasize its potential as a tool for writing and learning—if any. In short, we set out to discover the following:

How can ChatGPT aid in the development of academic writers?
What specific tasks might ChatGPT facilitate or impede in the first-year writing classroom?

Although AI chatbots had long been used in ESL-specific assignments and activities, at the time of conducting our research, virtually no studies had examined their applications in a first-year writing program at a large urban research university like ours. Of course, by the time of publication, we are sure that more research will have begun to address this gap, and we look forward to learning from these studies and expanding our own in the future as new iterations of the tool are inevitably released. Still, we believe the early findings we present below reveal how ChatGPT might impact diverse classroom settings like ours in which students arrive at college with remarkably varied skill sets and levels of preparation for academic writing.

In short, we determined that the combined efforts of both an individual student writer and ChatGPT scored the highest on a given essay prompt. Additionally, we believe the combined student-AI writing achieved this result because it harnessed the emotional and lived experiences of the human user alongside the genre knowledge of typified “moves” in academic writing provided by the AI chatbot. Thus, we assert that chatbots can be a helpful tool for first-year writing when properly integrated into the certain stages of the writing process through careful instruction and practice.

AI’s History in Teaching Writing

In our first phase of secondary research, we were surprised to find that artificial intelligence, and specifically chatbots, were not new to composition pedagogy. In fact, AI chatbots have been used in teaching English as a second language for at least two decades and all the way through the release of ChatGPT. For example, Chang et al. (2008) identified their writing assistant as “suitable for learners to use as an online collocation helper while writing their composition” (n.p.). More recently and prior to ChatGPT’s release, ESL scholars found AI chatbots to be promising but costly to implement and improve. Chiu & Chow (2020) utilized an ITS, or intelligent tutoring system, that showed improvements in providing personalized and effective feedback to writers, and Yang & Liu (2019) similarly posited AI as a field to be further explored in ESL, specifically to support in speech recognition and translation. Nguyen (2019), on the other hand, explored AI as a possible response to regions experiencing a teacher shortage.

Ultimately, these earlier studies demonstrate much of the same concerns we have today about ChatGPT. For example, Dikli & Bleyle (2014) found AI-based assistance tools to be rarely 100% accurate, but yet offered “helpful” feedback for multilingual learners despite the “missed and misidentified errors” (p. 11). When ChatGPT was given the opportunity to take the AP English Language and Literature tests, Michelle Diaz (2023) found that the earliest version of the AI scored less than 20% on each and saw no further scoring improvements with the more recent GPT-4 version. These results align with our own in that the best usage of AI in the classroom involves integration into the user’s own writing process and specific training to revise what the AI produces. This realization steered our discussions in English 105 toward viewing ChatGPT as a producer of templates, which still require integration and revision after their insertion into the written text.

Writing with templates also has a long and heavily debated history in composition pedagogy. One of the major concerns is that templates can have the tendency to “impede the ability of students to truly understand and integrate why these writing maneuvers really matter” (Benay, 2008, p. 3). Lancaster (2016) argues that style in a student’s writing makes all the difference to the reader and how they perceive the quality of the writing. For example, They Say, I Say templates can, if not used properly, serve as formulas that don’t strengthen the writing but just make it sufficient (Graff & Birkenstein, 2006). Similarly, writing solely produced by ChatGPT doesn’t always fully connect with its audience. However, we argue that if the student opts to use ChatGPT anyways, then instructors can teach them to integrate the tool into their writing process while still identifying their own stylistic additions.

Methods

Because ChatGPT was so new in early 2023, we decided that a two-phase approach would best answer our research questions. In order to study the tool’s application to first-year writing, we needed to learn how to use it for ourselves and evaluate its effectiveness on our own before studying it with others. The first phase focused primarily on the experience of writing with ChatGPT, looking to address our question that asked how ChatGPT might be used as an effective tool for developing writers. The second phase examined how writing with ChatGPT is perceived and graded by teachers, aiming to address our question that sought to explore the best practices for assigning AI in the writing classroom.

We began the first phase by inviting students to volunteer essays they had written for English 101. We then attempted to revise the student essays using ChatGPT on our own. Then, using the same prompt that the students had been given in their classes, we attempted to use ChatGPT to write an essay entirely on its own and ask it to revise its own writing. Once we had multiple samples of each, we reviewed the essays and chose one from each category (one student-written without ChatGPT’s influence, one student-written and revised with ChatGPT, and one written solely by ChatGPT) that we thought would best meet the demands of the rubric criteria. To select the essays, we prioritized whether or not it fulfilled all the requirements of the prompt, utilized general grammar and organization conventions, had a clear argument that was easy to follow, and included a “human” or personal perspective.

From that point, we invited faculty volunteers from the Composition Program to read and grade the three essays according to the given rubric from English 101, and three participants volunteered. The rubric focused primarily on three categories: rhetorical knowledge, critical reading and thinking, and conventions. The rubric also offered points for an original title. By organizing the grading process in this way, we wanted to show how each style of writing (student-written, student-written and revised with ChatGPT, and written by ChatGPT) met (or didn’t meet) the standards expected of first-year composition students and how each style is perceived by teachers.

We adapted a “think-aloud protocol”: as each instructor used the computer at the front of the room to grade, we observed their grading process and occasionally interrupted with questions about their decisions—so long as our questions didn’t give away the identity of each essay’s author. After they finished grading, we asked them the following interview questions:

1. How many classes have you taught at this university? What other writing instruction experiences do you have, such as tutoring?

2. For all three essays:

a. What is this essay’s key strength and key weakness?

b. What final grade would you give this essay? How did you make that determination?

c. What would your highest priority feedback be for this student?

3. Please rank the essays in order of most effectively responding to the prompt.

a. Why did you rank them in this order?

b. Would any of these students stand out as at-risk, meaning that they need intervention or assistance? Why or why not?

4. Do you have experience with ChatGPT? Do you have any general questions about what the tool is? Have you detected any uses in your own classes yet?

5. Rate your feelings about ChatGPT from 1 (fearful) to 5 (hopeful but cautious) to 10 (excited). Explain your decision.

6. Please rank the essays in order of most likely to have used ChatGPT’s assistance.

a. Why did you rank them in this order?

b. Reveal the answers, and compare the results.

7. For Essay A, ChatGPT’s feedback was to address: inconsistent use of terms, and awkward phrasing. Do you agree with this recommendation?

8. For Essay B, ChatGPT’s feedback was to address: overreliance on summary instead of critique, and limited engagement with the CARS Model. Do you agree with this recommendation?

9. For Essay C, ChatGPT’s feedback was to address: unnecessary personal anecdotes, and overgeneralization about non-standard dialect speakers. Do you agree with this recommendation?

At this point in each interview, we revealed the identity of the author behind each essay (not the original student in English 101, but the student-researcher in English 105 who had used ChatGPT to write and/or revise the English 101 drafts). Often, the interviewees had questions for the 105 student-researchers about how they managed to “trick” them, or, to trick ChatGPT into making strong revisions. After, we returned to the final two interview questions:

10. Rate your feelings about ChatGPT now.

a. Has your rating changed? Why or why not?

b. Do you feel it would be harmful/helpful?

11. Is there anything else you’d like for us to know?

In general, the three faculty members graded the student-written essay revised with ChatGPT as the strongest text, which suggested that ChatGPT could most definitely be an effective tool in the classroom. Ultimately, the goal of this phase of our study was to see how (or if) faculty/teachers would want to incorporate ChatGPT and/or AI into their classrooms, and how they would do so. But despite the apparent success of a student collaboration with ChatGPT, our interviewees were still hesitant to endorse ChatGPT or integrate it into their classes. They felt they needed more training to achieve proper and ethical implementation that would best facilitate student learning.

Discussion

Based on our findings, we were able to extract a few insights for writing instructors who seek to integrate an AI chatbot into their classrooms.

#1: The hybrid essay received the highest score because it harnessed the advantages of ChatGPT’s capabilities without repeating its deficiencies.

Faculty participants reported that the hybrid essay had a strong summary of the source text and excelled in its ability to connect the source text to personal anecdotes. ChatGPT is not able to give personal anecdotes or expand with its own originality on a topic, and in general, produces less word variety. One participant described how she “appreciated that the author wasn’t using the same words over and over again, which I think was seen in maybe some of the first two.” The hybrid essay was able to overcome the problem of repetition because the author coached ChatGPT through consistent revisions. Another participant reported, “Someone actually thought through these ideas, or there’s a specific style invoked here.” The hybrid maintains the student’s style and combines it with ChatGPT’s recognizable academic structure and error-free text only because the author repeatedly identified weaknesses in ChatGPT’s writing and instructed it where to revise its own work. In other words, students will still need to be able to assess their own and others’ writing in order to use ChatGPT effectively. Additionally, if a student has been given specific feedback by a peer or instructor, ChatGPT is often helpful in identifying where they may make revisions to a draft based on that recommendation.

#2: AI chatbots can be a helpful tool for academic writing, but only when used correctly.

We recommend first identifying specific aspects of a prompt that the author is struggling with. For example, if the author is unsure where to begin with topic generation, ChatGPT will return many unique and detailed ideas. Rarely does the user find the perfect topic in this list, but instead can begin to imagine what types of topics they can select, and will be inspired to remember an idea from their lived experience that aligns with the prompt. Alternatively, the AI is remarkably skilled at locating errors in a given text and suggesting line-level edits to improve a final draft. ChatGPT cannot, at the time of this study, produce a strong essay draft in its entirety, but it can assist in some small-scale tasks throughout the writing process.

#3: Specifically, ChatGPT has its own consistent writing style that is recognizable as mostly error-free academic writing—and not much else.

When writing longer texts, ChatGPT has a distinct way of portraying syntax. The AI is trained to get a point across in the most logical and “academic” way possible, and to add and cite as many sources as it possibly can find and include—often resulting in errors like made-up sources. Instead of connecting its supporting points together, it prefers to break them up into individual smaller paragraphs, with little to no transitioning connections. Instead, the chatbot prefers to use textbook-style, transitional phrases—similar to a They Say, I Say template—such as “furthermore,” “in conclusion,” and so on (Graff & Birkenstein, 2006). Experienced readers and reviewers often notice and address these stylistic deficiencies quite quickly as excess “repetition.” However, the style is sometimes less obvious to students who have had less experience with writing in academic contexts, even when they report “repetition” as a concern in their written texts. Because they are less likely to recognize and remove these repeated phrases in favor of their own style, developmental writers are further disadvantaged by these tools and may need more practice in revising what ChatGPT produces for them.

Similar to its giveaway style, ChatGPT also struggles to generate deep analysis. Essay B, which was fully written by ChatGPT, never received positive feedback in the analytical rubric categories, which was its only outgoing weakness. In fact, every participant commented on its shortcomings in relation to analysis. While ChatGPT might excel in generating topics, locating errors, completing directed instructions, and other specific tasks, large-scale tasks like argumentation—both determining a thesis, and sustaining an argument across an entire essay—is outside of ChatGPT’s skillset. This means that you cannot input an entire essay for the chatbot to simply revise. This would largely change the structure and idea of your work, consequently losing your style and argument. It is instead better to identify weaknesses in your writing yourself or with others and ask it to revise the specific errors you see in your work that you want suggestions to improve on. For example, while creating Essay C, the hybrid essay, when prompted to make the introduction paragraph sound more academic, the chatbot completely took out the original author’s personal anecdote that had connected very well with the article.

Conclusion

Based on our findings, the most effective and ethical way to use ChatGPT is to very gradually integrate its use to coincide with specific tasks and small-scale steps in all stages of the writing process. When viewed as a “cold,” academic template of writing, ChatGPT-produced text can empower students to familiarize themselves with some of the standardized structures and common phrases seen in academic writing genres. However, it should not and cannot replace large-scale characteristics like personal writing style or argument development. So far, we have noticed that it is especially helpful in the beginning stages (topic generation, template generation, etc.) and in the ending stages of the writing process (revision of final draft, editing). While ChatGPT can be a decent assistant to a student writer, the student will still need adequate practice to evaluate the texts produced by ChatGPT.

As writing instructors prepare activities and assignments that integrate ChatGPT and other AI-based writing tools, they should first attempt their own essay prompts and familiarize themselves with its strengths and weaknesses—or invite students to join them in doing the same, so that they can practice the assessment skills they will need to use the tool effectively. We also aim to inspire other first-year writing classes to experiment with assignments like this one and expand on our research by conducting their own studies. As AI options proliferate and programmers expand their capabilities, the findings we present here were gathered in the “early” stages of ChatGPT’s introduction to higher education and are intended as a foundation for future research to build upon.

Meanwhile, both writing program administrators and instructors are currently considering how to expand and redraft university plagiarism policies in light of AI, and many are dialoguing with other departments and campus stakeholders about how to preserve our shared values of academic honesty while making room for new technology. In these conversations, we recommend that our readers maintain and share an awareness of how tools like ChatGPT can disadvantage students with less experience in reading and writing with mainstream academic dialects of English. The findings of our study reiterate how “developmental” writers, or writers who are in the earlier stages of their academic careers, and especially some multilingual students, are more likely to be flagged by detection softwares and also more likely to be caught and disciplined for using AI. Our policies must make room for the possibility that our disciplinary practices perpetuate privilege, and it is our hope that teachers will continue to assist all students in becoming expert readers and reviewers—and to advocate on behalf of students who are working toward that goal.

Questions to Guide Reflection and Discussion

How can ChatGPT assist in the development of academic writers according to the research conducted?
Discuss specific tasks in the writing process where ChatGPT may be beneficial or detrimental based on the findings.
Reflect on the ethical considerations and potential biases that might arise with using ChatGPT in academic settings.
Explore the potential for ChatGPT to assist in developing academic writing skills. What are appropriate and inappropriate uses of this technology in the classroom?
Consider the future of AI in academic writing. What roles should AI play, and how can educators ensure its responsible use?

References

Benay, P. (2008). They say, “templates are the way to teach writing”; I say, “use with extreme caution.” Pedagogy, 8(2), 369–373.

Chang, Y. C., Chang, J. S., Chen, H. J., & Liou, H. C. (2008). An automatic collocation writing assistant for Taiwanese EFL learners: A case of corpus-based NLP technology. Computer Assisted Language Learning, 21(3), 283-299.

Chiu, P. H. P., & Chow, K. W. K. (2020). How an intelligent tutoring system can help people learn to write better in English. Computer Assisted Language Learning, 33(5-6), 452-473.

Dikli, S., & Bleyle, S. (2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback?,  Assessing writing, 22, /1-17.

Graff, Gerald, & Cathy Birkenstein. (2006). They say / I say: The moves that matter in academic writing. W. W. Norton & Company.

Howard, R. (1995). Plagiarism, authorship, and the academic death penalty. College English, 54(7), 788-806.

Lancaster, Z. (2016). Do academics really write this way? A corpus investigation of moves and templates in They Say/I Say. College Composition and Communication, 67(3), 437-464.

Nguyen, T. T. (2019). Using artificial intelligence to help people learn English in Vietnam. International Journal of Emerging Technologies in Learning, 14(1), 100-111.

Price, M. (2002). Beyond ‘gotcha!’: Situating plagiarism in policy and pedagogy. College Composition and Communication, 54(1), 88-115.

Yang, H., & Liu, M. (2019). How artificial intelligence can help people learn English better. Journal of Educational Computing Research, 57(3), 597-625.

Author Note: All research was conducted with the approval of the IRB. The IRB number is 23.0163, the reference number is 760880, and the university contact can be provided upon request. ↵

About the authors

institution: Goucher College

Walker P. Smith (he/him) is Assistant Professor of Rhetoric and Composition at Goucher College in Baltimore, MD., where he teaches courses in academic writing, professional and technical communication, and queer rhetorics. His research explores critical theories and histories of rhetoric, religion, and sexuality across technical documents, films, and archival ephemera. His work appears in Peitho, Across the Disciplines, and Unsettling Archival Research (Southern Illinois University Press).