"

1 Assessing the Accuracy of AI Outputs: A Comparative Study

Parker Eggli; Zara Moore; Mahri Dorius; and Karsten Hatch

This report was composed in April 2024 and uses APA documentation


Executive Summary

Artificial intelligence (AI) is used in many business fields, such as academic, scientific, and artistic practices. When searching for in-depth answers about a specific topic, AI generators can be asked to summarize and define the topic. Knowing where the information comes from and what databases the AI platform pulls from is critical. When using AI for a research project, individuals should double-check the credibility of the information to mitigate the risk of potential bias or irrelevant sources that are not up to date. With the research in this document, we asked various questions to investigate the AI output’s credibility, accuracy, and relevance using Chat GPT and Microsoft Copilot. Our study found that AI provides a comprehensive overview of the subject an individual is inquiring about. However, in some circumstances, AI failed to provide specific facts that would otherwise build a more excellent rapport with an audience. Bias was detected when asking questions that were influenced by political controversy. Furthermore, AI software is ineffective for researching current events, as Chat GPT’s last update was in 2022.

A diverse range of outputs, from data analysis to creative innovation, AI-driven technologies are reshaping industries and enhancing productivity. By implementing the techniques and recommendations discussed in this research, individuals will be confident in their ability to use AI and know what additional avenues to take when fact-checking AI platform output.

Secondary Research

Introduction

Artificial Intelligence has swept the world since 2022. It’s revolutionized technological society, healthcare, education, and the academic world. However, there are concerns about AI’s credibility and output accuracy. Studies have underscored the critical need for stakeholders to verify the information given by AI. The following research delves into the complexities of AI-generated content verification and proposes strategies to enhance its credibility and accuracy by examining and synthesizing insights from these research findings.

Credibility and Accuracy

The credibility and accuracy of AI-generated content are paramount in assessing AI technologies’ reliability. Mishra’s research on the ethical implication of AI in libraries displays the multifaceted challenges posed by AI-generated content, including concerns about bias, privacy, and job displacement (Mishra, 2023). Godlewski’s exploration of best practices for first-time AI users underscores the importance of setting clear goals, designing precise prompts, and rigorously reviewing and editing AI-generated outputs to ensure accuracy and relevance. These studies offer valuable insights into how complex AI-generated content is and how important it is to fact-check its outputs to check for accuracy and credibility. Godlewski explained, “As AI cannot pull from real-time online information, checking the results for credibility and potential plagiarism is vital” (Godlewski, 2023). Through careful analysis and consideration of such research findings, stakeholders can better understand the nuances of AI technologies and implement strategies to enhance the reliability of AI-generated content.

Verification

“Ensuring the accuracy and reliability of AI-generated content necessitates robust verification methods,” emphasized by the AIContentfy team’s article on quality control for AI-generated content. The article underscores the importance of verifying facts, statistics, and claims against multiple trustworthy references to avoid spreading misinformation. According to the article, implementing fact-checking tolls, third-party verification services, and monitoring the AI’s performance are crucial to ensuring accuracy and reliability. Additionally, the article suggests cross-referencing information from multiple reliable sources and checking for biases and misleading content to uphold the integrity of the AI content (AIC, 2023). It also highlights the significance of maintaining ethical standards, avoiding plagiarism and copyright infringement, ensuring consent and privacy compliance, and monitoring and addressing bias and discrimination of AI content. Gurnsee also mentions in his article “Elaborating AI Output” that to properly analyze an output from AI, you should ask yourself the date, authority, purpose, and documentation of said output (Gurnsee, 2024). By incorporating human review, utilizing subject matter experts, and applying user feedback for academic improvement, stakeholders can enhance the trustworthiness and credibility of AI content.

Effective Use

Effective utilization of AI requires careful consideration of optimal approaches tailored to specific contexts. Soliman’s research on AI’s impact on breast cancer pathology provides valuable insights into leveraging AI technologies for medical diagnosis and treatment planning. By harnessing AI’s capabilities, healthcare professionals can enhance diagnostic accuracy and treatment efficacy, ultimately improving patient outcomes (Soliman, 2024). However, Perrotta’s article on a school shooting email that AI wrote displays the ethical boundary and when and for what to use AI. Perrotta notes, “It’s hard to take a message seriously when I know that the sender didn’t even take the time to put their genuine thoughts and feelings into the words” (Perrotta, 2023). To maximize the benefits of AI while eliminating potential drawbacks, stakeholders must prioritize ethical considerations, data privacy, and human oversight. By integrating AI technologies with human judgment and ethics, stakeholders can harness AI’s power to drive innovation and advancement.

Conclusion

The credibility and accuracy of AI-generated content are central to its utility and impact across various domains. As evidenced by Mishra’s and Godlewski’s research, the challenges posed by AI-generated content necessitate vigilant verification methods to mitigate the risks of misinformation and bias. Perrotta’s examination of AI’s application in sensitive situations highlights the ethical considerations inherent in AI utilization. To navigate these complexities and maximize the benefits of AI, stakeholders must prioritize ethical standards, data privacy, and human oversight. By integrating AI technologies judiciously and ethically into various contexts, stakeholders can harness AI’s transformative power to drive innovation and advancement while ensuring the trustworthiness and credibility of AI-generated content.

Primary Research

This research is intended to help users determine what strategies to implement to gauge AI output accuracy, reliability, and relevance. The methodology used in this research involves asking different AI questions on various platforms and then fact-checking the output through cross-referencing, critical thinking, and expert consultation. The following subsections will delve deeper into the findings discovered. We sought to investigate the trustworthiness of AI outputs and explore practical strategies for integrating AI in an academic setting. With the proliferation of AI technologies in various domains, including education, it becomes imperative to assess the reliability of AI-generated content and identify optimal methods for leveraging AI tools in educational contexts. To address these questions, we conducted experiments using two prominent AI platforms: ChatGPT and Microsoft Copilot. Through these experiments, we aimed to gain insights into AI’s capabilities, limitations, and potential applications in academic settings.

Cross-Referencing

Implementing cross-referencing is a great tool to adopt when using an AI platform. AI systems may generate responses based on patterns learned from vast datasets, but they may only sometimes produce accurate and up-to-date information. Cross-referencing allows users to verify the information provided by AI against multiple sources to confirm its accuracy and reliability. Figure A shows a question asked by a user and the output given by the AI (see appendix). At first glance, AI’s output provides logical and reasonable challenges associated with establishing settlements on Mars. How does this information align with what NASA has to say about Mars? The average temperature on Mars is -81°F, with the atmosphere comprising 96% carbon dioxide with no water. Travel to Mars takes about seven months and 300 million miles across space to reach the red planet. The most recent mission to Mars, the Perseverance Rover, resulted in NASA investing $2.4 billion to build and launch the rover. In this specific example, it is clear that the output provided by the AI platform is relevant to the current challenges associated with Mars exploration and settlement. However, while this information is good, it is still vital to investigate credible sources and professional organizations to gain additional insights and facts that support the generalized output. Doing so facilitates outstanding professionalism in the desired area and fosters excellent rapport with the audience.

AI Platforms: A Comparison

In our exploration of AI capabilities, we utilized ChatGPT and Microsoft Copilot to cross-reference information regarding the basic rules of basketball (see Figures B and C in the appendix). While both AI models provided accurate responses, their approaches differed significantly. ChatGPT, a natural language AI, explained the rules comprehensively, covering aspects such as scoring, game duration, team composition, fouls, violations, out-of-bounds, and jump balls. On the other hand, Microsoft Copilot, specializing in coding and programming, explained the team composition, scoring, the shot clock, dribbling movement, and the inbounding of the ball. A nice feature that Copilot and ChatGPT don’t have is the inclusion of links to further your research. After asking Copilot the basic rules, it provided multiple YouTube links and encouraged us to research more. While both ChatGPT and Microsoft Copilot proved capable of delivering accurate information on the basic rules of basketball, their distinct approaches underscore the importance of selecting the most suitable AI tool for specific tasks, with Microsoft Copilot’s coding-focused expertise and supplementary research resources setting it apart in this particular instance.

Critical Thinking

Employing critical thinking skills when assessing the credibility and relevance of AI output is essential to ensure informed decision-making and mitigate potential risks. Given the complexity of AI algorithms and the inherent biases in data, critical evaluation helps individuals discern the accuracy and reliability of AI-generated information. Figure D depicts a prompt asking if individuals should receive the COVID-19 Vaccine. There are two perspectives on the COVID-19 vaccine: either you are for or against it. This research aims not to determine which side is right or wrong but to show that the AI output takes a “pro-vaccine” stance on whether individuals should receive the vaccine. Furthermore, critical thinking enables individuals to consider the context and limitations of AI output, allowing them to determine its applicability to specific situations or tasks.

Chat GPT vs. Microsoft Copilot

Through our experimentation with ChatGPT and Microsoft Copilot, we delved into the purpose and specialization of these AI models. ChatGPT, trained on a diverse range of internet text data, is designed to excel in natural language understanding and generation tasks, serving as a versatile tool for various applications, including writing, conversation, and information retrieval. In contrast, Microsoft Copilot, trained on public code repositories, is specifically tailored for software development, offering code suggestions, autocompletion, and other programming-related assistance. While both AI platforms demonstrate remarkable capabilities within their domains, it’s essential to recognize their distinct purposes and utilize them accordingly. See Figures E and F in the appendix regarding the research between these two platforms.

Expert Consultation

Like cross-referencing, seeking experts’ opinions is vital. These individuals possess specialized knowledge, skills, and experience that can provide invaluable insights and guidance. Consulting with experts allows individuals and organizations to tap into a wealth of knowledge and perspective that may not be readily available. This will enable individuals to make more informed choices, mitigate potential pitfalls, and increase their chances of success in their endeavors. The output of AI will not always be relevant to a specific individual and their circumstances. Some people in the world use Google as a means to self-diagnose themselves when they are sick or injured, which can lead to improper treatment. Out of curiosity, instead of using Google to attempt to make a self-diagnosis, we used Microsoft CoPilot, which resulted in a similar experience. There was no precise diagnosis, but there were plenty of possibilities on what my “made-up” condition could be. This experiment showed the need for the professional opinion and knowledge of an expert, such as a doctor, to rule out the actual condition and viable treatment options.

Conclusion

Our primary research proved helpful in teaching us more about how trustworthy AI information is and how we can use AI in academic settings. ChatGPT and Microsoft Copilot are useful for learning, but we must be careful about when and how we use them. It’s essential to understand what they’re good at and what they have been programmed for to get the most accurate data. Once they also give output, it is vital to fact-check and cross reference before using it for important projects or whatever end goal you wish. When we use AI in classrooms, mixing it with human knowledge and thinking is a good idea. This way, we can expand our learning and generate new ideas. As AI develops, we must keep trying new things and studying how it can help us learn even more.

Recommendations

This section aims to help readers and users of various AI platforms assess the quality and credibility of AI output. The following information is the best tips/recommendations for implementing AI.

1. Considering Stakeholders

AI is accessible to anyone with a technological device and a wifi connection. The usage of AI platforms spans professional interdisciplinary fields, academic institutions, personal research, and much more. With AI platforms pulling from many databases, it is vital to be specific when using prompts to draw relevant text to the targeted audience. For example, if an individual wants to know how communication is essential in neurosurgery, giving this context to the AI platform would be necessary to pull relevant information to the occupation.

2. Enhancing Learning

From the primary research, we concluded that AI is a reliable source for obtaining a generalized overview of a subject. We know that AI output isn’t perfect for reasons discussed previously in the report, one being that AI’s most recent update was in 2022. We know this when researching politics or questions regarding predictions about upcoming sporting events. It is important to cross-reference the outputs with secondary sources like professional journals and news outlets. It is an excellent tool for assistance but not for replacing traditional research and writing methods.

3. AI Professionalism

When researching AI and the different platforms, we found that each AI has a purpose, and some specifically emphasize the user interface. Chat GPT, honed through exposure to many online textual resources, has been crafted to master comprehending and producing natural language. It is a flexible asset across multiple domains, such as writing, dialogue, and data retrieval. Conversely, Microsoft Copilot, fine-tuned on extensive public code repositories, is finely tuned for software engineering tasks, furnishing suggestions, completing code, and providing other programming-centric aid.

4. Future Research

From an academic standpoint, AI should be used for summarization, proofreading, and other mundane tasks. Individuals should do alternative research to check AI’s output and ensure accuracy and credibility. There are many things AI does not know about this year. Using AI for research or writing about current events would not be reliable. For example, AI doesn’t know that OJ Simpson has passed away, and its last update on OJ was in January 2022, when ChatGPT was last updated.

References

Godlewski, N. (2023, December 28). First-time AI users guide: What to expect, best practices: Square. The bottom line by Square. https://squareup.com/us/en/the-bottom-line/operating-your-business/first-time-ai-users-guide

Gurnee. (2023). Research guides: Artificial Intelligence (AI) Resource Guide: Evaluating AI Output. Research Guides at School of the Art Institute of Chicago. https://libraryguides.saic.edu/ai/output

Mishra, S. (2023). Ethical Implications of Artificial Intelligence and Machine Learning in Libraries and Information Centres: A Frameworks, Challenges, and Best Practices Library Philosophy and Practices, 1-13.https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=14939&context=libphilprac

Perrotta, R. (2023). Peabody EDI Office responds to MSU shooting with an email written by ChatGPT. The Vanderbilt Hustler. 1. https://vanderbilthustler.com/2023/02/17/peabody-edi-office-responds-to-msu-shooting-with-email-written-using-chatgpt/

Soliman, A. Le, Z, & Parwani, A. V. (2024). Artificial intelligence’s impact on breast cancer pathology: a literature review. Diagnostic Pathology, 19(1), 1-18. https://doi-org.dist.lib.usu.edu/10.1186/s13000-024-01453-w

Team, AIC (2023). Quality Control: How to Verify AI Generated Content. https://aicontentfy.com/en/blog//quality-control-how-to-verify-ai-generated-content

Appendix

Figure A: Cross Referencing Mars Question
Figure A: Cross Referencing Mars Question

Figure B: Basketball Rules, Microsoft Copilot
Figure B: Basketball Rules, Microsoft Copilot

Figure C: Basketball Rules - ChatGPT
Figure C: Basketball Rules – ChatGPT

Figure D: Vaccine Bias
Figure D: Vaccine Bias

Figure E: AI differences - ChatGPT
Figure E: AI differences – ChatGPT

Figure F: AI Differences - Copilot
Figure F: AI Differences – Copilot