AI, Academic Integrity, and Authentic Assessment: An Ethical Path Forward for Education
Contents
Introduction
Outlining Ethical Boundaries for AI Plagiarism
Understanding How AI Detectors Work
Anti-Plagiarism Tools vs AI Detectors: What’s the Difference?
Anthology’s Testing Supports Research-Identified Issues with AI Detection
A More Ethical and Effective Approach: Empowering Instructors
An Eye to the Future: What Comes Next for AI, Detection, and Academic Integrity?
References List
Last Updated Winter/Spring 2024
Most writers already use some forms of AI writing tools without even thinking about it–consider, for example, how often you use grammar checks or predictive text algorithms on an average day. The newest generation of AI tools, however, can perform extremely sophisticated writing tasks with very little input or effort from the human user, and this raises some difficult issues for students and instructors.
At what point does a student’s use of artificial intelligence in their writing stop being the legitimate use of a tool and become plagiarism or academic misconduct? There is no single answer to this question, and there are few, if any, hard rules around AI that all instructors would agree on. Given the newness of this technology, there’s a lot we all need to learn about what AI writing tools can do and how students can, should and shouldn’t use them in their academic work.
As a student, this can make the choice to use AI tools a new and somewhat complicated spin on the academic integrity issues discussed throughout this site. As with any issue related to academic integrity, your first response should be to make sure you understand your instructor’s expectations for a given class and assignment. However, the newness of AI and the relatively uncharted range of things it can do means that instructors don’t always know how to articulate what they consider appropriate or inappropriate use of AI. This places a heavier responsibility on students to think through the ethical implications of their use of AI.
This is not just a matter of avoiding the consequences of accidentally or deliberately plagiarizing; it is also about making sure that you actually receive the benefits of the education you’re spending your time and resources pursuing. AI can unquestionably make many writing tasks easier, but as a student you should consider the degree to which making a given assignment easier is better or more beneficial for you in the long run. This page is intended to help you navigate these issues and make informed decisions about how to use AI writing tools ethically in your academic work.
For our purposes here, it’s easiest to think of AI writing tools in two broad categories:
AI Editing tools help writers improve text that they have written themselves. This category includes a number of tools that nearly all writers use routinely, such as spell checkers and bibliography generators like Zotero or EasyBib. In their most sophisticated form, though, AI editors can proofread and correct the grammar in whole texts or even revise them entirely to make the writing more “formal,” more “academic,” or otherwise more appropriate for the intended audience (at least to the degree the AI understands them).
Generative AI Tools actually create new text (or, in some cases, images, slides, charts, music, or video) for their users. Again, this category includes tools that many of us use regularly, like the predictive text algorithm integrated into most messaging apps. At the far end of this category, though, are tools like ChatGPT and Gemini, which can theoretically write entire essays if they’re simply fed the prompt.
Some of the simpler and more common ways to use these tools don’t present much of an ethical quandary. Very few instructors would object to students using the spelling and grammar checks built into their word processors.* Conversely, most instructors would consider it plagiarism if a student fed their assignment prompt into ChatGPT and submitted the essay that the AI generated as their own work.
In between these two extremes, though, lies a whole range of more complex uses for AI tools that aren’t so easy to label as acceptable or unacceptable in academic contexts. We give several examples of this in the Sample Scenarios below, but before we get to that, we should consider some general principles you and your instructor might use to decide what uses for AI are acceptable:
*A major exception here is introductory language classes (e.g. Spanish 101-104). Since learning basic spelling, grammar, and phrasing is a major part of the curriculum in these courses, many instructors in these courses would object to the use of spell check or predictive text.
First and foremost, it’s important to reiterate that your instructor has the final say on what does and does not constitute plagiarism or academic misconduct on a given assignment. So, if your instructor explicitly says that a given use of AI is acceptable or unacceptable, then there’s nothing more you need to consider. If your instructor’s expectations are unclear, though, or if you’re considering using an AI tool that seems to fall outside the guidelines your instructor has given, you’ll also need to evaluate the ethics yourself.
The good news is that, while the uses of artificial intelligence in academic writing may be new, the same five basic principles around plagiarism that we discussed elsewhere on this site still apply here. So, to help you consider whether a given use of AI tools is ethical or not, you can start by considering…
Education: What am I supposed to learn from this assignment? How is it intended to help me develop my writing or thinking skills, or to better understand the course material? Will a particular use of AI undermine or defeat the purpose of this assignment?
Attribution of Credit: Could a particular use of AI tools cause me to take credit for ideas that aren’t my own? How might an AI obscure my use of sources or draw on sources that I’m not aware of and therefore can’t document properly?
Maintaining a Scholarly Discourse: Will a particular use of AI help me to build on the ideas of others and express new ideas of my own? Or will it cause me to simply restate ideas that have already been articulated elsewhere?
Academic Integrity: What aspects of this assignment does the instructor expect to be the result of my effort alone? Would a particular use of AI cause me to deceive my instructor into thinking I put intellectual labor into my writing that I did not? Is a particular use of AI likely to produce false data or misinformation, which I would take responsibility for by submitting it under my name?
Intellectual Property: Could a particular use of AI cause me to appropriate text or ideas that are owned by other people?
Finally, if considering these questions still leaves you unsure about a given use of AI, ask yourself this: If I asked a human being to do the work that this AI is doing for me, would it still be okay? If the answer is “no,” or even “I’m not sure,” then your best bet would be to avoid that use of AI.
Unlike the scenarios described elsewhere on this site, we’re not providing analysis to go with these examples. This is because the applications for AI in academic writing are so new that there isn’t a clear consensus or general practice that we can provide. Instead, we’re providing these scenarios to help you and your instructors open a dialogue about what uses for AI are acceptable or unacceptable in their classes. We encourage you to think about each of these situations using the questions outlined above and to ask your instructors what they think.
A student is extremely insecure about their grammar or writing style. So, for an essay in their History class, they feed their draft into an AI tool that promises to “tune” their language to make it more formal and academic. The tool makes several dozen changes to the punctuation, sentence structure, and phrasing throughout the draft, returning a new version of the essay that reads to the student as a more polished version of the essay they wrote. The student then submits the AI-polished version of the essay to their instructor with no additional changes on their part.
A student in a Psychology class is writing a literature review that discusses the existing scholarship around violence in video games. To get started, they go to an AI text generator and ask “do psychologists believe that video games cause violent behavior?” The AI writes three or four paragraphs in response that summarize the various ways that psychologists have answered that question, though it cites no sources and provides no details about particular experiments, dates, or psychologists. The student then uses the AI’s response as a kind of outline: they write their literature review following the main ideas expressed by the AI, plugging in references to specific articles and sources they’ve found through their own research. The final draft that the student submit to their instructor is much more detailed than the AI’s version, and the actual text is almost entirely written by the student, but it makes basically the same points in the same order as the draft written by the AI.
A student in a Philosophy class is struggling to keep up with the reading, which they find dense and confusing. To make things easier, they ask an AI chatbot to summarize the readings for them. For each reading, the AI creates a summary that seems to cover the major ideas, but in much simpler language. At first, the student uses these summaries as a guide to help them read the assigned texts, but as the term goes on the student finds that they only read the summaries, and that seems to be enough to participate in class discussion and keep up with the written work in the course.
A student in an Economics class is having difficulty getting started on an essay assignment with a fairly open prompt, so they enter the prompt into a generative AI tool multiple times to see a range of different topics and approaches. Ultimately, the student decides to write on one of the topics the AI came up with, but as soon as they do they delete the AI’s essay and write the paper themselves.
A student in a Political Science class is writing a position paper in response to the argument in one of their readings. They create a detailed outline that includes multiple quotes from the reading and long bullet points laying out the student’s response. They paste this outline into a text generator and ask it to “write a position paper that follows this outline.” The resulting essay is roughly 90% text written by the student, but the AI has added connecting words, punctuation, and transition sentences at the beginning of each paragraph. The student revises this essay, making various changes to both their language and the AI’s, before turning in the final draft.
A student in a Religious Studies class is assigned to write an annotated bibliography. The assignment calls for the student to find six academic sources on their topic and write a one-paragraph description of each source. The student will later use these sources in a formal research essay. The student asks an AI to list “the six most important academic sources” on their topic, and the AI gives them a list of six books. The student then asks the AI to write a one-paragraph summary of each book. The AI can’t manage to write citations, so the student writes those themselves, using information on Amazon.com (this also allows the student to confirm that all six books really exist, and weren’t just made up by the AI). The student then turns in the annotated bibliography, which contains their citations of the six sources found by the AI and the descriptions written by the AI.
Last Updated Winter/Spring 2024
Most writers already use some forms of AI writing tools without even thinking about it–consider, for example, how often you use grammar checks or predictive text algorithms on an average day. The newest generation of AI tools, however, can perform extremely sophisticated writing tasks with very little input or effort from the human user, and this raises some difficult issues for students and instructors.
At what point does a student’s use of artificial intelligence in their writing stop being the legitimate use of a tool and become plagiarism or academic misconduct? There is no single answer to this question, and there are few, if any, hard rules around AI that all instructors would agree on. Given the newness of this technology, there’s a lot we all need to learn about what AI writing tools can do and how students can, should and shouldn’t use them in their academic work.
As a student, this can make the choice to use AI tools a new and somewhat complicated spin on the academic integrity issues discussed throughout this site. As with any issue related to academic integrity, your first response should be to make sure you understand your instructor’s expectations for a given class and assignment. However, the newness of AI and the relatively uncharted range of things it can do means that instructors don’t always know how to articulate what they consider appropriate or inappropriate use of AI. This places a heavier responsibility on students to think through the ethical implications of their use of AI.
This is not just a matter of avoiding the consequences of accidentally or deliberately plagiarizing; it is also about making sure that you actually receive the benefits of the education you’re spending your time and resources pursuing. AI can unquestionably make many writing tasks easier, but as a student you should consider the degree to which making a given assignment easier is better or more beneficial for you in the long run. This page is intended to help you navigate these issues and make informed decisions about how to use AI writing tools ethically in your academic work.
For our purposes here, it’s easiest to think of AI writing tools in two broad categories:
AI Editing tools help writers improve text that they have written themselves. This category includes a number of tools that nearly all writers use routinely, such as spell checkers and bibliography generators like Zotero or EasyBib. In their most sophisticated form, though, AI editors can proofread and correct the grammar in whole texts or even revise them entirely to make the writing more “formal,” more “academic,” or otherwise more appropriate for the intended audience (at least to the degree the AI understands them).
Generative AI Tools actually create new text (or, in some cases, images, slides, charts, music, or video) for their users. Again, this category includes tools that many of us use regularly, like the predictive text algorithm integrated into most messaging apps. At the far end of this category, though, are tools like ChatGPT and Gemini, which can theoretically write entire essays if they’re simply fed the prompt.
Some of the simpler and more common ways to use these tools don’t present much of an ethical quandary. Very few instructors would object to students using the spelling and grammar checks built into their word processors.* Conversely, most instructors would consider it plagiarism if a student fed their assignment prompt into ChatGPT and submitted the essay that the AI generated as their own work.
In between these two extremes, though, lies a whole range of more complex uses for AI tools that aren’t so easy to label as acceptable or unacceptable in academic contexts. We give several examples of this in the Sample Scenarios below, but before we get to that, we should consider some general principles you and your instructor might use to decide what uses for AI are acceptable:
*A major exception here is introductory language classes (e.g. Spanish 101-104). Since learning basic spelling, grammar, and phrasing is a major part of the curriculum in these courses, many instructors in these courses would object to the use of spell check or predictive text.
First and foremost, it’s important to reiterate that your instructor has the final say on what does and does not constitute plagiarism or academic misconduct on a given assignment. So, if your instructor explicitly says that a given use of AI is acceptable or unacceptable, then there’s nothing more you need to consider. If your instructor’s expectations are unclear, though, or if you’re considering using an AI tool that seems to fall outside the guidelines your instructor has given, you’ll also need to evaluate the ethics yourself.
The good news is that, while the uses of artificial intelligence in academic writing may be new, the same five basic principles around plagiarism that we discussed elsewhere on this site still apply here. So, to help you consider whether a given use of AI tools is ethical or not, you can start by considering…
Education: What am I supposed to learn from this assignment? How is it intended to help me develop my writing or thinking skills, or to better understand the course material? Will a particular use of AI undermine or defeat the purpose of this assignment?
Attribution of Credit: Could a particular use of AI tools cause me to take credit for ideas that aren’t my own? How might an AI obscure my use of sources or draw on sources that I’m not aware of and therefore can’t document properly?
Maintaining a Scholarly Discourse: Will a particular use of AI help me to build on the ideas of others and express new ideas of my own? Or will it cause me to simply restate ideas that have already been articulated elsewhere?
Academic Integrity: What aspects of this assignment does the instructor expect to be the result of my effort alone? Would a particular use of AI cause me to deceive my instructor into thinking I put intellectual labor into my writing that I did not? Is a particular use of AI likely to produce false data or misinformation, which I would take responsibility for by submitting it under my name?
Intellectual Property: Could a particular use of AI cause me to appropriate text or ideas that are owned by other people?
Finally, if considering these questions still leaves you unsure about a given use of AI, ask yourself this: If I asked a human being to do the work that this AI is doing for me, would it still be okay? If the answer is “no,” or even “I’m not sure,” then your best bet would be to avoid that use of AI.
Unlike the scenarios described elsewhere on this site, we’re not providing analysis to go with these examples. This is because the applications for AI in academic writing are so new that there isn’t a clear consensus or general practice that we can provide. Instead, we’re providing these scenarios to help you and your instructors open a dialogue about what uses for AI are acceptable or unacceptable in their classes. We encourage you to think about each of these situations using the questions outlined above and to ask your instructors what they think.
A student is extremely insecure about their grammar or writing style. So, for an essay in their History class, they feed their draft into an AI tool that promises to “tune” their language to make it more formal and academic. The tool makes several dozen changes to the punctuation, sentence structure, and phrasing throughout the draft, returning a new version of the essay that reads to the student as a more polished version of the essay they wrote. The student then submits the AI-polished version of the essay to their instructor with no additional changes on their part.
A student in a Psychology class is writing a literature review that discusses the existing scholarship around violence in video games. To get started, they go to an AI text generator and ask “do psychologists believe that video games cause violent behavior?” The AI writes three or four paragraphs in response that summarize the various ways that psychologists have answered that question, though it cites no sources and provides no details about particular experiments, dates, or psychologists. The student then uses the AI’s response as a kind of outline: they write their literature review following the main ideas expressed by the AI, plugging in references to specific articles and sources they’ve found through their own research. The final draft that the student submit to their instructor is much more detailed than the AI’s version, and the actual text is almost entirely written by the student, but it makes basically the same points in the same order as the draft written by the AI.
A student in a Philosophy class is struggling to keep up with the reading, which they find dense and confusing. To make things easier, they ask an AI chatbot to summarize the readings for them. For each reading, the AI creates a summary that seems to cover the major ideas, but in much simpler language. At first, the student uses these summaries as a guide to help them read the assigned texts, but as the term goes on the student finds that they only read the summaries, and that seems to be enough to participate in class discussion and keep up with the written work in the course.
A student in an Economics class is having difficulty getting started on an essay assignment with a fairly open prompt, so they enter the prompt into a generative AI tool multiple times to see a range of different topics and approaches. Ultimately, the student decides to write on one of the topics the AI came up with, but as soon as they do they delete the AI’s essay and write the paper themselves.
A student in a Political Science class is writing a position paper in response to the argument in one of their readings. They create a detailed outline that includes multiple quotes from the reading and long bullet points laying out the student’s response. They paste this outline into a text generator and ask it to “write a position paper that follows this outline.” The resulting essay is roughly 90% text written by the student, but the AI has added connecting words, punctuation, and transition sentences at the beginning of each paragraph. The student revises this essay, making various changes to both their language and the AI’s, before turning in the final draft.
A student in a Religious Studies class is assigned to write an annotated bibliography. The assignment calls for the student to find six academic sources on their topic and write a one-paragraph description of each source. The student will later use these sources in a formal research essay. The student asks an AI to list “the six most important academic sources” on their topic, and the AI gives them a list of six books. The student then asks the AI to write a one-paragraph summary of each book. The AI can’t manage to write citations, so the student writes those themselves, using information on Amazon.com (this also allows the student to confirm that all six books really exist, and weren’t just made up by the AI). The student then turns in the annotated bibliography, which contains their citations of the six sources found by the AI and the descriptions written by the AI.
This article presents two problems, a confession, and one clear solution for leaders who are grappling with how to properly assess learners in the age of AI.
The following scenario is unfolding in academic department meetings across the country: A faculty member who is reviewing a learner's written formative assessment believes something is off. Perhaps the style doesn't match the student's earlier work or incorporates a cadence or structure that seems out of place for the assignment type. The stakes are high as the talk turns to whether generative artificial intelligence (AI) was used. Another instructor raises concerns over the accuracy and bias of AI-detection tools. The conversation stalls, and the issue remains unresolved.
AI is rapidly transforming the classroom, and assessment is one of the most impacted areas. Old methods and new tools that are purported to be able to detect AI-generated text raise new concerns. What is the path forward? Higher education technology leaders should empower instructors to develop authentic assessment methods that are facilitated by technology—including AI. While these leaders are tasked with setting up instructors to succeed, the path ahead has never looked so uncertain.
The landscape instructors face is transforming at an alarming pace. The proliferation of generative AI tools like ChatGPT has opened the door to misuse by learners. Even the concept of "misuse" is a gray area, as few institutions have laid out comprehensive policies around the use of AI tools by learners, instructors, or staff, and the line delineating what constitutes appropriate use has yet to be established. A recent survey of students, instructors, and administrators found that 51 percent of students would continue to use generative AI tools even if such tools were prohibited by their instructors or institutions.Footnote1 AI plagiarism—the use of generative AI to produce content that someone submits as their own work for assessment tasks—represents a real challenge across institutions. In an effort to solve the problem, IT and institutional leaders are grappling with what amounts to an arms race, procuring tools that claim to use AI to detect AI-generated plagiarism. If only it were that easy.
A growing body of research is casting doubt on the efficacy of AI detection, with two key issues emerging: accuracy and bias. First, how accurately can AI-generated content be detected? Five computer scientists from the University of Maryland recently conducted a study in which they emphatically concluded that AI-generated text cannot be reliably detected, and simple paraphrasing is sufficient to evade detection.Footnote2 A separate study of fourteen AI detectors conducted by academic researchers in six countries found that the accuracy of these tools ranges from 33 to 81 percent, depending on the provider and the methodology used.Footnote3
Second, are the current iterations of AI-detection tools creating new issues by inadvertently introducing bias? The data that AI models are trained on is scraped from the internet, where the content is predominantly written in English. Stanford researchers evaluated whether this might lead to challenges in identifying whether Test of English as a Foreign Language (TOEFL) essays were AI-generated. Indeed, the researchers found that more than half of TOEFL essays were incorrectly classified as AI-generated.Footnote4
These challenges do not end at the edge of campus. As an education technology product leader, my team and I evaluated whether AI detection fits our solution ecosystem, and we grappled with these same questions. The framework that guided our evaluation and continues to guide our product development is grounded by inclusivity and accessibility principles and declares that academic integrity policies should level the playing field for every student. Fairness matters. In evaluating whether our company should pursue AI-detection capabilities, we conducted beta testing with a cohort of clients. The results were sobering. The participants had low confidence in the ability of AI detectors to correctly identify AI-generated content, with 80 percent of respondents believing that AI detectors are, at best, correct only "sometimes." When it comes to detecting biases, our evaluation also matched research related to the introduction of bias, with samples written by students who speak English as a second language and students with autism spectrum disorder incorrectly identified as AI-generated content. We concluded it would not be ethical to employ AI-detection tools at this point of development.
Assessment is still a critical pedological element, but it's time to think about it differently. In its simplest form, authentic assessment moves away from accrued knowledge to focus on the practical application of skills, prioritizing complex tasks over binary right-or-wrong questions. In the age of generative AI, authentic assessment is more important than ever. Injecting personal perspectives, critical thinking, and self-reflection in a way that appears genuine is much more difficult for generative AI technologies than it is for humans. For instance, authentic assessment in a business course may mean holding a mock negotiation where students can actively demonstrate their comprehension of the material. Authentic assessment isn't new. Educators have long recognized its benefits, including the ability to provide a clear connection between coursework and careers through the application of knowledge.
Authentic assessment demands time, a resource that is in short supply for instructors. Anthology believes AI can make a real, tangible impact today and help meet the challenge of creating more time for instructors. Learning technology infused with AI tools can reduce the time needed to complete administrative and production tasks like creating courses, enabling educators to spend more time teaching and working with students. Peer assessment and group work require a high level of authenticity, and tools that seamlessly support group work are another example of how technology can help instructors be more efficient.
Learning technology that reduces the administrative burden is critical for empowering instructors to rethink their assessment methods. There's no way around it: authentic assessment requires an investment of time on the part of the instructor. Freeing time that would otherwise be spent on repetitive or lower-value tasks to develop, test, and implement authentic assessment is the path forward. While the landscape may be transforming rapidly, instructors remain an institution's most valuable resource in the classroom. Combating the challenge that generative AI poses in evaluating learners starts with doubling down on the human element and adopting a proactive approach. The wait for solutions that can accurately identify AI plagiarism while avoiding serious ethical concerns might be long. Instead, institutional leaders need to embrace AI as part of the larger landscape and develop policies and approaches that use it to assess learners more authentically. For a more in-depth vision, download Anthology's whitepaper on an ethical path forward for education.