Using Generative AI in Research: Limitations & Warnings

This Research Guide provides information on the use of Generative AI in academic papers and research, and provides guidance on the ethical use of Generative AI in an academic setting.

Known Limitations Of Generative AI

In addition to many of the known limitations outlined below, generative AI may be prone to problems yet to be discovered or not fully understood.

  • Large language models (LLMs) are prone to "hallucinations" - generating fictitious information, presented as factual or accurate. This can include citations, publications, biographical information, and other information commonly used in research and academic papers. 
  • In addition to fictitious information, answers generated by LLMs can be wrong, often times presented as correct or authoritative.
  • Due to their fundamental structure, as well as regularly released newer versions, content from many generative AI models can be difficult to reproduce consistently. This is particularly problematic in research and academia where reproducibility is a cornerstone of establishing credibility.
  • Generative AI models are not databases of knowledge but rather an attempt to synthesize and reproduce the information they have been trained on. This makes it incredibly difficult to validate and properly attribute the basis of their content,
  • Many common generative AI tools are not connected to the internet and cannot update or verify the content they generate.
  • The nature of generative AI models, particularly when given simple prompts, can be very reductive, resulting in content that is over-simplified, low quality, or very generic.  
  • Many generative AI models (including ChatGPT) are trained on data with cutoff dates, resulting in outdated information or the inability to provide answers about current information and events.  In some cases, the data cutoff date is not made explicitly clear to the user.

 

Data Privacy Precautions

Extra caution should be exercised when working with private, sensitive, or identifiable information, both directly and indirectly regardless of whether you are using a generative AI service or hosting your own model. Although some generative AI tools allow users to set their own data retention policy, many collect user prompts and other user data, presumably for training data purposes. Regardless of data collection and retention policies, USC researchers, staff, and faculty should be particularly cautious to avoid sharing any student information (which could be a FERPA violation), proprietary data, or other controlled/regulated information.

Additional Considerations

In addition to offering direct access to generative AI tools and services, many companies are incorporating generative AI functionality into existing products and applications. Examples include Google Workspace tools (Docs, Sheets, Slides, etc.), Microsoft Office, Notion, and Adobe Photoshop. Third party plugins and extensions such as GitHub Copilot are also built upon generative AI models.  Extra care and caution should be exercised when using these tools for your research and academic work. In particular, the use of auto-completion of sentences and generating text should be avoided unless explicitly permitted or part of the assignment.  Similarly, when working with images or video, the use of generative AI assistance should be clearly communicated and properly attributed.

Detecting Generative AI

In an attempt to combat undisclosed and inappropriate uses of generative AI content, many organizations have started to develop and promote generative AI detectors.  These tools rely on AI to try to flag content as being created by generative AI. Despite good intentions, these tools can be unreliable, and in many cases, have falsely flagged student content as being created by AI when it was originally created by a human.  As such, it is inadvisable to rely solely on these tools to identify whether an assignment or other work was created by generative AI.
 
When in doubt, professors should speak with their student(s) to better understand if and how generative AI tools were used.  This can serve as an important opportunity for both parties to discuss the nuances of the technology and directly address any questions or concerns.  For additional assistance and resources, professors should contact the Office of Academic Integrity.