Date:

AI Empowers Corporate Interests in Higher Ed

On Feb. 15, Google DeepMind employee Susan Zhang shared on X a sponsored LinkedIn message she received stating that the University of Michigan is licensing academic speech data and student papers for training and tuning large language models (LLMs). As Zhang’s post spread across social media, outrage over the monetization of student data quickly grew, prompting Michigan to issue an official statement.

According to the university, the post had been sent out by “a new third party vendor that has since been asked to halt their work.” Furthermore, the university argued that rather than “student data” being offered for sale, the data set consisted of anonymized student papers and recordings, voluntarily contributed about two decades or more prior with signed consent for improving “writing and articulation in education.” While the release of this statement helped to calm the backlash, this case offers a crucial window into how the ethics of student data use are tied up with commercial interests in this latest period of AI fever. We shouldn’t be too quick to forget it.

Conversations about artificial intelligence in higher education have been all too consumed by concerns about academic integrity, on the one hand, and how to use education as a vehicle for keeping pace with AI innovation on the other. Instead, this moment can be leveraged to center concerns about the corporate takeover of higher education.

While AI is being framed as a contemporary scientific breakthrough, AI research goes back at least 70 years. However, increasing excitement about the commercial potentials of machine learning have led tech companies to rebrand AI as “a multitool of efficiency and precision, suitable for nearly any purpose across countless domains.” As Meredith Whittaker points out, LLMs are one of the most data- and computing-intensive techniques in AI. Precisely because LLMs and machine learning require vast computational infrastructure, corporate resources and practices are foundational to this type of AI development.

Transparency

One major challenge concerning the development and use of AI in higher education is a lack of transparency. Even in the University of Michigan’s official statement, the name of the third-party vendor (Catalyst Research Alliance) was not included. It’s also unclear whether the students who consented to the Michigan studies agreed to or even imagined their data being packaged and sold decades later for LLM research and development.

Partnerships and Agreements

Earlier this year, two major academic publishers, Wiley and Taylor & Francis, announced partnerships with major tech companies, including Microsoft, to provide academic content for training AI tools, including for automating various aspects of the research process. These agreements do not require author permission for scholarship to be used for training purposes, and many are skeptical of assurances regarding attribution and author compensation. Academic labor is being used to generate AI-related revenues for publishing companies that, as we’ve already seen, may not even disclose which tech companies they’re partnering with, nor publicize the deals on their websites. Cases like these have prompted the Authors Guild to recommend a clause in publishing distribution agreements that prohibits AI training use without the author’s “express permission.”

Privacy

Many people might also assume that the Family Educational Rights and Privacy Act protects student information from corporate misuse or exploitation, including for training AI. However, FERPA not only fails to address student privacy concerns related to AI, but in fact enables public-private data sharing. Universities have broad latitude in determining whether to share student data with private vendors. Additionally, whatever degree of transparency privacy policies may offer, students are rarely empowered to have control over, or change, the terms of these policies.

FERPA and Student Data

Educational institutions are permitted to share student data without consent with a “school official,” a term that after a 2008 change to the FERPA regulations was defined to include contractors, consultants, volunteers and others “to whom an educational agency or institution has outsourced institutional services or functions it would otherwise use employees to perform.” While these parties must have a “legitimate educational interest” in the education records, universities have discretion in defining what counts as a “legitimate educational interest,” and so this flexibility could permit institutions to potentially sell student information for funding purposes. Under conditions of austerity, where public funding for education is increasingly curtailed and restricted, student data is especially vulnerable to a wide range of uses with little oversight or accountability.

Exploitation

The practice of sharing student data with little accountability or oversight not only raises privacy issues, but also permits student data to be exploited for the purposes of creating and improving private firms’ products and services. In this sense, private firms are able to save money on what would otherwise require investment in market research and product development by virtue of being able to put to work the student data they collect. Student data typically becomes indefinite assets of universities and private firms once collected, especially once de-identified. There is also a sense of entitlement to student data, not only among university administrators and private technology firms, but in many cases, among university researchers who are contributing to the development of AI using data from students.

Conclusion

As I argue in Smart University: Student Surveillance in the Digital Age (Johns Hopkins Press), at a time when university administrators are suggesting replacing striking graduate students with generative AI tools, school districts are using ChatGPT to decide which titles should be removed from library shelves and university researchers are taking photos of students without their knowledge to train facial recognition software, it is crucial that we get to democratically deliberate about whether and how a range of digital tools are incorporated into the lives of those who live and work on college campuses. This includes the ethics of using data from students and faculty to improve the efficacy of AI in ways that drive power and profits to private companies at our expense.

Frequently Asked Questions

Q: What is the purpose of licensing academic speech data and student papers for training and tuning large language models (LLMs)?
A: The purpose is to improve the efficacy of AI in ways that drive power and profits to private companies at the expense of students and faculty.

Q: What is the Family Educational Rights and Privacy Act (FERPA)?
A: FERPA is a federal law that regulates the disclosure of student education records. However, it fails to address student privacy concerns related to AI and enables public-private data sharing.

Q: What are the concerns about the use of student data for AI development?
A: The concerns include a lack of transparency, privacy issues, and exploitation of student data for commercial purposes.

Q: What can be done to address these concerns?
A: Students and faculty can use a range of strategies, including open letters, public records requests, critical education, and refusals to work on research and development for harmful AI applications. Additionally, we need to demand more control over our labor and the data that is collected from us.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here