Date:

IBM Shapes AI Governance in Education with Smarter Balanced

Defining the Challenge

The California-based Smarter Balanced Assessment Consortium is a member-led public organization that provides assessment systems to educators working in K-12 and higher education. The organization, which was founded in 2010, partners with state education agencies to develop innovative, standards-aligned test assessment systems. Smarter Balanced supports educators with tools, lessons, and resources, including formative, interim, and summative assessments, which help educators identify learning opportunities and strengthen student learning.

Smarter Balanced is committed to evolution and innovation in an ever-changing educational landscape. Through a collaboration with IBM Consulting, it aims to explore a principled approach for the use of artificial intelligence (AI) in educational assessments. The collaboration was announced in early 2024 and is ongoing.

Incorporating Diversity

For the Smarter Balanced project, the combined teams established a think tank that included a diverse set of subject-matter experts and thought leaders. This group comprised experts in the fields of educational assessment and law, neurodivergent people, students, people with accessibility challenges, and others.

"The Smarter Balanced AI think tank is about ensuring that AI is trustworthy and responsible and that our AI enhances learning experiences for students," said think tank member Charlotte Dungan, Program Architect of AI Bootcamps for the Mark Cuban Foundation.

The goal of the think tank is not to simply incorporate its members’ expertise, viewpoints, and lived experiences into the governance framework in a "one-and-done" way, but iteratively. The approach mirrors a key principle of AI ethics at IBM: the purpose of AI is to augment human intelligence, not replace it. Systems that incorporate ongoing input, evaluation, and review by diverse stakeholders can better foster trust and promote equitable outcomes, ultimately creating a more inclusive and effective educational environment.

Exploring Student-Centered Values

One of the first efforts that Smarter Balanced and IBM Consulting undertook as a group was to ascertain the human values that we want to see reflected in AI models. This is not a new ethical question, and thus we landed on a set of values and definitions that map to IBM’s AI pillars, or fundamental properties for trustworthy AI:

  • Explainability: Having functions and outcomes that can be explained non-technically
  • Fairness: Treating people equitably
  • Robustness: Security and reliability, resistance to adversarial attacks
  • Transparency: Disclosure of AI usage, functionality, and data use
  • Data Privacy: Disclosure and safeguarding of users’ privacy and data rights

Operationalizing these values in any organization is a challenge. In an organization that assesses students’ skill sets, the bar is even higher. But the potential benefits of AI make this work worthwhile: "With generative AI, we have an opportunity to engage students better, assess them accurately with timely and actionable feedback, and build in 21st-century skills that are actively enhanced with AI tools, including creativity, critical thinking, communication strategies, social-emotional learning, and growth mindset," said Dungan.

Exploring Layers of Effect and Disparate Impact

For this exercise, we undertook a design thinking framework called Layers of Effect, one of several frameworks IBM Design for AI has donated to the open source community Design Ethically. The Layers of Effect framework asks stakeholders to consider primary, secondary, and tertiary effects of their products or experiences.

The primary (desired) effect of the AI-enhanced test assessment system is a more equitable, representative, and effective tool that improves learning outcomes across the educational system. The secondary effects might include boosting efficiencies and gathering relevant data to help with better resource allocation where it is most needed. Tertiary effects are possibly known and unintended. This is where stakeholders must explore what potential unintended harm might look like.

The teams identified five categories of potential high-level harm:

  • Harmful bias considerations that do not account for or support students from vulnerable populations that may need extra resources and perspectives to support their diverse needs.
  • Issues related to cybersecurity and personally identifiable information (PII) in school systems that do not have adequate procedures in place for their devices and networks.
  • Lack of governance and guardrails that ensure AI models continue to behave in intended ways.
  • Lack of appropriate communications to parents, students, teachers, and administrative staff around the intended use of AI systems in schools. These communications should describe protections against inappropriate use, and agency, such as how to opt out.
  • Limited off-campus connectivity that might reduce access to technology and the subsequent use of AI, particularly in rural areas.

Initially applied in legal cases, disparate impact assessments help organizations identify potential biases. These assessments explore how seemingly neutral policies and practices can disproportionately affect individuals from protected classes, such as those susceptible to discrimination based on race, religion, gender, and other characteristics. Such assessments have proven effective in the development of policies related to hiring, lending, and healthcare. In our education use case, we sought to consider cohorts of students who might experience inequitable outcomes from assessments due to their circumstances.

The groups identified as most susceptible to potential harm included:

  • Those who struggle with mental health
  • Those who come from more varied socioeconomic backgrounds, including those who are not housed
  • Those whose dominant language is not English
  • Those with other non-language cultural considerations
  • Those who are neurodivergent or have accessibility issues

In Conclusion

This is a bigger conversation than just IBM and Smarter Balanced. We are publicly publishing our process because we believe those experimenting with new uses for AI should consider the unintended effects of their models. We want to help ensure that AI models that are being built for education are serving the needs not just of a few, but for society in its entirety, with all its diversity.

"We see this as an opportunity to use a principled approach and develop student-centered values that will help the educational measurement community adopt trustworthy AI. By detailing the process that is being used by this initiative, we hope to help organizations that are considering AI-powered educational assessments have better, more granular conversations about the use of responsible AI in educational measurement." — Rochelle Michel, Deputy Executive Program Officer, Smarter Balanced.

FAQs

Q: What is the goal of the Smarter Balanced AI think tank?
A: The goal of the think tank is to ensure that AI is trustworthy and responsible and that our AI enhances learning experiences for students.

Q: What are the values that guide the use of AI in assessing children and young learners?
A: The values that guide the use of AI in assessing children and young learners include explainability, fairness, robustness, transparency, and data privacy.

Q: How will the teams ensure that AI models continue to behave in intended ways?
A: The teams will ensure that AI models continue to behave in intended ways by incorporating ongoing input, evaluation, and review by diverse stakeholders.

Q: What are the potential high-level harms associated with AI-powered educational assessments?
A: The potential high-level harms associated with AI-powered educational assessments include harmful bias considerations, issues related to cybersecurity and personally identifiable information, lack of governance and guardrails, lack of appropriate communications, and limited off-campus connectivity.

Q: Who is most susceptible to potential harm from AI-powered educational assessments?
A: Those who struggle with mental health, those who come from more varied socioeconomic backgrounds, those whose dominant language is not English, those with other non-language cultural considerations, and those who are neurodivergent or have accessibility issues are most susceptible to potential harm from AI-powered educational assessments.

Latest stories

Read More

Generate single title from this title AWS Amplifyの古いハンズオンを実施してハマった話 in 100 -150 characters. And it must return only title i dont want any extra information...

Write an article about JP Contents Hubには多くのサービスに関するハンズオンが掲載されており、少しでも触っていないサービスを触ろうとハンズオンにチャレンジする際に有意義なコンテンツとなっているが、CI/CD for AWS...

LEAVE A REPLY

Please enter your comment!
Please enter your name here