Important Dates

🧠 Task Overview

Polarization refers to the division of opinions into two sharply contrasting groups, especially when marked by hostility, intolerance, or exclusion. In the digital era, polarization is intensifying across platforms and geographies, affecting public discourse, exacerbating conflicts, and contributing to societal fragmentation.

This shared task is the first SemEval task on polarization, and it seeks to advance the computational understanding of how polarization manifests in text across multiple languages, cultures, and event types. Participants will develop models that can detect and interpret polarization in a variety of online contexts.

The task focuses on textual data collected from real-world events including elections, international conflicts, social protests, and ideological debates. The goal is to evaluate systems' abilities to identify polarized content and classify its target.

🌍 Multilingual, Multicultural, and Multievent Scope

This task emphasizes global inclusivity and cross-cultural representation. We include data from multiple languages, many of which are low-resource and underrepresented in mainstream NLP tasks.

Languages include:

High resource: English, German, Spanish, Arabic
Mid and Low resource: Urdu, Mozambican Portuguese, Amharic, Kinyarwanda, Hausa, Igbo, Twi, Swahili, isiXhosa, Zulu, Emakhuwa, etc.

🧪 Task Format and Subtasks

Participants may choose to compete in one or more of the following three subtasks:

Subtask 1: Polarization Detection

Binary classification: Identify whether a post contains polarized content.

Labels: Polarized, Not Polarized

Subtask 2: Polarization Type Classification

Classify the target of polarization.

Political groups or ideologies
Religious groups or beliefs
Racial or ethnic communities
Gender identities
Sexual orientations
Other/domain-specific targets

Subtask 3: Manifestation Identification

Classify how polarization is expressed. Multiple labels possible.

Stereotyping
Vilification
Dehumanization
Deindividuation
Use of Extreme Language
Lack of Empathy
Invalidation

📁 Data Description

Dataset sources: News websites, Reddit, blogs, Bluesky, regional forums. Event types include elections, conflicts, gender rights, migration, and more.

Each language has 3,000–5,000 annotated instances. Tools used: Label Studio, Prolific, Potato, Mechanical Turk.

🎯 Research Contributions

Advancing socially responsible AI
Supporting low-resource language NLP
Fostering explainable and inclusive NLP systems
Creating multilingual benchmarks for polarization detection

📅 Timeline

Phase	Date (Tentative)
31 March 2025	Call for Participation Opens / Task Proposals Due
8 August 2025	Trial Data Released
1 September 2025	Training Data Released
1 December 2025	Test Data Released (internal deadline; not for public release)
10 January 2026	System Submission Deadline / Evaluation Start
31 January 2026	Evaluation Results Released / Evaluation End
February 2026	System Paper Submission Deadline
March 2026	Notification of Acceptance
April 2026	Camera-Ready Papers Due
Summer 2026	SemEval-2026 Workshop at [Conference Location TBD]

🧑‍🤝‍🧑 Who Should Participate?

NLP researchers and developers
Computational social science teams
Practitioners in hate speech/misinformation
Peacebuilding and civil society organizations
Students and cross-disciplinary academics

👥 Organizing Team

Researchers from: University of Hamburg, Bahir Dar University, Macquarie University, Imperial College London, University of Pretoria, Zayed University, Bayero University Kano, Northeastern University

📬 Contact and Community

Email: polarization-semeval-2026-organisers@googlegroups.com
Discord Channel: Link TBA
GitHub: