๐ง Task Overview
Polarization refers to the division of opinions into two sharply contrasting groups, especially when marked by hostility, intolerance, or exclusion. In the digital era, polarization is intensifying across platforms and geographies, affecting public discourse, exacerbating conflicts, and contributing to societal fragmentation.
This shared task is the first SemEval task on polarization, and it seeks to advance the computational understanding of how polarization manifests in text across multiple languages, cultures, and event types. Participants will develop models that can detect and interpret polarization in a variety of online contexts.
The task focuses on textual data collected from real-world events including elections, international conflicts, social protests, and ideological debates. The goal is to evaluate systems' abilities to identify polarized content and classify its target.
๐ Multilingual, Multicultural, and Multievent Scope
This task emphasizes global inclusivity and cross-cultural representation. We include data from multiple languages, many of which are low-resource and underrepresented in mainstream NLP tasks.
Languages include:
- High resource: English, German, Spanish, Arabic
- Mid and Low resource: Urdu, Mozambican Portuguese, Amharic, Kinyarwanda, Hausa, Igbo, Twi, Swahili, isiXhosa, Zulu, Emakhuwa, etc.
๐งช Task Format and Subtasks
Participants may choose to compete in one or more of the following three subtasks:
Subtask 1: Polarization Detection
Binary classification: Identify whether a post contains polarized content.
- Labels: Polarized, Not Polarized
Subtask 2: Polarization Type Classification
Classify the target of polarization.
- Political groups or ideologies
- Religious groups or beliefs
- Racial or ethnic communities
- Gender identities
- Sexual orientations
- Other/domain-specific targets
Subtask 3: Manifestation Identification
Classify how polarization is expressed. Multiple labels possible.
- Stereotyping
- Vilification
- Dehumanization
- Deindividuation
- Use of Extreme Language
- Lack of Empathy
- Invalidation
๐ Data Description
Dataset sources: News websites, Reddit, blogs, Bluesky, regional forums. Event types include elections, conflicts, gender rights, migration, and more.
Each language has 3,000โ5,000 annotated instances. Tools used: Label Studio, Prolific, Potato, Mechanical Turk.
๐ฏ Research Contributions
- Advancing socially responsible AI
- Supporting low-resource language NLP
- Fostering explainable and inclusive NLP systems
- Creating multilingual benchmarks for polarization detection
๐ Timeline
Phase | Date (Tentative) |
---|---|
31 March 2025 | Call for Participation Opens / Task Proposals Due |
8 August 2025 | Trial Data Released |
1 September 2025 | Training Data Released |
1 December 2025 | Test Data Released (internal deadline; not for public release) |
10 January 2026 | System Submission Deadline / Evaluation Start |
31 January 2026 | Evaluation Results Released / Evaluation End |
February 2026 | System Paper Submission Deadline |
March 2026 | Notification of Acceptance |
April 2026 | Camera-Ready Papers Due |
Summer 2026 | SemEval-2026 Workshop at [Conference Location TBD] |
๐งโ๐คโ๐ง Who Should Participate?
- NLP researchers and developers
- Computational social science teams
- Practitioners in hate speech/misinformation
- Peacebuilding and civil society organizations
- Students and cross-disciplinary academics
๐ฅ Organizing Team
Researchers from: University of Hamburg, Bahir Dar University, Macquarie University, Imperial College London, University of Pretoria, Zayed University, Bayero University Kano, Northeastern University