Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Latest Research Reveals

Models & Benchmarks · Published: Aug 16, 2024 · James Hayes · ~7 min read

Author

James Hayes · Cloud & MLOps Staff Writer

Shipping models: inference, observability, cost, and what breaks in production.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University’s Latest Research Reveals

If your LLM can’t spot a tiger labeled as a kitten, it’s going to hallucinate confidently when production data gets messy. That’s the core risk Professor Wang Dequan’s research group at Shanghai Jiao Tong University highlights in their latest study. We’re not talking about edge cases here; we’re talking about AI safety and Superalignment failures where models generate responses for questions that have no valid answer because they missed the contradiction.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 2

The team introduced a multimodal benchmark called the Self-Contradictory Instructions (SCI) dataset to measure this gap. They also built an automatic dataset creation framework named AutoCreate to scale these tests. What stood out to me is that multimodal large models are significantly lacking in their ability to detect self-contradictory user instructions right now.

To fix this, they proposed a Cognitive Awakening Prompting (CAP) method. This approach injects cognitive capabilities from the external world to improve contradiction detection before the model commits to an answer. The paper is scheduled for publication at the 18th European Conference on Computer Vision (ECCV) in October of this year.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 3

In practice, hallucinations on contradictory inputs are a latency and cost sink we can’t ignore. I think cAP looks like a promising prompt-layer fix before model retraining becomes necessary.

Can Large Models Detect Conflicting Instructions?

I read the latest findings from Shanghai Jiao Tong University, and the production implication is stark: as we push context windows to their limits, our models are becoming dangerously compliant rather than critically aware. The research highlights a growing gap between raw capability and operational reliability in multimodal systems.

Multimodal large models have undeniably advanced, processing text and images with cognitive-like fluency. However, this success stems from aggressive instruction-following training that often prioritizes obedience over accuracy. This “over-compliance” is a feature for demos but a liability for production stability when users provide ambiguous or conflicting inputs.

Operationally, over-compliance creates silent failure modes in automated pipelines where the model hallucinates agreement instead of flagging errors.

The scale of these capabilities is now staggering. Models like Claude 3 and Gemini 1.5 Pro handle massive contexts—Claude 3 offers a 200K token window, while Gemini 1.5 Pro standardizes at 128K and reached 1M tokens in private preview. These lengths enable complex, prolonged interactions that meet human needs for deep engagement.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 4

But as context grows, so does the risk of self-contradiction. Users—especially those less familiar with prompt engineering or children—often issue conflicting multimodal instructions without realizing it. As dialogue turns increase and windows expand, users forget earlier constraints, leading to internal conflicts that degrade performance if the model lacks self-awareness.

To measure this degradation, the team introduced Self-Contradictory Instructions (SCI), a benchmark designed specifically to test conflict detection. SCI contains 20,000 conflicting instructions across 8 tasks, split evenly between language-language and vision-language paradigms. This dataset provides the first standardized way to evaluate how well models catch their own logical inconsistencies.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 5

The language-language paradigm focuses on textual conflicts: rule clashes, object attribute mismatches, exclusive instructions, and prohibited vocabulary violations. These are common in complex RAG pipelines where retrieved context might contradict the system prompt.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 6

The vision-language paradigm addresses multimodal conflicts, including OCR errors, chart discrepancies, geometric mismatches, and semantic contradictions. Notably, only the semantic conflict task utilizes ImageNet as a source dataset. For example, if an image shows an ostrich but the prompt asks about its size after replacing “ostrich” with “kiwi,” the model must detect the contradiction rather than blindly answering about the kiwi.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 7

Generating this data manually would be unscalable, so the authors developed AutoCreate, an automated framework using a multimodal loop of programs and LLMs. AutoCreate maintains a seed pool and iteratively generates data through two branches: a language branch (left) and a vision branch (right), each with generators and modifiers.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 7

A cleaner component filters out low-quality outputs before manual expert review. Only data passing these checks is fed back into the seed pool, ensuring continuous improvement of the benchmark’s rigor. This automated approach significantly reduces the cost and time required to build high-quality evaluation datasets for safety-critical applications.

In practice, automated dataset generation at this scale is essential; manual QA cannot keep pace with the complexity of modern multimodal edge cases.

I read through the latest findings from Shanghai Jiao Tong University regarding LLM capabilities, specifically focusing on their ability to detect contradictions within prompts. The research highlights a critical gap in how models handle conflicting instructions, which is often overlooked during standard evaluation phases.

AutoCreate Accelerates SCI Dataset Construction

The study introduces AutoCreate, a methodology designed to streamline the creation of the SCI dataset. This tool addresses the bottleneck of manual data curation by automating the generation process. According to the findings, AutoCreate significantly improves both the speed and breadth of content in SCI dataset construction. This efficiency gain allows for larger-scale testing environments without proportional increases in engineering overhead.

I think automated data pipelines reduce toil but require strict validation gates to prevent garbage-in-garbage-out scenarios.

Detecting Prompt Contradictions

The core objective was to determine if Large Language Models can reliably identify contradictory statements within a single prompt context. The researchers tested various model architectures against datasets containing intentional logical conflicts. The results suggest that while some models perform adequately, others struggle with nuanced semantic contradictions. This inconsistency poses a risk for production systems relying on complex, multi-constraint prompts where clarity is paramount.

Operationally, if your prompts contain conflicting constraints, expect non-deterministic behavior in edge cases during peak load.

Implications for Production Reliability

For platform engineers, the ability to detect contradictions isn’t just an academic exercise; it’s a reliability requirement. When prompts become complex, the likelihood of implicit contradictions increases, leading to silent failures or hallucinated outputs. The research indicates that current models are not uniformly robust in this area. We need better guardrails and pre-processing steps to sanitize inputs before they reach the inference layer.

In practice, implement input validation layers early; don’t rely on the LLM to self-correct logical errors in real-time.

The Ops Take on Cognitive Awakening Prompting

I read the Shanghai Jiao Tong University paper on LLM contradiction detection, and the core finding is stark: current models lack “cognition,” meaning they can’t evaluate instruction rationality or self-correct against internal conflicts. This isn’t a bug; it’s an architectural limitation in how these models process context without explicit meta-awareness.

I think lab demos of CAP look promising, but production pipelines need guardrails that don’t rely on model self-correction alone. Adding prompts increases latency and token cost for every inference call. We should treat contradiction detection as a pre-processing step, not an in-model capability. Relying on “cognitive awakening” is too fragile for high-stakes automation.

The researchers used the SCI dataset to stress-test large models against self-contradictory instructions. The results confirmed that while models can process information, they fail to identify inconsistencies because they lack self-awareness. To bridge this gap, they introduced Cognitive Awakening Prompting (CAP), a method that injects external cognitive capabilities via a single inserted prompt. This approach improves contradiction detection with minimal negative side effects, suggesting that multimodal models urgently need better mechanisms for handling complex instruction conflicts.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 8

For those digging into the methodology, the full details are available in the original paper.

Author Profile

The first author is Gao Jin, a doctoral student at Shanghai Jiao Tong University focusing on computer vision, multimodal large models, and AI-enabled life sciences.

Can Large Language Models Detect Contradictions in Prompts? Shanghai Jiao Tong University's Lates… — figure 9

Corresponding author Wang Dequan is a Tenure-Track Assistant Professor and doctoral supervisor at Shanghai Jiao Tong University. He holds a bachelor’s from Fudan University and a Ph.D. from UC Berkeley, advised by Professor Trevor Darrell. His work has appeared in CVPR, ICCV, ECCV, ICLR, ICML, ICRA, and IROS, accumulating over 10,000 citations and an H-index of 20 in the last five years.

Paper Link: https://arxiv.org/abs/2408.01091
Project Link: https://selfcontradiction.github.io/

— End —

Signed with this website and Toutiao