Find Duplicate in Array —
Four approaches. One optimal.
Finding a duplicate in an array has multiple solutions — hash set O(n) space, sorting O(n log n), and Floyd's cycle detection O(1) space. Each approach reveals a different way of thinking about the problem. Watch each one animate on LearnBug and you'll understand not just the answer but why each approach works.
What is the problem?
Given an array of n+1 integers where each value is between 1 and n inclusive, find the one duplicate. By the pigeonhole principle, at least one value must repeat. The challenge is finding it efficiently — without modifying the array and without using O(n) extra space for the hardest variant.
Why visualization changes everything
Floyd's cycle detection is notoriously hard to understand from code alone. On LearnBug you watch the tortoise and hare move through the array as if it were a linked list — slow pointer advancing one step, fast advancing two — until they meet inside the cycle. The cycle entry point reveals the duplicate.
From brute force to O(1) space
Brute Force — O(n²) time
Check every pair with nested loops. For each element, scan the rest of the array. Returns the duplicate but is too slow for large inputs.
Sort First — O(n log n)
Sort the array, then scan for adjacent equal elements. Simple but modifies the array and costs O(n log n). Not allowed if array is immutable.
Hash Set — O(n) time, O(n) space
Add each element to a set. If it's already there, that's the duplicate. O(n) time and space — fast but uses extra memory.
Floyd's Cycle — O(n) time, O(1) space ✓
Treat array values as linked list pointers. Slow and fast pointers detect a cycle, then find its entry — the duplicate. Optimal: no extra space, no modification.
Floyd's cycle detection — step by step
Array: [1, 3, 4, 2, 2]. Values are used as next-index pointers. Index 0 → value 1 → index 1 → value 3 → index 3 → value 2 → index 2 → value 4 → index 4 → value 2 → index 2 (cycle!)
Phase 1 — Find intersection point
Start both at index 0. slow = nums[slow] (one step), fast = nums[nums[fast]] (two steps). Repeat until they meet.
slow: 0→1→3→2→4→2... | fast: 0→3→4→2→4→2... → meet at index 2
Phase 2 — Find cycle entry (the duplicate)
Reset slow to index 0. Keep fast at intersection. Now both move one step at a time. Where they meet is the cycle entry = the duplicate value.
slow: 0→1→3→2 | fast: 2→4→2 → meet at index 2, value = 2 ✓
Hash set and Floyd's in code
def find_duplicate_set(nums):
seen = set()
for num in nums:
if num in seen:
return num
seen.add(num)
return -1def find_duplicate_floyd(nums):
# Phase 1: find intersection inside cycle
slow = fast = nums[0]
while True:
slow = nums[slow]
fast = nums[nums[fast]]
if slow == fast:
break
# Phase 2: find cycle entry = duplicate
slow = nums[0]
while slow != fast:
slow = nums[slow]
fast = nums[fast]
return slowApproach Comparison
Frequently asked questions
Why does Floyd's algorithm work on this problem?
Because values are in range 1 to n, each value can be used as an index into the array — just like a "next pointer" in a linked list. The duplicate value means two different indices point to the same next index, creating a cycle. Floyd's cycle detection finds where that cycle begins, which is exactly the duplicate value.
What if the array has multiple duplicates?
The classic formulation (LeetCode 287) guarantees exactly one duplicate. For multiple duplicates, Floyd's doesn't directly apply. Use a hash set to collect all seen values and return a list of those seen more than once, or sort and check adjacent pairs.
Can I solve this with XOR?
XOR works for finding a missing number (XOR all indices with all values), but not for finding a duplicate when values are 1 to n and one is duplicated. XOR cancels pairs perfectly but a duplicate creates an odd count, not a missing value — the XOR approach doesn't isolate which value is duplicated.
What is the LeetCode problem for this?
LeetCode 287 — Find the Duplicate Number. Constraints: array of n+1 integers, values 1 to n, exactly one duplicate which may appear more than twice. Must not modify the array and must use only O(1) extra space — that forces Floyd's. Also related: LC 442 (Find All Duplicates), LC 448 (Find All Missing Numbers).
Is the hash set approach acceptable in interviews?
Yes — always start with the hash set approach. It's O(n) time, easy to code correctly, and shows you understand the problem. Then offer Floyd's as the optimal follow-up if the interviewer asks for O(1) space. Interviewers appreciate seeing both approaches and understanding the trade-offs.
Watch slow and fast pointers find the cycle on your array
Paste your duplicate array into LearnBug and see Floyd's algorithm detect the cycle step by step.