Items the pipeline skipped because they were seen in a previous ingest cycle or on another platform.
Total in window: 5000
What is this? Every time the ingest timer fires, it re-fetches from your sources. Most sources return the same items between fetches (Reddit hot, RSS feeds, HN top barely change minute-to-minute). When we recognize an item we've already processed, we skip it — that's what's tracked here.
Two kinds of "already-processed" events:Re-fetched = same source returned the same item again (expected polling noise; tune schedule_seconds per source to reduce).
Cross-source = same URL surfaced by a different provider (HN + Reddit + RSS all posted the same story — usually interesting).
5000
Already-processed (window)
By provider
Top sources
Age when recognized
Cross-source: same URL across providers
Counts the "origin → duplicate provider" direction. For example,
hackernews → reddit means the URL was first surfaced by
HN and later appeared on Reddit.