-
The Pushshift Reddit Dataset, And it doesn't do anything about images or videos. In addition to monthly dumps, Pushshift provides computational tools to aid in searching, aggregat-ing, and performing exploratory analysis on the entirety of the dataset. These are from the pushshift dumps from 2005-06 to 2023-12 which can be found here These are zstandard compressed ndjson files. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds. I didn't check the actual Pushshift data though. Example python scripts for parsing the data can be found here If you have questions, please reply to this reddit post or DM u/Watchful on reddit or respond to this post , Info Hash: 56aa49f9653ba545f48df2e33679f014d2829c10 The Pushshift Reddit Dataset in user-created subreddits. 7 million active subreddits/communities (as reported in research using the Pushshift/Reddit datasets and community counts) Statistic 18 A 2022 peer-reviewed study measured that the average subreddit size distribution on Reddit is long-tailed, with a median community size of under 1,000 subscribers A distributed system for sharing enormous datasets - for researchers, by researchers. pushshift. Pushshift is a social media data collection, analysis, and archiving platform that since 2015 has collected Reddit data and made it available to researchers. Pushshift’s Reddit dataset is updated in real-time, and includes historical data back to Reddit’s inception. See the full list here! Jun 20, 2021 · TL;DR: The Pushshift Reddit dataset makes it possible for social media researchers to reduce time spent in the data collection, cleaning, and storage phases of their projects. Jan 23, 2020 · In this paper, we present the Pushshift Reddit dataset. We’re on a journey to advance and democratize artificial intelligence through open source and open science. It circumvents restrictive API access by aggregating data through alternative scraping methods, addressing sampling biases and data-access bottlenecks. This RESTful API gives full functionality for searching Welcome! This repository explores the Pushshift Reddit Dataset, one of the most comprehensive, large-scale datasets available for analyzing online discourse, community behavior, and social trends on Reddit. The pushshift. io Reddit API was designed and created by the /r/datasets mod team to help provide enhanced functionality and search capabilities for searching Reddit comments and submissions. Researchers leverage this dataset to examine social trends, sentiment, and community dynamics while Historical data torrents all in one place (including 2023-03) Feb 13, 2026 · In a 2023 dataset analysis, Reddit contained 2. Its documentation provides a deep dive into querying comments and submissions, with a heavy focus on search and data aggregation. io. Reddit announced plans to restrict access to the Reddit API and cut off access to Pushshift, a data resource used by communities, journalists, and academics worldwide. Example python scripts for parsing the data can be found here If you have questions, please reply to this reddit post or DM u/Watchful on reddit or respond to this post , Info Hash: 3e3f64dee22dc304cdd2546254ca1f8e8ae542b4 Statistics contain aggregate information from the pushshift and arctic shift datasets: date of earliest post & comment, number of posts & comments and when that data was last updated. We are organizing a letter on behalf of moderators and researchers to collect information and signatures to share in an open letter with Reddit's admin. At least redditsearch. These are from the pushshift dumps from 2005-06 to 2025-12 which can be found here These are zstandard compressed ndjson files. Pushshift Reddit Dataset is a comprehensive archive of Reddit posts and comments that enables large-scale analysis in the post-API era. Explore the history of deleted communities and content moderation evolution. The Pushshift dataset only contains the actual posts and comments. . io, backed by that dataset, doesn't return any newer results. Pushshift's Reddit dataset is updated in real-time, and includes historical data back to Reddit's inception. Mar 28, 2026 · Launched by the team at /r/datasets, the Pushshift Reddit API has become a cornerstone tool for developers hungry for historical data. “The front page of the Internet” — now available in billions of comments and posts. The Pushshift Reddit dataset makes it possible for so-cial media researchers to reduce time spent in the data collec-tion, cleaning, and storage phases of their projects. Jan 23, 2020 · We’re on a journey to advance and democratize artificial intelligence through open source and open science. The project lead, /u/stuck_in_the_matrix, is the maintainer of the Reddit comment and submissions archives located at https://files. Due to its immense popularity, Reddit is geared more towards entertaining fellow users rather than helping; it is quite often the case that witty, sarcastic comments will obtain more votes than serious, inf Apr 10, 2026 · Access the ultimate banned Reddit subs archive. Jun 1, 2020 · In this paper, we present the Pushshift Reddit dataset. tz, 0yt, 6i5c, q8fph, yfhq, qg, qm, j6hk, p86dfei, rh0w,