Google recently released datasetsearch, a free tool for searching 25 million publicly available datasets. The search tool includes filters to limit results based on their license (free or paid), format (csv, images, etc), and update time. The results also include descriptions of the dataset’s contents as well as author citations. Google’s dataset aggregation methodology differs from other dataset repositories like Amazon’s open data registry. Unlike other repositories that curate and host the datasets themselves, Google does not curate or provide direct access to the 25 million datasets directly.

Instead, Google relies on the dataset publishers to use the open standards of to describe their dataset’s metadata. Google then indexes and makes that metadata searchable across publishers. Since publishers are still required to host the datasets themselves, for-profit publishers that conform to standards will also have their datasets indexed by Google. In my anecdotal experience, I found about half of the datasets in the search results were from for-profit aggregators, with an even higher percentage when searching for market-related datasets.

Source info