Skip to Main Content

Data Science: Free Datasets

Repositories with Free Datasets for Machine Learning

Kaggle: Anyone can freely access many datasets across a wide variety of subjects.  They simply create usernames and passwords or use their Google or Facebook account to login.

UC Irvine Machine Learning Repository: Anyone can freely access a wide variety of datasets.  They can also donate datasets.  There is a list of descriptive questions and answers to help you get to know more about a dataset.  There are also links to papers that used the dataset.

Telus International (formerly Lionbridge Datasets ML): Telus International receives data from contributors around the world and create custom datasets for machine learning applications and enhance AI systems across a variety of applications.

Figshare: This website has numerous research outputs that are made available in a shareable and discoverable way. You can browse and access datasets at no charge for personal, non-commercial use without registering.  If you would like to deposit or publish content, then you would need to register.

Google Dataset Search: When you type in a search term, Google will look for datasets in many repositories across the Internet related to your search.

Cern Open Data Portal: CERN provides access datasets generated from research free of charge.  It also disseminates accompanying software and documentation needed to analyze the data.  Some services require a fee.

DataHub: DataHub is a site where people can find data, store data, and share data.

Government Datasets for Machine Learning ...When searching for datasets, you may see information about access and use.  Many datasets are public.  Some datasets are located on third-party websites that might require registration and login to access. This site provides free access to open data from international, European Union, national, regional, local, and geo data portals.

U.S. Healthcare Data-Kaggle: There are numerous different datasets to show different components of healthcare in the U.S., including disease prevalences, pharmaceuticals and drugs, and nutritional data of foods.  The data are collected via surveys from other health agencies.  The datasets can be analyzed to review demographics and disease, drugs and their compositions, nutrition data of foods, and healthcare provider ratings.  You can register with your Google or e-mail account.

NCES: ...Most data are freely accessible, but a restricted data license is required to access Restricted-Use Data Files published in the last six months.

UK data service: ...Users will have to apply for their own usernames to access data.

Data USA: ...The datasets are presented with colorful visualizations.  Users may freely download, copy, or print contents for their own use, as long as they provide proper acknowledgement of Data USA as the source.

Finance, Economics and Consumer Sentiment Datasets for Machine Learning

Nasdaq Data Link: ...Users can access data products by creating free accounts.

World Bank Open Data: ...It covers a wide variety of topics across major sectors of development.  The documents and reports are publicly available.

International Monetary Fund Open Data: ...The data reports are freely accessible.

Multi-Domain Sentiment Dataset (Version 2.0): ...It also lists papers that used the dataset.  (Delete?)

Large Movie Review Dataset: ... (Delete?)

Twitter U.S. Airline Sentiment: ...The dataset can be accessed by creating a free account with Kaggle, using your Google account or e-mail account.

Climate Change Datasets

Berkeley Climate Change: Earth Surface Temperature Data:  You can freely access data in Kaggle by logging in with your Google account or e-mail account.  Then, you can form your own opinion about whether climate change is a big threat or a myth.

World Bank (WB) Climate Change Knowledge Portal (CCKP): ...Users can access and analyze data related to climate change and development.  Users can find global, regional, or country-level data and information about climate change.

UN International Greenhouse Emissions: ...The data contain information on emissions by sources and eliminations of sinks of several greenhouse gases between 1990 and 2017.  Anyone can access by creating a username and password for Kaggle.

SGMA Climate Change Resources: The dataset includes processed climate change datasets related to climatology, hydrology, and water operations.  The data are provided free of charge and may be copied and distributed as long as UNdata is cited as the reference.