3 Approach 3. Pew Internet — Pew Research Center is a non-partisan fact tank aggregating the most varied data sources. Reviews include product and user information, ratings, and a plaintext review. The Dataset. I am unable to locate a good dataset. Google acquires Kaggle in boost to data play Technology giant Google has announced the acquisition of Kaggle, a start-up that hosts a number of data scientists, for an undisclosed amount at the Cloud Next 2017 conference. Sign in Sign up Instantly share code, notes, and. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. The challenge consisted of labeling, as accurately as pos-. You also have the opportunity to create new features to improve your results. Other than being a competition platform for data science, Kaggle is also a platform for exploring datasets and creating kernels that explore insights into the data. Google BigQuery Data With Kaggle Kernels Notebook. A new class of platform, database, messaging and app services have emerged to enable the rapid delivery of cloud native apps. This accounts for users with multiple accounts or plagiarized reviews. Kaggle入門編」としてまとめていきたいと思います。 Kaggleへ挑戦する前に… 機械学習の基礎用語や初歩的な手法、数学的な理解を深めませんか?. Read honest and unbiased product reviews from our users. Sci-Tech Google buys Kaggle and its gaggle of AI geeks. The Open Data Network by Socrata offers a vast collection of datasets nicely categorized by topic on their page. Kaggle: Amazon from Space - tricks and hacks when teaching neural networks Last summer, the kaggle competition ended, which was devoted to the classification of satellite images of the Amazon forests. The challenge of the competition was to examine pairs of paintings and determine whether they were painted by the same artist. A large number of Wikipedia comments are provided which have been labeled by human raters for toxic behavior. setwd("C:\\Users\\hi\\Documents") dataset <- read. But in the online context, reviews to be identified usually have more potential authors, and normally classification algorithms. The problem has only one predictor variable, 'comment_text', which is to be labeled or classified with respect to six target variables. This post was inspired with Louis Dorard's article. We work with data providers who seek to: Democratize access to data by making it available for analysis on AWS. - Kindle edition by Manav Sehgal. I have downloaded the data set…. JMP Public featured datasets; Kaggle Datasets. The Boston Housing Dataset A Dataset derived from information collected by the U. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. world- Learn how to easily pull data directly into Tableau using data. 8 million Amazon review dataset available to download here. In this project, our aim is to contextualize customer data and predict the likelihood a user will stay at 100 different hotel groups. The Korean Question Answering Dataset; Dataset Finders. Last but not least, we need to create a cursor object (line 9) to interact and execute the commands on the Postgres database. 5 billion clicks dataset available for benchmarking and testing; Over 5,000,000 financial, economic and social datasets. Online shopping is all over the internet. Get started with Amazon SageMaker. By using kaggle, you agree to our use of cookies. Consumer reviews of Amazon products - dataset by datafiniti Feedback. gated is the Kaggle data-set from planet lab: "Planet: Understanding the Ama- zon from Space". Kaggle has it all. If you're not familiar, BigQuery makes it very easy to query Terabytes amounts of data in seconds. (For more resources related to this topic, see here. between main product categories in an e­commerce dataset. What dataset did you analyze? The data used in this project is found in Kaggle's Amazon Employee Access Challenge. In this video we will understand how we can implement Diabetes Prediction using Machine Learning. Kaggle Competition. These dataset below contain reviews from Rotten Tomatoes, Amazon, TripAdvisor, Yelp, Edmunds. The challenge has two tracks: 1. Here are 101 data science interview questions with responses and suggestions from large tech companies like Amazon, Google, and Microsoft. Most Kaggle competitions are focused on model fitting: Participants are given a well-defined problem, a dataset, and a measure to optimise, and they compete to produce the most accurate model. world helps us bring the power of data to journalists at all technical skill levels and foster data journalism at resource-strapped newsrooms large and small. In Kaggle competitions, overspecialisation (without overfitting) is a good thing. The images in this dataset cover large pose variations and background clutter. The service offers a simple workflow but lacks model selection features and has slow execution times. ProductId - unique identifier for the product 3. Reviews include product and user information, ratings, and a plaintext review. This dataset is a manual annotatation of a subset of RCV1 (Reuters Corpus Volume 1). Kaggle competitions vs Real world Exercise: Apply GBDT and RF to Amazon reviews dataset. In CSE-CIC-IDS2018 dataset, we use the notion of profiles to generate datasets in a systematic manner, which will contain detailed descriptions of intrusions and abstract distribution models for applications, protocols, or lower level network entities. The system analyzes sentiments, opinions and emotions, extracts sentiment targets: entities, topics and their aspects/features, and handles comparative sentences and conditional sentences. To download the MNIST dataset, copy and paste the following code into the notebook and run it: The code does the following: Downloads the MNIST dataset (mnist. Booz Allen Hamilton & Kaggle Release: Company Convene Data Scientists, Medical Community To Improve Cancer Screening Using Artificial Intelligence Through $1 Million Competition - read this article along with other careers information, tips and advice on BioSpace. Amazon will give you some datasets and analysis, so you can see what's possible. The Rawah and Comanche Peak areas would tend to be more typical of the overall dataset than either the Neota or Cache la Poudre, due to their assortment of tree species and range of predictive variable values (elevation, etc. Kaggle — A data science community who regularly shares datasets about the most varied topics and categories, including the complete FIFA19 player dataset, wine reviews, or chest X-ray images. - Kindle edition by Manav Sehgal. Reviews include. Amazon Customer Reviews (a. You can also analyze the data in the cloud using EC2 and Hadoop via EMR. world helps us bring the power of data to journalists at all technical skill levels and foster data journalism at resource-strapped newsrooms large and small. Model Stacking - H20. 10,177 number of identities,. オープンデータセット(Open Data Sets) 橋本洋志 ( 創造技術専攻 , 産業技術大学院大学 )による講義「データサイエンス特論」または著書「データサイエンス教本(左欄の正誤表をご覧ください)」で用いるデータセット,これを次のように分類して掲載. Data Set Information: dataset are derived from the customers’ reviews in Amazon Commerce Website for authorship identification. We ran the Kaggle Red Wine Quality dataset untouched through the Amazon machine learning regression algorithm. Other Amazon Product Review datasets. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon. Discover what’s changed and get in touch to give us your feedback. 8 million Amazon review dataset available to download here. Kaggle competition solutions. Here, you’ll find a grab bag of topics. com's datasets gallery is the best place to explore, sell and buy datasets at BigML. PyTorch CNN Finetune suite for Kaggle competition - Planet: Understanding the Amazon from Space. One obvious limitation is inherent in the kNN implementation of several R packages. Finally, submit the results at Kaggle for test data. Hello All, In today's tutorial we will apply 5 different machine learning algorithms to predict house sale prices using the Ames Housing Data. Once we connected to our database, it's time to add some data. 10,177 number of identities,. Stanford Large Network Dataset Collection. Google Gearing Up Against Microsoft and Amazon. Kaggle Dataset. 1 Data preprocessing The Amazon Food Review dataset has 568, 454 samples. Below are links to collections of datasets that may be of use for homework assignments or projects. KONECT, the Koblenz Network Collection, with large network datasets of all types in order to perform research in the area of network mining. The dataset con- sists of over 150K 256 x 256 image tiles labelled with at least one of 17 classes. 2 million Amazon reviews of products in the Electronics section, I found some interesting statistical trends; some are intuitive and obvious, but others give insight to how Amazon's review system actually works. The AWS Public Dataset Program covers the cost of storage for publicly available high-value cloud-optimized datasets. The goal is to provide not just one recommendation but to rank the predictions and return the top five most likely hotel clusters for each user’s. kaggle datasets list You can also search for datasets by adding the -s tag and then the search term you're interested in. Amazon Customer Reviews (a. The SFPD Incidents dataset includes crime incidents in San Francisco from 1/1/2003 to 1/17/2017 (at time of analysis). Flexible Data Ingestion. 50 free datasets for Data Science projects 50+ free datasets Here are top 50 websites to gather datasets to use for your data science projects in R, Python, SAS, Excel or other programming language or statistical software. The Functional Map of the World (fMoW) Challenge seeks to foster breakthroughs in the automated analysis of overhead imagery by harnessing the collective power of the global data science and machine learning communities. Sign in Sign up Instantly share code, notes, and. Customer Support on Twitter: This dataset on Kaggle includes over 3 million tweets and replies from the biggest brands on Twitter. The Korean Question Answering Dataset; Dataset Finders. Lots of years. Common Crawl - Massive dataset of billions of pages scraped from the web. There’s an interesting target column to make predictions for. 13 million reviews) Finally, the following file removes duplicates more aggressively, removing duplicates even if they are written by different users. gz) from the deeplearning. as well as the Kaggle. Test data set, as created from the above process, will contain 75% of randomly selected observations. edu Abstract This paper documents our team's approach to the Kag-gle Competition: Understanding the Amazon from Space. I am looking for some large public datasets, in particular: Large sample web server logs that have been anonymized. Where can I find good data sets for text summarization? Further Reading. My second experience - Kaggle. Let's explore how Amazon Machine Learning performs with a mulitclass classification dataset. As well as charging companies they work with (including Amazon, Facebook, Microsoft and Wikipedia) up to $300 per hour for consultancy work, the company organises competitions – which is where the gamification comes in. Kaggle now offers the chance to hook up customer with successful data scientists to identify and solve tricky problems. gov/ https://github. Each dataset is a small community where you can have a discussion about data, find some public. Other than being a competition platform for data science, Kaggle is also a platform for exploring datasets and creating kernels that explore insights into the data. Remember, to import CSV files into Tableau, select the “Text File” option (not Excel). Speaker: Zhenhao is an application analyst at DHL Express. Here, you’ll find a grab bag of topics. ? What is the Secret of Academic Success? 2. I am planning to create an Analytics platform for a Retail store for my academic coursework. 50 free datasets for Data Science projects 50+ free datasets Here are top 50 websites to gather datasets to use for your data science projects in R, Python, SAS, Excel or other programming language or statistical software. com/caesar0301. You are typically given a cleaned dataset, which makes it hard to demonstrate the full data science skill-set - from data munging through to analysis and model-building to results and conclusions. 1 Dataset 3. Exploring the amazon fine food reviews data set from kaggle - Kushagra8888/amazon-dataset-exploration. This data span is a period of more than 10 years, including approximately 500k Customer reviews up to October 2012. Moreover, some content-based information is given (`Book-Title`, `Book-Author`, `Year-Of-Publication`, `Publisher`), obtained from Amazon Web Services. com's datasets gallery is the best place to explore, sell and buy datasets at BigML. We will simplify the dataset and only consider the user training data which is composed of features such as: gender, age, affiliate, browser, date of registration, etc. Given so much of a data scientist's time is actually spent extracting, cleaning,. It is a great alternative to the popular but older Boston Housing dataset. Organized by the Kaggle platform for data science competitions, the challenge was to track the human footprint in the Amazon rainforest by distinguishing. Large data sets mostly from finance and economics that could also be applicable in related fields studying the human condition: World Bank Data. Books are identified by their respective ISBN. See our updated (2018) version of the Amazon data here New!: Repository of Recommender Systems Datasets. CRITEO LABS DATA TERMS OF USE. showed that this was a challenging data set to analyze on. And a weight should be associated with each such relationship. I have downloaded the data set…. Analyzing the dataset of 1. Google acquires Kaggle in boost to data play Technology giant Google has announced the acquisition of Kaggle, a start-up that hosts a number of data scientists, for an undisclosed amount at the. The SageMaker is a fully managed service for machine learning. Kaggle is an open community where top data scientists can solve complex business problems and learn the latest techniques. Amazon: Another large name with an equally-impressive reputation. The data might be weird, and you might experience. Teacher Jeremy Howard uses the Understanding the Amazon from Space Kaggle competition for teaching purposes, and sets homework to try other similar image classification competitions. 2 million Amazon reviews of products in the Electronics section, I found some interesting statistical trends; some are intuitive and obvious, but others give insight to how Amazon's review system actually works. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. For this example, we look at. Performance wise: Amazon’s Machine Learning (AML) clearly produced a better result than my best model, which scored better than half of the accepted submissions on Kaggle. 13 million reviews) Finally, the following file removes duplicates more aggressively, removing duplicates even if they are written by different users. Amazon Dataset contains data collected from different fields such as Public Transport, Ecological Resources, and Satellite Images, and they are stored in Amazon Web Services (AWS). By clicking the “I agree” button, You accept and agree. Join us to compete, collaborate, learn, and do your data science work. Few years back, I published 10+ apps/games for Windows Phone, Android & iOS with 300K+ customers and featured by Microsoft in 150+ countries. Datasets | Kaggle. Flexible Data Ingestion. It has been used for sentiment analysis and product feature extraction. It is the Bond pricing data set from Kaggle. These datasets would appeal to you, irrespective of the fact whether you are a newbie or a pro. between main product categories in an e­commerce dataset. Introducing the Ames Housing dataset. Which offers a wide range of real-world data science problems to challenge each and every data scientist in the world. In addition, we also use datasets from Kaggle Competitions, because the public leaderboards on Kaggle allow students to test their models against the best in the world (the Kaggle datasets are not listed here). In trying to learn more about this problem I searched far and wide, and cataloged just a sliver of the datasets I found. The challenge has two tracks: 1. Yaroslav Bulatov said Train on the whole "dirty" dataset, evaluate on the whole "clean" dataset. Filtering the dataset only on incidents which resulted in arrests (since most incidents are trivial) leaves a dataset of 634,299 arrests total. I am unable to locate a good dataset. If you're not familiar, BigQuery makes it very easy to query Terabytes amounts of data in seconds. So developers can focus on training their models (the grey part in the following diagram). This dataset consists of reviews of fine foods from Amazon. K-Fold Cross validation: Random Forest vs GBM from Wallace Campbell on Vimeo. The available datasets are as follows:. Dataset Gallery: Media, Marketing & Advertising | BigML. If you decide to build a model like. I followed this link Using kaggle datasets into Google Colab. Stanford Large Network Dataset Collection. More than 800,000 data experts use Kaggle to explore, analyse and understand the latest. They range from the vast (looking at you, Kaggle) to the highly specific, such as financial news or Amazon product datasets. Overall, this represents over 50 GB of data — far more than the RAM I have on my computer (more on that later). Flexible Data Ingestion. Kaggle Display Advertising Challenge Dataset. • Kaggle: This dataset takes up around 57 MB of disk space and contains 13,000 rows and 20 columns of data. A new class of platform, database, messaging and app services have emerged to enable the rapid delivery of cloud native apps. Success in Kaggle is a combination of many things like Machine Learning experience, type of competitions and your ability to work in a team. 8 Academic Torrents 55. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. CSV, HTML and Flat Files), including establishing fully automated processes from data extraction to report generation. Which offers a wide range of real-world data science problems to challenge each and every data scientist in the world. In general, the Kaggle community is extremely creative and very non-trivial solutions are born as a result of tough competition. Flexible Data Ingestion. learning (Resnet) on a labeled dataset. What Kaggle taught us about predictive analytics. Amazon review Sentiment Analysis using TextBlob. The average length of the reviews comes close to 230 characters. The bin images in this dataset are captured as robot units carry pods as part of normal Amazon Fulfillment Center operations. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. Open Data Network. Feel free to list competion data sets Data journalism examples Tutorial datasets from different analytics tools * Small data is data that is small enough size for human comprehension. The obtained results are compared with other predictive APIs from Amazon, Google, PredicSis and BigML. The code is provided in the Amazon Fine Food Reviews. Read honest and unbiased product reviews from our users. You can also analyze the data in the cloud using EC2 and Hadoop via EMR. แนะนำ 5 ชุดข้อมูลน่าสนใจจากขุมทรัพย์ข้อมูล Kaggle Datasets. The MNIST Dataset of Handwitten Digits In the machine learning community common data sets have emerged. These data sets are freely hosted and accessible to everyone. Instructors of statistics & machine learning programs use movie data instead of dryer & more esoteric data sets to explain key concepts. Product Reviews) is one of Amazon's iconic products. Exercise: Apply GBDT and RF to Amazon reviews dataset. So, we're aggressively grabbing market share. Exploring the amazon fine food reviews data set from kaggle - Kushagra8888/amazon-dataset-exploration. By using kaggle, you agree to our use of cookies. All our needs are just a click away. Synopsis: Zhenhao will be sharing his learning journey in machine learning with Amazon's Employee Access Challenge dataset on Kaggle. 8 Academic Torrents 55. In this dataset, about 40% of all users have not made any bookings. com BigML is working hard to support a wide range of browsers. The challenge will publish one of the largest publicly available satellite-image datasets to date, with more than one million. Each tile covers a ground-sample distance of 3. Booz Allen Hamilton & Kaggle Release: Company Convene Data Scientists, Medical Community To Improve Cancer Screening Using Artificial Intelligence Through $1 Million Competition - read this article along with other careers information, tips and advice on BioSpace. One should have tried a few beginner’s problems before getting into the advanced problems. Training data set will contain the rest 25% obesvations (in original training set) which are exluded by newly created test data set. The dataset size for an image classification problem was relatively small, so we were always worried that overfitting could be a problem. 1 INTRODUCTION. If you're not familiar, BigQuery makes it very easy to query Terabytes amounts of data in seconds. Data Set Information: This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. World Bank. edu Steven Qian scqian@stanford. The task associated with the data is to predict how many comments the post will receive. Download clean datasets from Kaggle: Amazon FBA Is NOT FOR EVERYONE How to import data sets in Google Colab directly from kaggle competition - Duration:. • Image has more than 3 channels (RGB) called bands. Each example includes the type, name of the product as well as the text review and the rating of the product. Text Datasets Used in Research on Wikipedia; Datasets: What are the major text corpora used by computational linguists and natural language processing researchers?. If you have any questions regarding the challenge, feel free to contact dataset@yelp. First, Amazon SageMaker. Machine Learning and Kaggle Digit Recognizer Competition This is a rather introductory article to Machine Learning and using one of the freely available libraries to predict a value of some entity using classification mechanism. 52268 reviews have a score of 1, 29769 reviews have a score of 2, 42640 reviews have a score of 3, 80655 reviews have a score of 4, and 363122 reviews have a score of 5. The dataset con- sists of over 150K 256 x 256 image tiles labelled with at least one of 17 classes. In trying to do my capstone for the coding bootcamp I'm doing, I found a number of cool data sets which I thought I should share. Flexible Data Ingestion. Most of the datasets are highly unbalanced, so we balance the datasets to have an equal number of both classes. With more than 0. Google F1 Server Reading Summary; TensorFlow Implementation of "A Neural Algorithm of Artistic Style". The dataset included a training dataset (40,000 labeled images) and test dataset — unlabeled images to be submitted and scored on competition website. Download Open Datasets on 1000s of Projects + Share Projects on One Platform. CRITEO LABS DATA TERMS OF USE. For instance, Kaggle Kernels is a source code which analyzes data sets, and thereafter, developers can share the code on the platform. It will already be there. In the hope that others might find this catalog useful, here’s 20 weird and wonderful datasets you could (perhaps) use in machine learning. My first Kaggle challenge : the Avazu CTR contest – Part 2 Introduction. Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. These datasets are used for machine-learning research and have been cited in peer-reviewed academic journals. When I decided to work on Sentiment Analysis, Amazon fine food review (Kaggle project) was quite interesting , as it gives us a good introduction to Text Analysis. 8 million Amazon review dataset available to download here. com BigML is working hard to support a wide range of browsers. 01/19/2018; 14 minutes to read +7; In this article. Other Amazon Product Review datasets. 8 Academic Torrents 55. Kaggle入門編」としてまとめていきたいと思います。 Kaggleへ挑戦する前に… 機械学習の基礎用語や初歩的な手法、数学的な理解を深めませんか?. Available at Amazon product reviews dataset. 10 R Packages to Win Kaggle Competitions by Xavier Conort Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. Kaggle 竞赛和「经典的」数据科学有一些重要的不同之处,但只要你以正确的心态接触它,就也能收获有价值的经验。 让我们解释一下: Kaggle 竞赛. Building a gold standard corpus is seriously hard work. You can try scraping prices from one or more eCommerce websites, amazon maybe. and these are of course just a few examples that I could come up with, and one can come up with even more interesting things. Kaggle is the world’s largest data science community. This is Zillow’s estimation as to the value of a home. By clicking the “I agree” button, You accept and agree. We ran the Kaggle Red Wine Quality dataset untouched through the Amazon machine learning regression algorithm. Not all of us can afford (or fit) a super computer in our bedroom. 50 free datasets for Data Science projects 50+ free datasets Here are top 50 websites to gather datasets to use for your data science projects in R, Python, SAS, Excel or other programming language or statistical software. First, Amazon SageMaker. kaggle datasets list You can also search for datasets by adding the -s tag and then the search term you're interested in. All our needs are just a click away. The images in this dataset cover large pose variations and background clutter. What dataset did you analyze? The data used in this project is found in Kaggle's Amazon Employee Access Challenge. The Big Mac index (by the Economist) data. These datasets are used for machine-learning research and have been cited in peer-reviewed academic journals. Stay tuned for more challenges. Tagged datasets for named entity recognition tasks. So I am taking this data set from one of my favorite book Collective Intelligence book which was written by Toby Segaran. Which offers a wide range of real-world data science problems to challenge each and every data scientist in the world. Evaluating linear regression Amazon ML uses the standard metric RMSE for linear regression. Well, we've done that for you right here. I followed this link Using kaggle datasets into Google Colab. Most of the datasets are highly unbalanced, so we balance the datasets to have an equal number of both classes. In today's blog post, I interview David Austin, who, with his teammate, Weimin Wang, took home 1st place (and $25,000) in Kaggle's Iceberg Classifier Challenge. All gists Back to GitHub. Can you review the code and tell why there is such a big difference between cross validation accuracy and test accuracy? Conceptually is there anything wrong with the below code?. 1 INTRODUCTION. Where can I find good data sets for text summarization? Further Reading. We ran the Kaggle Red Wine Quality dataset through the Amazon machine learning regression algorithms in the last post. These datasets would appeal to you, irrespective of the fact whether you are a newbie or a pro. Amazon's or Overstock. It includes product and user information, ratings. I followed this link Using kaggle datasets into Google Colab. Here’s the Kaggle catch, these competitions not only make you think out of the box, but also offers a handsome prize money. 5m for a Kaggle competition,. This helps determine choice of model algorithms and strategies that may work best on the dataset. Kaggle is an open community where top data scientists can solve complex business problems and learn the latest techniques. Please feel free to add any I may have missed out. Amazon Web Services (AWS) datasets - Amazon provides a few big datasets, which can be used on their platform or on your local computers. If you use this data, please cite (Jindal and Liu, WSDM-2008). Note that these data are distributed as. Lots of years. See our updated (2018) version of the Amazon data here New!: Repository of Recommender Systems Datasets. Kaggle is the world’s largest data science community. This post was inspired with Louis Dorard's article. com website. This dataset is part of an ongoing Kaggle competition which challenges you to predict the final price of each home. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. New Dataset. Download it once and read it on your Kindle device, PC, phones or tablets. Download Open Datasets on 1000s of Projects + Share Projects on One Platform. New Dataset. Note: this dataset contains potential duplicates, due to products whose reviews Amazon. of the data set we would have seen more actual innovation. The datasets are meant to be used strictly for the purposes of the class project and nothing else. The Rawah and Comanche Peak areas would tend to be more typical of the overall dataset than either the Neota or Cache la Poudre, due to their assortment of tree species and range of predictive variable values (elevation, etc. This dataset has 34660 data points in total. The Amazon Bin Image Dataset contains over 500,000 images and metadata from bins of a pod in an operating Amazon Fulfillment Center. By clicking the “I agree” button, You accept and agree. Your Home for Data Science. Find helpful customer reviews and review ratings for Mining of Massive Datasets at Amazon. Google Cloud Public Datasets provide a playground for those new to big data and data analysis and offers a powerful data repository of more than 100 public datasets from different industries, allowing you to join these with your own to produce new insights. From the dataset website: "Million continuous ratings (-10. In the hope that others might find this catalog useful, here’s 20 weird and wonderful datasets you could (perhaps) use in machine learning. So developers can focus on training their models (the grey part in the following diagram). As well as charging companies they work with (including Amazon, Facebook, Microsoft and Wikipedia) up to $300 per hour for consultancy work, the company organises competitions – which is where the gamification comes in. Public: This dataset is intended for public access and use. An interactive deep learning book with code, math, and discussions Based on the NumPy interface The contents are under revision. Kaggle datasets: 13,321 themed datasets on "Facebook for data people" Kaggle, a place to go for data scientists who want to refine their knowledge and maybe participate in machine learning competitions, also has a dataset collection.