The “Data Wrangling Tests” are a collection of data science challenges suitable for testing out new algorithms, ideas, and opinions. It is primarily intended for the use of researchers at the Alan Turing Institute but everyone is welcome to make use of it.
We are hoping to create a set of challenges which include all the day-to-day problems that beset the typical data scientist as well as more sophisticated tasks. To that end, we welcome submissions. At the moment submissions must include data that can be shared without restrictions. That may change.
It is not intended that the resources posted here must necessarily be new. Many are republications of existing datasets.
This service is currently in beta.
6 Challenges
-
58 consecutive days of de-identified event data collected from five sources within Los Alamos National Laboratory’s corporate, internal computer network.
-
The numbers of UK e-petition signatories and their respective postcodes, dated from October 2012 - March 2015.
-
Quasi-global climatology measurements, recorded from 1981 to 2016.
-
Several physiological measurements on fifteen patients in neonatal intensive care over 24 hours.
-
Several hundred thousand files, each containing a summary of a single web page request to one of many websites. The summary consists of a timestamp and a direction for each packet.
-
Outcomes data (risk-adjusted, in-hospital, post-surgery survival rates) for cardiac surgeons operating in the UK.