ASA Sections on:

Statistical Computing
Statistical Graphics

Data expo

Airline on-time performance

Have you ever been stuck in an airport because your flight was delayed or cancelled and wondered if you could have predicted it if you'd had more data? This is your chance to find out.

The data

The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. This is a large dataset: there are nearly 120 million records in total, and takes up 1.6 gigabytes of space compressed and 12 gigabytes when uncompressed. To make sure that you're not overwhelmed by the size of the data, we've provide two brief introductions to some useful tools: linux command line tools and sqlite, a simple sql database.

The challenge

The aim of the data expo is to provide a graphical summary of important features of the data set. This is intentionally vague in order to allow different entries to focus on different aspects of the data, but here are a few ideas to get you started:

  • When is the best time of day/day of week/time of year to fly to minimise delays?
  • Do older planes suffer more delays?
  • How does the number of people flying between different locations change over time?
  • How well does weather predict plane delays?
  • Can you detect cascading failures as delays in one airport create delays in others? Are there critical links in the system?

You are also welcome to work with interesting subsets: you might want to compare flight patterns before and after 9/11, or between the pair of cities that you fly between most often, or all flights to and from a major airport like Chicago (ORD). Smaller subsets may also help you to match up the data to other interesting datasets.

Your submission

To enter the competition you need to submit a poster to the data expo session at the 2009 JSM (more details to follow closer to the time). As well as a printed poster, you're also welcome to bring along your laptop to present interactive/animated components. After the JSM, we'll also organise a special journal issue (journal TBA) where you can submit a paper that describes your methodology in more detail.

The prizes

There will be first, second, and third prizes awarded to the best posters (as judged by a panel of experts). As well as the honour and glory, the first prize consists of $1000, second prize $500, and the third prize $200. These will be awarded at the Statistical Graphics and Statistical Computing Sections Mixer at JSM 2009.

Keep in touch

If you have any questions about the data or competition, please sign up to the competition mailing list. This will make sure that your questions can form a useful resource for other people struggling with the same problems. This mailing list will also be used to remind you of deadlines, and highlight changes to the website or data.

Your email:
© 2009. Email webmaster