Episode 38: (Data Bites) Predicting Flight Delays Could Save you So Much Airport Misery

Who wouldn’t want to know three days ahead of time whether their flight was delayed or even canceled?

Today’s episode covers the many variables involved in predicting whether your flight will be delayed. At the moment these algorithms are only about 80% accurate, but, as they get more and more refined, we may never have to sit around airports waiting for our delayed flight to take off!

I know I’m excited about this potential!

To keep up with the podcast be sure to visit our website at datacouture.org, follow us on twitter @datacouturepod, and on instagram @datacouturepodcast. And, if you’d like to help support future episodes, then consider becoming a patron at patreon.com/datacouture!

Music for the show: Foolish Game / God Don’t Work On Commission by spinmeister (c) copyright 2014 Licensed under a Creative Commons Attribution (3.0) license. http://dig.ccmixter.org/files/spinmeister/46822 Ft: Snowflake


Welcome to data couture the podcast about data culture at work at home. And on the go. I’m your host, Jordan Bohall. To stay up to date with everything data controller, be sure to like and subscribe down below. Furthermore, be sure to follow us around the internet at data to her pod on Twitter,
at data couture podcast on Instagram, and at data couture pod on Facebook.
Of course, if you’d like to help keep the show going, then consider becoming a patron at patreon. com forward slash data couture. Now, no under the show,

Welcome to data couture. I’m your host Jordan. And on today’s data bytes, we’re going to be talking about flight delay predictions following up on Monday and Wednesday’s episode, which in my case wasn’t particularly needed. These flights have been very on time and very regular and very comfortable. But nevertheless, majority of cases to be able to predict whether a flight is delayed or overall canceled would be huge for most travelers. Before we get into the show. I’d like to thank BP three global as well as Lance Gibbs for putting on an absolutely fantastic conference and for their support. And you should be seeing a video they took with me for one of their ongoing series where I got to talk about the podcast what it has meant to me what I’ve learned from it, as well as where I see the digital workforce moving in the future. So again, thanks, guys, I had a great time. So now on to the show.

Welcome back, like I mentioned at first, we’re going to be talking about the problem of predicting flight delays. And so as flight delays are dependent on huge numbers of factors and variables, which include things like the weather, as well as what’s happening in various other airports, Ai, as applied to the airline industry, for the sort of prediction will be able to analyze massive data sets in real time to predict delays, and then hopefully, rebook or whatever your flight so that you don’t miss your connecting flight. Or you don’t have to show up to the airport at all, because you know, it’s not going to leave for another day or two. Now, according to a blog post from Google, these sorts of which, by the way, Google has a flight delay, predictor app, I guess now. And just word to the wary, they also say you should take these predictions with a grain of salt, because the types of variables that are dependent on these types of algorithms, they’re, they’re tricky. Just think of weather forecasting, it’s kind of a crap shot. So anyways, Google, according to Google, they say that these and its app will be able to come through lots of historical data to train the the algorithms to look for various common patterns and various late departures. And so the factors include location, whether the type of aircraft, the aircraft, whether they arrive late, the various hosts, the airline industry, the companies, what their historic pattern of arriving on time or late is, or are not sure which one that is.

Nevertheless, Google currently claims and 80% accuracy whether or not there will be a delay, and it’ll get more and more accurate as you get closer, closer to the flight. Now, I want to talk about some interesting bits about the kind of data used. And one of the biggest pieces is coming from the American government and Emily, the Bureau of transportation statistics, and this Bureau, the BTS, not the K pop band publishes every year, underlying data that tracks performance of literally every domestic and international flight by the large airline carriers. So it’s not going to help the algorithms if you happen to fly a lot of shorty pond topper type flights, like I tend to do to get out of the Midwest to a larger city. But nevertheless, it tracks these larger Eric carriers. And when downloaded into a data set, it goes all the way back to 2012. Which is amazing. My opinion, the only need three or four years. But hey, if you got a machine that can crunch and why not use as much as possible. And another note about this data set is it’s an entirely and amazingly clean. In so far, it has very few missing values, it doesn’t really have data is at the far end of the plot.

And Emily doesn’t have extreme values, so everything kind of matches what it should be expected to match. Also, it has expected fields like you at one like flight numbers, flight durations, schedule and scheduled departure and arrival times. It even has delays broken out by type, namely, weather related delays, late aircrafts, aircraft malfunctions, that kind of thing. So the obvious piece that you might want to attach to this already pretty amazing data set is weather data, you will want to divide it into various seasons, because of course the seasonal weather changes. And so you can get more and more precise whether it’s a summertime flight and winter flight, spring or fall flight, of course, that’s going to matter. I suppose unless you live in Santa Barbara, I think it only rains one day a year, it’s not the myth. Nevertheless, if you live in a normal place, then you’ll want to definitely account for seasonality. The point is there’s so much beautiful, very rich data and data sets that we can use to then go train. I believe cargo has a module for how you can create your own flight delayed training algorithms. So you should definitely check that out cowgirl com kggle.com. They’re not a sponsor or anything, but I love the competitions.

And I love the sort of community that’s being built over there. In any case, I’m really hopeful that these algorithms get more and more precise to the point where I can rely on the data from my apps or from the algorithm that I build to then tell me whether or not I really need to show up two hours early. Do I really need to go stand in the security line and then sit in the airport lounge or sit at one of the bars for two hours? I don’t want to In any case, I’d love to hear your thoughts leaving down below and I will talk to you soon. That’s it for the show. Thank you for listening and if you liked what you’ve heard, then consider leaving a comment or like down below.

To stay up to date on everything data couture, be sure to follow us on Twitter at data couture pod to consider becoming a patron@patreon.com forward slash data couture music for the podcast. It’s called foolish game. God don’t work on commission by the artist spin Meister used under the Creative Commons Attribution 3.0 license, writing, editing and production of the podcast is by your host Jordan Bohall

Liked it? Take a second to support Data Couture on Patreon!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.