Episode 45: How Graduate School is Failing Data Science Students

Graduate schools across the country are pumping out new batches of “data scientists” every few months. However, these young recruits are being denied a very significant piece of education that will legitimately allow them to excel in their new roles in industries across the world.

While they come out of these programs very well versed in various programming languages, visualization practices, or data engineering methodologies, they lack a holistic view of what it takes to deliver a data product from business question to final implemented data product.

This episode begins the conversation on how to change this.

Data Couture is running a Kickstarter Campaign!!!! To help support the show head over to datacouture.org/kickstarter today!

To keep up with the podcast be sure to visit our website at datacouture.org, follow us on twitter @datacouturepod, and on instagram @datacouturepodcast. And, if you’d like to help support future episodes, then consider becoming a patron at patreon.com/datacouture!

Music for the show: Foolish Game / God Don’t Work On Commission by spinmeister (c) copyright 2014 Licensed under a Creative Commons Attribution (3.0) license. http://dig.ccmixter.org/files/spinmeister/46822 Ft: Snowflake

Transcripts:

Welcome to data couture the podcast about data culture at work
at home. And on the go. I’m your host, Jordan Bohall. To stay up to date with everything data controller, be sure to like and subscribe down below. Furthermore, be sure to follow us around the internet at data to her pod on Twitter, at data couture podcast on Instagram, and at data couture pod on Facebook. Of course, if you’d like to help keep the show going, then consider becoming a patron at patreon. com
forward slash data couture.
Now, no onto the show,

Welcome data couture. I’m your host Jordan. And on today’s episode, we’re going to be talking about what it means to teach young up and coming data scientists. Given that my semester just started at Western Illinois University. I’ve been thinking quite a bit about what’s good, what’s bad, what’s lacking in terms of the common course of data science education. And so before we get to that, just another reminder, that Kickstarter campaign is about halfway through, and you can help become a backer at data couture.org forward slash Kickstarter, if you’d like to help support the show. So now, on to the show.
So welcome to the first time to this episode.

Now, some of you may remember the in a previous episode, I spoke about what it takes to actually learn data science or the data profession or how to become a bi engineer or a data engineer. And then that episode, I spoke quite a bit about all the different outlets, I suppose that are available to all of us now that the internet is so prevalent, of course, you can just go on to any one of many websites and you can learn a lot of the skill sets necessary in order to enter the data profession. Today, what I want to talk about a specifically those programs, those university programs offered as maybe a masters or a Ph. D, and data science or indecision sciences, or information systems or information management, any of these kinds of things, they they have lots of names for the different types of programs. But they all kind of offer the same curriculum, right.

And that’s kind of nice, you can go to any one of many universities and get a similar type of education. Now, specifically, what I want to talk about is where these programs are good, where they’re lacking, where they can improve, as well as perhaps the future of these types of systems. And so on this first x, let me talk about two different programs. So I’ve had the opportunity to teach for the graduate program and Information Sciences at the University of Illinois, as well as my new role teaching in the graduate program for Decision Sciences and Applied Statistics at Western Illinois University.

So what I’ve noticed is that, while the Illinois program is much larger, and it has many, many more course offerings, the Western Illinois University program, similar in its structure, both programs tend to offer courses and statistical modeling. So machine learning, predictive analytics, that kind of thing. They offer courses and Applied Statistics, which is very similar. They offer courses and visualization practices, best practices. They offer various programming language courses, such as SQL, or SPSS, or SAS, or what have you.

And generally, they they seem to have a similar level of student body, at least in the master’s program, Western does not have a PhD program necessarily for the Decision Sciences track. So what’s different about them? Well, again, the Illinois program has many, many more course offerings, but I wouldn’t say it’s necessarily a bad thing for the Western Illinois side. Why because they’re serving a different popper different subset of the population. So while both attract many students of international origin, which I think is wonderful, and offers a great perspective on the way people are doing these kinds of things across the world, Western is focused on smaller communities, specifically my community, for example, in the Quad Cities, which is on the Illinois Iowa border. And because Western is so much smaller, they have an interesting opportunity to really expand the analytic skill sets for the Quad Cities, which is desperately needed, in my view. And for those who have heard my other episodes on this topic, there’s significant need for increasing the digital literacy, the data literacy of basically the entire workforce, and my set of cities and the surrounding areas is no different. Right.

Whereas Illinois has a bit more of national and international reach. And so the populations and communities they’re serving, tend to be much, much broader. Now, what I want to talk about today is where both programs are lacking, and maybe what can be done about it, as well as the solution that I’m employing in my my own teaching, and my own seminars that I’ll continue to teach, hopefully, from Western. And that is, even though both courses offer, like I said, courses in various programming languages, modeling languages, and modeling techniques and methods, visualization practices, even storytelling practices, something that’s lacking is the actual implementation of all of these perhaps seemingly disparate areas when in fact, they’re not, they’re not really desperate, right. And so what I’m seeing in, especially in both programs is that students leave, and they enter the workforce, and they’ve never actually implemented a data product or completed a data project from start to finish.

So what I mean, implementing a data project from start to finish Well, the standard course for this kind of thing starts with a business question, right? Everything in this industry starts with trying to solve a particular question. And so in the case of data project, what happens is Oh, well, some leader or somebody in a particular business unit needs to solve a problem, for example, perhaps they’re noticing a run off of customers at a certain segment level, maybe their customers that are wealthier, perhaps in a middle wealth here, whatever it is, and they asked the question, Why are mine customers? Why are my members leaving mine organization leaving my company and what’s going on? And that’s a great starting point, right?

For any sort of analysis, well, of course, we can dig into the data. And we can say, Oh, these people are leaving, this is what they look like, we can segment them out, determine which demographics leaving. And depending on which industry, or in my case, the financial industry, we can see at least somewhat, where they’re moving that money maybe to another financial organization. And then once we have that information, we can look at those financial organizations and see one where the member or the customer is moving their data further their data, geez, they’re moving their money from, say, a type of checking account and see that, oh, this other financial institution has a checking account that has a much more favorable interest rate, for example, right. And in that way, we’ve answered the original business question. Well, that’s, that’s fine. You know, that’s certainly a level of analysis. But that doesn’t necessarily provide the sort of business insight, the sort of power that our data can really attain and achieve. And so the first piece that I am lecturing, at least in this semester, I’m teaching a course called seminar and contextual business analytics.

It’s about really digging into that business question. Great. You want to know why your members of a certain segment are leaving? Okay. Well, what are you trying to do? Right, and so the first section, the first, the first major skill set that needs to occur is one, truly understanding the business in which you’re, you’re dealing with, in this case, or in my pet example, the financial industry? Well, what I guess which segment of the business does this particular killer, leader or business unit member? What do they manage? What what are they dealing with? Right? And so you have to understand that piece of the business? And then to answer their question, why are they leaving? Well, that’s not really what they want to know, in this case, they want to maybe they have other motives, maybe they want to increase the positive share, or maybe they want to retain our membership. Or maybe they’re truly concerned about the member experience that our customers and members are experiencing, going through. And given that that might be a particular directive for the entire organization.

Their question is really, how can I align my business unit with the directive of the organization with the push of the organization, while also retaining these types of members. So once you get down to that level, the next step, of course, is to do quite a bit of data discovery. So in every industry, there going to be multiple systems of record. And the task of the data scientist or the data professional is to determine which system is the one that gives the full story for any given field or variable? And how do I how do I get ahold of it? How do I how do I use this in my eventual analysis? And so the data discovery part is key to providing a solution that will be actionable and useful for an organization.

Now, the next piece, of course, is well, how can we use it? How can we make this data these bits of data that maybe come from different systems? How can we put them into a single solution so that we can then run our analyses and traditional approach, put it into a data warehouse. And so knowing how to one, access the data, extract that data, transform that data and then load it into the warehouse in the appropriate architecture, or schema that will be useful for our data analysis is necessary step in any sort of production of a data product. So before I keep going, what I’m seeing is, and I’ve, I’ve tested this on a few different sets of students, they miss this part, they say, Oh, we get the business question. And then we mind the data and provide a potential solution. So they’re skipping a bunch of parts, right. And it’s important for them to understand and for anybody in any organization understand that once we get these business requirements, it takes a whole hell of a lot of effort, before we can present their information in a dashboard or in any other sort of implemented way for them to consume.

And so the next part, of course, is once we have it in these warehouses, or in our warehouse and usable format. Well, now it’s time to mine the data. Now it’s time to perform all the various statistical analyses and methods. So that we can see how the data that we’ve chosen the fields, the variables actually correspond and interact with the question that we’re trying to solve. And of course, part of this mining might also include data cleanup, or data normalization, or any of the other similar methods before we can then apply our statistical models or our visualization techniques. And so once we have that, then we’re able to visualize the data and start putting together some sort of dashboard or other meaningful bit of, of tech in order to communicate out what we think is actually happening with these members of these customers, and how this impacts our overall strategy for a member experience.

So once we have that, then we can start developing a story a story to tell about these members about how it’s affecting any other process across the organization, how it affects, perhaps, plans that we have to improve this or that bit of member experience. Of course, while all these pieces are happening, we have perhaps in tandem, the data science side of the house where we’re trying to model well, who which which customers are going to be next which set of customers which segment are going to be the ones who are going to exit our business. And so we can then after the various stages in the data science workflow, wrap that up into our automated dashboard, so that we can provide a plan for let’s contact these people, let’s start, let’s start being proactive so that we can prevent them from leaving, but also improve whatever is going wrong in the organization that might have caused them to leave. Right. And so once we have that, then we present it to them presented to the business owner to the person who requested this bit of insight and analysis through significant stories process.

Now, do you see how this wraps everything that I mentioned about what these programs offer, brings into one neat package, all of the programming languages, all of the statistical learning of the modeling, that they’ve learned how to do? pools and the visualization practices, it pulls in architecture practices, it pulls in data governance practices, it pulls in even soft skills, like building rapport, it builds, you know, it pulls in so many different facets of what one might consider a complete data science, education. And what I’m seeing is even like, the two programs that I’ve been encouraged, or I guess, been and I’m currently engaged with, even though they’re lacking these areas, I think this is endemic of the entire data science, education. artifice thing, the entire system that purports to be pumping out masters and data science, or perhaps the doctorates in data science there, I have yet to see a program that offers something that wraps all of thesethings together.

So let’s move on to the second section. We’ll talk about how to fix this and how it could actually be beneficial to both organizations and the community
at large. So stay tuned.

Welcome back to the show. So on the first day, more or less complained about the current state of data education, or at least the data professional education that is going on multiple universe cities around the country, let’s face it. But in this section, I want to talk about, well, how can we do something about it. And the solution is quite simple. What I just ran through for the process, the workflow for implementing a data product, well, let’s start teaching it. And frankly, that’s what I’m doing this semester, I’m going through each step, each stage of the data implementation cycle, and I’m one giving a seminar one week about whatever topic one or about each topic. And then the the next week, and I’m having my students present on a reading the covers that topic, and then I present my solution as I would do it in my own organization, and then we’re going to be discussing it for the third section of the class. Now, you know, I don’t know if it’s going to work or if it’s going to be beneficial, because, frankly, my students have they have varying degrees of technical skills, have any sort of industry experience or even have educational experience, right? And so well, you know, it’s a test, we’re going to see how it works out. And hopefully my students gain a significant bit of knowledge from in hell I, you know, with me going through all this again, like I’m hoping that I to gain quite a bit of perspective about how other people are doing it, and industry and across the world. Now, how is this interesting, or how’s this beneficial for my students?

Well, it’s beneficial, because one that connects all the dots for all the different classes and areas that they’re expected to know by the time that they leave their program. And, of course, once they leave the program, and they get jobs, hoping, you know, like, that’s the ultimate goal for these masters programs. And I hope, the best for my students, of course. So once again to the organization, they’ll be able to hit the ground running and understand, regardless of the tech stack that’s being used, well, here’s kind of how it’s done. They know how to anticipate it. And so hopefully, one of my students will be less stressed out as they start. But also, the organizations in which they become employed, will be far better off because they have people who truly understand what it means to be able to solve these problems, which means that these organizations will actually be able to see an ROI or some sort of return by hiring a data professional. So how is this interesting for people that aren’t in these programs?

Well, when you have students who are coming out of these programs, who understand all of the soft skills, as well as the technical skills, well, that makes data and automation and machine learning and AI and visualization and all these other practices, it makes it far less intimidating, because you have people who can clearly explain in extrapolate and be very clear about, well, here’s what I’m doing, here’s how we’re doing it. And here’s every step of the way, when you have something like that, then you have a lot more buy in because people are far more willing to work when they know when there’s some sort of clarity when they’re when there’s some sort of window into the black box that is often described as advanced analytics.

And when you have that, then people are far more interested, they’re excited, they want to be a part of the process. Because let’s face it, people that are outside of the data industry, they see the writing on the wall just as much as we do. They see the data and the sort of insights and future trends that we can get out of these tools and methods and resources are going to be absolutely necessary for the, you know, future of their organization for the future of their industry, and the future of their job. And so when we can get these people aligned these, let’s call them, sort of, I don’t know, let’s call them just other non data people. And when we can get them interested and excited about the processes that we’re going through, by having our students have this really in depth and broad understanding of what it means to do these sorts of projects, then we can get the entire organization on board and hopefully be able to change the culture to one that is data driven.

So and that’s how, by changing perhaps the the mindset of these programs that are pumping out people who will eventually be an industry that aren’t necessarily folks who want to stay in the academy, by giving them the tools that they actually need to be able to perform their jobs. Well, we can help both the students as well as anyone with whom those students will interact when they enter the workforce. And this is, this is what I’m testing, this is what I’m hoping will prove beneficial for my students. And of course, like this semester won’t be the only time I do it. But nevertheless, you know, as as I continue to teaching this particular course and refine my techniques, you know, I’ll be able to build, I’ll be able to be able to, I’ll be able to report back on what works, what doesn’t and how my students are faring given this bit of new information. So that’s all I have for today. But I’d love to hear your thoughts about it.

I would love to hear how you’ve attempted to change the course of data science education, or if you know anybody who has, so be sure to leave a comment down below. Be sure to subscribe so you get the latest from data couture. Till next time, have a good day. That’s it for the show.

Thank you for listening and if you liked what you’ve heard, think consider leaving a comment or like down below. Stay up to date on everything data couture, be sure to follow us on Twitter at data couture pod. consider becoming a patron patron@patreon.com forward slash data couture music for the podcast. It’s called foolish game. God don’t work on commission by the artist spin Meister used under the Creative Commons Attribution 3.0 license, writing, editing and production of the podcast is by your host Jordan Bohall.

Liked it? Take a second to support Data Couture on Patreon!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.