On this inaugural episode of Implement This! we talk about a real life example of how Jordan has implemented a machine learning model for clustering customers into particular classifications!
Stay tuned each week for how to do these kinds of technical things at your own organization!
To keep up with the podcast be sure to visit our website at datacouture.org, follow us on twitter @datacouturepod, and on instagram @datacouturepodcast. And, if you’d like to help support future episodes, then consider becoming a patron at patreon.com/datacouture!
Welcome to data tour the podcast covering data culture at work at home and on the go. I’m your host, Jordan Bohall. If you’d like to stay up to date with all things, data and data couture, then head over to our social media pages. If you’d like to help support the show, and check out our Patreon email@example.com, forward slash data couture, know under the show.
Welcome to data couture, I’m your host Jordan, and on today’s episode of implement this, which is the new title for the Friday show, we’re going to be talking about implementing a machine learning or any other kind of data science algorithm into some sort of production system, of course, is following up on Mondays and Wednesdays episodes where we talked about the difference between AI machine learning and predictive analytics, as well as talked about a specific use case for deep learning. Or I should say, artificial intelligence with a focus on deep learning on image recognition. any case, we’re going to be talking about a real life example in my my own life, about how we’ve gone about implementing machine learning, as well as some of the challenges and drawbacks to doing it the way that I’ve gone about it. So let’s get into this.
Alright, so welcome to the first section, we’re going to be talking about how to implement a machine learning algorithm work. And so recap a little bit. Any sort of artificial intelligence usually requires a boatload of data, usually in the Big Data realm. Whereas before machine learning is a bit more refined, I guess you can use structured data, you don’t have to have a ton of it, you’d have to be in the Big Data space in order to have a good result from your machine learning models. And so, you know, I work at a credit union, as many of you probably know, by this point. And even though we have very enviable data sets, compared to many other industries, you know, we don’t have a ton of it, like my particular credit union, for example, you know, we don’t have millions of customers, we don’t have, you know, hundreds or thousands of data points on them. But we do have quite a bit of very good data available. Of course, there are the challenges with bad data. But let’s face it, if you think that your organization has only good quality, clean data floating around, and its warehouses or its servers, either somebody lied to you, or you probably don’t know what you’re talking about. So I’m not too concerned about the bait, the bad data that we happen to have, because it is manageable type of thing. So the example I want to talk about is implementing a pretty straightforward classification algorithm, you know, I’m trying, in this case, I was trying to classify a certain segment of our customers into three or four different groups. And the purpose was to reach out to them for various business reasons, you know, pretty straightforward, day in day out kind of machine learning applications. So I wrote a K means model based on a data set with hundreds of thousands of rows with I think, what was it 32 or 34, different variables, and specifically, I wrote this algorithm in the our programming language. And, you know, there are lots of reasons to choose Python over our, which again, I’ll get to in future episodes, why one might choose a certain language over another, but in this case, going with our and that’s because we are a.net stack at my credit union. So we are in the Microsoft environment for the most part. And with the.net stack, we have a sequel server for our data warehouse, as well as we have Power BI as our visualization tools. So of course, given that we’re a financial institution, we don’t, we don’t really like cloud computing too much. Of course, we’re dipping our toes into it as our every other financial institution. But we’re being very careful, because the sort of member data that we have, the sort of customer data we have is very personal, it’s very private. And we want to make sure that our customers data is secured, and it’s treated, and handled ethically. So we’re being very careful about that. However, as I’ll discuss in a few minutes, that is actually a problem that I can’t be in the cloud, but nevertheless, so you know, we’re done that stack dot net shop, I should say, we have a sequel server for our warehouse, and we have Power BI report server, or an on prem solution for Power BI, living within my team server. And then on top of that, we have machine learning server. Because again, we’re not even though we’re a Microsoft company, we’re not using as your as your Machine Learning Studio, we’re using the machine and the machine learning server on our particular box that we have all our stuff on. Right. So that’s great. And, you know, the obvious thing to do for implementing this algorithm that classifies or segments customers and to certain groups is to put into the machine learning server, now treated as an object, pick it up with Power BI, and, you know, visualize the results. Oh, that would be the normal implementation for machine learning using our but there’s, of course, another way to do that, especially since we’re using Power BI. And that is, Power BI supports the our programming language. And so you might think that, oh, I can just drop my our script into the Power BI.
Or they call it, I guess, the individualization button. Nevertheless, it opens up a, our script window, and you can drop your script into that, and then just treat the view that I’ve built on the warehouse as any other data set or data source and then run the algorithm there, right? Well, the problem that I’ve found is that by doing so it’s purely diminishes the operational efficiency of Power BI. So that means that whenever I go to push this particular dashboard into production, it takes forever for any data or any visualization to show up to the end user. So that’s not a particularly useful way to implement the solution that I’m trying to achieve to help many teams across the organization reach out to the customers that we care to reach too, so that we can perform various business needs with them, right. Okay, so next choice is, of course, to throw my art script into the machine learning server. And then I can treat it as an object, pick it up as a data source through Power BI, Bob’s your uncle, we can push that out into production for our end users. Well, here’s the challenge I faced. The machine learning server that we’re using is running our version 3.4, point three, something like that, right, most of these packages that I use, you know, your common on packages, which I won’t get into detail here. We’ll talk about that again, in another week. Nevertheless, the package that I the packages that I’m using aren’t supported by this version of machine learning server, which means that there’s time, it’s now time to upgrade from three, four, whatever it is up to the latest, which I mean, somebody remind me if I’m wrong, but I think ours latest version is like three, six, or three, seven, I can’t remember. Nevertheless, anything above like 354 will support all the packages that I use. So you know, I have this problem, what to do, well, update machine learning server, which of course requires it helps, so we can get the correct provisioning and the other bits, which takes time. So I’m kind of in the stuck loop. But it’s, again, something that I’m guessing most people face. I’m not surprised. I wouldn’t be surprised if I were to hear this from any other organization. Right. And, and in fact, I raised it, there’s something called the credit union Power BI users group, and I’m a member on the steering on the steering committee, I think that’s what they call it. Yeah. And you know, people aren’t necessarily in the credit union space ready for that kind of talking point. But they’ll be there soon. So hopefully, I can find a solution more quickly, so that when they do face it, I can be a source of knowledge for them. Now, you might be asking, but Jordan, why don’t you just use Azure, that would solve all of your problems? Hey, great solution or great suggestion, random person, I can tell you for a fact. Because of course, I have my own personal Azure instance, or I guess, as your suite that I use for teaching and for my own projects, of course. And yeah, absolutely, I can just, I can do a number of things. One, I can, of course, set up my own SQL Server instance, in Azure, run all of my data run machine learning server, if I wanted to, through there. Or I could just pop over to the Azure Machine Learning Studio and write my customer code if I wanted to, or drop and drag the algorithm modules were necessary. And then of course, connect that up to the Power BI services, implementation, which is power behind the cloud. And all of a sudden, I have Power BI that is updated every month, instead of the Power BI server or Power BI report server, as well as Power BI Desktop, which gets updated, I think, once a quarter at this point, it’s not always the newest stuff that Power BI services gets. Anyways,
I feel like I’m going on a rant. So
what I’m trying to get across is implementing any sort of machine learning algorithm is going to have challenges. And in this case, it’s a pretty straightforward problem that we can overcome. But let’s say you want to, you know, throw your algorithm or the results from your algorithm into some sort of app across your organization, well, how you’re going to do that maybe you run an API, something like this. Or maybe you have your app dev team, do some sort of black magic, work and implement some sort of beautiful visualization for the the algorithms so that your frontline staff perhaps can use it and a CRM, actually our CRM, again, since we’re not in that sack, we can just natively implement Power BI dashboards into it. So I think we go that way. Now that comes to me. The point is, when you’re implementing machine learning, software solutions, it’s always challenging. And at the end of the day, what you have to focus on is the end user, how do they want to use it? How do they want to consume the data? You know, what sort of visualizations what makes the most sense for them to garner the insights that will be necessary to move the needle in a positive direction for your organization. And frankly, the technology is improving, not every day, but fairly frequently enough so that hopefully in the near term, maybe a year or two out, we won’t have to worry about these kind of mismatched capabilities or properties of various technical solutions. We can just say, Hey, we have this algorithm lives in the cloud. Here’s an API connect to it, right? That’s the dream. So yeah, that’s, that’s all I have for today. Now it’s a short one. And I’m not sure if I’m going to continue running the 20 minute, Wednesday, Friday episodes like, I’m guessing Wednesday will still be a 20 minute or but maybe these will be closer to 15 minutes on a Friday, so you guys can get home from work. In any case, until next time, keep on getting down and dirty with data. That’s it for the show. Thank you for listening. Be sure to follow us on any of our many social media pages and be sure to like and subscribe down below so that you get the latest from data couture. Finally, if you’d like to help support the show, then consider heading over to our Patreon firstname.lastname@example.org. forward slash data couture. writing, editing and production of this podcast is done by your host Jordan Bohall. So until next time, you getting down and dirty with data