“MLOps and ML Systems are so important, because they are ways to embody best practices that enable you to have resilient systems”
In our ‘5-minute interview’ series we meet AI and ML professionals who will share their experiences on working in the Machine Learning field. How did they start out and what aspects of AI and ML make them tick? What are their passions and what drives them to create major tech innovations?
This episode Rik meets Zander Matheson, founder of Bytewax and a jack of all trades in data science. Zander tells us about how Bytewax was created to make real-time inference easier and why MLOps is important for building resilient ML systems.
Zander: My name is Zander and I’m the founder of Bytewax, an open source project and python stream processing library. We also offer a commercial offering on top of that, which is a platform that operationalizes real time data processing. My background is in Data Science and Machine Learning. Before starting Bytewax I worked at Github and Heroku. I have always had my fingers in different data problems, so anything from data engineering to analytics, data science and machine learning infrastructure. So you can say that I’m a jack of all trades.
Zander: So the idea behind how Bytewax came to be, was to tackle a problem around how we can make real-time inference easier. So you get to the point where you have an idea and you go and train a model, you test to make sure that your idea actually works, and then you want to move this thing into production. In some instances it's best to have that inference happening in real time. When it comes down to the hard part, it’s not how to host this model, but (as you are aware of in the feature store side of things) how to generate the features that are needed so that the model can make the inference. And I need to do the same thing to the data in real-time that I did when I was training that model. So that's where Bytewax came to be. Because oftentimes, those transformations you need to do, whether it's joining multiple streams together or you're taking a window of time and calculating something (making a new distribution of data over a window of time), those require the things that stream processors are naturally really good at. That's what led us to making an open source, Python stream processor that is stateful, and has all the nice things, because it provides a much easier handoff from what you did to the data in training to do it in real time.
Zander: It was sort of out of necessity that we tackled infrastructure. I came to realize the importance of the additional software engineering best practices that are related to these data driven and oriented things. So I think that's why MLOps and ML Systems are so important, because they are ways to embody best practices (in addition to software engineering best practices) that enable you to have resilient systems. At the end of the day, that was our job, to build resilient ML systems so we can all sleep well at night.
Zander: The industry moves so fast around all of this stuff so sometimes I find it quite hard to keep up. I have a number of interesting people, blogs and conferences that I follow. For example, the Feature Store Summit is great because it’s so unbiased and has a lot of great presentations. I also recommend the Python conferences, both the PyCons and the PyDatas, because that’s my area. You meet lots of practitioners and there are really great talks about exciting new libraries or how to effectively use existing libraries.
I would also recommend Fast AI. I’m following some of the folks that were involved with Fast AI. So for example, the Fast AI blog is a good place to start, they’re always poking at new and interesting things and asking questions or posing things that might be controversial. For me, it’s a great place for keeping up.