A Detailed Conversation on Open-Source AI Frameworks for MLOps Workflows and Projects
Today, AI stands as a pivotal force driving Industry 4.0, fostering innovation across diverse sectors regardless of organizational size or industry vertical. Yet, keeping abreast of the rapid advancements in AI poses a formidable challenge. To expedite application development and foster efficient practical implementations, developers lean on AI open-source projects to craft robust deep learning solutions. Investing heavily in MLOps tools isn’t a prerequisite for infusing DevOps magic into machine learning endeavors; a plethora of open-source tools offer viable alternatives. This approach proves invaluable when tackling unique challenges, offering a supportive community as a backbone. However, open-source machine learning tools aren’t devoid of drawbacks.
Additional components that powered ChatGPT were TensorFlow, created by Google, and PyTorch, created by Facebook. When it comes to developing and training deep learning models, these Python frameworks include the necessary tools and modules. They serve as a foundation for further open-source AI/ML applications.
A software company focused on making working with data intuitive, KNIME provides access to AI algorithms, tools, and frameworks, encouraging collaboration and innovation among developers and researchers in the AI community. To understand how KNIME enables users to collaborate on data science, our journalist, Pooja spoke to the CEO & Co-Founder of KNIME, Michael Berthold. Berthold has authored over 250 publications while focusing his research on the usage of machine learning methods for the interactive analysis of large information repositories.
Here’s the full interview with Michael.
Hi Michael, welcome to our AiThority.com Interview Series. Please tell us about your AI journey so far.
There’s a reason we’ve chosen to make KNIME Analytics Platform an open platform – when new technology becomes available and even omnipresent the way that genAI has, the platform can easily be extended and adapted to work with those technologies. We’ve integrated every “AI” type technology as it’s become available – data mining, machine learning, deep learning, and now, gen AI.
At KNIME, we started experimenting with genAI in early 2023, shortly after the release of ChatGPT 3.5, in a retreat project. There, we experimented with ChatGPT as co-author to test its ability to create meaningful text with a few careful prompts. We also built an AI assistant to help onboard users to our software. The integrations at that point were somewhat rudimentary, but we could already see its potential.
Those were the origins of some of our gen AI capabilities. We’ve since built a working group of AI experts at KNIME, who are regularly contributing and updating functionality into both, the KNIME Analytics Platform and our commercial product, KNIME Business Hub.
What primary services do you provide? What is your overarching vision for both your company and the industry at large?
We build software to help anyone, irrespective of their background or technical experience, make sense of data. We do this through a combination of software (an intuitive, low-code, and open KNIME Analytics Platform, as well as the KNIME Hub for collaborating & deploying solutions) and content (training, resources, videos, etc).
We believe that a single tool can be used for both, advanced or domain-specific use cases, as well as basic ETL use cases. This will not only (obviously) decrease the TCO of software in an enterprise, but help all employees raise the standard of data literacy of an organization, and finally, glean insights from their data.
With KNIME, advanced users (data scientists) appreciate that they can write code when they want to, but they don’t have to and can use the low-code environment to eliminate the need to be software developers. They use code (or low-code) for their statistical or AI/ML models but then don’t worry about the underlying technologies for the rest of the work (like accessing and prepping data, then ultimately integrating, deploying, and monitoring).
Beginners appreciate the intuitive, drag-and-drop interface and the ability to automate mundane data tasks, typically performed in spreadsheets, but they’re also able to dive deeper into data science, in some cases, even moving into AI/ML techniques.
With large quantities of employees on both sides of the business, utilizing the KNIME Analytics Platform, many enterprises use our commercial offering, KNIME Business Hub, to enable large (or small) scale deployment of analytics as REST APIs or Data Apps in a secure, governed environment.
Decades in, enterprises are still struggling to get a lot out of their data, and this is, at least in part, due to the silos between data and domain expert teams, as well as the burden on data scientists to solve all data problems in the organization. By creating a common, intuitive language between the two teams and giving domain experts more powerful data tools, we can start to remedy these issues.
KNIME being an open-source data analytical platform, how does it fit into the current technology landscape?
Open-source platforms are transparent, highly customizable, and extensible.
To the first point, KNIME has no black box – workflows include nodes handling granular tasks, and the source code is public. In today’s technology ecosystem littered with AI-based tools, it’s hard for organizations to know what decisions a software vendor is making on their behalf.
To the second point, as mentioned in one of the questions above, any new library, coding language, tool, or technology in this rapidly evolving space can be integrated into the platform. If an enterprise has custom requirements about which tools or technologies can or should be made available, that customization is also possible.
Of course, keeping TCO low is also helpful, but almost more important than that, it makes upskilling everybody in the organization feasible because they can all start with the open-source platform.
How do you construct, modernize, or orchestrate your MLOps?
The “final mile” of deploying models has historically always been an issue for data teams since the coding or the work that they do to develop a model isn’t the same code or work that is needed to run it in production.
KNIME has functionality to support the packaging of the right pieces of the model development process that can then, automatically, be integrated and packaged for production. This, we call Integrated Deployment.
Furthermore, our CDDS (Continuous Delivery of Data Science) extension allows teams to define deployment stages (e.g., dev, test, prod, or more), testing and validation requirements, and, ultimately, criteria for monitoring performance over time. Available as a set of KNIME workflows and data apps, the CDDS extension allows IT and data stakeholders to define an MLOps process, as well as automate as much or as little as they choose.
The extension is fully customizable to fit with any organization’s unique governance and deployment practices.
How are the Gen AI capabilities having a thrilling Revolution in the AI Sphere?
GenAI has been talked about a lot in the last year and while I do see it having a positive impact in some use cases, it is not a magic tool that will solve complex problems on its own.
The Good – GenAI can help save a lot of time in many contexts, especially helping employees not start with a “blank slate.” This could be used for many goals, like generating dialogues, stories, songs, even movie plots, images of new cars, images of people that do not exist, or images of nonexistent landscapes. This could be used to enhance creativity – you generate many ideas until you find the one you can work on – , to expand data sets with artificial data, or to protect people’s privacy with fake content.
The Bad – While the advancement of generative AI technology is fascinating, the reality is that it still needs discipline.
The downfall is in the notion that there is no annotation capability within AI-generated text – which brings up challenges of credibility and accuracy. Additionally, while most know about the issues with deep fakes and hallucinations, in the midst of so much realistic, artificially generated content, it can become hard to distinguish what is false and what is true, what is fake and what is real.
AI also has a well-documented problem with hallucinations, generating false outputs with high confidence and greater frequency over time.
Adding to this challenge is the fact that these erroneous outputs are not easily detected.
With this in mind, I encourage everyone to be cautious and understand that GenAI is not ready to solve all the problems of an organization. It can be a great assistant, but it still needs a lot of human oversight and monitoring.
What AI initiatives are you currently focusing on and why?
We’re looking to balance enabling our users to access the latest & greatest tech, but, at the same time, provide options for ensuring quality control of integrated genAI recommendations, as well as how data is shared with third parties.
There are three ways that we enable users to benefit from genAI from within the KNIME Platform:
Assist & Explain
Users can benefit from our AI assistant, K-AI, who can answer questions and, in some cases, even build KNIME workflows from scratch (although we always recommend that you “check” K-AI’s work)
Use & Customize
Users can augment their workflows with AI or build their own AI assistant, using our LLM extension which supports both state-of-the-art OpenAI models as well as open-source Large Language Models (LLMs) such as those from Hugging Face Hub and GPT4All. Users can leverage the low-code environment to fine-tune or fully tailor models to their own needs.
Manage & Govern
KNIME Business Hub lets you centrally control which LLMs your organization has access to and how (and if) data is shared with third parties.
Please shed some light on your customer relations and how their feedback has helped you create better products and services.
For open-source users, we regularly solicit feedback on the KNIME forum, have Q&As with our product & dev team, and meet for user conferences (small meetups, as well as our annual Summit).
Commercial customers have dedicated account managers that they meet with on an ongoing basis. Our product team also regularly meets with customers for feedback sessions.
How do you incorporate AI and automation into your daily workflow?
Personally, I enjoy using K-AI to help me with some KNIME workflows and the integrated Python editor. It is also super useful for sanity checking and improving of text that I write. I have tried to use it for the creation of teaching material but found it less useful. ChatGPT can be very stubborn at times and we clashed since I too can be stubborn when I have a clear idea about what I want.
Your favorite quote: “KNIME can do that, too?!” Please shed some light.
People tend to start using KNIME for a particular reason, with a use case in mind.
Maybe they are automating spreadsheet manipulation or analyzing texts. When they then cross over into new territory and realize KNIME also has extensions for geospatial data analysis, cheminformatics, and image mining, they are often surprised at the breadth of coverage we have. That’s where that quote comes from – there are only a few areas of analytics that we don’t have an extension for.
What is your perspective on how AI technology will influence the entire industry in the future?
I think we are currently seeing a bit of the peak of AI hype.
K-AI (or Copilot) like assistants will continue to grow and make a lot of mundane work go away. There is still enormous room for improvement.
But, we’ll also hit a bit of a wall at how far this can go.
On the opposite end, organizations will struggle with keeping things under control – making sure that no sensitive information leaves the house or that people include wrong material in their own works (may that be copyrighted, malware, or flat-out wrong) will trigger a whole new set of governance setups.
Thank you, Michael! That was fun and we hope to see you back on AiThority.com soon.
[To share your insights with us as part of the editorial and sponsored content packages, please write to sghosh@martechseries.com]
Michael Berthold is a German computer scientist, entrepreneur, academic and author. He is a Co-Founder of KNIME, and has acted as CEO since 2017. Berthold has authored over 250 publications while focusing his research on the usage of machine learning methods for the interactive analysis of large information repositories. He is the editor and co-author of numerous textbooks, a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), the past president of the North American Fuzzy Information Processing Society, and past president of the IEEE Systems, Man, and Cybernetics Society.
KNIME helps everybody make sense of data. Its free and open-source KNIME Analytics Platform enables anyone–whether they come from a business, technical or data background–to intuitively work with data, every day. KNIME Business Hub is the commercial complement to KNIME Analytics Platform and enables users to collaborate on data science and share insights across the organization. Together, the products support the complete data science lifecycle, allowing teams at all levels of analytics readiness to support the operationalization of data and to build a scalable data science practice.
Comments are closed.