AiThority Interview with Michael Berthold, CEO of KNIME

Machine LearningAnalyticsAutomationChatbots & Intelligent Assistants

By Rishika Patel On Sep 12, 2024

Michael Berthold, CEO of KNIME, discusses the benefits of integrating AI and ML into business, the top challenges organizations face in integrating automation into their workflows and major technological breakthroughs in data science in this Q&A:

————–

Hi Michael, please walk us through your journey and learnings as a CEO of KNIME.

I started with limited industry experience and primarily an academic background, so we had the typical engineering-founded learnings: at some point in time, IT isn’t done by the geeks anymore, and instead, you add an IT department. You also realize that engineers don’t scale as salespeople and you need to bring people on board who are actually passionate about sales. The same is true for marketing – finding someone who gets KNIME, can talk to geeks, and translate what they mumble to non-geeks was a longer journey. I also had to learn that not everybody sees numbers as a proxy for the insights they can reveal, but that many people tend to focus on the surface–on the numbers themselves. By now, I can rely on a couple of key individuals who truly own parts of KNIME and I can start to go back to what I do better (and what I also enjoy doing). Lastly, I also learned that “Management by walking around” is actually a thing and that “Change Management” is not just business BS. It’s still amazing to me how people don’t listen to what you say the first time.

Also Listen: AI Inspired Series by AiThority.com: Featuring Bradley Jenkins, Intel’s EMEA lead for AI PC & ISV strategies

KNIME is known for bridging the worlds of dashboards and advanced analytics. Can you explain how KNIME achieves this and what makes it distinct from other analytics platforms?

KNIME’s workflow-based interface makes it intuitive and easy to use, both for users just getting started with analytics, and those who are looking to dive deeper into more advanced analytics and data science techniques. However, KNIME’s open-source nature also ensures users get access to the broadest set of data technology (with 300+ integrations) and the deepest analytical capabilities (with access to all popular ML libraries and genAI models).

And while you can use KNIME’s open-source product to build workflows, you can use the commercial product to automate, deploy, and ultimately control and govern those workflows.

KNIME’s software is distinct through its:

Breadth & depth. Open source means KNIME can integrate with any current or future data technology (access any ML techniques and libraries; connect to any databases, big data platforms, LLM model providers, cloud services, and file formats; integrate with any visualization or BI tool, and more). Any KNIME user benefits from community-driven innovation. Extensions, forum help, and working examples are available for all popular use cases across industries and departments.
Consistent visual workflow paradigm for all data work. Data access, visualization, abstraction, encapsulation, deployment, and orchestration all happen through workflows – no need to incorporate code, unless you want to.Also, with KNIME, the workflow is the program. Low-code is a broad term – many low-code providers are simply adding a UX on top of a coding language like Python. You visually drag and drop nodes, and underneath the interface that action creates code. With KNIME, the visual workflow is the program and the corresponding programming language is a network connecting nodes. The advantage here is that everything can be done without code, and KNIME does not rely on any one single language or library to stay relevant.
Enterprise-scale, but highly flexible and customizable. Providing enterprises with the ability to implement infrastructure, scale, control, and governance to their distinct specifications. Enterprises using KNIME can have thousands of users and workflows that need to be governed.
No barrier to entry. Since the ability to create workflows of any complexity is available completely for free, users or teams can start learning and upskilling on low-code data science immediately, with no cost or IT overhead. The efficiency and TCO of organizational upskilling helps organizations get to “data-driven” much faster than what proprietary tools would allow.

With the growing importance of AI and machine learning, how does KNIME integrate these technologies into its platform, and what benefits does this bring to business and data experts alike?

Before GenAI, KNIME was already built with the consideration that a lot of development in the data science and AI space lay ahead.

Also Read: AiThority Interview with Carolyn Duby, Field CTO and Cyber Security GTM Lead at Cloudera

The open-source nature of the KNIME Analytics Platform gives all users access to current and future powerful data technology, while KNIME Business Hub provides the tools to control the workflows built with that technology. So as GenAI (and all the genAI-based software) comes onto the market, KNIME can quickly integrate new developments, and give users the ability to mix and match with other powerful data technology. KNIME users could quickly augment their analytical workflows with genAI, and vice versa – augment genAI with analytics, or with guardrails, better controlling the input or the results of LLM models.

More specifically, KNIME has recently released both, (1) features to augment data science workflows, and (2) features to better govern and control GenAI.

Access to latest GenAI models & developments: First, users can connect to the latest models from AI providers such as OpenAI, Azure OpenAI Service, and Databricks, as well as access the advancements in local LLMs, such as Llama 3, to work with the latest models on their local machines. Additionally, they can connect to local and remote Hugging Face Text Embedding Inference servers to access a wide range of open-source embedding models for tasks such as semantic search and feature extraction. They can also connect to protected Hugging Face Inference Endpoints to quickly spin up many GenAI models for experimentation. Users can now also split documents for retrieval augmented generation (RAG) in a single step to save time.

Also Read: Generative AI in Healthcare: Key Drivers and Barriers to Innovation

Features for advanced GenAI & data governance: KNIME also now provides a GenAI Gateway that allows IT to centrally configure which GenAI providers and nodes can be accessed and used by team members. This centralized management of GenAI providers and models enhances governance and security, allowing IT to control model accessibility in alignment with enterprise policies.Users can also make working with data and LLMs safer with new Presidio and Giskard integrations. KNIME’s new capabilities, based on Microsoft Presidio, help protect personally identifiable information (PII) when sharing data with external LLM providers. These capabilities can detect and anonymize sensitive information such as names, phone numbers, and credit card numbers in text data to prevent them from being sent out to GenAI tooling, addressing key concerns around data privacy and compliance in GenAI usage. Additionally, a new Giskard-based extension helps users spot issues in end-to-end machine learning workflows and aids them in evaluating robustness and bias for more reliable deployments.

KNIME emphasizes an open approach to data analytics. How does this technique benefit the end users?

As mentioned above, keeping the software open-source allows us to quickly integrate new technology faster than any proprietary tool.

Offering the open-source product for free has also enabled us to rapidly grow a large community that actively shares practical examples across all application areas of KNIME. For what is essentially a visual programming language, this has resulted in an extensive repository of use cases applicable to any industry or discipline for users. For enterprises, encouraging the widespread adoption of advanced data tools beyond Excel or BI has been challenging. However, the KNIME Analytics Platform, with its minimal barriers to entry, allows thousands of employees to be trained on the platform.

What are the top challenges organizations face in integrating automation into their workflows, and what solutions does KNIME leverage to keep the process smooth?

It’s not, as much, that organizations are setting out to automate their workflows, and are blocked or inhibited, but more so that most organizations are using tools, such as spreadsheets, that do not make automation and reusability easy. The result is a ton of rework, siloed knowledge, and data grunt work performed across organizations when experts – whether they are data or domain specialists – could be doing move value creation work. Even the few companies that do cover the entire data science life cycle, do it through a plethora of tools that aren’t all easily or well connected. In KNIME’s case, it’s two well-integrated products: the open-source analytics platform for workflow creation and the KNIME Business Hub for productionization.

KNIME provides automation capabilities through our SaaS “Team” plan on KNIME Community Hub, allowing any team to start automating KNIME workflows with just a credit card and no IT overhead. Or, they can pay for an annual license of the on-prem KNIME Business Hub, where they’ll be able to not only automate workflows, but also get access to a suite of governance and deployment capabilities.

Please highlight some of the key factors that differentiate successful data-driven organizations from those that struggle with their data initiatives.

Successful, data-driven organizations employ nimble tech stacks. One of the easiest ways to accomplish this is to adopt open-source tools as they allow for the quick implementation of new technology. Organizations should also try to use a tool that can connect broadly, so they’re not working with piece-mealed solutions, as mentioned in the question above.

Also Read: Generative AI in Healthcare: Key Drivers and Barriers to Innovation

Centralizing data governance also sets successful organizations apart. Through these efforts, data quality is improved, consistency is ensured, and compliance is enhanced across the board. In addition to removing data silos, this also creates a single source of truth that enables faster, more informed decision-making. By centralizing governance operations, it is also easier to scale and collaborate effectively. A lack of centralized governance leads to fragmented data management and low data utilization.

Finally, successful organizations are making sure more of their non-technical staff are getting up to speed with data skills through upskilling. This not only frees up the data team to focus on mission-critical initiatives but also embeds data-driven decision-making across the entire organization, ensuring that every department can leverage data as a strategic tool.

If you had to share five major technological breakthroughs in data science that you foresee in the next decade, what would they be?

Integrated analysis of heterogeneous data – Most tools tend to be focused on a subset of data types and sources. Even tools such as KNIME, which allow practitioners to connect to many more sources, lack algorithms to truly find insights in all of those data sources, together.
Tightly integrated AI assistants that speed up and enhance data science workflow creation.
Self-Improvement Analytics – Similar to a conversation with a chatbot, I foresee a proliferation of analytical workflows that continue to incorporate feedback and learnings from past mistakes. Again, Active Learning has been around for very limited applications for a while, but scaling this out to general analytical toolkits will fundamentally change how people work.
Advancements in prescriptive analytics – Innovation that will enable data science workflows that not only predict the future, but also provide suggestions for corrective actions (and the resulting outcomes). The current state of the art is fairly limited and relies primarily on simulations, but this will change in the future.
Beyond-big data – Having the ability to process data “on the fly” without ever looking back. In some application areas (think particle physics) they are already accustomed to only storing 1% of their data, but I foresee other areas will require reliable tools for this in the future.

Thank you, Michael Berthold, for your insights; we hope to see you back on AiThority.com soon.

[To share your insights with us as part of editorial or sponsored content, please write to psen@itechseries.com]

About Michael Berthold
About KNIME

About Michael Berthold

Michael Berthold is co-founder of KNIME, the open analytics platform used by thousands of data experts around the world. He recently left his chair at Konstanz University to focus solely on being CEO of KNIME. Before that, he held positions in both academia (Carnegie Mellon, UC Berkeley) and industry (Intel, Tripos). He has co-authored several books (the second edition of the Guide to Intelligent Data Science appeared recently), is an IEEE Fellow, and a former president of the IEEE-SMC society. He continues to publish and occasionally still creates KNIME workflows.

About KNIME

KNIME Software bridges the worlds of dashboards and advanced analytics through an intuitive interface, appropriate for anybody working with data. It empowers more business experts to be self-sufficient and more data experts to push the business to the bleeding edge of modern data science, integrating the latest AI and Machine Learning techniques. KNIME is distinct in its open approach, which ensures easy adoption and future-proof access to new technologies.