Case Studies

Basware – Extract invoice information with AI

Our company in short

Basware provides services in e-invoicing, accounts payable, and financing across 175 countries worldwide. As a leader in the Accounts Payable
automation sector, Basware expertly manages the receipt of invoices from suppliers, checks invoice details against purchase orders, stores invoices
in the accounting system, and ensures payments are scheduled by their due dates. Annually, Basware processes over 170 million invoices. 

Our business problem and machine learning 

The SmartPDF service extracts invoice information from PDFs sent to Basware via email and converts them into the e-invoice standard format.
Initially, SmartPDF required manual template creation and validation to extract data from PDFs, which proved costly and unscalable. To address this,
Basware implemented SmartPDF AI, utilizing supervised learning to parse and process invoices automatically, eliminating the need for manual intervention.
The goal is to replace traditional templates and manual validation entirely with AI. 

This AI service operates on the AWS cloud, employing running SmartPDF AI to extract text from the invoices. It utilizes supervised machine learning
to predict field values from the extracted text, trained with 30 million samples, covering 800,000 invoice types, and utilizing 2TB of raw data.
Maintaining high accuracy in this process is critically important. 

Our research and solutions 

The problem: Human errors in training data. If data is wrong, AI learns wrong. Objective: Enhance data quality. 

Phase 1:
A committee of six AI models employs a voting system for robust decision-making. Features include anomaly detection and the use of hierarchical or
pivot-based clustering for efficient invoice extraction. Additionally, a grammar-based invoice signature creation method is implemented,
designed to be resilient to OCR inaccuracies. This approach has successfully reduced errors by approximately 30 percent. While accuracy for crucial fields
is satisfactory, it is not uniformly high across all fields. 

Phase 2:
Machine learning-based embeddings are automatically discovered in an unsupervised manner through a fully ML-based solution. This system autonomously identifies the signatures of various fields without relying on heuristics or human-labeled training data. While the model achieves high accuracy across all fields, there is a need to improve its coverage. 

| © Basware

Future direction

OpenAI’s GPT-4 offers significant potential for enhancing invoice data extraction by leveraging its advanced natural language understanding and generation capabilities. This could represent an opportunity rather than a threat, providing more sophisticated tools for recognizing and processing complex invoice information, particularly at the line-item level.

A study initiated to explore alternative methods for extracting invoice data with GPT-4 could involve comparing results from various engines and potentially integrating them to achieve superior recognition results. Key factors to consider in this exploration would include the costs associated with implementing and maintaining such technology, its performance in real-world scenarios, and the accuracy of the data extraction compared to existing methods.

Granlund – ML tools in energy forecasting and predictive maintenance

Our company in short

Granlund is strongly growing group of companies with more than 1500 experts in Finland, Sweden and UK.  Our lines of business are MEP design (Mechanical, Electrical, and Plumbing), property management services and software, energy, environmental and real estate sector consulting, construction management and supervision and building management. Our key goal is to make properties more functional and smarter and to improve human well-being in the built environment. 

Our business problem and machine learning

Buildings are responsible for 40 percent of global energy consumption and are therefore an interesting target for optimizing energy use using machine learning. In addition, the productivity growth in construction has been low compared to other industries. Thus, there is much potential for improvement
and machine learning offers methods of achieving this.

Our first machine learning application area in the project is building energy management in a large building portfolio. Buildings are a challenging sector
for machine learning as every building is different. Thus, there is a need for effective methods to develop and manage a high number of ML models.

Our research and solutions

In the IML4E project, we have researched tools and methods for data profiling and model monitoring of a high number of ML models. Within data profiling, we have studied data quality dashboards that could be used in continuous data quality monitoring and searched for ML solutions to inspect the quality of energy consumption data.

The project involved

  • creating the ML application for detecting and predicting anomalies in building energy
  • building MLOps platform and a model monitoring dashboard to assess the performance, data quality, and infrastructure of training and prediction.

MLOps platform was built to manage large number of ML models that the created application requires. The model monitoring dashboard is customized
to the use case/business perspective, as it tracks multiple models per building for each application, instead of just specific models and it also tracks
data quality and infrastructure serving the application.

Future direction and business benefits

A possible future direction is to develop applications for other types of time-series data from buildings (such as submetering, sensors, etc.), and then to adapt the MLOps and model monitoring platform to suit more general models, since in the future we may have models that can work for any building, instead of specific models for each one. This might bring more customers and increase the value of our monitoring and digital services.

| © Granlund

Reaktor – MLOps practices for a development team

Our company in short

Reaktor is a global technology consultancy that builds category-defining digital products. Our clients include industry leaders such as Adidas, HBO, Supercell, Cathay Pacific, and KONE. We’re known for embracing autonomy, optimizing for speed, and meticulously building the most high-performing, multidisciplinary teams. Today, we are 700 strong, with offices in six countries.

Our business problem and machine learning

As a consultancy company we are committed to forming and serving customers with high-performance teams that can deliver well-designed digital products. On team level work and design practices become of importance, as well as knowledge of state-of-the art tools and methods. Machine learning starts to be a part of any software architecture in different industry verticals.

There will be specialized machine learning experts in future, too, but it is foreseeable that the ML will become commoditized in software development.
Our main concern is how the best software and design teams will work in the future, so that we can stay in our quality and value promise.  

MLOps gives the technological and methodological background for incorporating ML into the solutions, but that is where our work just starts.
Important questions are, for example: How the business cases, services, and solutions should be designed and lead, and how the organization should support this process?   What are the value-creating design and development practices of software systems that include ML components? How to shorten the cycle times, lessen faults, and align work better?  What kind of MLOps frameworks and software fit best in different contexts and industry verticals?

Our research

We have concentrated on researching and developing the co-operation between design and developer teams in the context of delivering solutions for healthcare. Our technical research has been, e.g., about data preprocessing, which is often an important link between the business environment and ML architecture. Here, the LLMs act as a novel way of extracting structured information from text and combining freeform and structured UIs. Much of the value created by the project is realized as elevated competence, and we have participated in training material creation. Part of our training material has been tried
at Helsinki University.

Business benefits

In the context of IML4E the efforts in training material and OSS platform, and maturity assessment scheme have already boosted our teams’ capabilities.
We have been running an internal ML / MLOps lab where leads, designers, developers, and data scientists create together applications, train each other, and enrich our offering. The lab constantly creates proof-of-concepts to demonstrate our offering. All in all, the program has made our ML delivery capability broader.

| © Reaktor

Vitarex – Scoliosis screening with AI for health visitors

Our company in short

Vitarex Studio Ltd is an independent software development company with a primary focus on creating software for Hungarian healthcare providers.
Our mission is to provide high quality, modern technology to our customers and users. We have almost 70 percent market share with our software targeting health visitors, which is a special healthcare nursing segment in Hungary covering the whole country.

Our business problem and machine learning

Our company cooperates closely with the country’s health visitors, helping with the medical screening of school-aged children. The assessment of the posture of the children is carried out during such screenings. Introducing an ML-based software tool would help identify problematic cases and evaluate postures in general.

Our aim is to develop an application that can help with the assessment of postures by automating the process with ML models. For a well-functioning application, we must provide robust and reliable ML models. In the scope of the IML4E project this effort can be helped by developing an MLOps infrastructure with which models can be easily trained, evaluated and deployed, as well as by finding an efficient way to handle the different data.

Our research and solutions

In the scope of the IML4E project we have researched tools and methods that can help with keypoint detection model debugging and data management.

We have built a model training pipeline and developed a custom keypoint detection evaluation process to easily create and compare different ML models. In cooperation with Spicetech, virtual validation of the keypoint model was executed with VALICY, and this way the limits of the model had been determined.

A model deployment process was also created to easily deploy and update ML models in production. We have developed a pose assessment software that uses a keypoint detection model for posture analysis and carried out a field test with it. With the help of a built-in feature the performance of the ML model can be continuously monitored, and additional training data can be collected.

Future direction

We have gained valuable experience designing and developing ML based applications using MLOps practices. This experience will provide great benefits in other ML projects, some of which are already underway.

Architecture of the use case system Vitarex | ©Vitarex
Architecture of the use case system | © Vitarex

Siemens – Machine Learning in Visual Quality Inspection for field engineers

Our company in short

Siemens AG is a global technology company with a focus on industry, infrastructure, transport, and healthcare. With around 2100 employees worldwide,
the division of Technology plays a key role in R&D within Siemens. It covers a wide range of research fields, including software development, electronic engineering, energy, sensor technology, automation, medical informatics, and imaging, as well as information and communications technology.
The technology field Data Analytics and Artificial Intelligence has been driven for more than 30 years at Siemens, creating innovative solutions and
new business opportunities in many divisions and products.

Our business problem and machine learning

While the different divisions Siemens operates in might choose different solutions to a problem, a common denominator is the operation of such systems.
ML could be one of the many engineering aspects to solve a customer problem but often comes with operational complexity. This is one of the reasons we joined the IML4E research project: We want to master the challenges that arise with MLOps wrt. trustworthiness, scaling, and integration with systems engineering.

While there are many possible use cases in the different sectors we operate in, we chose a Visual Quality Inspection (VQI) use case as a reference within
the project. While the specific use case is not representative of all our machine learning efforts, it is a complex and reoccurring use case on the shop floor,
in buildings or in the field. Our industry partners expect solutions that don’t interfere with their operations but enhance them, which is a high expectation
for a machine learning system.

With the technical and soft skills acquired with the IML4E we will enable our employees to bring innovations to our customers.

Our research and solutions

Through guiding and evaluating the technical work in the project, we took the opportunity to try the tools and frameworks in other projects throughout
the company to connect the different efforts around MLOps in Siemens and create a shared knowledge pool.

Additionally, we integrated specific tools to the internal developer platforms, so that developers have an easy and proven way to integrate with these services, while our inner source approach facilitates reuse and comes with best practices without the operational overhead, since the tools fit well into the actual Siemens ecosystem.

Future direction and business benefits

We will proceed by leveraging the knowledge built within the project to extend our internal platforms and training offerings and engineers will be able
to create ML systems faster and more reliable and thereby create more value for our customers and partners. With new opportunities like GenAI and
other further advances, we are well equipped to build the future of automation.