How to Automate Data Extraction from Bank Statements

Using custom trained AI model

Walid Amamou


Image by Racool_studio on Freepik

In the world of accounting, document extraction from bank statements is an important task that ensures efficiency and accuracy in financial transactions. This is particularly important in an era where data is growing at an unprecedented rate and manual data entry is becoming increasingly inefficient.

In this tutorial we are going to learn how to automate the data extraction process from bank statements using custom trained AI models and automated table extraction.

Table Extraction

Bank statements are generally organized in a tabular format containing the financial transactions in a table along with unstructured text such as the address, bank name, statement period located at the beginning of the statement.

Bank statement example

An NLP model can be trained to automatically recognize and extract specific types of information from unstructured document such as amounts, dates, statement period and so on. However, it is not the most efficient use of time to train it on extracting organized tabular data. For this purpose, it is more efficient to use pre-trained tabular extraction APIs such as Microsoft Azure or AWS since they have been trained on millions of examples.

Below is an example of automated table extraction using UBIAI based on Microsoft azure API:

UBIAI’s table extraction

AI Model Training

Now that we are able to reliably extract the tables, we can train our AI model to extract the relevant information located at the top of the statement. Using UBIAI Annotation Tool, this can be done quite easily by labeling just 5 documents to train the AI model.

UBIAI OCR Labeling Interface



Walid Amamou

Founder of UBIAI, annotation tool for NLP applications| PhD in Physics.