Currently, I have a desktop application that handles the scanning, interpretation, and transfer of documents to a GED system, with a tracking tool to identify tasks that fail during transfer after interpretation. This application will no longer be supported after December 12, 2024, but I will continue working on the transfer module.
For each transfer task, I receive a folder containing several subfolders (e.g., archive1, archive2). Each archive includes documents with scanned images and metadata files. The current solution generates a .arc file that consolidates an XML file with metadata and a PDF compiling all document images. This .arc file is then sent to the GED system via the XFTP protocol, directed to a specific directory on the transfer machine.
In the new solution, I’d like to move to a web application to allow multiple users access to the interface for initiating transfer processing. I’ll use Spring Boot and Batch to handle this processing. The web interface will enable users to select the task to be transferred, and the process will then point to the corresponding task path.
To enhance speed and efficiency, I want to distribute file transfers across the two machines. I’m looking for a way to execute the processing in parallel on both machines, with an architecture that facilitates communication between the modules. Additionally, I’d like to display in the interface the document currently being transferred, which will disappear once the transfer is complete.
I also plan to create a small database to save each document’s data. I’m wondering if a high-performance database is necessary, given that it needs to be hosted remotely.
Thank you for your help and any ideas on architecture and processing optimization.