Issue with Document Builder Memory Usage

Hello,

As mentioned in other posts, I am developing an HTTP server to generate documents using Document Builder. (More details can be found here: Issue with TOC update after using MergeCell and Running multiple DocBuilder scripts with different arguments).

Issue with Fonts Directory

My HTTP server needs to generate documents simultaneously. Some documents require custom fonts, and I noticed an issue:

  • If I do not set the fonts directory using .SetProperty (documentation: SetProperty API) before the first document generation, subsequent documents that require custom fonts do not use them.

  • The following command ensures that custom fonts are recognized:

    builder.SetProperty("--fonts-dir", "/usr/src/app/fonts")
    
  • I resolved this issue by setting the property at the server initialization, like this:

    def initialize_docbuilder():
        builder = docbuilder.CDocBuilder()
        builder.SetProperty("--fonts-dir", "/usr/src/app/fonts")
    

This is not really my question, but I think it can gives a little bit more context on what I’m trying to ask.

Memory Usage Issue

My main question is: How exactly does Document Builder work in terms of memory management?

  • My HTTP server must generate documents simultaneously, so I use threads.

  • Each thread initializes its own instance of Document Builder, like this:

    builder = docbuilder.CDocBuilder()
    ... # Some work is done here
    builder.CloseFile()
    
  • However, after processing multiple document generation requests, I noticed that the server’s RAM usage keeps increasing. Even when the server is idle and waiting for new requests, the memory usage remains high.

  • It appears that each document generation request increases memory consumption, but the memory is not released afterward.

Questions & Additional Information

  1. Is there a recommended way to properly clean up Document Builder instances to prevent memory leaks? I’ve tried the .Initialize (Initialize) and .Dispose (Dispose) API methods, but It only crashes Python with exit code 139.
  2. Does Document Builder have an internal caching mechanism that retains data even after CloseFile() is called?
  3. What is the best approach for handling multiple instances of Document Builder in a multi-threaded HTTP server?
  4. Does the document builder, when executed the first time, keeps running in the backgroud?

Environment Details:

  • DocumentBuilder version: latest using the Python Builder.Framework with the .Run() method (documentation here)
  • Installation method: Downloaded the package from the official page and installed it using sudo dnf install <package name> .
  • OS: Rocky Linux 9 on a docker container

Hello @mrmikept

Thank you very much for the detailed description. We will perform some tests in order to reproduce the issue and analyze it.

I will keep you posted about the results or I will let you know if we require some additional information.

Hi @Constantine,

I’ve conducted additional tests and noticed that when the document builder (at least in the Python wrapper library) is executed, it spawns 11 threads that continue running in the background. Could these threads be contributing to the excessive memory usage?

I’ve also tested document generation under the following conditions:

  • Using ONLY the Python wrapper library: The threads were spawned and remained running in the background.
  • Using builder.Run(<path-to-.docbuilder-script>) (Run and RunText ): I observed that the threads are created within this method (or when using builder.RunText(...)) and persist in the background.

To rule out other factors, I also ran tests where I disabled thread creation for handling incoming requests in my service, ensuring that memory usage wasn’t caused by my implementation. Despite this, I still observed a noticeable increase in memory consumption. (I’ve verified that my service does not create any threads.).

Hope this information helps!

1 Like

Thank you very much for the additional information. We keep working on the situation, I’ll let you know once we get any news.