How to delete historical versions of edited files and retain only the latest version

Dear OnlyOffice Team

We have integrated OnlyOffice into our application to enable editing and previewing of documents uploaded by users to our server. OnlyOffice works exceptionally well; users find it very smooth, and it fully meets our expectations. The only issue we’ve encountered relates to OnlyOffice’s versioning mechanism.

By examining the OnlyOffice working directory on the server (/var/lib/onlyoffice/documentserver/App_Data/cache/files/data), we observed the following workflow for file editing:

  1. When a user first opens a file via OnlyOffice, an Editor.bin file and a media folder are created under App_Data/cache/files/data/${fileKey}/.
  2. After editing, the changes are saved to a new directory: App_Data/cache/files/data/${fileKey}_\d+/. This directory contains three files: changes.zip, changeHistory.json, and output.${fileType}.
  3. Every edit generates a new directory with these three files.

This poses a problem: if the original file is large, multiple edits (even minor ones) will create numerous copies of output.${fileType}, consuming significant storage space.

Our Questions:

  1. Can OnlyOffice be configured to retain only the latest version of edits? If so, how?
  2. If configuration isn’t possible, can we manually delete intermediate versions? Based on our analysis of the directory structure before/after edits, we suspect OnlyOffice works as follows:
  • The original file is converted to Editor.bin upon first open and remains unchanged.
  • Each edit generates a changes.zip, which functions like a diff/patch file. The output.${fileType} is generated by applying changes.zip to Editor.bin.

If our understanding is correct, we could safely delete intermediate versions and keep only the latest edit results. We plan to identify the latest version using the last_open_date field in the task_result table of the PostgreSQL database.

Would there be any side effects if we programmatically delete these intermediate versions?

------------------------------------------------ 以为内容为 AI 翻译的英文原文------------------------------------------------

如何删除编辑过后的文件的历史版本,只保留最新版本。

OnlyOffice 的朋友们您们好,我将 OnlyOffice 接入了我们的应用用来对用户上传到服务端的文档进行编辑和预览。
OnlyOffice 工作的很好,用户起来很顺滑,完全符合我们的预期。唯一的问题出现在 OnlyOffice 的多版本机制上。

通过查看服务端的 OnlyOffice 工作目录: var/lib/onlyoffice/documentserver/App_Data/cache/files/data,我发现 OnlyOffice 对于文件的编辑是按如下流程处理的:

  1. 用户首次通过 OnlyOffice 打开文件后,会在 App_Data/cache/files/data/${fileKey}/ 目录下创建 Editor.bin 文件以及 media 文件夹
  2. 用户对文件进行编辑之后将编辑结果保存到目录: App_Data/cache/files/data/${fileKey}_\d+/,在这个目录中会写入: changes.zip, changeHistory.json, output.${fileType}
  3. 用户没每对文件进行编辑一次就生成一个新的目录,并且生成步骤 2 中所描述的 3 个文件。

这样会有一个问题,如果一个文件本身非常大,那么多次修改此文件(哪怕只修改一个字), 就会保存多个版本的 output.${fileType} 文件,会耗费非常大的存储空间。

是否可以通过配置让 OnlyOffice 只保留最新一个版本的修改结果? 如果有的话,要怎么样配置? 如果不能我是否可以自己删除不需要的中间修改版本。根据我对编辑前后生成的目录结构的变化,
我推测 OnlyOffice 的工作逻辑如下:

  1. 用户首次打开文件时将原文件转换为 Editor.bin 文件,之后此文件不不会再被修改
  2. 用户每次次修改后在每个目录生成 changes.zip,changes.zip 的作用就像 diff 命令生成的 patch 文件一样,通过 Editor.bin + changes.zip 就可以生成 output.${fileType}

如果我理解的上述工作原理没有问题的话,我完全可以删除编辑的中间版本,只需要保留最后一次修改的结果就好了。 关于哪个目录是最新版本可以通过查询 psql 数据中的 task_result 表的 last_open_date
来确定。 如果我自己通过代码删除这些中间版本是否会有其他副作用?

Hello @isNaN

Document Server stores files in cache indeed, but clearing them manually can cause various issues related to the document opening. By default Document Server performs cache clearing once in 24 hours (more details about process of cleaning I’ve posted here).

services.CoAuthoring.expire.files is a lifetime of a file in cache that was successfully edited and saved back to the storage. However, the Document Server does not delete such files immediately after the lifetime end. It does that on a schedule set in services.CoAuthoring.expire.filesCron. So, by default, at 12 am each day the Document Server will check if there are files in its cache that are 24 hours (or more) old and delete them.

If you want, you can change the cron job timings according to your needs to make files in cache being remove more frequently. I’d suggest sticking to this approach, because, as I mentioned before, manual removal of items in cache may cause serious problems with the integration.

Hi Constantine,

Thank you very much for your reply. The information you provided was very helpful and gave me a deeper understanding of the inner workings of OnlyOffice. This will greatly assist me in my subsequent work.

However, this information still doesn’t fully resolve my issue. My original intention in asking this question was to address the problem where the same document is repeatedly converted to Editor.bin during multiple editing sessions and then saved as output.{fileType} after each edit. I hope to achieve the following:

  1. If a file is edited multiple times, it should not require re-generating Editor.bin each time it is reopened for preview after editing, as this conversion process consumes significant server resources, especially for large files spanning hundreds of pages.
  2. If a file is edited multiple times, only the final edited version should be saved, discarding intermediate versions.

To tackle this issue, I initially tried reusing the same key for the same file after editing. However, this caused significant side effects: the newly edited content wouldn’t appear upon reopening, and clicking “Edit” resulted in Error 4004 (isSaveLocked).
After reviewing the official API documentation (Co-editing | ONLYOFFICE), I understood that the problem was caused by not generating a new key after edits. I modified the key algorithm to generate a new key after each edit, which resolved the issue. However, this introduced a new problem: after editing, reopening the edited file still triggers reconversion to Editor.bin and stores the result in the /App_Data/cache/files/data directory. This results in a poor user experience when reopening large edited files.

The ideal approach would be: When saving multiple versions of a file, OnlyOffice could recalculate a new Editor.bin based on the existing Editor.bin and changes.zip, eliminating the need to reconvert from the new version file. Is this optimization feasible? Alternatively, are there other solutions to achieve similar results?

I hope my intentions are clear. To summarize:

  1. After a file is edited, is there a way to skip or accelerate the Editor.bin conversion process when reopening it, to reduce server overhead?
  2. After multiple edits to a file, is it possible to discard intermediate versions of Editor.bin and output.{fileType}, retaining only the files corresponding to the latest edit?

I look forward to your reply.

------------------------------------ translated by DeepSeek the following is original ------------------------------------
Hi Constantine:

非常感谢您的回复,您回复中提到的信息对我很有帮助,让我对 OnlyOffice 的内在工作原理有了更深入的了解,这对我后续的工作非常有帮助。

但这些信息还是没有完全解决我的问题。我问这个问题的初衷是想解决同一个文档在多次编辑过程中会被多次转换为 Editor.bin 而编辑之后
又会被存储为 output.{fileType} 的问题。 我期望实现如下功能:

  1. 如果一个文件被被多次编辑,不要每次编辑后,再打开文件预览时需要重新生成 Editor.bin,因为这个转换过程还是比较耗费服务端资源的,尤其是超过几百页的大文件。
  2. 如果文件被多次编辑只保存最后编辑的版本,中间

当时我为了解决这个问题,用的办法是同一个文件编辑之后不生成新 key,但这会有很大的副作用,再次打开后看不到刚刚编辑后的新内容,而且点击编辑会报 4004 错误,isSaveLocked。
通过阅读官方的 API 文档: ONLYOFFICE
key 导致的,我修改了 key 的算法,改为每次修改后生成新的 key 之后问题解决了。 但这样还引出了新的问题: 编辑之后再打开编辑后的文件还会被重新转换为 Editor.bin
并将结果存储到 /App_Data/cache/files/data 目录下,大文件编辑后再打开体验很差。

比较理想的方式为:一个文件的多个版本在保存时根据 Editor.bin 和 changes.zip 重新计算出新的 Editor.bin,这样就不用重新根据新版本文件重新转换了。这个优化可以实现吗?
或者有其他方案可以实现类似的效果吗?

不知道我的意图是否表达清楚了,总计下:

  1. 文件编辑之后再次被打开,有没有办法省去或者加速转换为 Editor.bin 的过程,以降低服务端的开销。
  2. 文件被多次编辑后,是否有办法可以省去中间版本对应的 Editor.bin 以及 output.{fileType}, 只需要保留最后一次修改对应的文件即可。

期待您的回复

You see, the document is put into cache for every session, because you are generating new key. Key is a unique identifier of a document in cache, so if the session has ended, the file is saved back to the storage, next opening will generate new key and create new folder in cache. Basically, this is expected due to the way Document Server works with keys – new key is new folder.

Document Server does not differentiate documents in cache as “versions”, so it does not check documents for amount of changes made files when checking cache for deleting. If you opened a file with new key, did some small edits and closed the file, Document Server will gather those edits and return edited file back to the storage for saving.

I see no issue in Document Servers’ logic for that point, thus suggested changing lifetime of files in cache in general.