Automatic backup on third party storage stopped triggering

irdi · 23 August 2024 12:21

Do you want to: Report a bug

For feature suggestions, describe the result you would like to achieve in detail.
For bug reports, provide the steps to reproduce and if possible a minimal demo of the problem.

DocSpace version: 2.6.1

Type of installation of the DocSpacedeb/rpm

please point us to the installation guide that you used as well): Installing ONLYOFFICE DocSpace using the provided script - ONLYOFFICE

OS: Ubuntu 22.04

Browser version: Firefox browser 129.0 (64-bit)

Additional information: Automatic backups was working just fine till 20th of August and it just stopped. No intervention was done on the setup or the services during this time.

Any relevant details about the situation. If you’ve modified configuration files or are using a proxy server that might affect the situation, please mention it. You can also attach images and videofile to the post (or to place them to external storage)

The only logs which might be relevant is this

2024-08-23 11:01:31,103|ERROR|[121]|ASC.Core.Common.Hosting.RegisterInstanceWorkerService - Critical error forced worker to shutdown|System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at ASC.Core.Common.Hosting.RegisterInstanceWorkerService`1.ExecuteAsync(CancellationToken stoppingToken)
2024-08-23 11:01:31,108|ERROR|[49]|ASC.Core.Common.Hosting.RegisterInstanceWorkerService - Critical error forced worker to shutdown|System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at ASC.Core.Common.Hosting.RegisterInstanceWorkerService`1.ExecuteAsync(CancellationToken stoppingToken)
2024-08-23 11:01:31,169|WARN|[53]|ASC.EventBus.RabbitMQ.EventBusRabbitMQ - RabbitMQ: model is shutdown: (null)|System.Exception: AMQP close-reason, initiated by Application, code=200, text='Goodbye', classId=0, methodId=0
2024-08-23 11:18:25,164|ERROR|[64]|ASC.Core.Common.Hosting.RegisterInstanceWorkerService - Critical error forced worker to shutdown|System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at ASC.Core.Common.Hosting.RegisterInstanceWorkerService`1.ExecuteAsync(CancellationToken stoppingToken)
2024-08-23 11:18:25,195|ERROR|[67]|ASC.Core.Common.Hosting.RegisterInstanceWorkerService - Critical error forced worker to shutdown|System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at ASC.Core.Common.Hosting.RegisterInstanceWorkerService`1.ExecuteAsync(CancellationToken stoppingToken)
2024-08-23 11:18:25,303|WARN|[67]|ASC.EventBus.RabbitMQ.EventBusRabbitMQ - RabbitMQ: model is shutdown: (null)|System.Exception: AMQP close-reason, initiated by Application, code=200, text='Goodbye', classId=0, methodId=0

Any idea how to fix this issue?

Alexandre · 27 August 2024 09:23

Hello @irdi
Sorry for the late reply.
Please show us your settings on the ‘Automatic backup’ page and provide us with entire DocSpace logs folder: /var/log/onlyoffice/docspace
Additionally, please go to the host and run these commands:
apt list | grep docspace
apt list --installed | grep onlyoffice

irdi · 27 August 2024 09:34

apt list | grep docspace

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

docspace-api-system/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-api/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-backup-background/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-backup/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-clear-events/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-common/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-doceditor/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-files-services/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-files/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-healthchecks/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-login/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-migration-runner/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-notify/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-people-server/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-proxy/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-radicale/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-socket/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-ssoauth/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-studio-notify/squeeze,now 2.6.0.432 all [installed,automatic]
docspace-studio/squeeze,now 2.6.0.432 all [installed,automatic]
docspace/squeeze,now 2.6.0.432 all [installed]

 apt list --installed | grep onlyoffice

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

onlyoffice-documentserver-ee/squeeze,now 8.1.1-26 amd64 [installed]

Screenshot of automatic backup page:

Docspace logs:

https://cloud.redlinedcs.com/s/ZscYR3YaR7JrQ8X

Thanks for the support!

Alexandre · 27 August 2024 10:13

Could you please check out RabbitMQ service status? Please run rabbitmqctl status command and show us the output results.
Additionally, provide us with RabbitMQ logs folder. It should be located here: /var/log/rabbitmq/

irdi · 27 August 2024 11:19

Status of node rabbit@denalidocspace ...
Runtime

OS PID: 880
OS: Linux
Uptime (seconds): 961866
Is under maintenance?: false
RabbitMQ version: 3.9.13
Node name: rabbit@denalidocspace
Erlang configuration: Erlang/OTP 24 [erts-12.2.1] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]
Erlang processes: 862 used, 1048576 limit
Scheduler run queue: 1
Cluster heartbeat timeout (net_ticktime): 60

Plugins

Enabled plugin file: /etc/rabbitmq/enabled_plugins
Enabled plugins:


Data directory

Node data directory: /var/lib/rabbitmq/mnesia/rabbit@denalidocspace
Raft data directory: /var/lib/rabbitmq/mnesia/rabbit@denalidocspace/quorum/rabbit@denalidocspace

Config files


Log file(s)

 * /var/log/rabbitmq/rabbit@denalidocspace.log
 * /var/log/rabbitmq/rabbit@denalidocspace_upgrade.log
 * <stdout>

Alarms

(none)

Memory

Total memory used: 0.1522 gb
Calculation strategy: rss
Memory high watermark setting: 0.4 of available memory, computed to: 13.1426 gb

binary: 0.1037 gb (53.31 %)
code: 0.032 gb (16.44 %)
other_system: 0.0203 gb (10.44 %)
other_proc: 0.0193 gb (9.94 %)
connection_other: 0.0065 gb (3.32 %)
allocated_unused: 0.0044 gb (2.25 %)
other_ets: 0.0031 gb (1.58 %)
connection_readers: 0.0023 gb (1.19 %)
atom: 0.0013 gb (0.69 %)
queue_procs: 5.0e-4 gb (0.25 %)
metrics: 4.0e-4 gb (0.2 %)
connection_channels: 3.0e-4 gb (0.14 %)
mnesia: 2.0e-4 gb (0.1 %)
msg_index: 1.0e-4 gb (0.06 %)
connection_writers: 1.0e-4 gb (0.05 %)
plugins: 0.0 gb (0.02 %)
quorum_ets: 0.0 gb (0.02 %)
stream_queue_procs: 0.0 gb (0.0 %)
stream_queue_replica_reader_procs: 0.0 gb (0.0 %)
mgmt_db: 0.0 gb (0.0 %)
queue_slave_procs: 0.0 gb (0.0 %)
quorum_queue_procs: 0.0 gb (0.0 %)
reserved_unallocated: 0.0 gb (0.0 %)
stream_queue_coordinator_procs: 0.0 gb (0.0 %)

File Descriptors

Total: 50, limit: 65439
Sockets: 39, limit: 58893

Free Disk Space

Low free disk space watermark: 0.05 gb
Free disk space: 193.8006 gb

Totals

Connection count: 39
Queue count: 15
Virtual host count: 1

Listeners

Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Logs:

Alexandre · 27 August 2024 12:03

Thank you! We’re checking the situation.

Alexandre · 27 August 2024 13:30

Hello @irdi
Please clarify a few things:

Does manual backup creation work?
Have you tried to resave backup settings?
what is the amount of data on the portal?
have you changed any config files on the server side? For example, RabbitMQ settings?

irdi · 27 August 2024 15:06

This is the second time this happens. The first time it started working without doing anything, and now it stopped without doing anything. It is a serious issue, especially if someone does not check backups regularly, and could cause data loss.

Tests I have done so far include, changing backup time and saving again, putting backup time with 1 hour of difference for each Spaces instance, restarting everything.

This pops up on the logs when triggering manual backup. The backup is succesfull but still it logs this error.

2024-08-27 09:25:43,074|ERROR|[32]|ASC.EventBus.RabbitMQ.EventBusRabbitMQ - ----- ERROR Processing message "BackupRequestIntegrationEvent { Id = b8c16bd6-12c8-4780-b309-c330f780f092, CreateOn = 8/27/2024 9:25:43 AM, CreateBy = 03f56b9a-f807-486f-b479-67373f3bb537, Redelivered = False, TenantId = 15, StorageType = ThirdPartyConsumer, StorageParams = System.Collections.Generic.Dictionary`2[System.String,System.String], IsScheduled = False, BackupsStored = 0, StorageBasePath = , ServerBaseUri = , Dump = False, TaskId = b6a88c94-b093-422a-abec-d9824746a414 }"|ASC.EventBus.Exceptions.IntegrationEventRejectExeption: Exception of type 'ASC.EventBus.Exceptions.IntegrationEventRejectExeption' was thrown.
   at ASC.Data.Backup.IntegrationEvents.EventHandling.BackupRequestedIntegrationEventHandler.Handle(BackupRequestIntegrationEvent event)
   at ASC.EventBus.RabbitMQ.EventBusRabbitMQ.ProcessEvent(String eventName, IntegrationEvent event) in /home/jenkins/workspace/appserver.deb/install/deb/debian/build/server/common/ASC.EventBus.RabbitMQ/EventBusRabbitMQ.cs:line 435
   at ASC.EventBus.RabbitMQ.EventBusRabbitMQ.ProcessEvent(String eventName, IntegrationEvent event) in /home/jenkins/workspace/appserver.deb/install/deb/debian/build/server/common/ASC.EventBus.RabbitMQ/EventBusRabbitMQ.cs:line 410
   at ASC.EventBus.RabbitMQ.EventBusRabbitMQ.Consumer_Received(Object sender, BasicDeliverEventArgs eventArgs) in /home/jenkins/workspace/appserver.deb/install/deb/debian/build/server/common/ASC.EventBus.RabbitMQ/EventBusRabbitMQ.cs:line 279

Alexandre · 27 August 2024 15:34

Thank you, we are looking into it.

Alexandre · 28 August 2024 10:24

Hello @irdi
Could you please double-check that there’s enough free space on the AWS side where you save backups?

irdi · 28 August 2024 12:17

Yes, there is enough space.

Alexandre · 28 August 2024 12:45

Please go to the host and restart backup services with these commands:

systemctl restart docspace-backup-background.service
systemctl restart docspace-backup.service

After that please check if the situation is changed.

irdi · 28 August 2024 12:48

Thanks for the tip, but I have already tried that and also restarting everything. I think triggering a manual backup will re-activate triggering the automatic backups. Ill report back tomorrow if it worked or not.

irdi · 29 August 2024 08:43

I confirm that triggering a manual backup after reconfiguring the automatic backups make it work again.

The issue is why this happens in the first place? This could cause data loss if someone does not notice that the automatic backups are stuck without a reason at all.

Alexandre · 29 August 2024 08:50

Hello @irdi
Thank you for the provided details.
We are still investigating the situation. I will contact you as soon as possible.

Alexandre · 29 August 2024 09:15

Dear @irdi
Please notify us if you face the same issue again.

irdi · 16 December 2024 12:22

This is happening again out of the blue. Worked fine till 14 of December.

In rabbitmq logs, i see some bad header errors.

2024-12-16 06:14:19.713867+00:00 [error] <0.2430.18> {bad_header,<<“GET /squ”>>}
2024-12-16 10:02:11.481543+00:00 [error] <0.8350.18> {bad_header,<<“MGLNDD_5”>>}

Manual backup triggering works fine.

irdi · 17 December 2024 13:32

It seems there are some other logs:

2024-12-17 13:08:36,238|ERROR|[61]|ASC.EventBus.RabbitMQ.EventBusRabbitMQ - ----- ERROR Processing message "BackupRequestIntegrationEvent { Id = 122314e4-2c9a-4e14-aebe-24ef788817d7, CreateOn = 12/17/2024 1:08:36 PM, CreateBy = a37ee56e-3302-4a7b-b67e-ddbea64cd032, Redelivered = False, TenantId = 8, StorageType = ThirdPartyConsumer, StorageParams = System.Collections.Generic.Dictionary`2[System.String,System.String], IsScheduled = True, BackupsStored = 30, StorageBasePath = , ServerBaseUri = , Dump = False, TaskId =  }"|ASC.EventBus.Exceptions.IntegrationEventRejectExeption: Exception of type 'ASC.EventBus.Exceptions.IntegrationEventRejectExeption' was thrown.

By what i understood it seems it was unable to communicate with the third party storage provider but when trigerring manual backups it does not give any errors.

Alexandre · 17 December 2024 13:48

Hello @irdi
Let’s start from scratch.
What is your current version of the DocSpace and how exactly have you set up autobackup option? Please share a screenshot of your settings.
Additionally, please reproduce the issue and provide us with entire DocSpace logs folder.

irdi · 18 December 2024 10:57

It seems that rebooting the system restarts that auto backup process but i would really like to understand why the triggers get blocked and i can not seem to figure it out. I will provide full logs the next time it happens before restarting the system.

Thanks for your support.