OnlyOffice Workspace portal doesn`t start after server failure

kuzminol · 31 January 2024 08:14

Hello, I have a problem with Onlyoffice Workspace. I have a PC running Ubuntu 20.04.1 LTS, which serves as a server in a local network. A few years ago, Onlyoffice Workspace was installed on it using Docker (communityserver:11.1.0.1506, controlpanel:2.9.1.369, documentserver:6.2.0.123, mysql:5.7.30). Recently, the server had issues with one RAM module, which was replaced. Since then, the portal keeps loading infinitely upon startup. I’m wondering if there is a way to restore the functionality of the portal or at least retrieve the data from it.

I tried following the instructions mentioned here Onlyoffice portal never starts properly. However, when executing the command docker exec -it onlyoffice-community-server service monoserve status it returns monoserve: unrecognized service. Restarting the Community Server container with docker restart onlyoffice-community-server did not help. Additionally, running the command docker exec -it onlyoffice-community-server service systemd status also returns systemd: unrecognized service.

Nikolas · 2 February 2024 16:02

Hello, @kuzminol

Please execute the following commands and provide me with the file logs.txt

Docker Version:

docker -v > logs.txt

Container Information:

docker ps -a >> logs.txt

Logs for the “community-server” container:

docker logs onlyoffice-community-server >> logs.txt

kuzminol · 6 February 2024 03:53

Hello, @Nikolas
I did everything as you told as soon as I reached the server.
I can’t upload the file logs.txt to the forum, so I uploaded it to Google Drive
https://drive.google.com/file/d/1SE1TOZq85m3j1w45fYtTDASDY-wO-uu3/view?usp=drivesdk

Thank you!

Nikolas · 8 February 2024 10:38

@kuzminol
Requested access.

kuzminol · 8 February 2024 12:18

@Nikolas , sorry, forgot to do it before. Done.

Nikolas · 13 February 2024 09:12

@kuzminol

Briefly summarizing the logs you provided:

The issue lies in establishing a connection to the onlyoffice-document-server container.
The script is performing a check for the availability of the onlyoffice-document-server host on port 8000 and continues execution only after the connection is established.
Unfortunately, there is no response from the document server, causing the container to restart its configuration process.

...
+ bash /app/assets/tools/wait-for-it.sh onlyoffice-document-server:8000 --quiet -s -- echo 'Document Server is up'
+ sleep 1
+ bash /app/assets/tools/wait-for-it.sh onlyoffice-document-server:8000 --quiet -s -- echo 'Document Server is up'
+ sleep 1
+ bash /app/assets/tools/wait-for-it.sh onlyoffice-document-server:8000 --quiet -s -- echo 'Document Server is up'
+ sleep 1
+ bash /app/assets/tools/wait-for-it.sh onlyoffice-document-server:8000 --quiet -s -- echo 'Document Server is up'
+ sleep 1
+ bash /app/assets/tools/wait-for-it.sh onlyoffice-document-server:8000 --quiet -s -- echo 'Document Server is up'
+ sleep 1
+ bash /app/assets/tools/wait-for-it.sh onlyoffice-document-server:8000 --quiet -s -- echo 'Document Server is up'
+ sleep 1
+ bash /app/assets/tools/wait-for-it.sh onlyoffice-document-server:8000 --quiet -s -- echo 'Document Server is up'
+ echo '##########################################################'
##########################################################
+ echo '#########  Start container configuration  ################'
#########  Start container configuration  ################
+ echo '##########################################################'
##########################################################
+ SERVER_HOST=
+ APP_DIR=/var/www/onlyoffice
+ APP_DATA_DIR=/var/www/onlyoffice/Data
+ APP_INDEX_DIR=/var/www/onlyoffice/Data/Index/v7.4.0
+ APP_PRIVATE_DATA_DIR=/var/www/onlyoffice/Data/.private
+ APP_SERVICES_DIR=/var/www/onlyoffice/Services
................

Let’s take a look at what’s happening inside the document server container.
Please provide the logs from the document server container:

docker logs onlyoffice-document-server > logDS.txt

kuzminol · 13 February 2024 17:50

Logs from document server available here:

Nikolas · 14 February 2024 11:57

@kuzminol

.....
 * Starting PostgreSQL 12 database server        [80G 
 [74G[ OK ]
nc: port number invalid: "services":
Waiting for connection to the { host on port "services":
nc: port number invalid: "services":
Waiting for connection to the { host on port "services":
nc: port number invalid: "services":
Waiting for connection to the { host on port "services":
.....

Looks like we hit a snag when starting up RabbitMQ. Let’s take a peek at the logs for RabbitMQ, Redis, and PostgreSQL

The logs are located in the onlyoffice-document-server container:

/var/log/rabbitmq/
/var/log/redis/
/var/log/postgresql/

kuzminol · 14 February 2024 17:36

@Nikolas

Logs from the onlyoffice-document-server container here:

Nikolas · 16 February 2024 13:17

Let’s continue troubleshooting.
The attached logs didn’t provide the necessary information.
Follow the steps in the suggested sequence.

Access the container:

docker exec -it onlyoffice-document-server bash

Provide the output of the following command within the onlyoffice-document-server container:

rabbitmq-diagnostics status

Stop the RabbitMQ service:

service rabbitmq-server stop

Attach the logs from the directory /var/log/rabbitmq/.
Clear the RabbitMQ data directory:

rm -rf /var/lib/rabbitmq/mnesia/*

Start the RabbitMQ service:

service rabbitmq-server start

Please allow some time for the community server to boot up.

Give the community server some time to start up on its own.

If the issue persists, please provide the output of the following command: rabbitmq-diagnostics status (In Community Server container)

kuzminol · 19 February 2024 06:40

@Nikolas

Output of the command rabbitmq-diagnostics status is

root@e7c840398cc0:/# rabbitmq-diagnostics status
Error: unable to perform an operation on node 'rabbit@e7c840398cc0'. 
Please see diagnostics information and suggestions below.

Most common reasons for this are:

 * Target node is unreachable (e.g. due to hostname resolution, 
TCP connection or firewall issues)
 * CLI tool fails to authenticate with the server 
(e.g. due to CLI tool's Erlang cookie not matching that of the server)
 * Target node is not running

In addition to the diagnostics info below:

 * See the CLI, clustering and networking guides on
https://rabbitmq.com/documentation.html to learn more
 * Consult server logs on node rabbit@e7c840398cc0
 * If target node is configured to use long node names, 
don't forget to use --longnames with CLI tools

DIAGNOSTICS
===========

attempted to contact: [rabbit@e7c840398cc0]

rabbit@e7c840398cc0:
  * connected to epmd (port 4369) on e7c840398cc0
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on e7c840398cc0
  * suggestion: start the node

Current node details:
 * node name: 'rabbitmqcli-2849002-rabbit@e7c840398cc0'
 * effective user's home directory: /var/lib/rabbitmq
 * Erlang cookie hash: 77oOWpjzbslkx0WjLqgilw==

Logs from the directory /var/log/rabbitmq/ available here:

Cleared the RabbitMQ data directory with provided command and started the RabbitMQ. The ouput was

root@e7c840398cc0:/# service rabbitmq-server start
 * Starting RabbitMQ Messaging Server rabbitmq-server
* FAILED - check /var/log/rabbitmq/startup_\{log, _err\}

Files startup_log and startup_err from /var/log/rabbitmq/ available here:

UPD. Tried to start the RabbitMQ service else one more time. It worked.
Output of the rabbitmq-diagnostics status is:

root@e7c840398cc0:/# rabbitmq-diagnostics status
Status of node rabbit@e7c840398cc0 ...
Runtime

OS PID: 2854909
OS: Linux
Uptime (seconds): 772
RabbitMQ version: 3.8.2
Node name: rabbit@e7c840398cc0
Erlang configuration: Erlang/OTP 22 [erts-10.6.4] [source] 
[64-bit] [smp:1:1] [ds:1:1:10] [async-threads:64]
Erlang processes: 257 used, 1048576 limit
Scheduler run queue: 1
Cluster heartbeat timeout (net_ticktime): 60

Plugins

Enabled plugin file: /etc/rabbitmq/enabled_plugins
Enabled plugins:


Data directory

Node data directory: /var/lib/rabbitmq/mnesia/rabbit@e7c840398cc0

Config files


Log file(s)

 * /var/log/rabbitmq/rabbit@e7c840398cc0.log
 * /var/log/rabbitmq/rabbit@e7c840398cc0_upgrade.log

Alarms

(none)

Memory

Calculation strategy: rss
Memory high watermark setting: 0.4 of available memory, 
computed to: 6.6412 gb

other_proc: 0.0305 gb (29.87 %)
code: 0.0268 gb (26.21 %)
other_system: 0.0225 gb (22.05 %)
allocated_unused: 0.0179 gb (17.48 %)
other_ets: 0.0026 gb (2.58 %)
atom: 0.0014 gb (1.41 %)
binary: 0.0002 gb (0.23 %)
mnesia: 0.0001 gb (0.07 %)
metrics: 0.0 gb (0.05 %)
msg_index: 0.0 gb (0.03 %)
plugins: 0.0 gb (0.01 %)
quorum_ets: 0.0 gb (0.01 %)
connection_channels: 0.0 gb (0.0 %)
connection_other: 0.0 gb (0.0 %)
connection_readers: 0.0 gb (0.0 %)
connection_writers: 0.0 gb (0.0 %)
mgmt_db: 0.0 gb (0.0 %)
queue_procs: 0.0 gb (0.0 %)
queue_slave_procs: 0.0 gb (0.0 %)
quorum_queue_procs: 0.0 gb (0.0 %)
reserved_unallocated: 0.0 gb (0.0 %)

File Descriptors

Total: 2, limit: 1048479
Sockets: 0, limit: 943629

Free Disk Space

Low free disk space watermark: 0.05 gb
Free disk space: 11801.3621 gb

Totals

Connection count: 0
Queue count: 0
Virtual host count: 1

Listeners

Interface: [::], port: 25672, protocol: 
clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: 
amqp, purpose: AMQP 0-9-1 and AMQP 1.0

UPD2. Half an hour later I still can see only endless startup in web interface

Nikolas · 21 February 2024 08:20

@kuzminol

Great progress so far!
Please restart the DS and CS containers:
docker restart onlyoffice-document-server onlyoffice-community-server

kuzminol · 21 February 2024 09:01

@Nikolas
Nothing changed, still endless loading.
After restart of the onlyoffice-document-server container command rabbitmq-diagnostics status gives output as earlier:

root@e7c840398cc0:/# rabbitmq-diagnostics status
Error: unable to perform an operation on node 'rabbit@e7c840398cc0'. 
Please see diagnostics information and suggestions below.

Most common reasons for this are:

 * Target node is unreachable (e.g. due to hostname resolution, 
TCP connection or firewall issues)
 * CLI tool fails to authenticate with the server (e.g. due to CLI 
tool's Erlang cookie not matching that of the server)
 * Target node is not running

In addition to the diagnostics info below:

 * See the CLI, clustering and networking guides on 
https://rabbitmq.com/documentation.html to learn more
 * Consult server logs on node rabbit@e7c840398cc0
 * If target node is configured to use long node names, 
don't forget to use --longnames with CLI tools

DIAGNOSTICS
===========

attempted to contact: [rabbit@e7c840398cc0]

rabbit@e7c840398cc0:
  * connected to epmd (port 4369) on e7c840398cc0
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on e7c840398cc0
  * suggestion: start the node

Current node details:
 * node name: 'rabbitmqcli-150-rabbit@e7c840398cc0'
 * effective user's home directory: /var/lib/rabbitmq
 * Erlang cookie hash: 77oOWpjzbslkx0WjLqgilw==

It looks like rabbitmq is off again after restart of the container

Nikolas · 22 February 2024 13:16

@kuzminol
It’s unfortunate.
Let’s try reinstalling the Document Server container, assuming the issue lies there.

Please ensure to create a snapshot of the server hosting OnlyOffice Workspace before proceeding.

At this stage, you can attempt two solutions:

1. Reinstalling the Document Server using the script:

docker rm -f onlyoffice-document-server # remove the old document-server

wget https://download.onlyoffice.com/install/workspace-install.sh # download the new workspace installation script

bash workspace-install.sh -u true -dv 6.2.0.123 -ics false -icp false -ims false -ies false -skiphc true # Reinstall Document Server only to version 6.2.0.123

The script will automatically install and restart all containers.

Description:
-u --update
-dv or --documentversion
-ics or --installcommunityserver → install or update community server (true|false|pull)
-icp or --installcontrolpanel → install or update control panel (true|false|pull)
-ims or --installmailserver → install or update mail server (true|false|pull)
-ies or --installelasticsearch → install or update elasticsearch (true|false|pull)
-skiphc or --skiphardwarecheck → skip hardware check

2. Updating to the latest versions of all Workspace components using a script:

wget https://download.onlyoffice.com/install/workspace-install.sh

bash workspace-install.sh -u true

You can start with the first option.
If the update doesn’t yield positive results, you can then try the second option.