Wrong character encoding DOCX -> PDF

Hello,

when converting DOCX → PDF (all done using Onlyoffice), the character encoding / charset seems to be incorrect. Characters look good on both documents, but when we copy text from the PDF and paste it somewhere, the characters are messed up.

When copying from Chrome PDF Viewer:
“In bekannten Hydraulikaggregaten von Fahrzeugbremsen”
it gets pasted as:
“)n bekannten Hydraulikaggregaten *on #ahr$eugbre%sanlagen”

When copying from the local Ubuntu PDF Viewer, it seems fine., though.

DOCX: Google Docs wird geladen
PDF: characters.pdf - Google Drive

Is there something that can improve this behaviour?

Thanks for looking

Hello @ipdoc1

I’ve converted DOCX to PDF and tried reproducing the issue, but copy-pasting the text from PDF in Chrome returns the same text.

Please specify what version of Document Server you are using and how exactly you are converting DOCX into PDF.

Onlyoffice Documentserver Community Vers. 8.1

Converting using “https://onlyoffice-instance.com/ConvertService.ashx

Payload: {“async”:false,“filetype”:“docx”,“outputtype”:“pdf”,“url”:"https://my-instance.com/document/pdf?id=73f9fa77-990d-44a7-9575-ab9ed1d7e44a.docx,“key”:“a94fdeb6-d490-48ec-8510-6dcbd3bacf3f”,“region”:“de-DE”}

But the implementation is not the cause, I think. It always worked without problems like this.

Could the installation of a new .TTF font (Century Gothic) and the rebuilding of the font-cache (be related to the issue? It appears that since that event, the problem occurs.

I also did this after copying the new font: Adding fonts to ONLYOFFICE Docs - ONLYOFFICE

I just remove my docker container and created a fresh new instance of 8.1.

Copy & Paste works again!

But I don’t have my needed font Century Gothic.

How would you proceed? I copied the font to “/usr/share/fonts”, executed “sudo fc-cache -fv” and ran the “documentserver-generate-allfonts.sh” file.

Anything that I did wrong or could have caused the wrong character issue? The problematic documents do NOT use Century Gothic, by the way. Simple Arial also has the character problem.

The font originates from a Windows machine where Office is installed.

This is correct way. However, after adding the font you also need to reset browser cache to actually see newly added font.