Correct encoding (in xml inside xlsx) of text like "_x1234_" in the spreadsheet editor for the "formula area"

(I use automatic translation from Russian, the text may look clumsy, sorry)

the problem is in generating correctly encoded (according to [MS-OE376]: Part 4 Section 3.18.96, ST_Xstring (Escaped String) | Microsoft Learn) strings in sheet1.xml (which is placed in .zip archive of which the .xmlx file consists).

to reproduce:

  1. open onlyoffice desktop editors (tested on 7.4.1.36 deb and x86 msi versions of the program under astralinux 1.7 and windows 7), table, enter the text ="_x1234" in cell A1 (everything starting with the “equals” symbol, 9 characters, “x” - Latin)
  2. save as .xlsx, close the table editor
  3. open this saved file in the table editor

I see that after this, _x1234_ is correctly visible in the table itself, but if you select this cell, nonsense is visible in the “formula area”

my opinion: this is because when entering text like _xNNNN_ into a cell, the program, when generating sheet1.xml, writes encoded text into the <v> tag (instead of _xNNNN_ writes _x005F_xNNNN_ as it should be encoded according to the link above), and in the <f> tag writes NOT the encoded text, but the original text.

But when opening a file, the program decodes BOTH from the <v> AND from the <f> tag, and therefore an incorrect result is obtained in the “formula area” (it includes the decoded from the <f> tag… into which the program “forgot” to encode the text when saving).

if I write this text (="_x1234_") in cell A1, save it in xlsx, open xlsx just as an archive and read /xl/worksheets/sheet1.xml from there, then I will see something like this (for cell A1):
<c r="A1" t="str"><f>&quot;_x1234_&quot;</f><v>_x005F_x1234_</v></c>

but (I think) there should be something like: <c r="A1" t="str"><f>"_x005F_x1234_"</f><v>_x005F_x1234_</v></c>
(in this form, xml is generated by MS Excel when writing such text into cell A1 and saving the table to an xlsx file)

Please confirm the error and correct it.

I have a free version and technical support told me that they cannot reproduce the problems and do not provide support for free versions and told me to write to the forum.

here I am writing :slight_smile:

Hello @iteh

Thank for providing valuable information, we are analyzing it.
Once we get any results, I will update the thread.

We have registered this issue as a bug and we are already working on it. Thank you for providing this information.

Hello again @iteh

I’d like to inform you that mentioned issue was fixed in new version 8.0 of Desktop Editors.
Please update your editors, check the issue again and provide a feedback.

I confirm - in version 8.0 (version 8.0.0, msi, x64, windows 7) the problem is not reproduced when writing new documents!

but when editing old ones (incorrectly written earlier - when the <v> tag was correctly encoded, and the <f> tag was without the _x005F_ prefix) - the error is reproduced: this is for old files generated in the 7.x version of onlyoffice. But since the <v> tag already has a “calculated” string for display on the screen, and the <f> tag contains a formula, it may be algorithmically impossible to obtain a value for <f> from <v> - we probably can’t ask developers to add auto-correction for old files that were incorrectly written in version 7.x :frowning:

but for new ones - everything works (except that the quote inside is encoded a little differently: MS Excel uses the quote character itself in the <f> tag to encode quotes, and onlyoffice uses the word "&quot;" instead of the quote symbol itself inside xml, this is the only difference, which I found by saving the files in ms excel and onlyoffice and comparing the file \xl\worksheets\sheet1.xml from the xlsx archive) :slight_smile:

thank you :slight_smile:

1 Like