Reading a document and getting data

I am using python docbuilder and trying to read the contents of a document. But instead of normal values, I get an object.
sys.path.append(‘C:/Program Files/ONLYOFFICE/DocumentBuilder’)
import docbuilder

builder = docbuilder.CDocBuilder()
input_file = ‘123.docx’

input_file = os.path.abspath(input_file)

builder.OpenFile(input_file, ‘docx’)
context = builder.GetContext()
api = context.GetGlobal()[“Api”]
document = api.Call(“GetDocument”)

paragraph = document.Call(‘GetElement’, 0)

element = paragraph.Call(“GetElement”, 0)
print(element)

Hello @yt-0r

It is not quite clear what is meant by “read the contents of a document”. If you want to read text content, then you should use GetText method after getting a document.

Would you mind sharing more context to your goal?

I want to get the contents of the document - read paragraphs, headings and be able to get the positions of different objects in the document.

Basically, headings are the same paragraphs, so you can use GetAllParagraphs method to get all paragraphs to read their content with GetText method of paragraph class. GetRange allows getting specific range of characters to get positions of text.

This is the most simple way, but with some downsides such as, for instance, necessity of getting Start and End numbers for the GetRange method. But technically this is a suitable approach.

However, it gets a bit more complicated when getting other objects than text. For that, please refer to the Text document API to find suitable method for each object:

I execute this code. But I get in the output not what I want. try:
# Проверка пути к файлу
sys.path.append(‘C:/Program Files/ONLYOFFICE/DocumentBuilder’)
import docbuilder

builder = docbuilder.CDocBuilder()
input_file = '123.docx'

input_file = os.path.abspath(input_file)

builder.OpenFile(input_file, 'docx')
context = builder.GetContext()
api = context.GetGlobal()["Api"]
document = api.Call("GetDocument")

paragraph = document.Call("GetElement", 0)
text = paragraph.Call("GetText")

print(text)

except Exception as e:
print(f"Ошибка: {e}")

finally:
builder.CloseFile()
print(“Закрытие файла.”)

I cannot predict the result relying only on the code. What is the content of your document and what you’d like to get from the content?

This particular code gets very first element of the document, e.g. a paragraph, gets it text and prints it. It is what you are trying to do with it, isn’t it? If I am wrong, please elaborate on the usage scenario and result you’d like to achieve.

  1. I want to get the text of something and output it to the console. 2) I want to find out the coordinates of a certain text and replace it with a table.

It is important to understand that Document Builder does not have an interface to select the text, so you need to know exactly what you are looking for. Initially I was interested in knowing what result do you get with the previously shared code and what you are trying to get, because the statement was unclear what you are trying to get and why you are getting not what you want. Can you share a file and point to the content you’d like to add?

If you know the text to look for beforehand, then you can rely on following scheme:

  1. Use Search to get the range of the text you’d like to replace;
  2. Create a table with CreateTable method;
  3. For the range array returned by Search execute Delete to imitate replacement;
  4. After calling Delete the cursor will remain in the same position, so you can add your table with InsertContent method

Note: Make sure that you are running InsertContent with isInline argument being false to separate paragraphs in between and place a table there. If you run it with inInline:true, then no table will be placed.

This is the easiest way to achieve that.

I followed your recommendations and this is what I got:
Source file
image
Final file:


Code:

# Проверка пути к файлу
sys.path.append('C:/Program Files/ONLYOFFICE/DocumentBuilder')
import docbuilder

builder = docbuilder.CDocBuilder()
input_file = '123.docx'

input_file = os.path.abspath(input_file)

builder.OpenFile(input_file, 'docx')
context = builder.GetContext()
api = context.GetGlobal()["Api"]
document = api.Call("GetDocument")

table = api.Call("CreateTable", 3, 3)
params = ("single", 4, 0, 0, 0, 0)
table.Call('SetTableBorderInsideH', *params)
table.Call('SetTableBorderInsideV', *params)

searchResults = document.Call("Search", "{{table}}")
searchResults.Call("Delete")

document.Call("InsertContent", [table])

builder.SaveFile("docx", output_file)

Please note that Search returns an array of objects, not just a single object. You need to call the number for searchResults to execute Delete and place a table on its place.

In current code Delete does not delete anything and table gets inserted at the very beginning of the document, because you are calling an array.

Something like:

searchResults[0].Call("Delete")
// further code

I changed the code as per your advice. And here is what I got.

source file


final file

Apparently the carriage is not moving

To make sure that cursor moved to specified location you can add Select before Delete for the same search object. Something like this:

searchResults[0].Call("Select")
searchResults[0].Call("Delete")
// further code

after i used select the table disappeared completely

source file:


final file:
image
code:

sys.path.append('C:/Program Files/ONLYOFFICE/DocumentBuilder')
import docbuilder

builder = docbuilder.CDocBuilder()
input_file = '123.docx'

input_file = os.path.abspath(input_file)

builder.OpenFile(input_file, 'docx')
context = builder.GetContext()
api = context.GetGlobal()["Api"]
document = api.Call("GetDocument")

table = api.Call("CreateTable", 3, 3)
params = ("single", 4, 0, 0, 0, 0)
table.Call('SetTableBorderInsideH', *params)
table.Call('SetTableBorderInsideV', *params)

searchResults = document.Call("Search", "{{table}}")
searchResults[0].Call("Select")
searchResults[0].Call("Delete")

document.Call("InsertContent", [table])

builder.SaveFile("docx", output_file)

I also get the following error:
:5866091: Uncaught TypeError: Cannot read property ‘da’ of undefined