Import data from ONLYOFFICE Docs Community spreadsheets to Apache Superset

Hi!

We are using ONLYOFFICE Docs Community (Docker image if this is important). This is really amazing product! Thank you for it!

Currently i have some data in OnlyOffice Spreadsheets and i want to access it from my Apache Superset instance to use them on dashboard.

Is there any way to do it using some API calls or, may be, to arrange some kind of automatical export in JSON (or any kind of structured format)?

Thank you very much!

Hi @lsd_dev :handshake:

Unfortunately, such an option is not available.

Hi @Nikolas !

Thank you for your reply!

Will this function be available in future?

I’ll clarify!

Beforehand, I’m afraid there isn’t.

1 Like

@lsd_dev :saluting_face:
Here are 2 news pieces for you:

We have methods ToJSON and FromJSON, but currently they are implemented for document editor, converting various objects into a JSON object: https://api.onlyoffice.com/docbuilder/search?query=tojson

There’s a suggestion to implement ToJSON and FromJSON methods for tables. Unfortunately, the timeline for this implementation is currently unknown

1 Like

Hi, Nikolas!

Thank you so much for information!

Or use Meltano or another ETL to extract data from a CSV-file and then send to Superset. tap-csv - Meltano Hub

Hi, vincx!

Thank you for your advice.

Honestly speaking - there are no problems to parse data from CSV file. The problem is to receive data (csv, json, xml, any other format) from OnlyOffice using API (similar to Google Docs).

You mean Get file download link | ONLYOFFICE ? Also the above examples, with docbuilder, can be used to open a file and extract data - but a bit of overkill here IMO.

You can also check the last version with get-file-versions to avoid doing the work, to be able to set the cronjob to like 1 minute.

Thank you so much! will investigate it.

Let me know, as I’m probably doing exactly the same thing right now: make sure SuperSet can contain some data from the company processes that don’t fancy software (yet).

My plan:

  • Get history, and check if the file has changed recently. If not, abort.
  • Debug: write last change date and current + whatever get-file-versions shows
  • Get file link of the sheet with get-file-download-link
  • wget or curl the file
  • Use the Conversion API to convert to CSV
  • Run Meltano-extractor to get the important data-points - here I have open questions too, as I probably need to protect datafields, but also make it somewhat flexible.
  • Optional: double check if extracted data is not the same as the previous
  • Add new data to a database
  • SuperSet now needs to find the new data and update the dashboard

But very busy this week, so probably not going to have time for it.

just made some tests:

  • created excel file in my OnlyOffice (we are using our own version, it is almost up to date);
  • received a link to download the file (using get-file-download-link);
  • downloaded xlsx file;
  • made some changes online to this file using OnlyOffice;
  • requested download link again - i received same link with same version number;
  • downloaded this file once again and received old version without any changes i made earlier.

looks strange

Hello @lsd_dev

Are you using Workspace? The category is set to Docs, which is a separate products, also in your initial post you’ve mentioned Docs Community. Please elaborate on this.

Dear, Constantine

Honestly speaking i do not know how to answer this question correctly.

This is part of “About” window: Screenshot by Lightshot

Probably i’m using Workspace - documents are part of several OnlyOffice services we are using (Community Server i suppose)

here is the list of our services:
2025-02-28 12.11.50

1 Like

Hey @lsd_dev :wave:

Got it! I believe @Constantine was just looking to clarify the category of the topic—perhaps to adjust the tag or suggest creating a new thread in the appropriate section for better visibility. Nothing more than that. :blush:

Let us know if you need any assistance!

1 Like