How to analyze PDF form documents with ONLYOFFICE macro

Originally published at: How to analyze PDF form documents with ONLYOFFICE macro | ONLYOFFICE Blog


In today’s fast-paced digital environment, writers, editors, and content creators often struggle to gain meaningful insights about their documents. Understanding metrics like readability, word frequency, and structural balance can significantly improve document quality, yet manual analysis is time-consuming and inconsistent. In this blog post, we’ll show you how to craft a powerful ONLYOFFICE macro that automatically analyzes your documents and generates comprehensive reports.

How to analyze PDF form documents with ONLYOFFICE macro

Building the document analysis macro

Let’s break down our macro into functional components and explain how each part works.

Setting up the main function

The core of our macro is the analyzeDocument() function, which orchestrates the entire analysis process:

function analyzeDocument() {
    try {
        // Get document and all text
        var oDocument = Api.GetDocument();
        var allText = "";
        var paragraphs = oDocument.GetAllParagraphs();
    // Check if document is empty
    if (paragraphs.length === 0) {
        console.log("Warning: Document is empty or no paragraphs found for analysis.");
        return;
    }
    
    // Collect all text
    paragraphs.forEach(function(paragraph) {
        allText += paragraph.GetText() + " ";
    });
    
    // Perform analyses
    var stats = calculateBasicStats(allText, paragraphs);
    var advancedStats = calculateAdvancedStats(allText, stats);
    var commonWords = findCommonWords(allText, 10);
    
    // Create report
    createAndAddReport(oDocument, stats, advancedStats, commonWords);
    
    // Log success
    console.log("Success: Document analysis completed. Report added to the end of the document.");
} catch (error) {
    console.log("Error: " + error.message);
}

}

This function first collects all text from the document, then passes it to specialized analysis functions, and finally creates a report. The try-catch block ensures the macro gracefully handles any errors.

Calculating basic statistics

The calculateBasicStats() function processes the text to extract fundamental metrics:

function calculateBasicStats(text, paragraphs) {
    // Word count
    var words = text.split(/\s+/).filter(function(word) { 
        return word.length > 0; 
    });
    var wordCount = words.length;
// Sentence count
var sentences = text.split(/[.!?]+/).filter(function(sentence) { 
    return sentence.trim().length > 0; 
});
var sentenceCount = sentences.length;

// Paragraph count
var paragraphCount = paragraphs.length;

// Character count
var charCountWithSpaces = text.length;
var charCountWithoutSpaces = text.replace(/\s+/g, "").length;

// Line count (approximate)
var lineCount = Math.ceil(charCountWithSpaces / 70);

return {
    wordCount: wordCount,
    sentenceCount: sentenceCount,
    paragraphCount: paragraphCount,
    charCountWithSpaces: charCountWithSpaces,
    charCountWithoutSpaces: charCountWithoutSpaces,
    lineCount: lineCount,
    words: words,
    sentences: sentences
};

}

This function splits the text into words and sentences, counts paragraphs, and calculates character and line counts.

Performing advanced analysis

For deeper insights, the calculateAdvancedStats() function computes more sophisticated metrics:

function calculateAdvancedStats(text, basicStats) {
    // Average sentence length
    var avgWordsPerSentence = basicStats.wordCount / Math.max(1, basicStats.sentenceCount);
// Average paragraph length
var avgWordsPerParagraph = basicStats.wordCount / Math.max(1, basicStats.paragraphCount);

// Average word length
var totalWordLength = basicStats.words.reduce(function(sum, word) {
    return sum + word.length;
}, 0);
var avgWordLength = totalWordLength / Math.max(1, basicStats.wordCount);

// Readability score (simplified Flesch-Kincaid)
var readabilityScore = 206.835 - 1.015 * avgWordsPerSentence - 84.6 * (totalWordLength / basicStats.wordCount);

// Estimated reading time
var readingTimeMinutes = Math.ceil(basicStats.wordCount / 200);

return {
    avgWordsPerSentence: avgWordsPerSentence,
    avgWordsPerParagraph: avgWordsPerParagraph,
    avgWordLength: avgWordLength,
    readabilityScore: readabilityScore,
    readingTimeMinutes: readingTimeMinutes
};

}

This calculates average sentence and paragraph lengths, readability scores, and estimated reading time.

Analyzing word frequency

The findCommonWords() function identifies the most frequently used words:

function findCommonWords(text, limit) {
    // Clean text and convert to lowercase
    var cleanText = text.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
// Split into words
var words = cleanText.split(/\s+/).filter(function(word) { 
    return word.length > 3; 
});

// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
    wordFrequency[word] = (wordFrequency[word] || 0) + 1;
});

// Filter stop words
var stopWords = ["this", "that", "with", "from", "have", "been"];
stopWords.forEach(function(stopWord) {
    delete wordFrequency[stopWord];
});

// Sort by frequency
var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
    return wordFrequency[b] - wordFrequency[a];
});

// Return top N words
return sortedWords.slice(0, limit).map(function(word) {
    return { word: word, frequency: wordFrequency[word] };
});

}
function findCommonWords(text, limit) {
// Clean text and convert to lowercase
var cleanText = text.toLowerCase().replace(/[.,/#!$%^&*;:{}=-_`~()]/g, “”);

// Split into words
var words = cleanText.split(/\s+/).filter(function(word) { 
    return word.length > 3; 
});

// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
    wordFrequency[word] = (wordFrequency[word] || 0) + 1;
});

// Filter stop words
var stopWords = ["this", "that", "with", "from", "have", "been"];
stopWords.forEach(function(stopWord) {
    delete wordFrequency[stopWord];
});

// Sort by frequency
var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
    return wordFrequency[b] - wordFrequency[a];
});

// Return top N words
return sortedWords.slice(0, limit).map(function(word) {
    return { word: word, frequency: wordFrequency[word] };
});

}

This function removes punctuation, filters common filler words, and returns the most frequently used words in the document.

Generating the report

Finally, the createAndAddReport() function compiles and formats all the analysis results:

function createAndAddReport(oDocument, basicStats, advancedStats, commonWords) {
    // Add new page
    var oParagraph = Api.CreateParagraph();
    oParagraph.AddPageBreak();
    oDocument.AddElement(oDocument.GetElementsCount(), oParagraph);
// Add title
var oHeading = Api.CreateParagraph();
oHeading.AddText("DOCUMENT ANALYSIS REPORT");
oDocument.AddElement(oDocument.GetElementsCount(), oHeading);

// Add basic statistics section
var oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("BASIC STATISTICS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);

// Add statistics content
// ... (code that adds individual statistics)

// Add advanced analysis section
// ... (code that adds advanced metrics)

// Add word frequency section
// ... (code that adds word frequency list)

// Add footer
var oFootnotePara = Api.CreateParagraph();
oFootnotePara.AddText("This report was generated by OnlyOffice Document Statistics and Analysis Tool on " + 
                    new Date().toLocaleString() + ".");
oDocument.AddElement(oDocument.GetElementsCount(), oFootnotePara);

}

This function creates a structured report at the end of the document with all the analysis results.

How to analyze PDF form documents with ONLYOFFICE macro

Complete macro code

Here’s the complete macro code that you can copy and use:

(function() {
    // Main function - starts all operations
    function analyzeDocument() {
        try {
            // Get document and all text
            var oDocument = Api.GetDocument();
            var allText = "";
            var paragraphs = oDocument.GetAllParagraphs();
        // Check if document is empty
        if (paragraphs.length === 0) {
            console.log("Warning: Document is empty or no paragraphs found for analysis.");
            return;
        }
        
        // Collect all text
        paragraphs.forEach(function(paragraph) {
            allText += paragraph.GetText() + " ";
        });
        
        // Calculate basic statistics
        var stats = calculateBasicStats(allText, paragraphs);
        
        // Perform advanced analysis
        var advancedStats = calculateAdvancedStats(allText, stats);
        
        // Find most common words
        var commonWords = findCommonWords(allText, 10);
        
        // Create and add report to the document
        createAndAddReport(oDocument, stats, advancedStats, commonWords);
        
        // Inform user
        console.log("Success: Document analysis completed. Report added to the end of the document.");
    } catch (error) {
       console.log("Error: An error occurred during processing: " +         error.message);
    }
}

// Calculate basic statistics
function calculateBasicStats(text, paragraphs) {
    // Word count
    var words = text.split(/\s+/).filter(function(word) { 
        return word.length > 0; 
    });
    var wordCount = words.length;
    
    // Sentence count
    var sentences = text.split(/[.!?]+/).filter(function(sentence) { 
        return sentence.trim().length > 0; 
    });

var sentenceCount = sentences.length;

    // Paragraph count
    var paragraphCount = paragraphs.length;
    
    // Character count (with and without spaces)
    var charCountWithSpaces = text.length;
    var charCountWithoutSpaces = text.replace(/\s+/g, "").length;
    
    // Line count (approximate)
    var lineCount = Math.ceil(charCountWithSpaces / 70); // Approximately 70 characters/line

return {
wordCount: wordCount,
sentenceCount: sentenceCount,
paragraphCount: paragraphCount,
charCountWithSpaces: charCountWithSpaces,
charCountWithoutSpaces: charCountWithoutSpaces,
lineCount: lineCount,
words: words,
sentences: sentences
};
}

// Calculate advanced statistics
function calculateAdvancedStats(text, basicStats) {
    // Average sentence length (in words)
    var avgWordsPerSentence = basicStats.wordCount / Math.max(1, basicStats.sentenceCount);
    
    // Average paragraph length (in words)
    var avgWordsPerParagraph = basicStats.wordCount / Math.max(1, basicStats.paragraphCount);
    
    // Average word length (in characters)
    var totalWordLength = basicStats.words.reduce(function(sum, word) {
        return sum + word.length;
    }, 0);
    var avgWordLength = totalWordLength / Math.max(1, basicStats.wordCount);
    
    // Readability score (simplified Flesch-Kincaid)
    var readabilityScore = 206.835 - 1.015 * (basicStats.wordCount / Math.max(1, basicStats.sentenceCount)) - 84.6 * (totalWordLength / Math.max(1, basicStats.wordCount));
    
    // Estimated reading time (minutes)
    var readingTimeMinutes = Math.ceil(basicStats.wordCount / 200); // Average reading speed 200 words/minute
    
    return {
        avgWordsPerSentence: avgWordsPerSentence,
        avgWordsPerParagraph: avgWordsPerParagraph,
        avgWordLength: avgWordLength,
        readabilityScore: readabilityScore,
        readingTimeMinutes: readingTimeMinutes
    };
}

// Find most common words
function findCommonWords(text, limit) {
    // Clean text and convert to lowercase
    var cleanText = text.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");

// Split into words
var words = cleanText.split(/\s+/).filter(function(word) {
return word.length > 3; // Filter out very short words
});
// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
if (wordFrequency[word]) {
wordFrequency[word]++;
} else {
wordFrequency[word] = 1;
}
});

// Filter stop words (common English words)
var stopWords = [“this”, “that”, “these”, “those”, “with”, “from”, “have”, “been”, “were”, “they”, “their”, “what”, “when”, “where”, “which”, “there”, “will”, “would”, “could”, “should”, “about”, “also”];
stopWords.forEach(function(stopWord) {
if (wordFrequency[stopWord]) {
delete wordFrequency[stopWord];
}
});

    // Sort by frequency
    var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
        return wordFrequency[b] - wordFrequency[a];
    });
    
    // Take top N words
    var topWords = sortedWords.slice(0, limit);
    
    // Return results as word-frequency pairs
    return topWords.map(function(word) {
        return {
            word: word,
            frequency: wordFrequency[word]
        };
    });
}

// Create and add report to document
function createAndAddReport(oDocument, basicStats, advancedStats, commonWords) {
    // Add new page
    var oParagraph = Api.CreateParagraph();
    oParagraph.AddPageBreak();
    oDocument.AddElement(oDocument.GetElementsCount(), oParagraph);
    
    // Main title - highlighting in capital letters
    var oHeading = Api.CreateParagraph();
    oHeading.AddText("DOCUMENT ANALYSIS REPORT");
    oDocument.AddElement(oDocument.GetElementsCount(), oHeading);
    
    // Subheading - in capital letters
    var oSubHeading = Api.CreateParagraph();
    oSubHeading.AddText("BASIC STATISTICS");
    oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);

    // Add basic statistics
    var oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Word Count: " + basicStats.wordCount);
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Sentence Count: " + basicStats.sentenceCount);
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Paragraph Count: " +      basicStats.paragraphCount);
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Character Count (with spaces): " +  basicStats.charCountWithSpaces);
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Character Count (without spaces): " +  basicStats.charCountWithoutSpaces);
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    

    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Estimated Line Count: " + basicStats.lineCount);
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    // Advanced analysis title
    oSubHeading = Api.CreateParagraph();
    oSubHeading.AddText("ADVANCED ANALYSIS");
    oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
    
    // Add advanced analysis results
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Average Sentence Length: " + advancedStats.avgWordsPerSentence.toFixed(2) + " words");
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Average Paragraph Length: " + advancedStats.avgWordsPerParagraph.toFixed(2) + " words");
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Average Word Length: " + advancedStats.avgWordLength.toFixed(2) + " characters");
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Readability Score: " + advancedStats.readabilityScore.toFixed(2));
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    oStatsPara = Api.CreateParagraph();
    oStatsPara.AddText("• Estimated Reading Time: " + advancedStats.readingTimeMinutes + " minutes");
    oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
    
    // Common words title
    oSubHeading = Api.CreateParagraph();
    oSubHeading.AddText("MOST FREQUENTLY USED WORDS");
    oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
    
    // We'll create a simple list instead of a table
    if (commonWords.length > 0) {
        for (var i = 0; i < commonWords.length; i++) {
            var oWordPara = Api.CreateParagraph();
            oWordPara.AddText((i + 1) + ". " + commonWords[i].word + " (" + commonWords[i].frequency + " times)");
            oDocument.AddElement(oDocument.GetElementsCount(), oWordPara);
        }
    } else {
        var oNoneFoundPara = Api.CreateParagraph();
        oNoneFoundPara.AddText("No frequently used words found.");
        oDocument.AddElement(oDocument.GetElementsCount(), oNoneFoundPara);
    }
    
    // Footer note
    var oFootnotePara = Api.CreateParagraph();
    oFootnotePara.AddText("This report was generated by OnlyOffice Document Statistics and Analysis Tool on " + 
                        new Date().toLocaleString() + ".");
    oDocument.AddElement(oDocument.GetElementsCount(), oFootnotePara);
}

// Run the macro
analyzeDocument();

})();

To use this macro in ONLYOFFICE

  1. Open your document in ONLYOFFICE
  2. Navigate to the View tab and select Macros
  3. Create a new macro and paste the code
  4. Run the macro
  5. A detailed analysis report will be added to the end of your document

Now let’s run our macro and see how it works!

This macro is a valuable tool for professionals looking to automate text analysis and documentation processes in a modern office environment. We hope it will be a useful addition to your work toolkit.

We encourage you to explore the ONLYOFFICE API documentation to create your own custom macros or enhance this one. If you have ideas for improvements or suggestions for new macros, please don’t hesitate to contact us. Your feedback helps us continue developing tools that make document creation and editing more efficient.

About the author

How to analyze PDF form documents with ONLYOFFICE macro

Useful links

ONLYOFFICE API methods

ONLYOFFICE on GitHub

More ONLYOFFICE macros

Get a free desktop suite

Use ONLYOFFICE macro to analyze spreadsheet data