Summarizing Markdown files with Azure Cognitive Services TextAnalytics Extractive Summary

At Microsoft, documentation is created in Markdown, lives in GitHub repos, which you then fork and clone and do all the GitHub things and stuff. The markdown then gets transformed into HTML. (This is a really simplified explanation. The actual process is a lot more complicated.)

This script opens local Markdown files and summarizes them using Azure Cognitive Services TextAnalytics Extractive Summary. It’s available in version 5.3.0b2. It’s not available in the current verion, 5.2.1.

It doesn’t as yet write anything to the Markdown files or create a CSV file. All it does is return the summary to the terminal. I’m planning to use similar scripts to analyze what is in my content set so I can eventually use structured documentation methods to overhaul it. First, though I need to know what is there, and how it all maps to other content.

It does return an unclosed connection error, and fixing that it is on my TO DO list.

import asyncio
from dotenv import load_dotenv
# Get environment variables
load_dotenv()
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics.aio import TextAnalyticsClient
endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
key = os.environ["AZURE_LANGUAGE_KEY"]
text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)
async def get_summary(content,filename):
        print(f"Summarizing {filename}")
        #print(content)
        document = [content]
        poller = await text_analytics_client.begin_extract_summary(document)
        extract_summary_results = await poller.result()
        async for result in extract_summary_results:
            if result.kind == "ExtractiveSummarization":
                print("Summary extracted: \n{}".format(
                    " ".join([sentence.text for sentence in result.sentences]))
                )
            elif result.is_error is True:
                print("...Is an error with code '{}' and message '{}'".format(
                    result.error.code, result.error.message
                ))
async def sample_extractive_summarization_async():
    document = []
    direct = "C:/absolute/path/to/markdown/files/"
    for filename in os.listdir(direct):
        #Open the file
        file = open(direct + filename, "r")
        #Read the contents of the file
        content = file.read()
        await get_summary(content,filename)
async def main():
    await sample_extractive_summarization_async()
# TO DO: Fix unclosed connector error
if __name__ == '__main__':
    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
    asyncio.run(main())

PS. For organizations that use an actual Component Content Management System (CCMS) to write and publish technical documentation and learning content, there may be a function in the system that automatically creates summaries of aggregated content. Paligo integrates with GitHub repos, although I haven’t seen it in action yet.