{"id":8524,"date":"2023-11-21T10:00:24","date_gmt":"2023-11-21T09:00:24","guid":{"rendered":"https:\/\/anexia.com\/blog\/?p=8524"},"modified":"2023-11-29T09:42:06","modified_gmt":"2023-11-29T08:42:06","slug":"summarizing-video-conferences-with-artificial-intelligence","status":"publish","type":"post","link":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/","title":{"rendered":"Summarizing Video Conferences With the Help of Artificial Intelligence"},"content":{"rendered":"<p>In times of the coronavirus, video conferencing has become a huge popularity. The trend is to increasingly rely on online meetings in the future. But how can it be ensured that absent people receive all relevant information? One option would be to take meeting notes. Depending on the author, these notes could be incomplete and relevant information could be forgotten.<\/p>\n<p>&nbsp;<\/p>\n<p>Recording a session would be a more reliable method. Many video conferencing tools now offer this function so that absent participants can view the missed session afterwards. In practice, it has been shown that it is very difficult to actively follow a recorded session.<\/p>\n<p>&nbsp;<\/p>\n<p>Using artificial intelligence, it is possible to shorten recorded conferences in a meaningful and compact way. Harald Nezbeda, one of our employees, attempted to build such a system from open source components for his final thesis as part of the university course in data and AI management at the University of Klagenfurt.<\/p>\n<p><a href=\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/IMG_1669-1-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-8510\" src=\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/IMG_1669-1-1024x683.jpg\" alt=\"Harald Nezbeda\" width=\"600\" height=\"400\" srcset=\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/IMG_1669-1-1024x683.jpg 1024w, https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/IMG_1669-1-300x200.jpg 300w, https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/IMG_1669-1-768x512.jpg 768w, https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/IMG_1669-1-1536x1024.jpg 1536w, https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/IMG_1669-1-2048x1365.jpg 2048w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<h2>Application<\/h2>\n<p>Whisper from OpenAI is used for speech recognition. This tool needs to be supplemented in some places, for example to identify speakers and recognize pauses in conversations.<\/p>\n<p>The process is structured in different steps:<\/p>\n<p><a href=\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/KI-Videokonferenz_Ablauf_EN.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-8525\" src=\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/KI-Videokonferenz_Ablauf_EN-1024x575.png\" alt=\"Steps video conferences summarize with artificial intelligence \" width=\"600\" height=\"337\" srcset=\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/KI-Videokonferenz_Ablauf_EN-1024x575.png 1024w, https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/KI-Videokonferenz_Ablauf_EN-300x168.png 300w, https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/KI-Videokonferenz_Ablauf_EN-768x431.png 768w, https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/KI-Videokonferenz_Ablauf_EN-1536x862.png 1536w, https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/11\/KI-Videokonferenz_Ablauf_EN.png 1800w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<ol>\n<li><strong>Extract audio from video<\/strong><br \/>\nAn audio file is extracted from a video. The image material is no longer required for further processing.<\/li>\n<li><strong>Speaker diarization<\/strong><br \/>\nIn this step, all speakers in the video are recognized and an <a href=\"https:\/\/catalog.ldc.upenn.edu\/docs\/LDC2004T12\/RTTM-format-v13.pdf\">RTTM file<\/a> is created using the <a href=\"https:\/\/github.com\/pyannote\/pyannote-audio\">pyannote.audio <\/a>tool.<\/li>\n<li><strong>Share audio parts from speakers<\/strong><br \/>\nThe generated RTTM file is divided into different audio blocks in this step.<\/li>\n<li><strong>Speech recognition &#8211; ASR (Automatic Speech Recognition)<\/strong><br \/>\nWith Whisper, the audio blocks are automatically recognized as text, transcribed and stored as TXT files.<\/li>\n<li><strong>Merging ASR and Speaker Diarization<\/strong><br \/>\nThe RTTM file from step 2 and the TXT file from step 4 are merged using a Python function. The following formats are generated:<br \/>\n<strong><br \/>\nTXT format<br \/>\n<\/strong>The text is displayed as a dialog.<\/p>\n<pre>{SPEAKER}: {ASR_TEXT}<\/pre>\n<p>SPEAKER stands for the speaker defined in the RTTM file and ASR_TEXT for the transcribed text.<\/p>\n<p>JSON format<br \/>\nThe JSON output contains more details and can be used later for troubleshooting. The schema here is as follows:<\/p>\n<pre>{\r\n\u00a0 \"$schema\": \"https:\/\/json-schema.org\/draft\/2020-12\/schema\",\r\n\u00a0 \"type\": \"array\",\r\n\u00a0 \"items\": {\r\n\u00a0\u00a0\u00a0 \"type\": \"object\",\r\n\u00a0\u00a0\u00a0 \"properties\": {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \"start\": {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \"type\": \"number\"\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 },\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \"duration\": {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \"type\": \"number\"\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 },\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \"speaker\": {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \"type\": \"string\"\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 },\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \"text\": {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \"type\": \"string\"\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0 },\r\n\u00a0\u00a0\u00a0 \"required\": [\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \"start\",\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \"duration\",\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \"speaker\",\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \"text\"\r\n\u00a0\u00a0\u00a0 ]\r\n\u00a0 }\r\n}\r\n<\/pre>\n<\/li>\n<li><strong>Merging<br \/>\n<\/strong>The<a href=\"https:\/\/huggingface.co\/docs\/transformers\/model_doc\/bart\"> BART model<\/a> is used to combine the TXT and JSON format and create a summary of the entire conversation. It has also been shown that the SAMSum data set can be used efficiently for fine-tuning BART. Due to the existing models and data, the summary is currently only possible in English.<\/li>\n<\/ol>\n<h2>Conclusion<\/h2>\n<p>It can therefore be said that artificial intelligence can make a significant contribution to summarizing video conferences. With the help of ASR and speaker diarization, it is possible to present the content of a video conference in text form and shorten it to a compact and content-relevant form using the BART model.<\/p>\n<p>Since the work was written some time ago, it is quite possible that the model described above no longer works optimally due to changes in the open source applications.<\/p>\n<p>The project is available on <a href=\"https:\/\/github.com\/nezhar\/speech-condenser\">Github<\/a>.<\/p>\n<h2>RELATED TOPICS<\/h2>\n<p><a href=\"https:\/\/anexia.com\/en\/software-development\/individual-solutions\/artificial-intelligence\" target=\"_blank\" rel=\"noopener\">What is Artificial Intelligence? <\/a><a href=\"https:\/\/anexia.com\/blog\/en\/what-is-artificial-intelligence\/\" target=\"_blank\" rel=\"noopener\">\u00a0\u2192<\/a><\/p>\n<p><a href=\"https:\/\/anexia.com\/en\/software-development\/individual-solutions\/artificial-intelligence\" target=\"_blank\" rel=\"noopener\">Anexia Artificial Intelligence Development \u2192<\/a><\/p>\n<p><a href=\"https:\/\/anexia.com\/en\/software-development\/individual-solutions\/machine-learning\" target=\"_blank\" rel=\"noopener\">Anexia Machine Learning Development \u2192<\/a><\/p>\n<p><a href=\"https:\/\/anexia.com\/blog\/en\/machine-learning-for-beginners\/\" target=\"_blank\" rel=\"noopener\">Machine Learning for beginners \u2192<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post, we show you how artificial intelligence can summarize videos using open source tools.<\/p>\n","protected":false},"author":47,"featured_media":8473,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2255],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Summarizing Video Conferences with Artificial Intelligence<\/title>\n<meta name=\"description\" content=\"In this blog post, we show you how artificial intelligence can summarize videos using open source tools.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Summarizing Video Conferences with Artificial Intelligence\" \/>\n<meta property=\"og:description\" content=\"In this blog post, we show you how artificial intelligence can summarize videos using open source tools.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/\" \/>\n<meta property=\"og:site_name\" content=\"ANEXIA Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/anexiagmbh\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-21T09:00:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-11-29T08:42:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/10\/KI-Videokonferenz_Teaser.png\" \/>\n\t<meta property=\"og:image:width\" content=\"672\" \/>\n\t<meta property=\"og:image:height\" content=\"372\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Bianca Aldinger\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@_ANEXIA\" \/>\n<meta name=\"twitter:site\" content=\"@_ANEXIA\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Bianca Aldinger\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/\",\"url\":\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/\",\"name\":\"Summarizing Video Conferences with Artificial Intelligence\",\"isPartOf\":{\"@id\":\"https:\/\/anexia.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/10\/KI-Videokonferenz_Teaser.png\",\"datePublished\":\"2023-11-21T09:00:24+00:00\",\"dateModified\":\"2023-11-29T08:42:06+00:00\",\"author\":{\"@id\":\"https:\/\/anexia.com\/blog\/#\/schema\/person\/bdc6f0cc5dc56835109748527ae31778\"},\"description\":\"In this blog post, we show you how artificial intelligence can summarize videos using open source tools.\",\"breadcrumb\":{\"@id\":\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#primaryimage\",\"url\":\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/10\/KI-Videokonferenz_Teaser.png\",\"contentUrl\":\"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/10\/KI-Videokonferenz_Teaser.png\",\"width\":672,\"height\":372},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/anexia.com\/blog\/de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Summarizing Video Conferences With the Help of Artificial Intelligence\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/anexia.com\/blog\/#website\",\"url\":\"https:\/\/anexia.com\/blog\/\",\"name\":\"ANEXIA Blog\",\"description\":\"[:de] ANEXIA Blog - Technischen Themen, Anexia News und Insights [:]\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/anexia.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"de\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/anexia.com\/blog\/#\/schema\/person\/bdc6f0cc5dc56835109748527ae31778\",\"name\":\"Bianca Aldinger\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/anexia.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/7819156cff96e2498826d4dc9fc66452?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/7819156cff96e2498826d4dc9fc66452?s=96&d=mm&r=g\",\"caption\":\"Bianca Aldinger\"},\"url\":\"https:\/\/anexia.com\/blog\/author\/baldinger\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Summarizing Video Conferences with Artificial Intelligence","description":"In this blog post, we show you how artificial intelligence can summarize videos using open source tools.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/","og_locale":"de_DE","og_type":"article","og_title":"Summarizing Video Conferences with Artificial Intelligence","og_description":"In this blog post, we show you how artificial intelligence can summarize videos using open source tools.","og_url":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/","og_site_name":"ANEXIA Blog","article_publisher":"https:\/\/www.facebook.com\/anexiagmbh\/","article_published_time":"2023-11-21T09:00:24+00:00","article_modified_time":"2023-11-29T08:42:06+00:00","og_image":[{"width":672,"height":372,"url":"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/10\/KI-Videokonferenz_Teaser.png","type":"image\/png"}],"author":"Bianca Aldinger","twitter_card":"summary_large_image","twitter_creator":"@_ANEXIA","twitter_site":"@_ANEXIA","twitter_misc":{"Verfasst von":"Bianca Aldinger","Gesch\u00e4tzte Lesezeit":"4\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/","url":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/","name":"Summarizing Video Conferences with Artificial Intelligence","isPartOf":{"@id":"https:\/\/anexia.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#primaryimage"},"image":{"@id":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#primaryimage"},"thumbnailUrl":"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/10\/KI-Videokonferenz_Teaser.png","datePublished":"2023-11-21T09:00:24+00:00","dateModified":"2023-11-29T08:42:06+00:00","author":{"@id":"https:\/\/anexia.com\/blog\/#\/schema\/person\/bdc6f0cc5dc56835109748527ae31778"},"description":"In this blog post, we show you how artificial intelligence can summarize videos using open source tools.","breadcrumb":{"@id":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#primaryimage","url":"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/10\/KI-Videokonferenz_Teaser.png","contentUrl":"https:\/\/anexia.com\/blog\/wp-content\/uploads\/2023\/10\/KI-Videokonferenz_Teaser.png","width":672,"height":372},{"@type":"BreadcrumbList","@id":"https:\/\/anexia.com\/blog\/en\/summarizing-video-conferences-with-artificial-intelligence\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/anexia.com\/blog\/de\/"},{"@type":"ListItem","position":2,"name":"Summarizing Video Conferences With the Help of Artificial Intelligence"}]},{"@type":"WebSite","@id":"https:\/\/anexia.com\/blog\/#website","url":"https:\/\/anexia.com\/blog\/","name":"ANEXIA Blog","description":"[:de] ANEXIA Blog - Technischen Themen, Anexia News und Insights [:]","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/anexia.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"de"},{"@type":"Person","@id":"https:\/\/anexia.com\/blog\/#\/schema\/person\/bdc6f0cc5dc56835109748527ae31778","name":"Bianca Aldinger","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/anexia.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/7819156cff96e2498826d4dc9fc66452?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/7819156cff96e2498826d4dc9fc66452?s=96&d=mm&r=g","caption":"Bianca Aldinger"},"url":"https:\/\/anexia.com\/blog\/author\/baldinger\/"}]}},"lang":"en","translations":{"en":8524},"amp_enabled":true,"pll_sync_post":[],"_links":{"self":[{"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/posts\/8524"}],"collection":[{"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/users\/47"}],"replies":[{"embeddable":true,"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/comments?post=8524"}],"version-history":[{"count":24,"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/posts\/8524\/revisions"}],"predecessor-version":[{"id":8565,"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/posts\/8524\/revisions\/8565"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/media\/8473"}],"wp:attachment":[{"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/media?parent=8524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/categories?post=8524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/anexia.com\/blog\/wp-json\/wp\/v2\/tags?post=8524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}