Delivering Video Intelligence with Open Source Tools

Nov 19, 2024

Like many other developers, I’ve felt the impact of Microsoft’s decision to retire Azure Media Services as they shift focus to ‘secular’ growth areas—a change that directly affects my martial arts SaaS solution, http://www.jitzs.com, which relied heavily on this service. As I’ve been redesigning Jitzs to fully deliver our video intelligence capabilities using open-source technologies, I wanted to share some best practices for building such solutions. This might be helpful for others looking for alternatives to Azure Media Services.

Let’s delve into building an end-to-end video intelligence platform for encoding, AI-analyzing, and streaming using tools like FFmpeg, Whisper, and Serverless.

The Business

First, why should we care about video intelligence? Encoding, streaming, and video analytics represent a significant Total Addressable Market (TAM). The video streaming market alone was valued at approximately $107.8 billion in 2023, and it’s expected to grow at a compound annual growth rate (CAGR) of 20.5%, reaching nearly $397.67 billion by 2030. Every minute, over 500 hours of video are uploaded to YouTube alone, and millions of minutes of video are watched globally within the same timeframe. This massive consumption creates a huge demand for encoding solutions to make these videos lighter and accessible across multiple platforms. Additionally, AI-powered video intelligence is essential for analyzing aspects like sentiment and brand presence, while robust streaming capabilities are required to deliver videos live and on-demand. In short, there’s a lot of money to be made in this space.

Architectural Pillars

To architect such solutions requires building a multifaceted platform, which includes the following pillars: storage, encoding, analytics, streaming, and integration. Let’s review them:

Storage: An end-to-end architecture for delivering video intelligence starts with efficient and scalable storage for raw video assets and output files. Cloud storage is typically used, with costs minimized by selecting appropriate resolutions. Security capabilities, such as Digital Rights Management (DRM) and HLS Encryption, should be implemented to protect copyrights and intellectual property.

Encoding: For encoding, it’s essential to choose the best open-source libraries like FFmpeg, tailored to your preferred programming language and platform. This step involves converting raw video into formats suitable for streaming while optimizing for different devices and bandwidth conditions.

Analytics: Video analytics are powered by AI models that analyze content for various factors such as sentiment, brand presence, and more. While building a Convolutional Neural Network (CNN) from scratch is possible, using open-source models like Whisper or paid services like Azure Video Indexer can accelerate development and enhance accuracy.

Streaming: The streaming component involves delivering processed video to users in various formats, such as HLS (HTTP Live Streaming) or MPEG-DASH. These protocols optimize streaming quality by adapting to the viewer’s hardware and internet connection. Security aspects should be considered here as well, such as using DRM and encryption capabilities provided by DASH and HLS to protect content and prevent unauthorized access. Combining streaming with a Content Delivery Network (CDN) ensures that content is delivered quickly and efficiently, closer to the end-user.

Integration: An event-driven architecture with serverless computing can integrate each layer of the video intelligence platform while decoupling the components. This allows for scalability, flexibility, and easier management of the entire system, ensuring that each part can operate independently and efficiently.

Example Architecture

This is an example architecture using open source tools that can run on Microsoft Azure or on-prem using the Functions Core Tools:

The Code

As is the norm on this blog, I like to share some code snippets because, as they say, “paper accepts everything.” But I prefer to prove the idea in practice. Most importantly, I want to help others accelerate their development. I’m sharing the Video Streaming Azure Function that delivers an HLS video from storage—this was a crucial component in my project. This function can stream a video from Azure Storage or CDN using a SAS token passed via query string.

function_streamer.route(route="http_streamer/{container}/{media_id}/{*filename}", auth_level=func.AuthLevel.FUNCTION)
def http_hls_streamer(req: func.HttpRequest) -> func.HttpResponse:

    filename = req.route_params.get('filename')
    container = req.route_params.get('container')
    media_id = req.route_params.get('media_id')


    query_string = urlparse(req.url).query

    root_url = os.getenv("MEDIA_CDN_OR_STORAGE_URL")

    container_path = f"https://{root_url}/{container}"

    sas_token = query_string

    blob_url = f"{container_path}/media-{media_id}/{filename}?{sas_token}"

    origin = req.headers.get('Origin') if req.headers.get('Origin')  else '*'

    if filename.endswith(".m3u8"):

        modified_manifest_path = modify_hls_manifest(storage_helper, container_path, f"media-{media_id}/{filename}", sas_token , query_string)
        with open(modified_manifest_path, 'r') as file:
            modified_manifest_content = file.read()
        return HttpResponse(
            modified_manifest_content,
            mimetype='application/vnd.apple.mpegurl',
            headers={
                'Access-Control-Allow-Origin': origin,
                'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS',
                'Access-Control-Allow-Headers': 'Content-Type, Authorization',
                'Access-Control-Allow-Credentials' : 'true'
            }
        )

    return HttpResponse(
        status_code=302,
        headers={
            'Location': blob_url,
            'Access-Control-Allow-Origin': origin,
            'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS',
            'Access-Control-Allow-Headers': 'Content-Type, Authorization',
            'Access-Control-Allow-Credentials' : 'true'
        }
    )


def modify_hls_manifest(storage_helper, container_path, manifest_file, sas_token, current_query_string):

    manifest_content = storage_helper.download_as_text(blob_name=manifest_file)

    folder = ""

    # Find folder of the manifest file
    if "/" in manifest_file:
        last_slash_index = manifest_file.rindex('/')
        folder = manifest_file[:last_slash_index] + "/"

    # Add SAS token to each line that references another file
    modified_content = []
    for line in manifest_content.splitlines():
        if line and not line.startswith("#"):
            if line.endswith(".m3u8"):
                line = f"{line}?{current_query_string}"  # .m3u8 is delivered by this endpoint, so no SAS and storage container is added
            else:
                line = f"{container_path}/{folder}{line}?{sas_token}"  # .ts file go direct to storage
        modified_content.append(line)

    # Save modified manifest to a temporary file
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".m3u8")
    with open(temp_file.name, 'w') as file:
        file.write("\n".join(modified_content))
    return temp_file.name

AI BITS

Delivering Video Intelligence with Open Source Tools

The Business

Architectural Pillars

Example Architecture

The Code

Leave a comment Cancel reply

Delivering Video Intelligence with Open Source Tools

The Business

Architectural Pillars

Example Architecture

The Code

Share this:

Leave a comment Cancel reply