AI, Innovation, and Tech Leadership by Eumar Assis

Integrate Enterprise Apps with Databricks using REST and OAuth

·

, ,

In a previous post, we explored options for accessing data in Databricks from external apps and systems. In this post, we’ll dive into another integration method: using Databricks SQL REST APIs with Databricks OAuth for authorization.

We’ll demonstrate how to connect to Databricks with single sign-on (SSO) in both Power Apps and a custom Python application using OAuth. This approach is useful when your system or tool does not support a Databricks SDK, requiring direct REST API calls. It builds upon Kyle Hale’s post, “Step by Step: Connecting to Databricks SQL in Power Apps…”.

The business impact of this scenario is to enable secure and efficient development of data-driven apps while supporting single sign-on. It’s a real-world use case I’ve worked on with enterprise customers. Let’s dive in and happy building! 😊

Authorization in Databricks

Databricks provides an unified client authentication “framework” to simplify access to Lakehouse resources. Tools using the unified client authentication such as the Databricks SDK, Databricks Connect, and the Terraform provider transparently handles authentication and the OAuth flow. The “framework” also makes it easy to pass credentials by defining default set of environment variables that the Databricks tools and SDKs recognize. With unified client authentication, the authorization options for apps wanting to integrate with Databricks are the following:

OAuth for service principals (OAuth M2M): Short-lived OAuth tokens for service principals in unattended scenarios (e.g., automated CI/CD).

OAuth for users (OAuth U2M): Short-lived OAuth tokens for users in attended scenarios, where you authenticate in real time via a web browser.

Personal Access Tokens (PAT): Short-lived or long-lived tokens for users or service principals, suitable when the target tool does not support OAuth.

OAuth is a widely accepted industry standard for authorization and using OAuth over PATs provide additional security as OAuth tokens can be shorter-lived and are less likely to be mishandled. Users don’t need to manage or share personal token. Apps get delegated access on behalf of user and administrators can revoke or modify permissions easily. Below is the illustration of OAuth User-To-Machine (U2M) Flow with Databricks:

Though not depicted, the Databricks Auth Endpoint works transparently with the Identity Provider, such as Microsoft Entra, AWS Cognito, or Okta. While the Identity Provider handles user authentication, Databricks authorizes access to Lakehouse resources.

Databricks OAuth Endpoints

Databricks OAuth endpoints enable secure authentication and token management for accessing Databricks APIs. There are Workspace-level and Account-level endpoints:

Databricks Account-level OAuth Endpoints

Endpoint TypeEndpoint URL
OAuth Tokenhttps://{account-console-url}/oidc/accounts/{your account id}/v1/token
OAuth Authorizationhttps://{account-console-url}/oidc/accounts/{your account id}/v1/authorize
OAuth Well-know
Configuration
https://{account-console-url}/oidc/accounts/{your account id}/.well-known/openid-configuration

Databricks Workspace-level OAuth Endpoints

Endpoint TypeEndpoint URL
OAuth Tokenhttps://{databricks-workspace}/oidc/v1/token
OAuth Authorizationhttps://{databricks-workspace}/oidc/v1/authorize
OAuth Well-know
Configuration
https://{databricks-workspace}/oidc/.well-known/openid-configuration

Registering OAuth Apps in the Account Console

Before an application like Power Apps can authenticate via OAuth, it must be registered via Databricks Account Console or Databricks CLI. Below are the steps for registering an OAuth app via the Account Console.

1. Open the Databricks Account Console -> Settings -> App Connections. AWS Link: https://accounts.cloud.databricks.com/settings/app-integrations

2. Click “Add Connection”:

3. Complete the fields below:

4. Save and record the Client ID and Client Secret.

If you prefer to script the process, use the Databricks CLI, which also supports managing OAuth app registrations. View the Databricks’ Enable/Disable OAuth doc for detailed step-by-step.

Connecting Power Apps to Databricks using OAuth

Kyle’s post demonstrates how to connect Power Apps to Databricks SQL using a Custom Connector and Personal Access Tokens (PATs). It illustrates how to pull NYC Taxi Rides data from Databricks using a Custom Connector. Here, we’re building upon his work, but using OAuth for authentication instead. So, follow Kyle’s post end-to-end, but in step 2 (Security), select OAuth2 as the authentication method and input the values as below.

High-Level Steps

1. Create a Custom Connector in Power Apps.

2. On Security, specify OAuth 2.0 details from your Databricks App Registration (steps above):

Client ID. Obtained from the Databricks App Registration.

Client Secret. Obtained from the Databricks App Registration

Authorization Endpoint. E.g., [workspace_url]/oidc/v1/authorize

Token Endpoint. E.g., [workspace_url]/oidc/v1/token

Refresh Endpoint. E.g., [workspace_url]/oidc/v1/token

Scope. E.g., sql offline_access

3. Define an Power Apps action (e.g., GET /taxi_rides) that queries the Databricks SQL REST endpoint:

• You’ll pass an SQL statement in the body to filter NYC taxi rides.

• Use custom code to parse the JSON response into a list.

4. Update the App Registration in the Databricks Account Console with the Redirect URL generated by the Power Apps Custom Connector.
5. Test the connector by launching an OAuth login flow. Once you authorize, you can retrieve the data.

Below is my Power Apps Security configuration using the OAuth2 with the Databricks Workspace OAuth endpoint.

Connecting a Python App to Databricks using OAuth with REST

Now, let’s walk through a Python example that uses OAuth tokens to query the same NYC taxi rides table in Databricks. While this could leverage Databricks Connect or the Databricks SDK, as mentioned in my other post, the goal here is to illustrate the OAuth flow “from scratch.” Note that the app uses a Flask web endpoint to handle the OAuth Redirect – it exchanges an OAuth authorization code for an OAuth token.

Requirements:

  • Add “http://127.0.0.1:5098/callback” to the “Redirect URLs” of your App Registration in the Databricks Account Console
  • Add the following environment variables to your local machine:
    • DATABRICKS_CLIENT_ID – value from Databricks App Registration
    • DATABRICKS_CLIENT_SECRET – value from Databricks App Registration
    • WORKSPACE_URL – Databricks Workspace URL
    • WAREHOUSE_ID – Databricks Warehouse id

The entire code is available on my Github repo. View file. We’ll outline the key code snippets below:

a) Dependencies & Configuration

import os
import requests
from urllib.parse import urlencode, quote_plus
from flask import Flask, request, jsonify
from threading import Thread
import time

# Flask App for Handling Redirects
app = Flask(__name__)

CLIENT_ID = os.getenv("DATABRICKS_CLIENT_ID")

CLIENT_SECRET = os.getenv("DATABRICKS_CLIENT_SECRET")

REDIRECT_URI = os.getenv("DATABRICKS_REDIRECT_URI", "http://127.0.0.1:5098/callback")

WORKSPACE_URL = os.getenv('WORKSPACE_URL')
AUTH_URL = f"https://{WORKSPACE_URL}/oidc/v1/authorize"

TOKEN_URL = f"https://{WORKSPACE_URL}/oidc/v1/token"

WAREHOUSE_ID = os.getenv("WAREHOUSE_ID")

authorization_code = None  # Global variable to store the code

c) Get Authorization and Exchange Code for Token

def get_authorization_url():
    params = {
        "response_type": "code",
        "client_id": CLIENT_ID,
        "redirect_uri": REDIRECT_URI,
        "scope": "sql offline_access"
    }
    return f"{AUTH_URL}?{urlencode(params, quote_via=quote_plus)}"

def exchange_code_for_token(auth_code: str) -> dict:
    data = {
        "grant_type": "authorization_code",
        "code": auth_code,
        "redirect_uri": REDIRECT_URI,
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET
    }

    response = requests.post(TOKEN_URL, data=data)
    response.raise_for_status()
    return response.json()

After a successful login, Databricks returns an auth_code which you exchange for a token.

d) Query Databricks SQL

def query_databricks(access_token: str, query: str):

    statements_url =  f"https://{WORKSPACE_URL}/api/2.0/sql/statements"

    headers = {
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "application/json"
    }

    payload = {"statement": query, "warehouse_id": WAREHOUSE_ID, "parameters": 
               [
                    {
                    "name": "date",
                    "type": "DATE",
                    "value": "2016-02-01"
                    }
                ]}

    resp = requests.post(statements_url, headers=headers, json=payload)

    resp.raise_for_status()

    return resp.json()   

f) Handle OAuth Callback

@app.route("/callback", methods=["GET"])
def handle_callback():
    """Handle the OAuth2 redirect."""
    global authorization_code
    authorization_code = request.args.get("code")
    if authorization_code:
        return jsonify({"message": "Authorization code received", "code": authorization_code})
    return jsonify({"error": "No authorization code received"}), 400

f) Run the App Flow

if __name__ == "__main__":
    # Step 1: Start the Flask app for redirect handling
    from threading import Thread

    def run_flask_app():
        app.run(port=5098, debug=False)

    flask_thread = Thread(target=run_flask_app,daemon=True)
    flask_thread.start()

    # Step 2: Direct user to authorize
    print("Go to this URL to authorize your application:")
    print(get_authorization_url())

    # Step 3: Wait for the authorization code
    print("Waiting for the authorization code...")
    while not authorization_code:
        time.sleep(1)  # Check every second

    print("Authorization code received. Querying Databricks.")
    
    # Step 4: Retrieve the access token
    token_data = exchange_code_for_token(authorization_code)
    access_token = token_data["access_token"]

    # Step 5: Query Databricks SQL
    sql_query = "SELECT count(1) as trip_count, pickup_zip FROM samples.nyctaxi.trips GROUP BY pickup_zip LIMIT 10"
    results = query_databricks(access_token, sql_query)
    print("Sample NYC Taxi Rides:", results)

Running the App

Run the app and voilà! After running the app, you should see the following table output. Make sure to navigate to the authorization URL.

This script demonstrates a minimal OAuth flow. In your scenario, you’d need to handle token refreshes, error cases, and securely store tokens. It helps you understand what happens under the hood when using the Databricks SDK, as these OAuth flows are managed by the SDK through the Databricks unified client authentication.

Conclusion

By configuring Databricks OAuth and registering your applications, you can securely connect enterprise tools (like Power Apps) and custom apps (like Python scripts) to query your Lakehouse data without manually handling long-lived personal tokens.

From Power Apps: Create a custom connector that authenticates via OAuth, run queries to fetch NYC taxi data (or any dataset).

From Custom Apps: Implement a lightweight OAuth flow that automatically redirects for sign-in and exchanges the code for a token.

Not only does this unify security and identity management under your organization’s identity provider, but it also enhances the overall user experience and governance model. If you’re eager to dive deeper, check out the OAuth App Registration documentation and the Partner OAuth Integration Guide for more details.

Happy building!

One response to “Integrate Enterprise Apps with Databricks using REST and OAuth”

  1. […] a previous post, we explored how to integrate enterprise apps with Databricks using REST APIs and OAuth from Python […]

    Like

Leave a comment