This page demonstrates how to access connections, variables, and secrets within your Python scripts in Dataflow using Airflow hooks for streamlined and secure development workflows.
What Are They?
Connections: Secure database or external system configurations (PostgreSQL, MySQL, HTTP, AWS, etc.) stored centrally and accessed via Airflow hooks using conn_id.
Variables: Key-value pairs stored in Airflow for dynamic values such as configuration parameters.
Secrets: Sensitive credentials like API keys stored securely, ensuring they remain encrypted and never exposed in plaintext in notebooks or scripts.
Using Connections with Airflow Hooks
Connections are accessed using Airflow provider hooks. Each hook type requires a specific conn_id parameter to identify the connection.
Example: PostgreSQL Connection
from airflow.providers.postgres.hooks.postgres import PostgresHook
# Example: Using Gemini API key with Google Generative AI
genai.configure(api_key=gemini_api_key)
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Hello from Dataflow!")
print(response.text)
Using Connections in Jupyter Notebooks with JupySQL
Dataflow supports JupySQL magic commands in Jupyter notebooks, allowing you to execute SQL queries directly using your configured connections with a simple and intuitive syntax.
Basic Usage
# Connect to your database using the connection ID
%sql conn_id
# Execute SQL queries directly
%sql SELECT*FROM users LIMIT10;
# Multi-line queries
%%sql
SELECT
user_id,
username,
created_at
FROM users
WHERE status ='active'
ORDERBY created_at DESC;
Important Notes
Kernel Restart Required: If you add a new connection in Dataflow Studio while a notebook kernel is already running, you need to restart the kernel to use the new connection with %sql conn_id.
Alternative Without Restart: To avoid restarting the kernel, you can use the --section parameter:
%sql --section conn_id
This will work immediately without requiring a kernel restart.