Powering Penguin Insights: A Practical Guide to Snowflake Cortex AI with DeepSeek-R1 LLM
Snowflake Cortex is a suite of AI features that use large language models (LLMs) to understand unstructured data, answer freeform questions, and provide intelligent assistance. This tutorial is a practical hands-on guide to analyse Penguin data.
Let us get started.
Part A: Load the data to be analyzed
- Download the penguins.csv to your local from here
- Load the csv into a new snowflake table
At this point, you should see a new table called Penguins in your selected database
Preview of the data (It has a total of 344 records)
Part B: Analysis code in Snowflake Notebook
Create a new snowflake notebook — ANALYSIS_USING_LLM_PENGUINS
From the top right import the relevant packages
Implement the helper functions —
a) generate_deepseek_response() — This function interacts with the snowflake cortex complete function. First param is the name of the reasoning model (DeepSeek-R1 in our case) & second param is the prompt to be sent to the model
b) extract_think_content() — This function handles the response returned from the LLM model
# Helper function
def generate_deepseek_response(prompt):
cortex_prompt = f"'[INST] {prompt} [/INST]'"
prompt_data = [{'role': 'user', 'content': cortex_prompt}]
prompt_json = escape_sql_string(json.dumps(prompt_data))
response = session.sql(
"select snowflake.cortex.complete(?, ?)",
params=['deepseek-r1', prompt_json]
).collect()[0][0]
return response
def extract_think_content(response):
think_pattern = r'(.*?)'
think_match = re.search(think_pattern, response, re.DOTALL)
if think_match:
think_content = think_match.group(1).strip()
main_response = re.sub(think_pattern, '', response, flags=re.DOTALL).strip()
return think_content, main_response
return None, response
def escape_sql_string(s):
return s.replace("'", "''")
Implement the code for streamlit app that takes user questions & sends it to Deepseek-R1 model
# Streamlit app to send questions to the LLM
import streamlit as st
from snowflake.snowpark.context import get_active_session
import json
import pandas as pd
import re
# Write directly to the app
st.title("🐧 Ask about Penguins")
# Get the current credentials
session = get_active_session()
df = penguinsData.to_pandas()
user_queries = ["Which penguins has the longest bill length?",
"Where do the heaviest penguins live?",
"Which penguins has the shortest flippers?"]
question = st.selectbox("What would you like to know?", user_queries)
# question = st.text_input("Ask a question", user_queries[0])
prompt = [
{
'role': 'system',
'content': 'You are a helpful assistant that uses provided data to answer natural language questions.'
},
{
'role': 'user',
'content': (
f'The user has asked a question: {question}. '
f'Please use this data to answer the question: {df.to_markdown(index=False)}'
)
},
{
'temperature': 0.7,
'max_tokens': 1000,
'guardrails': True
}
]
df
if st.button("Submit"):
status_container = st.status("Thinking ...", expanded=True)
with status_container:
response = generate_deepseek_response(prompt)
think_content, main_response = extract_think_content(response)
if think_content:
st.write(think_content)
status_container.update(label="Thoughts", state="complete", expanded=False)
st.markdown(main_response)
For the question we asked, the model rightly displays -
The penguin with the longest bill length is Gentoo from Biscoe Island with a bill length of 59.6 mm (male)
One can try other questions as well.
The advantage of using Snowflake Cortex AI is that accessing large language models (LLMs) is extremely easy without a need to manage integration & API keys. Governance control is easily implemented via Cortex Guard to filter out potentially inappropriate content.
Bonus
In case you are getting curious about penguins species here is how they look
Conclusion
This assignment demonstrates the following-
- Loading sample data from local into snowflake tables
- Query the data using Snowflake Cortex AI
- Usage of Cortex Guard & DeepSeek-R1 LLM model
If you like my tutorials, consider giving multiple claps & follow me for more interesting reads.