Sitemap

Fine-tune LLM Model using Snowflake Cortex: Practical use case

4 min readMar 11, 2025
Press enter or click to view image in full size

Problem Statement

A fictitious company gets a number of support tickets from different customers in a day. The support staff has to manually go through them and categorize. Wouldn’t it be better if this can be automated?

That’s exactly what we will be doing in this assignment via LLM models in Snowflake Cortex

Part A: Data Preparation

  1. Get the sample dataset from here
  2. Load the dataset into snowflake table — support_tickets. Follow the steps mentioned in Part A of this tutorial
  3. Open a new snowflake notebook
# Import python packages
import streamlit as st
import altair as alt
import snowflake.snowpark.functions as F

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()
df_support_tickets = session.table('support_tickets')
df_support_tickets
Press enter or click to view image in full size
Support_tickets table records

Part B: Analyze the dataset using Mistral models

Create a prompt that we will pass to the model in next step

prompt = """ You are an agent that helps organize the requests that come to your support team

The request category is the reason why the customer reached out. These are the possible types of request categories

Slow performance
Product Info
Account Management
Billing
Technical Support

Try doing it for this request & return only the request category

"""

Now we use the Mistral model + complete function to check how it performs

mistral_large_response_sql= f""" 
select ticketid, request, trim(snowflake.cortex.complete('mistral-large', concat('{prompt}', request)), '\n') as mistral_large_response
from support_tickets
"""

df_mistral_large_response = session.sql(mistral_large_response_sql)
df_mistral_large_response
Press enter or click to view image in full size

As you can see, this model performs a decent job of categorization. However it can be very expensive to run in production with large dataset.

Is there an alternative?

Yes. We could try the smaller model — Mistral-7B

mistral_7b_response_sql= f""" 
select ticketid, request, trim(snowflake.cortex.complete('mistral-7b', concat('{prompt}', request)), '\n') as mistral_7b_response
from support_tickets
"""

df_mistral_7b_response = session.sql(mistral_7b_response_sql)
df_mistral_7b_response
Press enter or click to view image in full size

As you can see, the model has not done well and has given low quality results.

Lets compare the results side-by-side from both the models

df_llms = df_mistral_large_response.join(df_mistral_7b_response, 'ticketid')
df_llms
Press enter or click to view image in full size

Here is where Fine-tuning will come to our rescue.

So we will fine-tune the Mistral-7B model to suit our sample dataset.

Before we jump there lets look at our model architecture

Press enter or click to view image in full size
Architecture

Part C: Fine-tuning

Generate the dataset to fine-tune Mistral-7B

df_fine_tune = df_mistral_large_response.with_column("prompt", F.concat(F.lit(prompt),F.lit(" "),F.col("request"))).select("ticketid","prompt","mistral_large_response")
df_fine_tune.write.mode('overwrite').save_as_table('support_tickets_finetune')

Split the data into training set & test set using split function

train_df, eval_df = session.table("support_tickets_finetune").random_split(weights=[0.8, 0.2], seed=42)
train_df.write.mode('overwrite').save_as_table('support_tickets_train')
eval_df.write.mode('overwrite').save_as_table('support_tickets_eval')

Fine-tuning via the UI is extremely easy

Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size

Next we need to provide the Prompt & Completion

Press enter or click to view image in full size
Press enter or click to view image in full size

Now we need to select our validation data

Press enter or click to view image in full size

You can check the progress of your fine tuning model using the following SQL. You need to provide the JobID as parameter

SELECT SNOWFLAKE.CORTEX.FINETUNE(
'DESCRIBE',
'ft_e7c079be-f011-4075-8f1f-f8e6c41375e7'
)

It took a couple of minutes to fine tune the model

Lets use this new model for our categorization problem

Press enter or click to view image in full size

The results are better than earlier.

Further Improvements

You could try to further improve the results by adding more epochs in sub step 1 of the fine-tune cortex wizard. However it is not guaranteed to yield desired results.

Conclusion

This tutorial demonstrated the practical use case of fine-tuning a large language model (like mistral-7b) to create your own version in Snowflake. It also demoed RAG implementation via working on our sample set (i.e. synthetically generated data of IT tickets)

Fine-tuning is extremely easy in Snowflake — just a few clicks via the wizard & you can have your own version of model.

--

--

Ashish Agarwal
Ashish Agarwal

Written by Ashish Agarwal

Engineer and Water Color Artist @toashishagarwal

Responses (1)