Home > Resource Center > Migrate to Snowflake

Migrate to Snowflake

Snowflake is a cloud-based data warehouse designed to handle all aspects of data and analytics. It has become a leading database for data warehousing because of its patented new architecture. Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database architectures, and uses a central data repository for persisted data that is accessible from all compute nodes on the platform. The Snowflake system maintains all data in databases, with each database comprising one or more schemas, which are logical groupings of database objects, such as tables and views. Snowflake does not place any hard limits on the number of databases, schemas, or objects you can create.

Snowflake offers the following benefits:

  • Storage capacity: Snowflake is scalable and has a high storage capacity, making it ideal for businesses that handle a large amount of data.
  • Multiple-cloud availability: Snowflake can run on Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP), giving organizations more options for where to store their data.
  • Automatic data optimization: Snowflake automatically optimizes how we store and query data. This makes it easier for businesses to combine structured and semi-structured data with no conversion or transformation into a fixed relational schema first.
  • User-friendly and responsive: Snowflake is user-friendly and has published guidelines for organizing data, which can lead to a highly responsive and optimally performing system. It does not require a full-time database administrator on staff.
  • Compatibility with popular business intelligence tools: Snowflake is compatible with popular business intelligence tools such as QuickSight, Looker, Power BI, and Tableau, enhancing its ability to provide valuable insights for organizations.
  • Security: Snowflake secures and encrypts all data, making it easier to generate business intelligence and comply with data warehousing certifications such as PCI DSS.

Snowflake has grown into one of the premier platforms for data warehouses in the cloud. Data professionals who design and manage Snowflake data warehouses can use ER/Studio to capture organizational requirements, translate requirements into design, and deploy the design to Snowflake.

Watch this video to discover the comprehensive support of ER/Studio for Snowflake provides better visibility and easier updates to data structures.

This session includes:

  • Forward and reverse engineering with data definition language (DDL) code
  • Compare and merge capabilities with ALTER script generation
  • Access to tables, views, columns, and constraints
  • Table persistence and clustering keys
  • Support for Snowflake features: procedures, functions, materialized views, sequences, users, roles, and schemas

And more!

Transcript

00:08

Migrate Databases to Snowflake using ER/Studio

Hi everyone, and welcome to our democast, where today we’re going to be talking through how we can use ER/Studio to help us migrate databases to snowflake.

00:28

ER/Studio

So, a quick introduction in terms of ER/Studio itself, which I always just do quickly with a couple of slides just to refresh everyone if they’ve used ER/Studio before or if you’re new to ER/Studio. Our focus with ER/Studio is very much around data modeling. So it’s what we call an enterprise-grade tool, which can be used by Data Architects to help them document and also design and modify the data assets within an organization.

01:01
ER/Studio helps businesses make it more accessible to other people within your organization who might be involved in data governance or data analytics.

01:22

ER/Studio Team Server

What I’ll probably do at the end is just for five minutes, I might just bring up some of that sharing capability which comes under what we call Team Server. Quickly going through this, what we’re really looking to achieve with ER/Studio as a modeling tool is we’re trying to really translate or transform, if you like, the very technical information that’s captured as part of a database schema and turn it into more meaningful information.

The slide here is just illustrating that on the right-hand side, if we look at our data sources, look at our data assets, they’re going to be stored potentially in many different database platforms. They’re going to have a technical schema associated with them. Some of the naming conventions that are in use may not be altogether meaningful. They might be kind of coded in for historic reasons. If we want to actually transform that and make it into something that’s easier to understand, ER/Studio can help us do that.

02:24

Reverse Engineering

The first step really is that what we’re able to do is connect to those data sources and extract the information from them with a process we call reverse engineering that allows us to create primarily a physical and logical model representing that information. On the bottom left of the slide here, we might transform the data assets and from there extract that information and produce a logical data model.

02:52

Logical Data Model

The logical data model can then have more meaningful information in there so it can have more meaningful names, they can be more business-friendly and be presented in a way that’s easy for people to understand.

The next thing we do with ER/Studio, again, using what we call our Team Edition, our Team Server is on the top left. We’re also then able to associate those things to an even higher level by creating links into business terms and business glossaries. We can then map what something means to the business, what’s a customer, what’s an employee, what’s a person, how that actually translates through into a logical model and then ultimately into the databases themselves. That’s the kind of thing that we can achieve with ER/Studio in quite a simple way. And we’ll illustrate some of this as we go through the demonstration.

03:47

ER/Studio Data Architect

In terms of ER/Studio itself. Again, just so we understand the terminology that we use. ER/Studio is made up of a number of different components. On the left-hand side is what we call ER/Studio Data Architect.

04:01

Data Modeling

And this is a Windows application and we’ll spend a lot of the demonstration actually inside ER/Studio data architect. This is where we’ll actually do the work in terms of creating, updating, and maintaining data models. It’s typically going to be used by a data architect or a data modeler, a fairly technical person within the organization.

04:39

Data Repository

The central section there is what we call Team Server Repository. Now, this is really something that enables team working. What this means is we can store our data models in a central location. The team server repository will then manage access to those data models so you can define security roles, and define who has access to those. Also, it will support versioning so we can check out, and make our changes.

05:06

Data Modeler

Check those changes back in again and it will make sure that we’re not overwriting other changes, and that everyone’s kind of working nicely together as part of the team. It also supports things like named releases which are a snapshot of the data model at that point in time. So you might have your one production model, your one-two production model.

05:25
And you can always go back and see the differences between those. The final piece on the right-hand side that I’m just going to talk through quickly is what we call Team Server Core. This adds a web browser-based interface to the team server repository.

Enterprise Data Model

This is really a mechanism for sharing information from the data models with a wider audience. If you got people in your organization who are more business focused, maybe they’re data stewards, maybe they’re more business analysts, they’re not going to be actually maintaining the data models, but they might like to be able to browse through the data models. They might like to search for specific content in the data models and be able to understand that.

06:04
Team server Core really provides that access just via a web browser, so they don’t need to install any software but enables them to see all that. Information that we’ve captured as part of the data models themselves.

06:18
That’s really the core elements that make up ER/Studio. For the demonstration today, we’ll spend most of our time inside ER/Studio Data Architect. As I say, towards the end, we might just briefly bring up the team Server Core web interface so you can see how that actually relates as well.

06:36
Okay, I’ll bring this slide back up again towards the end, but there’s lots and lots of good information out there on our websites and also on our YouTube channel. If you want to go and explore further information around ER/Studio, that’s just a few kinds of pointers as to good useful URLs to start with.

06:54
As I say, I will bring that back up again toward the end of the session.

07:41
Okay, so this is ER/Studio data architect. As I mentioned, this is a Windows application and this is where we do the creation of the models and work with the models and so on.

07:53

Snowflake Migration

Now, just for the purposes of the demo today, I’ve got an existing model here that’s already been created. Where we’re kind of coming from with the session we’re covering is if you do have existing database systems that might be stored in a SQL Server, might be stored in an Oracle, and you are looking at how you can migrate those or move those into other environments, specifically snowflake

08:25
For the data model we’re looking at here, I’ll just explain a few things briefly about ER/Studio. We’re first of all looking at what we call a physical data model.

08:34

Physical Data Model

This is actually something that was reverse-engineered from Microsoft SQL Server. We then added a few comments and kind of title bars to it just to make it clear that’s the case. If I do click on some of these objects, I can go see the details behind them and I’ll see the definitions of the columns, I can even see quite specific things related to SQL Server Ddl, how that would be created or how it came through. From reverse engineering, I can specify specific storage parameters and so on. These will all be different depending on the database platform that we’re working with.

09:10

Logical Data Model

But what ER/Studio allows us to do, is it allows us to have multiple physical models within our modeling projects. So I can in effect go from a SQL Server data model, I then typically produce a logical model from that, so the logical model is a little bit more abstracted away–it’s not specific to the platform.

09:33

Migrating Database Platforms

What I can then do is forward generate a new physical model and make that specific to a platform again. The process here that I’m kind of starting a little way into is we’ve reverse-engineered from SQL Server, we’ve produced our physical data model. I’ve also produced a logical data model. What I want to do now is how would we actually go through and use this as a basis for migrating to other database platforms.

10:04
Again here is just on the left-hand side, as I mentioned, I’ve got my physical data model which came from SQL Server. I’ve got another one here which I was kind of working with earlier, which is why that one says snowflake against it, but we’ll ignore that one for now.

10:18
We’ll generate it and explain the steps from scratch for you, but it’s detailing all the different components of that SQL Server database for me. So I’ve got my tables, my columns, functions, procedures, and so on.

10:32

Data Modeling Tool

I think the first thing as well, just to understand, with a tool like ER/Studio, we’re not a 100% migration utility, we are a data modeling tool. What we are able to do is take things like table definitions, and column definitions–which we’re able to move across different platforms.

If you have got specific code that’s installed procedures or packages depending on the platform, we’re not able to convert that for you automatically. There’s always going to be some manual Intervention required to take those across to the other platforms.

11:07
Again, if we look at this particular physical model, if we look at the properties behind it, if I click.

11:12
Here, at any point in time, I can go in and see the platform that’s associated with a physical data model. So I just say yes. This one is actually set up against Microsoft SQL Server 2017.

11:24

Reverse Engineer Database Schema

As I said, I generated this model by reverse engineering it from an existing schema. From there, what I’ve also created is a logical model. So the logical model is not specific to the database platform, but it contains very similar things.

11:46
I’ve kind of got a one-for-one mapping really here between table definitions in my physical layer and what we call entity definitions at the logical layer. The logical model can have them in a more friendly format. So instead of having things abbreviated to ADR, that’s actually mapping to something called to address at the logical layer.

12:05
And we have things called Naming Standard Templates, which will help you do that automatically as well.

12:09
There’s lots of good ways in which we can kind of maintain the differences and update between logical and physical data models. The first thing that I’m going to do here in terms of our demonstration is I’m going to go ahead from my logical model I’m going to say I’d like to generate a new physical model. Yes, let’s go ahead and do that.

12:32
What’s the name of the physical model going to be, just so I can find it easily?

12:36
I’m just going to call this one Snowflake and it’s part of our training database, so I just put TRN next to it. What’s the target platform going to be for this physical model? So, again, I’ve got a comprehensive drop-down list here of what I can select from.

12:52
The steps that I’m showing you here are not specific to Snowflake. They will work if we want to do the same process across other platforms. As I said in our demonstration today, we’re really going to talk through with Snowflake and I’m going to go through and say, yes, let’s do all of that.

13:10
I can select the different parts of the data model that I want to be generated into the physical.

13:17

We’ve got things called business data objects. We’ve also got our tables, we got our views, users’ roles and so on that are all part of the selection process.

13:44

Data Model Validation

ER/Studio does also support what we call validation rules. You can actually have a model validation routine executed against it. This can just check some kind of static definitions. Do things have a certain number of key definitions? Have they got a description? Have they got an explanation of what they’re used for? There are all sorts of different items that we can select here if we want to.

14:08
You can see a quick example there. We got duplicate check constraints. Are there more than ten foreign keys on a table?

14:15
So things that generally might be regarded as not always best practice, so there is a validation wizard that you can elect to run.

14:28

Snowflake Data Model

From my logical model, which was defined here, I’ve now created an all-new snowflake model. This is a physical model and again, if I just want to reassure myself, I can click on this, I can.

14:40
Select the database platform and we’ll see that this is set to Snowflake and its going to use kind of the Snowflake defaults for that as well. It works in the same way as the data model that we derived it from. Again, any of these items I can, for example, double click and open up and it’s going to show me the same kinds of things. There are my column definitions and my DDL code behind them.

15:04

Snowflake

In this case, this is now the snowflake-specific code that’s been generated. And you’ll also see some of the options are a little bit different. With Snowflake, we have options around storage parameters. Are they permanent, are they transient, are they temporary? Et cetera, et cetera.

15:19
All these different options as part of the property editors will change to reflect the database platform that you’ve selected.

15:29
In order to run the demonstration today as well, the next steps that we need to go through are basically from this snowflake-specific data model. How would I actually go about implementing that on a snowflake platform? Again, this is quite a simple example, but we’ll illustrate the steps for you.

17:19

Snowflake Migration

So this is my Snowflake environment. Snowflakes I’m sure is kind of a cloud-based system. This is my user interface for Snowflake. And I’ve got a list of the different databases that I’ve got as part of my warehouse. We’ve got a few different test ones that we’ve been using here.

17:42
What I’m going to do to start with here is I’m just going to go ahead and I’m going to create a brand new blank database in Snowflake. We’ll create this here and I’m just going to call that JH for John Harrison and let’s call this demo. So that’s our training demo model that we’re working on.

18:13
You’ll see it’s basically empty at the moment, so it doesn’t have any tables defined. It’s got no other information defined against it either. It’s just kind of existing there. So what I can now do is.

18:24
Jump back to ER/Studio and I’m now going to basically click through my wizard here and from my model and generate the database. I will just specify a location where I can find it if I need to. But I’ll show you as well that we can also open it up interactively inside ER/Studio.

19:03
We’ve got some options that we can generate here. So I’m going to select everything. We’re going to create all our tables. There are different options in that we can specify how we want the DDL to be generated.

19:14
Again, I can choose to select and create the views, the users, and the roles. I’m going to make sure that I’ve got my schema selected there so it is going to create the actual schema for me.

19:37
Okay, so I’m just working through this wizard and as I said, although we’re showing it on Snowflake today, these steps are applicable to other platforms as well.

19:45

SQL Code Generator

From ER/Studio, from a physical model, we can ask it to go ahead and generate the SQL code. I can also be specific if there are just certain tables that I want to include…

19:59

Create Database Schema

What I’m going to do here is just say open up the SQL preview window. That’s going to show me that this is SQL code that’s been generated by ER/Studio Data Architect. It’s got a date and timestamp target platform Snowflake and it’s then got the different codes that are required to actually create that database schema for us.

20:42
I’m going to jump back to my browser where we’ve got Snowflake running here. So this is my back-end snowflake system, again based on the cloud, and I’m going to go in and select the worksheet option.

21:06
Here I’m going to paste in that new set of SQL code. Again, if we go up to the top, we should see it’s telling us it was created by ER/Studio. I’m just going to check here.

21:19
I’m going to say that we’re going to host this in our warehouse, which is called Demo Warehouse.

21:25

So we’re going to do that and I’m going to take a default here, saying it’s allocated to the public schema. Having specified that, I should hope to be good to go.

21:41
So I’m going to say let’s select everything and let’s run that query.

21:47
It’s just going to work its way through the execution of that generated code for us now.

21:57
And it’s now just telling us that it’s executed successfully.

22:00
So now if I go back and look at my databases and we can now see that it’s actually created those different table names for us based on the physical data model itself.

22:21
We’ve executed the code in the Snowflake environment to actually create the tables within there.

Of course, we can drill into these as well on Snowflake, we’ll see that it’s basically got my different column names defined, it’s got the ordering types, etc. all added in there.

22:41

Platform-Agnostic Data Model

That’s quite a kind of powerful capability of ER/Studio in that it’s able to take the specifics of the physical data model and the platform that we have and actually then generate it in terms of the target platform that we’re moving to.

23:00
So, as I said, it’s not 100%, it won’t do a fully automatic migration for you, but it will give you all the basics of the actual schema itself to be able to start working with that.

23:11
What I’m going to show you next is how we can also manage this from ER/Studio in terms of when we make changes and have differences and so on that are being applied to the data model and how we push those forward.

27:01

Conceptual Data Model

ER/Studio does support something called sub-models. They’re sometimes called subject areas. These are really powerful ways of organizing the model. If you want to have different views onto the model and say views, I should say different layers on the model showing different elements, I can do that using sub-models.

Here, for example, I’ve got a different view here called the conceptual data model. I’ve got my main, more detailed training model. If I look up here in my logical data modelling environment, again, I’ve got kind of other options that I can explore in there as well.

27:34
I could go into my conceptual data model where I’ve hidden away a lot of the details because, for example, higher-level business users just want to know that we have these high-level concepts of a training course and an employee.

27:59

Snowflake Entity Relationship Diagram

I’m going to go back to my Snowflake ERD and zoom into it. So I’ve got my objects already existing, and these have all been created for me by that code execution on my snowflake back end. Again, just to keep it simple, what I’m going to do here is I’m just going to add to the physical model.

I’m going to say let’s insert a new table. I can go into the property editor for the table and I could add in some extra columns.

28:50
I’ve now added to my data model a new object called Entity Three. Now, that doesn’t exist in my snowflake back end. That’s a new object that I just created on the data model.

29:03
So how can I go about understanding differences and managing changes within ER/Studio?

We’ve got a really powerful utility or wizard that’s called Compare and Merge just for items like this.

29:21

Compare and Merge Data Models

It’s going to bring up a new set of menus for me and I can compare a number of different things here so I can compare to other models, so I can synchronize between logical and physical models. If I’ve been making changes to the logical, I want to push those into the physical. I can do that. I can connect to my studio team server repository and compare to other models I got stored in that central location. I can also compare things like a SQL file if I extracted that from the database. Or I can also go back and.

29:50
Reconnect to my data source. So what I’m going to do here is go through and I’m going to specify my credentials to connect to Snowflake. I specify the database that I want to connect and work with. There’s a list of my Snowflake databases and this is one that we just created.

30:27
So I’m just going to select that one, bring that over and I can specify things like owners and so on in here as well. For my simple example, I shouldn’t need to do that, so I can click through. Again, more options. We’ll just take the defaults. You can go down to quite fine levels of detail and granularity here.

30:46

Snowflake Data Modeling

And then what it will do is it will give me a list of the columns on Snowflake. This is Snowflake on the right hand side. This is my current data model.

30:57
Already it’s telling me that there’s something called Entity Three that’s not existing. I’m just going to get it to look just for this particular one, but I could compare multiple ones if I want to.

31:13
And what we should then see is it’ll bring up a little summary telling me that I’ve got something in my data model called Entity Three. It’s also got some columns defined in here and they don’t exist currently on the Snowflake platform.

31:27

Create Schema in Snowflake

So what do I want to do? I can select things individually or I can actually say let’s synchronize these. In my case, I’m going to say I’d like to take these changes from the model and come up with a way of merging those into my Snowflake schema.

31:44
Now ER/Studio will not make those changes directly. It’s a data modeling tool. What it will do though is it will generate scripts to make those changes happen and this one’s a very simple script.

32:03
It’s now basically showing me that we’ve got a script generated which is going to create that new table that we added to the data model. It’s got those particular columns in there, so I could then take that script, put it through my usual processes, check it, verify it in my test environments, my development environments, and then once I’m happy with it, then obviously it could be deployed and applied into the snowflake system at the back end as well.

32:30

Compare Data Model

We will generate alter statements so the script that gets generated, where it can basically generate an alter, it will do that. We also generate something called an extended alter, where it’s sometimes with different dependencies to generate the script in different ways. But basically by selecting those changes, by comparing and merging in the data model that allows us to actually apply those.

32:57
Generate the script out from there, generate that change script and then apply it through our usual processes into the system itself.

33:06
There are quite a few questions about how we actually update and compare things like the logical and physical models and the actual database schema.

And again, I’m showing this on Snowflake. That’s the same type of functionality and feature set that works across all of our supported data platforms. It could equally well be an on-premise Oracle or an on-premise SQL server, or a cloud-hosted snowflake system like we’ve just used today.

33:36
So I could then take that and apply it back into my snowflake system.

40:06
Just as a quick recap, in essence, what we’ve shown today is that with ER/Studio we can reverse engineer from an existing database, which could be SQL Server, Oracle or something else, or we could create a new data model by hand. From those data models, we can then generate physical models to correspond to different platforms. That enables me to actually generate the definitions for my objects, my tables, my columns, onto a platform like Snowflake. We will generate the code around that, which you can then take across, implement on that platform, and basically stop migrating.

ER/Studio Logo
Aligning complex data environments with business goals for over 30 years.
Copyright © 2024 Idera, Inc.