Irfan's World

January 25, 2011

How to load data from an excel file using SSIS 2008 when target table supports Unicode ?

Filed under: SSIS — Tags: , , , , — Irfan @ 12:47 am

Requirement (1): Load the data from an excel file to a Unicode database SQL Server 2008 database table.  

 
    Let’s first see what we have. We have an excel file in Office 2007 format that has 57 rows to load into a destination table in the Target database using SQL Server 2008 Integration Services.

Source : Excel file in MS Office 2007

No. of rows : 57
Destination : SQL Server 2008 Database

No. of rows after the load : 57

 

This would be an easy task, we can load the data without any issues. The source data file looks like below which has about 7 columns:

Now to load the data using SSIS, you need to install SSIS (SQL Server Integration Services) on you machine. 

    Open BIDS
    Create a Package and give it name and save it
     Create one excel Connection Manager and one OLE DB Connection Manager
    Create a Control Flow in which you will create a Data Flow. (This data flow is shown below)
  

And, the SSIS package developed in BIDS (Business Intelligence Development Studio) to load the data from excel file looks like below, It has a data flow that contains a connection to an excel data source file plus it has an OLE DB connection that connects to our Target table in the Destination database.

The destination table has the following data type’s that stores Unicode characters without any issues. Note to the readers, SSIS 2008 when reads first 8 records from the excel file it decides the data type of that column, Here the excel file string columns are read by SSIS as Unicode strings and our Target table is having Unicode data type’s which supports Unicode characters that can be stored easily. So the data type’s in source and target matches.

When the job is executed in BIDS (Business Intelligence Development Studio) it loads all the data to the destination table successfully.

Unicode ——-> to ——> Unicode (57 rows are loaded and the Job works perfectly, No worries) 🙂

Wait, what happens when we try to load the same excel file data to a non Unicode target table ?

<!–[if get mso 9]> Normal 0 false false false EN-US X-NONE AR-SA <![endif]–><!–[if gte mso 9]> <![endif]–> <!–[endif]–>

Source : Excel file in MS Office 2007 format

 

No. of rows : 57

Destination : SQL Server 2008

 

No. of rows after the load : 57

What is data ? How it is stored in Database ?

Filed under: Data Warehouse — Tags: — Irfan @ 12:29 am

Today, I am going to explain you a little about data. This four letter character is a huge topic that cannot be explained in a single post. I will try to keep it concise. In addition to that, we will see different types of data encoding schemes, their use in businesses. During this era where many companies all building their data warehouses by supporting Unicode or non Unicode formats, this topic has been highly discussed by DBA’s or Data Architect’s to support their projects and is top in their discussion list. In this post, I am leaning more towards Microsoft SQL Server and how it is used.?

Data :

A set of characters belonging to a language plus some special characters from a keyboard.

For example, A character set of English Language may contain letters from A..Z, numbers, punctuations, commas, ampersand, plus, minus, hyphen, backslash, tilde, ENTER, SHIFT and other special characters.

How Character data is stored:

In a computer, characters are represented by different patterns of bits being either ON or OFF.

There are 8 bits in a byte, and the 8 bits can be turned ON and OFF in 256 (28) different patterns. A program that uses 1 byte to store each character can therefore represent up to 256 different characters by assigning a character to each of the bit patterns.

There are 16 bits in 2 bytes, and 16 bits can be turned ON and OFF in 65,536 unique patterns. A program that uses 2 bytes to represent each character can represent up to 65,536 (216) characters.

Single-byte code pages (SBCS) are definitions of the characters mapped to each of the 256 bit patterns possible in a byte. Code pages define bit patterns for uppercase and lowercase characters, digits, symbols, and special characters such as !, @, #, or %. Each European language, such as German or Spanish, has its own single-byte code page. Although the bit patterns used to represent the Latin alphabet characters A through Z are the same for all the code pages, the bit patterns used to represent accented characters such as ‘Ă©’ and ‘á’ vary from one code page to the next.

For example, If data is exchanged between computers running different code pages, all character data must be converted from the code page of the sending computer to the code page of the receiving computer. If the source data has extended characters that are not defined in the code page of the receiving computer, data is lost. When a database serves clients from many different countries/regions, it is difficult to pick a code page for the database that contains all the extended characters required by all the client computers.

The common encoding formats used to store the data are American Standard Code for Information Interchange (ASCII) and IBM’s Extended Binary Coded Decimal Interchange Code (EBCDIC). Mostly these codes were used to represent a common language like English.

The advancement of technological challenges arose to support International Languages for global audience, a committee came with a new type of encoding format i.e. Unicode. There are different types of Unicode formats available such as UTF-8, UTF-16, UTF-32 and similarly UCS-2 used by SQL Server.

In order to support multiple languages around the world, Microsoft came up with their own UCS-2 encoding scheme to store and compare the data multiple language data in the database and it gave rise to a concept called Collation that will unique identify database’s language, sort rules, code page etc..

In brief, I have taken some excerpts about collation from Microsoft website that is shown below:

Collation:

English meaning of Collation from dictionary.com : assembling in proper numerical or logical sequence

A collation encodes the rules governing the proper use of characters for either a language, such as Greek or Polish, or an alphabet.

A SQL Server collation defines how the database engine stores and operates on character and Unicode data.

The physical storage of character strings in Microsoft® SQL Server™ 2000 is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared.

Collation Properties:

  • The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted (ascending or descending), and the way characters are evaluated in comparison operations. (<, >, =, <=, >=, < > ).
  • The sort order to use for non-Unicode character data types (char, varchar, and text).
  • The code page used to store non-Unicode character data.

SQL Server database collations can be specified at many levels :

  • When you install an instance of SQL Server database, you specify the default collation for that instance.
  • Each time you create a database, you can specify the default collation used for the database. If you do not specify a collation, the default collation for the database is the default collation for the instance.
  • Whenever you define a character column, you can specify its collation. If you do not specify a collation, the column is created with the default collation of the database.

SQL_SortRules[_Pref]_CPCodepage_<ComparisonStyle>

SortRules (Language Character Set) 

A string identifying the alphabet or language whose sorting rules are applied when dictionary sorting is specified. Examples are Latin1_General or Polish.

Pref

Specifies uppercase preference.

Codepage

Specifies a one- to four-digit number that identifies the code page used by the collation. CP1 specifies code page 1252, for all other code pages the complete code page number is specified. For example, CP1251 specifies code page 1251 and CP850 specifies code page 850.

CaseSensitivity

CI specifies case-insensitive, CS specifies case-sensitive.

AccentSensitivity

AI specifies accent-insensitive, AS specifies accent-sensitive.

BIN
Specifies the binary sort order to be used.

Codepages

Codepages are usually defined to support specific languages or groups of languages which share common writing systems. For example, codepage 1253 provides character codes required in the Greek writing system.

The order of the character codes in a codepage allows the system to provide the appropriate character code to an application like ASP.NET or Web Services.

When a new codepage is loaded, a set of different character codes are provided to the application. In Windows, codepages can be changed on-the-fly by the user, without changing the default system language in use. An application can determine which codepages a specific font supports and then can present language options to the user.

January 24, 2011

SQL Server Dates – Part 2 (Quarter)

In this post, I will be showing you how to generate Quarterly dates using getdate() database function.

Quarter Dates

The following code will generate First Day of the Previous Quarter, Last Day of the Previous Quarter, First Day of the Current Quarter, Last Day of the Current Quarter.

select ‘First Day of the Previous Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate())-1,0)
select ‘Last Day of the Previous Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate()),0)-1
select ‘First Day of the Current Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate()),0)
select ‘Last Day of the Current Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate())+1,0)-1
select ‘First Day of the Next Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate())+1,0)

Wait for my next post for more Quarterly dates. Click for Date Part 3

SQL Server Dates – Part 1 (Year)

To display today’s date in SQL Server 2008 select GETDATE().

Following are some of the dates that can be used in SQL Queries to fulfill the end user needs while retrieving data from the database or creating custom objects for delivering reports in BI tools.

select ‘First Day of the Previous Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate())-1,0)
select ‘Last Day of the Previous Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate()),0)-1
select ‘First Day of the Current Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate()),0)
select ‘Last Day of the Current Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate())+1,0)-1
select ‘First Day of the Next Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate())+1,0)

I will be continuing on how to generate Quarterly dates in my next post Quarter Dates

January 23, 2011

SQL Server Dates – Part 5 (Week)

Filed under: SQL Server 2000/2005/2008 — Tags: , , , — Irfan @ 10:28 pm

Here are some of the examples for generating weekly dates that are helpful to create weekly reports.

Week Dates

select ‘First Day of the Previous Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate())-1,0)

select ‘Last Day of the Previous Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate()),0)-1
select ‘First Day of the Current Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate()),0)
select ‘Last Day of the Current Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate())+1,0)-1
select ‘First Day of the Next Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate())+1,0)

SQL Server Dates – Part 4 (Month)

Most of the times when we develop Business Intelligence reports for trending, like trend reports, cross tabs or charts, compare analysis ( current month vs. previous month, current rolling 6 months vs. previous rolling 6 months, current rolling 12 months vs. previous rolling 12 months) which are widely used in BI industry.

Following are some date examples of how to generate those dates using GETDATE() function from SQL Server 2008 database that can be beneficial for your report development. These date examples are also very useful for scheduling reports. (i.e. something we call dynamic dates as parameters for scheduling the reports)

Month Dates

select ‘First Day of the Previous Month :’ , DATEADD(MM,DATEDIFF(MM,0,getdate())-1,0)
select ‘Last Day of the Previous Month :’ , DATEADD(MM,DATEDIFF(MM,0,getdate()),0)-1
select ‘First Day of the Current Month :’ , DATEADD(MM,DATEDIFF(MM,0,getdate()),0)
select ‘Last Day of the Current Month :’ , DATEADD(MM,DATEDIFF(MM,0,getdate())+1,0)-1
select ‘First Day of the Next Month :’ , DATEADD(MM,DATEDIFF(MM,0,getdate())+1,0)

Rolling 12 Months

select ‘Current Rolling 12 Months / Begin Date :’ , DATEADD(MM,DATEDIFF(MM,0,getdate())-11,0)
select ‘Current Rolling 12 Months / End Date :’ , DATEADD(MM,DATEDIFF(MM,0,getdate())+1,0)-1
select ‘Previous Rolling 12 Months / Begin Date :’ , DATEADD(MM,DATEDIFF(MM,0,getdate())-23,0)
select ‘Previous Rolling 12 Months / End Date :’ , DATEADD(MM,DATEDIFF(MM,0,getdate())-11,0)-1

Wait for my next post on Weekly dates.

SQL Server Dates – Part 3 (Quarter ..)

Quarterly Dates continued … Following are mostly used Quarterly Dates in Data Warehousing and Business Intelligence reports.

Previous Year Quarters:

select ‘First Day of the Previous Year / First Quarter :’ , DATEADD(Q,0,DATEADD(YY,DATEDIFF(YY,0,getdate())-1,0))
select ‘First Day of the Previous Year / Second Quarter :’ , DATEADD(Q,1,DATEADD(YY,DATEDIFF(YY,0,getdate())-1,0))
select ‘First Day of the Previous Year / Third Quarter :’ , DATEADD(Q,2,DATEADD(YY,DATEDIFF(YY,0,getdate())-1,0))
select ‘First Day of the Previous Year / Fourth Quarter :’ , DATEADD(Q,3,DATEADD(YY,DATEDIFF(YY,0,getdate())-1,0))

Current Year Quarters:

select ‘First Day of the Current Year / First Quarter :’ , DATEADD(Q,0,DATEADD(YY,DATEDIFF(YY,0,getdate()),0))
select ‘First Day of the Current Year / Second Quarter :’ , DATEADD(Q,1,DATEADD(YY,DATEDIFF(YY,0,getdate()),0))
select ‘First Day of the Current Year / Third Quarter :’ , DATEADD(Q,2,DATEADD(YY,DATEDIFF(YY,0,getdate()),0))
select ‘First Day of the Current Year / Fourth Quarter :’ , DATEADD(Q,3,DATEADD(YY,DATEDIFF(YY,0,getdate()),0))
select ‘First Day of the Current Year / Third Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,3,getdate())-1,0)
select ‘Last Day of the Current Year / Second Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,3,getdate())-1,0)-1

Blog at WordPress.com.