Irfan's World

January 30, 2011

How to Load Unicode Data from Excel to SQL Server that supports non Unicode format ?

Filed under: SSIS — Tags: , , — Irfan @ 9:23 am

Requirement (2): Load the data from an excel file to a non Unicode SQL Server 2008 database table. 

In my earlier post  here , we dealt with Unicode to Unicode format. Today, I will show you how to load Unicode data from a excel file to non Unicode table.

We do need both excel(source) and OLE DB (destination) connection managers. When we load Unicode data to non Unicode format database table, it throws the following error in SSIS Business Intelligence Development Studio (BIDS).

Error Message : Cannot convert between Unicode and non Unicode Character string….



To resolve the above error and move forward, we first need to convert the Unicode characters that are coming from excel source file and send it to the target component.

Moreover, SSIS does not even allow you to run the task without solving the error.  How do we resolve this ..? 

There is a special task called "Data Conversion" available in BIDS in Data Flow transformations. We will use this task to convert the data from Unicode to non Unicode format.

First insert the "Data Conversion" task between the source and target, after the insertion our Data Flow looks like this.




Now double click the Data Conversion task and change the data type as shown below.  The following screen shot shows the data type’s Converted from DT_WSTR(Unicode) to DT_STR(non Unicode string) 🙂



Click OK to close the Data conversion transformation editor. Go back to the Data Flow OLE DB Destination and map the columns correctly as shown below in .

We have to use the new columns that are non Unicode coming out of “Data Conversion” task and map to the Destination columns of the target table.



Now, Close the Destination by Clicking OK and execute the task by right clicking and selecting Execute Task from the Context menu. It will execute successfully and loads all the rows in the destination table.
Following screen shot displays the no. of rows read from the source, converted to string and loaded them successfully in the target table.



You can check the data in the target table either from BIDS or SQL Server Management Studio.  Hope I’ve explained clearly.  Thanks for reading.

January 27, 2011

Custom Date Format in Universe Designer for WebI reports

Date is an important attribute in doing business, capturing events etc.   As you know there are lot of different date formats used in different countries.  For example, In USA we use MM/DD/YYYY format, in UK they use DD/MM/YYYY etc.  A request came to me from one of my client to format a date that displays month name, something  similar  as "7th of February 2005"  or "23rd of February 2004" format, as they need to convey some information through a business letter to their customers living around the world.  The client’s reporting system mostly uses Web Intelligence reports.   My first idea was to train the users on how to create variable in the report and use it, Later I have decided that I should create an object in the Universe and came up with the following solution.  It is not that difficult once you understand what the code does to create a Universe object to support this requirement.

We have the following data in the SQL Server 2008 database, which shows the date format from the database.

SQL Code for the Universe Object (SQL Server) :

CASE
  
WHEN DATENAME(DAY, [Date]) IN (1,21,31) THEN
       
DATENAME(DAY, [Date]) + 'st ' + 'of ' +
       
DATENAME(MM, [Date]) + ' ' +
       
DATENAME(YYYY, [Date])
  
WHEN DATENAME(DAY, [Date]) IN (2,22) THEN
       
DATENAME(DAY, [Date]) + 'nd ' + 'of ' +
       
DATENAME(MM, [Date]) + ' ' +
       
DATENAME(YYYY, [Date])
  
WHEN DATENAME(DAY, [Date]) IN (3,23) THEN
       
DATENAME(DAY, [Date]) + 'rd ' + 'of ' +
       
DATENAME(MM, [Date]) + ' ' +
       
DATENAME(YYYY, [Date])
  
ELSE
       
DATENAME(DAY, [Date]) + 'th ' + 'of ' +
       
DATENAME(MM, [Date]) + ' ' +
       
DATENAME(YYYY, [Date])
END

Parse the Universe Object to make sure that the syntax and the code is correct, test the object on few reports before it goes to production.

Let’s test what data is returned by this object.

Hope this tip helps. See you later !!!.

January 25, 2011

How to load data from an excel file using SSIS 2008 when target table supports Unicode ?

Filed under: SSIS — Tags: , , , , — Irfan @ 12:47 am

Requirement (1): Load the data from an excel file to a Unicode database SQL Server 2008 database table.  

 
    Let’s first see what we have. We have an excel file in Office 2007 format that has 57 rows to load into a destination table in the Target database using SQL Server 2008 Integration Services.

Source : Excel file in MS Office 2007

No. of rows : 57
Destination : SQL Server 2008 Database

No. of rows after the load : 57

 

This would be an easy task, we can load the data without any issues. The source data file looks like below which has about 7 columns:

Now to load the data using SSIS, you need to install SSIS (SQL Server Integration Services) on you machine. 

    Open BIDS
    Create a Package and give it name and save it
     Create one excel Connection Manager and one OLE DB Connection Manager
    Create a Control Flow in which you will create a Data Flow. (This data flow is shown below)
  

And, the SSIS package developed in BIDS (Business Intelligence Development Studio) to load the data from excel file looks like below, It has a data flow that contains a connection to an excel data source file plus it has an OLE DB connection that connects to our Target table in the Destination database.

The destination table has the following data type’s that stores Unicode characters without any issues. Note to the readers, SSIS 2008 when reads first 8 records from the excel file it decides the data type of that column, Here the excel file string columns are read by SSIS as Unicode strings and our Target table is having Unicode data type’s which supports Unicode characters that can be stored easily. So the data type’s in source and target matches.

When the job is executed in BIDS (Business Intelligence Development Studio) it loads all the data to the destination table successfully.

Unicode ——-> to ——> Unicode (57 rows are loaded and the Job works perfectly, No worries) 🙂

Wait, what happens when we try to load the same excel file data to a non Unicode target table ?

<!–[if get mso 9]> Normal 0 false false false EN-US X-NONE AR-SA <![endif]–><!–[if gte mso 9]> <![endif]–> <!–[endif]–>

Source : Excel file in MS Office 2007 format

 

No. of rows : 57

Destination : SQL Server 2008

 

No. of rows after the load : 57

What is Unicode and non Unicode data formats?

Filed under: Data Warehouse — Tags: , — Irfan @ 12:33 am

In my earlier post, I have explained about data. In this post, I am going to explain you some basics about Unicode and non Unicode formats:

Unicode :
A Unicode character takes more bytes to store the data in the database. As we all know, many global industries wants to increase their business worldwide and grow at the same time, they would want to widen their business by providing services to the customers worldwide by supporting different languages like Chinese, Japanese, Korean and Arabic. Many websites these days are supporting international languages to do their business and to attract more and more customers and that makes life easier for both the parties.

To store the customer data into the database the database must support a mechanism to store the international characters, storing these characters is not easy, and many database vendors have to revised their strategies and come up with new mechanisms to support or to store these international characters in the database. Some of the big vendors like Oracle, Microsoft, IBM and other database vendors started providing the international character support so that the data can be stored and retrieved accordingly to avoid any hiccups while doing business with the international customers.

The difference in storing character data between Unicode and non-Unicode depends on whether non-Unicode data is stored by using double-byte character sets. All non-East Asian languages and the Thai language store non-Unicode characters in single bytes. Therefore, storing these languages as Unicode uses two times the space that is used specifying a non-Unicode code page. On the other hand, the non-Unicode code pages of many other Asian languages specify character storage in double-byte character sets (DBCS). Therefore, for these languages, there is almost no difference in storage between non-Unicode and Unicode.

Encoding Formats:
Some of the common encoding formats for Unicode are UCS-2, UTF-8, UTF-16, UTF-32 have been made available by database vendors to their customers. For SQL Server 7.0 and higher versions Microsoft uses the encoding format UCS-2 to store the UTF-8 data. Under this mechanism, all Unicode characters are stored by using 2 bytes.

Unicode data can be encoded in many different ways. UCS-2 and UTF-8 are two common ways to store bit patterns that represent Unicode characters. Microsoft Windows NT, SQL Server, Java, COM, and the SQL Server ODBC driver and OLEDB provider all internally represent Unicode data as UCS-2.

The options for using SQL Server 7.0 or SQL Server 2000 as a backend server for an application that sends and receives Unicode data that is encoded as UTF-8 include:

For example, if your business is using a website supporting ASP pages, then this is what happens:

If your application uses Active Server Pages (ASP) and you are using Internet Information Server (IIS) 5.0 and Microsoft Windows 2000, you can add "<% Session.Codepage=65001 %>" to your server-side ASP script. This instructs IIS to convert all dynamically generated strings (example: Response.Write) from UCS-2 to UTF-8 automatically before sending them to the client.

If you do not want to enable sessions, you can alternatively use the server-side directive "<%@ CodePage=65001 %>".

Any UTF-8 data sent from the client to the server via GET or POST is also converted to UCS-2 automatically. The Session.Codepage property is the recommended method to handle UTF-8 data within a web application. This Codepage setting is not available on IIS 4.0 and Windows NT 4.0.

Sorting and other operations :

The effect of Unicode data on performance is complicated by a variety of factors that include the following:

1. The difference between Unicode sorting rules and non-Unicode sorting rules
2. The difference between sorting double-byte and single-byte characters
3. Code page conversion between client and server

Performing operations like >, <, ORDER BY are resource intensive and will be difficult to get correct results if the codepage conversion between client and server is not available.

Sorting lots of Unicode data can be slower than non-Unicode data, because the data is stored in double bytes. On the other hand, sorting Asian characters in Unicode is faster than sorting Asian DBCS data in a specific code page, because DBCS data is actually a mixture of single-byte and double-byte widths, while Unicode characters are fixed-width.

Non-Unicode :

Non Unicode is exactly opposite to Unicode. Using non Unicode it is easy to store languages like ‘English’ but not other Asian languages that need more bits to store correctly otherwise truncation will occur.

Now, let’s see some of the advantages of not storing the data in Unicode format:

1. It takes less space to store the data in the database hence we will save lot of hard disk space.
2. Moving of database files from one server to other takes less time.
3. Backup and restore of the database makes huge impact and it is good for DBA’s that it takes less time

Non-Unicode vs. Unicode Data Types: Comparison Chart

The primary difference between unicode and non-Unicode data types is the ability of Unicode to easily handle the storage of foreign language characters which also requires more storage space.

Non-Unicode Unicode
(char, varchar, text) (nchar, nvarchar, ntext)
Stores data in fixed or variable length Same as non-Unicode
char: data is padded with blanks to fill the field size. For example, if a char(10) field contains 5 characters the system will pad it with 5 blanks nchar: same as char
varchar: stores actual value and does not pad with blanks nvarchar: same as varchar
requires 1 byte of storage requires 2 bytes of storage
char and varchar: can store up to 8000 characters nchar and nvarchar: can store up to 4000 characters
Best suited for US English: "One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters."1 Best suited for systems that need to support at least one foreign language: "The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.

I think I should stop here, I will go on writing otherwise.

What is data ? How it is stored in Database ?

Filed under: Data Warehouse — Tags: — Irfan @ 12:29 am

Today, I am going to explain you a little about data. This four letter character is a huge topic that cannot be explained in a single post. I will try to keep it concise. In addition to that, we will see different types of data encoding schemes, their use in businesses. During this era where many companies all building their data warehouses by supporting Unicode or non Unicode formats, this topic has been highly discussed by DBA’s or Data Architect’s to support their projects and is top in their discussion list. In this post, I am leaning more towards Microsoft SQL Server and how it is used.?

Data :

A set of characters belonging to a language plus some special characters from a keyboard.

For example, A character set of English Language may contain letters from A..Z, numbers, punctuations, commas, ampersand, plus, minus, hyphen, backslash, tilde, ENTER, SHIFT and other special characters.

How Character data is stored:

In a computer, characters are represented by different patterns of bits being either ON or OFF.

There are 8 bits in a byte, and the 8 bits can be turned ON and OFF in 256 (28) different patterns. A program that uses 1 byte to store each character can therefore represent up to 256 different characters by assigning a character to each of the bit patterns.

There are 16 bits in 2 bytes, and 16 bits can be turned ON and OFF in 65,536 unique patterns. A program that uses 2 bytes to represent each character can represent up to 65,536 (216) characters.

Single-byte code pages (SBCS) are definitions of the characters mapped to each of the 256 bit patterns possible in a byte. Code pages define bit patterns for uppercase and lowercase characters, digits, symbols, and special characters such as !, @, #, or %. Each European language, such as German or Spanish, has its own single-byte code page. Although the bit patterns used to represent the Latin alphabet characters A through Z are the same for all the code pages, the bit patterns used to represent accented characters such as ‘é’ and ‘á’ vary from one code page to the next.

For example, If data is exchanged between computers running different code pages, all character data must be converted from the code page of the sending computer to the code page of the receiving computer. If the source data has extended characters that are not defined in the code page of the receiving computer, data is lost. When a database serves clients from many different countries/regions, it is difficult to pick a code page for the database that contains all the extended characters required by all the client computers.

The common encoding formats used to store the data are American Standard Code for Information Interchange (ASCII) and IBM’s Extended Binary Coded Decimal Interchange Code (EBCDIC). Mostly these codes were used to represent a common language like English.

The advancement of technological challenges arose to support International Languages for global audience, a committee came with a new type of encoding format i.e. Unicode. There are different types of Unicode formats available such as UTF-8, UTF-16, UTF-32 and similarly UCS-2 used by SQL Server.

In order to support multiple languages around the world, Microsoft came up with their own UCS-2 encoding scheme to store and compare the data multiple language data in the database and it gave rise to a concept called Collation that will unique identify database’s language, sort rules, code page etc..

In brief, I have taken some excerpts about collation from Microsoft website that is shown below:

Collation:

English meaning of Collation from dictionary.com : assembling in proper numerical or logical sequence

A collation encodes the rules governing the proper use of characters for either a language, such as Greek or Polish, or an alphabet.

A SQL Server collation defines how the database engine stores and operates on character and Unicode data.

The physical storage of character strings in Microsoft® SQL Server™ 2000 is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared.

Collation Properties:

  • The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted (ascending or descending), and the way characters are evaluated in comparison operations. (<, >, =, <=, >=, < > ).
  • The sort order to use for non-Unicode character data types (char, varchar, and text).
  • The code page used to store non-Unicode character data.

SQL Server database collations can be specified at many levels :

  • When you install an instance of SQL Server database, you specify the default collation for that instance.
  • Each time you create a database, you can specify the default collation used for the database. If you do not specify a collation, the default collation for the database is the default collation for the instance.
  • Whenever you define a character column, you can specify its collation. If you do not specify a collation, the column is created with the default collation of the database.

SQL_SortRules[_Pref]_CPCodepage_<ComparisonStyle>

SortRules (Language Character Set) 

A string identifying the alphabet or language whose sorting rules are applied when dictionary sorting is specified. Examples are Latin1_General or Polish.

Pref

Specifies uppercase preference.

Codepage

Specifies a one- to four-digit number that identifies the code page used by the collation. CP1 specifies code page 1252, for all other code pages the complete code page number is specified. For example, CP1251 specifies code page 1251 and CP850 specifies code page 850.

CaseSensitivity

CI specifies case-insensitive, CS specifies case-sensitive.

AccentSensitivity

AI specifies accent-insensitive, AS specifies accent-sensitive.

BIN
Specifies the binary sort order to be used.

Codepages

Codepages are usually defined to support specific languages or groups of languages which share common writing systems. For example, codepage 1253 provides character codes required in the Greek writing system.

The order of the character codes in a codepage allows the system to provide the appropriate character code to an application like ASP.NET or Web Services.

When a new codepage is loaded, a set of different character codes are provided to the application. In Windows, codepages can be changed on-the-fly by the user, without changing the default system language in use. An application can determine which codepages a specific font supports and then can present language options to the user.

January 24, 2011

ORA – 12514 TNS: listener does not currently know of service requested in connect descriptor

Filed under: Oracle 9i/10g/11g — Tags: , — Irfan @ 10:38 pm

When you connect using SQL Developer or any other third party tools that uses JDBC thin client would return this error, but remember it will not give you this error when you connect using SQL*PLUS.

    Error ORA - 12514 : TNS listener does not currently know of service requested in connect descriptor

As the error message suggests, the database server did not register itself with the database listener.  To troubleshoot this connectivity issue, we will start with looking at TNSNAMES.ORA and LISTENER.ORA files on the client machine from which you want to connect. In other words, I would say there is something like SERVICE NAME or HOST name is incorrect in the listerner.ora or tnsnames.ora file on the client machine.

Note : If you receive this error first contact your DBA to resolve it first.

The files that we are discussing here are Oracle systems files, take the backup of these files before you change any settings.  These files contain information regarding the HOST name or IP address of Oracle database server(s) in your network.  When a client machine tries to connect to the server using oracle client middle ware then you have to make sure that the client machine has correct information of the database server (HOST name or IP Address).  I repeat HOST or IP Address.

Follow these two steps to resolve this issue:

First of all, make sure that both of these files are in sync, you must use only one of the below options, it means either

   1.     Use HOST name in both the files (TNSNAMES.ORA and LISTENER.ORA)

       or

          Use IP Address of the Database Server in both the files (TNSNAMES.ORA and LISTENER.ORA)

   2. Register the database with the listener.



Step 1:

The following screen shots shows that I am using the HOST name as ‘irfan-PC’. Here is a sample of tnsnames.ora file from my client machine.

Here is a sample of listener.ora file from my client machine in that you will see the HOST name of the database server. 

Step 2:

Now connect to SQL*PLUS as DBA and issue the following command:

ALTER SYSTEM REGISTER

This command will register the database with the listener.

Now connect SQL Developer to oracle database :

Following screen shot shows some mandatory details when you connect using SQL Developer that will ensure to connect to an oracle database using SQL Developer.

Click ‘Test’ to check the connection and it should connect without issues. Hope you saved some time by resolving this error :).

Let me know if there are any other alternatives to resolve this error.

SQL Server Dates – Part 2 (Quarter)

In this post, I will be showing you how to generate Quarterly dates using getdate() database function.

Quarter Dates

The following code will generate First Day of the Previous Quarter, Last Day of the Previous Quarter, First Day of the Current Quarter, Last Day of the Current Quarter.

select ‘First Day of the Previous Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate())-1,0)
select ‘Last Day of the Previous Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate()),0)-1
select ‘First Day of the Current Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate()),0)
select ‘Last Day of the Current Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate())+1,0)-1
select ‘First Day of the Next Quarter :’ , DATEADD(QQ,DATEDIFF(QQ,0,getdate())+1,0)

Wait for my next post for more Quarterly dates. Click for Date Part 3

SQL Server Dates – Part 1 (Year)

To display today’s date in SQL Server 2008 select GETDATE().

Following are some of the dates that can be used in SQL Queries to fulfill the end user needs while retrieving data from the database or creating custom objects for delivering reports in BI tools.

select ‘First Day of the Previous Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate())-1,0)
select ‘Last Day of the Previous Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate()),0)-1
select ‘First Day of the Current Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate()),0)
select ‘Last Day of the Current Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate())+1,0)-1
select ‘First Day of the Next Year :’ , DATEADD(YY,DATEDIFF(YY,0,getdate())+1,0)

I will be continuing on how to generate Quarterly dates in my next post Quarter Dates

January 23, 2011

SQL Server Dates – Part 6 (Year, MI, SS)

Filed under: SQL Server 2000/2005/2008 — Tags: , , , , — Irfan @ 10:34 pm

Today, I am going to show you how to add hours, minutes or seconds to a datetime field GETDATE() function from SQL Server 2008 always returns Current Date and Time.  I am referencing Current Date and Time as "now" in this article.

To display the current date and time, type the following query in SQL Server editor:

SELECT GETDATE() as ‘Current Date and Time’

Hours

Add 4 hours from now.
SELECT DATEADD(HOUR,4, GETDATE()) as ‘Add Four hours from now’

Subtract 6 hours from now.
SELECT DATEADD(HOUR,-6, GETDATE()) as ‘Subtract 6 hours from now’

Minutes

Add 10 minutes to the current date and time
SELECT DATEADD(MI,10, GETDATE()) as ‘Add 10 min to now’

Subtract 60 minutes from the current date and time
SELECT DATEADD(MI,-60, GETDATE()) as ‘Subtract 60 min from now’

Seconds

Add 50 sec to the current date and time
SELECT DATEADD(SS,50, GETDATE()) as ‘Add 50 sec to now’

Subtract 60 minutes from the current date and time
SELECT DATEADD(SS,-40, GETDATE()) as ‘Subtract 40 sec to now’

Following are the results when we run the above queries in SQL Server Management Studio.

SQL Server Dates – Part 5 (Week)

Filed under: SQL Server 2000/2005/2008 — Tags: , , , — Irfan @ 10:28 pm

Here are some of the examples for generating weekly dates that are helpful to create weekly reports.

Week Dates

select ‘First Day of the Previous Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate())-1,0)

select ‘Last Day of the Previous Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate()),0)-1
select ‘First Day of the Current Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate()),0)
select ‘Last Day of the Current Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate())+1,0)-1
select ‘First Day of the Next Week :’ , DATEADD(WW,DATEDIFF(WW,0,getdate())+1,0)

Older Posts »

Create a free website or blog at WordPress.com.