Skip to main content

Fuzzy lookup Transformation


The Fuzzy Lookup transformation differs from the Lookup transformation in its use of fuzzy matching. The Lookup transformation uses an equi-join to locate matching records in the reference table. It returns records with at least one matching record, and returns records with no matching records. In contrast, the Fuzzy Lookup transformation uses fuzzy matching to return one or more close matches in the reference table.
Step 1:
Create a fuzzy lookup source and fuzzy lookup reference tables.
create table fuzzyLookupSource
(
      firstName varchar(10),
      LastName varchar(10),
      BirthDate datetime
)
insert into fuzzyLookupSource
select 'Masud','Pervez','02/07/1980' union all
select 'Tamirul','Islam','03/31/1983' union all
select 'Animesh','Chandra','04/09/1980'  union all
select 'Shahriar','Bin Elahi','05/05/1980'  union all
select 'Masud','Rana','04/15/1980' 
GO
 
create table fuzzyLookupReference
(
      firstName varchar(10),
      LastName varchar(10),
      BirthDate datetime
)
insert into fuzzyLookupReference
select 'Masud','Parvez','02/07/1980' union all
select 'Tamirul','Islam','03/31/1983' union all
select 'Animesh','Chandra De','06/05/1980'  union all
select 'Shahriar','Elahi','04/09/1980'

GO




Step 2:
Click Windows button and go to SQL Server 2008 R2 and run as a administrator.

It shows a bellow window.

Step 3:
Go to a views and create a new project (Short cut of New Project is Ctrl + Shift +N).

Step 4:
Give a project name (e.g. IS) and click OK button and then shows a bellow window.
Step 5:
To drag and drop the Data Flow Task
Step 6: 
To edit the Data Flow Task then it open a Data Flow
Step 7:
Drag and drop OLEDB source and edit it.

Step 8:
Give a connection manager and select table as dbo.fuzzyLookupSource.

Step 9:

Drag and drop a Fuzzy Lookup transformation to the Data Flow and right click to edit it.

Step 10:
Select a reference table as  dbo.fuzzyLookupSourceReference.

Step 11:
Go to columns and select which columns we want.

Step 12:
In advance set similarity threshold is 50%

Step 13:
Drag and drop conditional split and edit it.

Step 14:
Conditions
 "Solid Matched"  is "_Similarity > 0.85 &&_Confidence > 0.8"
 "Likely Matched" is "_Similarity > .65 && _Confidence > 0.75"
Step 15:
Drag and drop three derived columns and connection it

Step 16:
Drag and drop union all
Step 17:
Drag and drop data reader destination

Step 18:
Execute a task

Comments

  1. Great article,Thank you for sharing this valuable info with us.
    Keep updating...

    MSBI Online Training

    ReplyDelete
  2. Thank you for your valuable feedback

    ReplyDelete

Post a Comment

Hi User,
Thanks for visiting My Blog and please provide your valuable feedback and subscribe for more updates. Please don't post any spam content or comments.
Thank You

Popular posts from this blog

SSRS INTERVIEW QUESTIONS

Q: What is SSRS? Ø   SSRS or SQL Server Reporting Service is a server-based report generation software systems from Microsoft and is part of Microsoft BI. Ø   It is used for preparing and delivering interactive and variety of reports. Ø   It is administered through an web based interface. Ø   Reporting services utilizes a web service interface for supporting and developing of customized reporting applications. Ø   SSRS lets you create very rich reports (Tabular/Graphical/Interactive) from various datasources with rich data visualization (Charts, Maps, sparklines) Ø   SSRS allows are reports to be exported in various formats (Excel, PDF, word etc) Q: Explain SSRS Architecture? Reporting services architecture comprises of integrated components. It is a multi-tiered, included with application, server and data layers. This architecture is scalable and modular. A single installation can be used across multiple computers. It includes the fo...

Exception deserializing the package "The process cannot access the file because it is being used by another process."

TITLE: Microsoft Visual Studio ------------------------------ Failed to start project ------------------------------ ADDITIONAL INFORMATION: Exception deserializing the package "The process cannot access the file 'E:\SSASCube\HistoricalDataLoad\HistoricalDataLoad\bin\Development\HistoricalDataLoad.ispac' because it is being used by another process.". (Microsoft.DataTransformationServices.VsIntegration) ------------------------------ The process cannot access the file 'E:\SSASCube\HistoricalDataLoad\HistoricalDataLoad\bin\Development\HistoricalDataLoad.ispac' because it is being used by another process. (mscorlib) ------------------------------ BUTTONS: OK ------------------------------ While running SSIS package i got the error “The process cannot access the file ‘*.ispac’ because it is being used by another process”. I tried to close SSDT and run it again but, I still got the same error while compiling. Then, after searching over internet, I got...

Failed to execute the package or element. Build errors were encountered

Error: TITLE: Microsoft Visual Studio ------------------------------ Failed to execute the package or element.   Build errors were encountered. For more information, see the Output window. ------------------------------ BUTTONS: OK ------------------------------   Solution: We tried to close SSDT and run it again but, we still got the same error while running SSIS package. Then, we need to follow bellow solution: Step 1: Go to Task Manager–> Details Tab. Step 2: Locate the process “ DtsDebugHost.exe “. Kill this process. There might be multiple instances of this process. Kill all of them. Step 3: Rerun SSIS package