Thursday, August 21, 2014

Sorter Transformation

Sorter Transformation:

  • Active and Connected. Active as when we use distinct option the number of rows might reduce.
  • We can use sorter to sort data in ascending or descending order according to a specified sort key.
  • You can sort data from relational or flat file sources. We can also use it to pass data to aggregator transformation configured for sorted input.
  • The Sorter transformation contains only input/output ports.
  • Sort key could be single or multiple columns. When multiple columns are used the order the ports appear in the Ports tab determines the
    succession of sort operations.


  • Sorter transformation properties:
  1. Sorter Cache Size: The Integration Service uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation. The Integration Service passes all incoming data into the Sorter transformation before it performs the sort operation. You can configure a numeric value for the Sorter cache, or you can configure the Integration Service to determine the cache size at run time. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
     
If the Integration Service runs a partitioned session, it allocates the specified amount of Sorter cache memory for each partition.

If it cannot allocate enough memory, the Integration Service fails the session.

Sorter cache size is set to 16,777,216 bytes by default.
  1. Case Sensitive: When you enable the Case Sensitive property, the Integration Service sorts uppercase characters higher than lowercase characters.
  2. Work Directory: You must specify a work directory the Integration Service uses to create temporary files while it sorts data. After the Integration Service sorts the data, it deletes the temporary files. By default, the Integration Service uses the value specified for the $PMTempDir process variable.
  1. Distinct Output Rows: You can configure the Sorter transformation to treat output rows as distinct. You can configure the Sorter transformation to treat output rows as distinct.
  1. Tracing Level: Configure the Sorter transformation tracing level to control the number and type of Sorter error and status messages the Integration Service writes to the session log.
  1. Null Treated Low: Enable this property if you want the Integration Service to treat null values as lower than any other value when it performs the sort operation. Disable this option if you want the Integration Service to treat null values as higher than any other value.
  1. Transformation Scope:

♦ Transaction . Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions.

♦ All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.

Aggregator Transformation

                                                     Informatica Transformations

Aggregator Transformation  
  • Active and Connected.
  • The Aggregator transformation performs aggregate calculations, such as averages and sums.
  • The Aggregator transformation makes use of aggregator cache to perform aggregations.
  • What is Difference between Aggregator and Expression Transformation: Aggregator transformation to perform calculations on groups. The Expression transformation permits we to perform calculations on a row-by-row basis only.
  • What is incremental aggregation option:
After we have used the aggregator transformation in our mapping we can can enable the session option, Incremental Aggregation. Note that this is done using session properties option. This means this can be done only through session and not When the Integration Service performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.
Explain Components of aggregator transformation:

The Aggregator transformation has the following components and options:
  • Aggregate cache. The Integration Service stores data in the aggregate cache until it completes aggregate calculations. It stores group values in an index cache (when we are using group by) and row data in the data cache.
  • Aggregate expression. Enter an expression in an output port. Note that we need to create a new output port and then enter the expression. We cant do that on input-output port.The expression can include nont-aggregate expressions and conditional clauses.
  • Group by port. Indicate how to create groups. The port can be any input, input/output, output, or variable port. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
  • Sorted input. Select this option to improve session performance. To use sorted input, we must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
 Explain Aggregator Transformation properties:


Property
Description.
Cache Directory
Local directory where the Integration Service creates the index and data cache files. By default: $PMCacheDir.
If we have enabled incremental aggregation, the Integration Service creates a backup of the files each time we run the session. The cache directory must contain enough disk space for two sets of the files.
Tracing Level
Amount of detail displayed in the session log for this transformation.
Sorted Input
Indicates input data is presorted by groups. Select this option only if the mapping passes sorted data to the Aggregator transformation.
Aggregator Data Cache Size
Data cache size for the transformation. Default cache size is 2,000,000 bytes.
we can configure the Integration Service to determine the cache size at run time, or we can configure a numeric value. If we configure the Integration Service to determine the cache size, we can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Aggregator Index Cache Size:
Index cache size for the transformation. Default cache size is 1,000,000 bytes.
we can configure the Integration Service to determine the cache size at run time, or we can configure a numeric value. If we configure the Integration Service to determine the cache size, we can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Transformation Scope
Specifies how the Integration Service applies the transformation logic.
Transaction: Applies the transformation logic to all rows in a transaction. Choose
Transaction when a row of data depends on all rows in the same transaction, but
does not depend on rows in other transactions. Traansaction commit points can be created using Transaction control transformation.

All Input: Applies the transformation logic on all incoming data. When we choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
This is what it generally should be.

All the above properties can be further overridden at the session level:
  • How does aggregator use Aggregator cache:
  1. The Integration Service creates index and data caches in memory to process the transformation. If the Integration Service requires more space, it stores overflow values in cache files. Note that only if more space is required cache files are created.
  2. We can configure the index and data caches in the Aggregator transformation or in the session properties.
  3. Why no cache required for Sorted inputs:
The Integration Service uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. We do not need to configure cache memory for Aggregator transformations that use sorted ports
  • Explain Aggregate expressions:
  1. The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions.
  2. It can also include one aggregate function nested within another aggregate function, such as: MAX( COUNT( ITEM ))
  • What Aggregate functions can be used:
  1. Following functions can be used with aggregator:
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
  1. We can nest one aggregate function within another aggregate function.
  1. We can include multiple single-level or multiple nested functions in different output ports in an Aggregator transformation. However, we cannot include both single-level and nested functions in an Aggregator transformation. Therefore, if an Aggregator transformation contains a single-level function in any output port, we cannot use a nested function in any other port in that transformation. When we include single-level and nested functions in the same Aggregator transformation, the Designer marks the mapping or mapplet invalid. If we need to create both single-level and nested functions, we should create separate Aggregator transformations.
  1. We can Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.

Ex: SUM( COMMISSION, COMMISSION > QUOTA )

  1. We can also use non-aggregate functions in the aggregate expression.
  1. When we configure the Integration Service, we can choose how we want the Integration Service to handle null values in aggregate functions. we can choose to treat null values in aggregate functions as NULL or zero. By default, the Integration Service treats null values as NULL in aggregate functions.
Define a default value for each port in the group to replace null input values. This allows the Integration Service to include null item groups in the aggregation. 
  • Explain how to use Group By Ports:
  1. The Aggregator transformation lets we define groups for aggregations, rather than performing the aggregation across all input data.
  2. To define a group for the aggregate expression, select the appropriate input, input/output, output, and variable ports in the Aggregator transformation. we can select multiple group by ports to create a new group for each unique combination.
  1. When we group values, the Integration Service produces one row for each group. If we do not group values, the Integration Service returns one row for all input rows. The Integration Service typically returns the last row of each group (or the last row received) with the result of the aggregation. (unless we some function like FIRST).
  1. When selecting multiple group by ports in the Aggregator transformation, the Integration Service uses port or der to determine the order by which it groups.
  1. Use non-aggregate expressions in group by ports to modify or replace groups. For example, if we want to replace ‘AAA battery’ before grouping, we can create a new group by output port, named CORRECTED_ITEM, using the following expression:
IIF( ITEM = 'AAA battery', battery, ITEM )
  • Explain how to Use Sorted Input:
  1. We can improve Aggregator transformation performance by using the sorted input option. When we use sorted input, the Integration Service assumes all data is sorted by group and it performs aggregate calculations as it reads rows for a group.
  2. Sorted input uses only memory and no cache is involved and hence it is beneficial as far as performance is concerned.
  3. This prevent aggregator to wait for entire data to be first read.
When we do not use sorted input, the Integration Service performs aggregate calculations as it reads. Since the data is not sorted, the Integration Service stores data for each group until it reads the entire source to ensure all aggregate calculations are accurate.
  1. If we use sorted input and do not presort data correctly, we receive unexpected results.
  2. Do not use sorted input if either of the following conditions are true:
The aggregate expression uses nested aggregate functions.
The session uses incremental aggregation. Very imp to note that when using incremental aggregation we cannot use sorted input.
  1. Data must be sorted in the following ways:
By the Aggregator group by ports, in the order they appear in the Aggregator transformation.
Using the same sort order configured for the session.
  1. For relational and file sources, use the Sorter transformation to sort data in the mapping before passing it to the Aggregator transformation.
If the session uses relational sources, we can also use the Number of Sorted Ports option in the Source Qualifier transformation to sort group by columns in the source database. Group by columns must be in the same order in both the Aggregator and Source Qualifier transformations.

Note: We cannot add expressions to existing input-output ports. So we need to create new output or variable ports.

Tips:
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
 

Wednesday, August 20, 2014

About Me

Hi everyone this is Ranjith, An enthusiastic of Oracle Business Intelligence, have been 
working in the field of BI/DW in different roles. Inspired with many others and have decided to start my own blog. All the views expressed here are my own and does not reflect the views of my employers/clients who i work for. Thanks a lot for visiting this blog, your comments are most welcome.