Talend Best Practices

1. Reusability

2. Error Handling

3. Performance

4. Logging

 

1. Reusability:

Talend provides many ways to make our code reusable. There are also techniques that we can use to cover some of the things that Talend does not always help with.

  • Jobs – A Job can call another with the use of a,tRunJob,component.

  • Joblets – A Joblet is a specific component that replaces Job component groups. It factorizes recurrent processing or complex transformation steps to ease the reading of a complex Job. Joblets can be reused in different Jobs or several times in the same Job.

  • Context Groups – Context describes the user-defined parameters that are passed to our Job at runtime. Context Variables are the values that may change as we promote our Job from Development, through to Test and Production. Values may also change as our environment changes, for example, passwords may change from time to time.

Context Groups allows us to easily define a group of Context Variables that can be added to our Job as a group rather that individually adding each Context Variable to every Job that needs it. Define sets of context variables – these can be loaded from external sources.

  • Code Routines – Routines are reusable pieces of Java code. Sometimes, we want to write our own routines. This allows us to write code once, which we can call many times from within the same Job or from more than one of our Jobs. We may also want to Refactor complex code for readability, or use a routine to aid using external libraries that we’ve imported.

  • Metadata-Metadata is defined as “data about data”. It describes the data but isn’t the data itself. In terms of Talend open Studio, metadata refers to reusable configurations that describe the data, its attributes, or its containers. For example, we could define metadata in the Studio that describes an XML schema, a web service definition, or an FTP connection.

Once the metadata is defined, it can be used across multiple jobs. It also allows a single place to update metadata configurations for many jobs. For example, if the password to an FTP account changes and this FTP connection is used in 10 different jobs, the details would have to be updated 10 times. However, if you store this configuration in a single metadata component, it only needs to be updated once.

  • Custom Code – The Custom Code components enable us to create codes for specific needs, quickly and efficiently. Talend provides few components like tJava, tJavaRow, tJavaFlex, tLibraryLoad etc.

2. Error Handling:

Talend provides different components to handle errors and by using those components we can store the log information about execution of each job.

We can catch the exceptions using the component tLogCatcher and record it in a database table, file, email etc.

Sending exception details to the email:

If we setup the crone job and want to immediately get the notification as a mail whenever there is an exception in the job. Based on that exception we can easily check the logs and take an appropriate action. There is no need to monitor the crone job every time in this way. We are using this procedure in our current system.

Talend provides a component called tsendMail, using this we can send exception details as a mail based on the exception caught by tLogCatcher .Exception details includes the job name, time, project name, type, message details and the component name which is throwing exception.

Talend also providing a feature called AMC (Activity Monitoring Console) in enterprise editions, especially for monitoring activities.

Monitoring jobs with Talend Activity Monitoring Console:

Talend Activity Monitoring Console,provides detailed monitoring capabilities that can be used to consolidate the collected log information, understand the underlying component and Job interaction prevent faults that could be unexpectedly generated and support system management decisions. For more information, see the,attached Talend Activity Monitoring Console User Guide with this documentation.

References:

https://help.talend.com/display/TalendActivityMonitoringConsoleUserGuide51bEN/2.1+Talend+Activity+Monitoring+Console%27s+GUI
https://help.talend.com/display/TalendActivityMonitoringConsoleUserGuide51bEN/1.2+Accessing+Talend+Activity+Monitoring+Console

 

  1. Performance :

Performance (usually meaning speed of execution) is often a key metric of a Jobs success or failure. There are many aspects to tuning a Job, including: General Job design, Memory management, I/O, Sharing the load with inputs and outputs (e.g.. Databases) and Parallelism.

We can speed up the job process in below following ways:

  1. Global Connection: we can create a global connection and make use the same connection throughout the job with “use existing connection” feature in required component. We can share the same connection to different sub jobs also.

  2. Global Context: Defining the context group and access context details based on the context group and appropriate context variable throughout the job and we can pass the same context to sub jobs also.

  3. If we have more complex job then we can divide it into multiple sub jobs to increase the processing time and speed.

  4. We can also use Multithreading feature to improve the performance.

For example : whenever we required to run few sub jobs in parallel e.g. loading different files to different table, extracting data from different tables to different files etc, we can enable the multi threaded execution to run the sub jobs in Parallel.

 

 

  1. Logging :

Talend also provides some log components to record information about the execution of the Job, over time. This helps with performance tuning and resolving issues.

We can write the logs information into a specific file via crone or if we running manually it will display on the console.

Talend provides few components for this purpose like tLogRow, tLogCatcher etc.

tLogRow: This component allows us to write row data to the Job log file, or console window, if we’re running the Job from within Talend Studio.

tLogCatcher: This component will catch the logs or exception details from other components in Talend studio.