Defining Recipes (2.9)
This page discusses how to create recipes. For related reference information, see Activity Type Reference.
Creating a Recipe
To define a recipe:
-
Click the Recipes icon in the application menu.
-
Click New Recipe.
-
For Recipe Name, enter a descriptive name to display in the Recipes Dashboard. This name cannot be changed later.
-
For Recipe Short Name, enter a short name to be used within the names of the generated tables used by this recipe. This name cannot be changed later.
-
For Recipe Group, optionally select the name of a recipe group.
You can move this recipe later to another group or remove it from the group.
-
Click Submit.
The system then displays a page where you can add activities to the recipe.
Adding a Staging Activity
A staging activity populates one or more staging tables. To add a staging activity to a recipe:
-
Display the recipe:
-
Click the Recipes icon in the application menu.
-
If the recipe is in a group, expand that group.
-
Click the recipe name.
The recipe is then displayed on the right side of the page.
-
-
Click Add Staging Activity.
-
For Name, enter a descriptive name to display within the recipe.
-
For Short Name, enter a short name to be used within the names of the generated tables used by this recipe. This name cannot be changed later.
If you are going to use only one data source in this recipe, you could use a generic short name like Load, but if you are going to use multiple data sources, it may be helpful for the short names to indicate the data source.
-
For Data Source, select a data source.
This selection cannot be changed later.
-
Click Submit.
The system then displays a page where you can edit the details of how to use this data source. This page lists all the schemas that belong to this data source, and each schema represents a table, an specific API call, or a specific format of file. You can load data from any number of these items, within this staging activity.
-
For each schema that you want to include in this staging activity:
-
Click the schema name in the left. The right side of the page then displays information about the schema, including (if applicable) information about the last time data was loaded from the associated data source.
-
Specify how to use this data schema. First, optionally select the following check boxes:
-
Stage on Target Table—Enables you to copy data to the table name of your choice. The table is generated for you. If you select this, you are prompted for a full table name.
-
Use SQLQuickload—Enables the recipe to perform a bulk load for this step. If you select this, also specify the following options:
-
Action On Dropped Records—Specifies the action to take when the bulk load encounters more than the specified number of dropped records. The Use SQLQuickload option drops a record if that record cannot be inserted, which can occur if the record is invalid in some way (such as not having a required field or having a value in an incorrect format). For Action On Dropped Records, select one of the following, to specify how the system should behave if the maximum number of dropped records is reached:
-
Abort—The recipe is automatically aborted, without any workflow intervention.
-
WorkflowStrict—The recipe is stopped and a workflow task is generated. The workflow task provides the options Abort and Retry.
-
Workflow—The recipe is stopped and a workflow task is generated. The workflow task provides the options Abort, Retry, and Ignore and Proceed.
-
Proceed—The recipe continues.
-
-
Max Dropped Records—Specifies the maximum number of dropped records before the specified action is triggered.
-
-
-
Choose the fields to load. To do so, either click the check box at the top of the column of check boxes or click the check box for each field of interest.
-
-
When you are temporarily done with this activity, click Save Draft. Or if you are done defining the activity, click Publish Activity.
When you save the draft for the first time, the Staging Table field is updated to show the name of the staging table. The table is not generated until you publish the activity. If you want to add additional activities that use the staging table, it is necessary to publish the staging activity.
-
Click the Recipe link in the upper left to return to the recipe.
Adding a Transformation Activity
A transformation activity transforms data in a staging table. To add a transformation activity to a recipe:
-
Display the recipe.
-
Make sure that you have published the staging activities for the staging tables you want to transform.
-
Click Add Transformation Activity.
-
For Name, enter a descriptive name to display within the recipe.
-
For Target Source, select a schema; the list includes all schemas used in published staging activities within the same recipe.
The system then displays the fields in the associated staging table, with options to specify transformations for any of the fields.
-
For each field where you want to apply a transformation, select the field name.
The system then displays options on the right side of the page. Make edits described below and then click Apply.
-
Function Type—See Transformation Functions.
-
Overwrite—Select this if you want the new value to replace the old value. In this case, the system will not generate a new field in the staging table.
Otherwise, leave Overwrite cleared and specify a new field name (see below).
-
Transformation Name—Specify a short name for this field transformation.
-
Result Field Name—Specify a name for the new field that will store this transformed value. In this case, the system will automatically add a field to the staging table; the actual name of the new field is %T_newname where newname is the name you specified.
-
Result Field Type—Select the type for the new result field.
-
Other options—See Transformation Functions.
-
-
When you are done editing this activity, click Save Draft. Or click Publish Activity so that the recipe will use it.
-
Click the Recipe link in the upper left to return to the recipe.
Adding a Validation Activity
A validation activity validates data in a staging table and generates a validation report; the report will flag any rows that failed the validation rules that you specify. For each validation rule, you specify whether to halt the recipe or to simply issue a warning. In the case of any validation error, the system generates a workflow task for the assigned role (specified when the recipe is scheduled).
To add a validation activity to a recipe:
-
Display the recipe.
-
Click Add Validation Activity.
-
For Name, enter a descriptive name to display within the recipe.
-
For Target Source, select the schema. The list includes all schemas used in published staging activities within the same recipe.
The system then displays the fields in the staging table, with options to specify validation rules for any of the fields. Note that any field can have at most one validation rule, within a given validation activity.
-
For each field where you want to apply a validation rule, select the field name.
The system then displays options on the right side of the page. Make edits described below and then click Apply.
-
Function Type—See Validation Functions.
-
Validation Name—Specify a short name for this field transformation.
-
Result Field Name—Specify a name for the report field that will store either 1 (if the field value passes the validation rule) or 0 (if it does not). It is useful for the name to indicate the field being examined, for example CheckItemCount.
-
Validation Failure—Specify how the system should respond when it encounters a field value that fails this specific rule. A Fatal failure halts the recipe. A Warning failure logs a warning. Both kinds of failures are visible in the Workflow Inbox.
-
Other options—See Validation Functions.
-
-
When you are done editing this activity, click Save Draft. Or click Publish Activity so that the recipe will use it.
-
Click the Recipe link in the upper left to return to the recipe.
Adding a Reconciliation Activity
A reconciliation activity compares data in a staging table to comparison data, which can be either data that you have loaded (and if needed transformed and validated) within the same recipe or data from another data source. This activity generates a reconciliation report.
To add a reconciliation activity to a recipe:
-
Display the recipe.
-
Click Add Reconciliation Activity.
-
For Reconciliation Activity Name, enter a descriptive name to display within the recipe.
-
In the Primary Source section, specify the table that you ultimately want to promote and use downstream. To do so:
-
For Data Source, select the location of the table that you want to use. Typically this is Staging Tables, because the most common use case is to reconcile data you have loaded within a recipe.
-
For Tables, select the table.
-
-
In the Secondary Source section, specify the comparison table. To do so:
-
For Data Source, select the location of the table that you want to use.
-
For Tables, select the table.
-
-
Click Create Activity.
-
On the Key Map Definition tab, optionally click the Report Missing Records from Primary Source (Full Outer Join) check box. If you select this check box, the comparison will report whenever there are records in the primary source that are not found in the secondary source.
Below this check box, the page initially displays a list of the fields in the primary source, with no information about how these are to be matched to fields in the secondary source.
-
The next step is to specify how to match a row in the primary source to a row in the secondary source. Typically, each source has a primary key (consisting of one or more fields), and the primary keys can be matched to each other; that is a record that has a specific primary key in the one source will be matched to the record that has the same primary key in the second source.
If the sources both have a primary key, the easiest approach is to click Auto Match Primary Keys. When you do so, the Key Map area shows the primary key fields from the primary source on the left, with the primary key fields from the secondary (comparison) source on the right.
-
Now add fields to the comparison, as follows:
-
Click a field name under Primary Source—this adds the field to the left column in Key Map area.
-
Look for the field under Secondary Source that should be used for comparison for this field. Click that field name. The system adds this field to the right column in Key Map area.
Or click Try to Match Selected Column. In this case, the system looks for an appropriately named field in the secondary source.
To remove an item in the Key Map area, click its X button.
-
-
When you are done adding fields, click Create. The system then creates the key map for this reconciliation activity. If needed, you can return later and make edits.
-
Click Reconciliation.
Now the page displays a table listing the fields from the two sources. Here you specify rules used to compare the values.
-
Briefly review the table for familiarity. The Primary Field column lists the fields from the primary source, and the Secondary Field column list the fields from the secondary source. This information is the same as in the key map. In addition, note that Data Type indicates the data types of the fields from the primary source (which is helpful in choosing a comparison function).
-
If many or most of the fields have the same name in the two tables, click Quick Match All to quickly create a set of reconciliation rules that you can then review and edit. With this option, the system automatically creates a rule for each pair of fields that have the same name (including case). Each automatically generated rule includes a unique rule name, a comparison function, a generated field name (to hold the comparison result), and (for numeric comparisons), an initial setting for Tolerance.
Some fields may not yet have rules (which is OK).
-
Make edits (as needed) for each rule:
-
In Function Type, choose a comparison function.
-
For Rule Name, click the Edit icon and then type a unique rule name.
-
For Result Field, click the Edit icon and then type a unique field name. The reconciliation activity will write a 1 or a 0 to this field, to indicate whether reconciliation was successful.
-
For Tolerance (required for numeric comparison), click the Edit icon and then type the numeric tolerance.
If the tolerance ends with a percent sign (%), then the system treats it as a percentage.
-
If failing this reconciliation check should be a fatal error (halting the recipe), select the Fatal check box.
-
-
When you are done editing this activity, click Save Draft. Or click Publish Activity so that the recipe will use it.
-
Click the Recipe link in the upper left to return to the recipe.
Viewing the Staging Table Details
Before you can add a promotion activity, it is helpful to review the structure of the staging table or tables used by the recipe. In particular, make a note of the field names that you need to use. To see the definition of a staging table:
-
Display the recipe.
-
At the bottom of the page, see the list in Indices and Staging Table Details.
This section shows one row for each data source used in the recipe.
-
Click the row for a given data source. The page then lists the generated staging tables associated with this data source.
-
Click a table name in this list. The right side of the page then provides information about the indices on the table, along with details about the fields in this table.
-
Optionally add indices as needed. To add an index, click New Index and then specify an index name and select one or more properties to be indexed.
Adding a Promotion Activity
A promotion activity copies data from staging tables to their final targets, which can be any of the following:
-
A table in Data Fabric Studio
-
A table in an external database
-
A file in an S3 bucket
The activity consists of one or more promotion items, each of which uses custom SQL to delete, insert, or update records.
Because the staging table name is generated, it is necessary to refer to the staging table name in an indirect way. Similarly, it is necessary to refer indirectly to the current run of the recipe. Accordingly, the product provides options that insert special tokens that you can use within your SQL; these tokens are automatically replaced when the recipe is run.
To add a promotion activity to a recipe:
-
Display the recipe.
-
Click Add Promotion Activity.
-
For Name, enter a descriptive name to display within the recipe.
-
For Run Order, select or type an integer, which controls the order in which the promotion activities are run, with respect to other promotion activities in the same recipe.
-
For Promotion Type, select one of the following:
-
Internal — Select this to load data into a table within this database (that is, the database native to Data Fabric Studio).
-
External — Select this to load data into a table in another database or a file in an S3 bucket.
If you select External, also specify Target Connection. This must be either the name of an external database or an S3 bucket, in either case previously configured as a data source.
-
-
Click Submit.
-
Click New Item.
-
Specify the item as described in the next section.
-
Click Save Activity Item.
-
If necessary, add other items to this promotion activity.
-
When you are done editing this activity, click Save Draft. Or click Publish Activity so that the recipe will use it.
For each promotion activity item, the system compares the number of source fields to the number of target fields; if these numbers do not match, the system displays an error and does not let you publish the activity. Similarly, the system compares the data types of the data being promoted to the data types of the fields in the target. If the data types do not match, the system displays an error and does not let you publish the activity.
-
Click the Recipe link in the upper left to return to the recipe.
Adding a Promotion Activity Item
A promotion activity consists of one or more activity items. For each activity item, specify the following values:
-
Run Order—Specify the order in which this item should be executed, relative to other items in the same promotion activity. Select an integer.
-
Description—Enter an optional description of this item, as an aid to yourself and others reviewing this recipe later.
-
Operation Type—Choose one of the following:
-
Insert
-
Insert or Update on Primary Key
-
-
SQL Expression—Enter an SQL UPDATE, INSERT, or DELETE statement.
The right side of the page provides samples to assist you in writing the query. When you are getting started, it is a good idea to review these and find a example that matches your scenario, to use as a starting place.
Below SQL Expression, the page provides options that you can use to insert helpful tokens into the query. To insert a reference to a staging table:
-
Place the cursor within SQL Expression at the position where you need the staging table reference.
-
For Placeholders, select Staging Activity.
-
For Staging Activities, select the short name of the staging activity to which the desired staging table belongs. (For example, mystage.)
-
For Data Source Items, select the data schema to which the desired staging table belongs. (For example, samplecsv.)
-
Click Add to SQL.
This inserts the following string into your query:
{mystage}.samplecsv
This is a generic reference to the staging table for the samplecsv, generated and populated by the mystage staging activity within this recipe. When the recipe is run, this token is replaced with the actual name of this staging table.
Similarly, when selecting records from the staging table (to insert or update the final table or file), you typically select only the records from the current run of the recipe. To modify your query to do this:
-
If there is no WHERE clause yet, move the cursor to the end of SQL Expression and type WHERE followed by a space.
-
For Placeholders, select BatchID.
-
Click Add to SQL.
This inserts the following string into your query:
%BatchId={%BatchId}
Note that %BatchId is the name of the field that stores the recipe run ID, and the token {%BatchId} is converted automatically into the ID of the current recipe run.
You can also type these syntaxes directly into the query.
Specifying the Target File Settings
If this promotion activity creates a target file, click the Target File Settings tab and specify the following values:
-
Target subfolder—Specifies the subfolder into which the target file should be written. This option is optional. If you omit this value, the file is written to the main directory.
-
Target Filename Pattern—Specifies the name of the file to create. The filename can include the %RUNDATE token, which is replaced with the date and time when the file is created.
-
Delimiter—Specifies the field delimiter. This must be a single ASCII character. To specify a control character, wrap the name of the control character in angle brackets.
-
Line Terminator—Specifies the line terminator to use when creating this file. Use a value appropriate for the target operating system.
Adding a Custom Activity
A recipe can include custom activities. For example, it may be necessary to have a custom activity that retrieves the data from a data source (for example, to place files within the file system). In such cases, two items are needed:
-
Custom code in the form of a business host added within the interoperability engine that system uses internally. The business host must be a business operation or a business process that accepts as input a message of type SDS.DataLoader.CustomActivityRequest. Details are currently beyond the scope of the documentation.
-
Configuration of the recipe to use that code at the applicable phase. To configure a recipe to use custom code:
-
Display the recipe.
-
Click Add Custom Activity. This displays a dialog box where you specify the details.
-
For Name, enter the name of this activity.
-
For Target Business Host, enter the configuration name of the business host.
-
For When to Run, select Before Staging, Before Transformation, Before Validation, Before Reconciliation, Before Promotion, or After Promotion.
-
For Enabled, select this check box if the activity should be enabled.
-
Click Submit.
-
Step Mode
You can run a recipe in step mode, a mode that lets you use a subset of data—perhaps a very small subset—so that you can test the recipe quickly from end to end. In this mode, you specify the number of rows of data to load into the staging tables (and then to use in the subsequent processing).
To configure a recipe for step mode:
-
Display the recipe.
-
Click the Step Mode toggle.
-
In the box next to Step Mode, select or type the number of records to use.
A recipe remains in step mode until you change the Step Mode toggle back to off.
See Also
-
Viewing Run History (includes information on viewing dropped records, validation reports, and reconciliation reports)