Editing and Managing Schemas
In InterSystems® Data Fabric Studio™, a schema describes the structure of a single data element such as a table or a file using a specific file format. This page describes how to edit and manage schemas; another page provides information on importing them.
Viewing the Data Catalog
To view the Data Catalog and the schemas currently contained in it:
-
Click the Data Catalog icon in the application menu.
-
Click Data Catalog Browser.
This page enables you to see all available schemas, either as a table or as a set of rectangular cards.
For each schema, the page displays the name of the data source that contains this schema, the actual name of the schemas, and any categories to which it belongs. If you click Card View, the cards also display the descriptions.
-
Optionally filter the display by using the options at the top of the page:
-
Schema Definition Item Name—Filters the display to include only the schemas whose short names contain the given string. After typing into this field, click the Find icon.
-
Data Source—Filters the display to include only schemas from a specific data source.
-
Category—Filters the display to include only schemas labelled with a specific category. Note that schemas can be labelled with any number of categories, including no categories at all.
-
Recipe—Filters the display to include only schemas used in a specific recipe.
-
Drafts —If selected, displays only the schemas that are drafts.
-
Reset —Clears all the filter options.
-
Click any schema in order to view it or make changes.
Displaying a Schema
If you have just imported the schema and you are viewing the Results of Last Import tab, click the Inspect icon in the row for a schema.
Otherwise, to display a schema:
-
Click the Data Catalog icon in the application menu.
-
Click Data Catalog Browser.
-
Click the schema name.
If there is a draft of the schema, you can choose to display the published version or display the draft (which is shown as a separate item as in the following example).
Editing a Schema
To edit a schema, first display it. If the schema has been published, click Edit Schema in the upper right to make the schema editable. (If there is a draft of the schema and you are viewing the published version, this button opens the draft version of the schema.)
Then make any of the following changes:
-
To add or edit a description, click the Edit icon next to Description and then make edits.
-
To add a category, click Add Category... and then click a category.
-
To remove a category, click the X icon for the category.
-
To modify the details of how the data source is handled, click the Edit icon in that right box. This displays a dialog box where you can make changes. The details depend on the kind of schema; see Schema Details.
When you are done, click Save to save these changes or Cancel to discard them.
-
To make changes to the schema fields, see the next section.
When you are done with the schema, click Save Draft to save these changes or Delete Draft to discard them (and return to the previous schema definition).
Note that if you leave this page without saving your changes, the system automatically saves your changes as a draft.
Schema Options
This section provides reference information on the options that control how a schema (and its data source) are interpreted and used.
Applies to all schema types. Specifies which records to use from the data source.
For SQL-based schemas (tables and custom queries), the options are as follows:
-
General Table—Retrieves all new records and any changed records.
A new record is a record with a new primary key value. A changed record is a record with an existing primary key value but a change to one or more other fields.
-
Append only transactional table with a numeric sequential key—Retrieves all new records since the last sequential primary key.
If you specify this, also specify Extraction Strategy Field, which must be the name of the field that contains the sequential primary key.
-
Transactional table with a changed timestamp field—Retrieves all records changed since the last timestamp.
If you specify this, also specify Extraction Strategy Field, which must be the name of the field that contains the timestamp.
-
Simple load—Retrieves all records, whether or not the system has them already.
-
Change data capture
For file-based schemas, the options are General Table and Simple Load.
Applies to table-based schemas. Specifies whether to use a pseudocolumn in this table as the primary key. If you select this option, also specify PrimaryKeyPseudocolumnName, which must be the name of the pseudocolumn. Does not apply to a schema based on a custom query.
Applies to SQL-based schemas (tables and custom queries). Projects this table or query as a foreign table, which is read-only and which does not copy the data into Data Fabric Studio.
If you select this option, you can override the default schema used for foreign tables from this data source. To do so, select ForeignTableLocalSchemaOverride and then specify a schema name for ForeignTableLocalSchema. (An additional override is available for individual tables, if needed.)
Applies to file-based schemas. Specifies the delimiter that separates fields in the file.
When loading delimited files, you must define and follow specific conventions so that they can be read appropriately. The first consideration is the delimiter that separates the fields, controlled by the Delimiter setting.
If the character specified by Delimiter is also used within any field, you must wrap the field in quotation marks (single or double, as you choose—or possibly some other character). To control which character is used to wrap a field, you specify the QuoteChar option, whose default is the double quote character ". If the character specified by QuoteChar is used within any field that is already wrapped in quotation marks, see the DoubleQuote and EscapeChar settings.
Applies to file-based schemas. Specifies which files are read and used as input, for this schema. That is, input files whose names conform to this pattern are processed, and other files are ignored. You can specify any legal filename string or a filename wildcard expression, consistent with the operating system. The initial pattern is simply the name of the file that you initially loaded.
Applies to file-based schemas. Specifies the form of pattern matching to use (for FileNamePattern).
Applies to file-based schemas. Specifies the prefix that indicates the start of a comment line in this file. A comment line is ignored.
Applies to file-based schemas. Specifies how to interpret a pair of quotation marks contained within a field that is wrapped in quotation marks. (Here the term quotation marks refers to the character specified by the QuoteChar option.)
If this option is Yes, then in the stated context, a pair of quotation marks is interpreted as a single quotation mark.
For example, suppose that Delimiter is a comma, QuoteChar is ", and DoubleQuote as Yes. Then the value "here is a ""value"" to load" is loaded as here is a "value" to load
Applies to file-based schemas. Specifies the character used to start an escape sequence, so that you can include a quotation mark within a field that is wrapped in quotation marks. Applies to file-based schemas. Specifies how to interpret a pair of quotation marks contained within a field that is wrapped in quotation marks. (Here the term quotation marks refers to the character specified by the QuoteChar option.)
For example, suppose that QuoteChar is " and EscapeChar is a backslash \). Then the value "here is a \"value\" to load" is loaded as here is a "value" to load
Applies to file-based schemas. Specifies the character or sequence of characters that indicate the end of a line in this file.
Applies to file-based schemas. Specifies the character used to wrap any field that contains the field delimiter character. For example, if fields are delimited by a comma, then if a field contains a comma, that field must be wrapped in a quotation mark, as specified by QuoteChar.
If the character specified by QuoteChar is used within any field that is already wrapped in quotation marks, see the DoubleQuote and EscapeChar settings.
Applies to file-based schemas. Specifies how to handle space characters that appear within a field after the field delimiter. If this option is Yes, initial space characters are trimmed when the data is loaded. Note that you can also define a transformation that trims whitespace on a field-by-field basis.
Applies to file-based schemas. Specifies whether the files have a header row. Select Yes or No. The header row is ignored when the data is loaded.
Editing Schema Fields
For any kind of data source, when you edit the schema, the bottom part of the page displays a table labeled Schema Fields. Here you can modify the schema fields as follows:
-
To specify the primary key for this schema, click the Primary Key check box for all the fields that make up the primary key.
-
To modify other details for a specific field, click the Edit Attributes icon for that field. This displays a dialog box where you can edit the following:
-
Data Type—Specifies the data type for this field.
-
Description—Specifies an optional description for the field.
-
Required—Specifies whether this field is permitted to be null. If a field is required, an error is thrown if the value is null.
-
Default Value—Specifies the value to use if this field is null.
-
Data Format (for date, time, and timestamp fields)—Select the format that data is expected to be in for this field.
-
Min Value (for double and integer fields)—Specifies the minimum allowed value of this field.
-
Max Value (for double and integer fields)—Specifies the maximum allowed value of this field.
-
Scale Value (for double fields)—Specifies the number of digits to display after the decimal point. This has no effect on how data is stored.
-
Length Value (for string fields)—Specifies the maximum number of characters allowed in this field.
When you are done with this dialog box, click Save to save these changes or Cancel to discard them.
-
Publishing a Schema
A schema cannot be used in recipes unless it has been published. To publish a schema:
-
Display it.
-
Click Publish Schema in the upper right.
Reimporting a Schema
To reimport a schema:
-
Display it.
-
Click Re-import Schema in the upper right. This change automatically creates a new draft of the schema.
-
Edit the schema again as needed.
If the schema has not yet been published, you cannot reimport it. You can, however, delete it and then import it again.
Deleting a Schema
To delete a schema:
-
Display it.
-
Click Delete in the upper right.
-
Click Delete to confirm.
You cannot delete a schema that is used in a recipe; if you attempt to do so, you are notified about any recipe activities that depend on this schema. You can then go edit the recipe to remove the dependency.