Importing Schemas (2.12)
This page describes how to import the schemas that make up the Data Catalog . Each of these schemas (not to be confused with SQL schemas) is associated with a specific external data source and describes the structure of a single data element such as a table or a delimited file provided by that data source.
After importing a schema, it is generally necessary to edit the details such as the data extraction strategy and field types.
Adding Tables from a JDBC Data Source
To add one or more tables from a JDBC data source:
-
Click the Data Catalog
icon in the application menu.
-
Click Data Schema Importer .
-
Select a JDBC data source from the Data Source drop-down menu.
-
Select an SQL schema (group of tables) from the Schema drop-down list.
The section Available Items from JDBC Source then displays the tables in the selected SQL schema.
-
Add tables to be imported in either of two ways:
-
Click the check box next to each table name and then click Select for Import .
-
Click the Add to Imports
button in the applicable row or rows.
The table or tables are then moved to the section Items Selected For Import ; the system has not yet imported their metadata.
-
-
Optionally select a different item from the Schema drop-down list and repeat these steps as needed. This process adds to the list in Items Selected For Import ; the system has not yet imported their metadata.
-
Optionally, to remove a table from the list in Items Selected For Import , click the Delete
icon in the applicable row. -
To import the metadata for the selected tables, click Import .
The system then imports metadata for all the selected tables and displays the Results of Last Import tab, which lists all the tables whose metadata it just imported.
JDBC Source Import Behavior
When importing a schema from a JDBC source, any remarks defined on the source table or its columns are imported as descriptions. These descriptions appear in the Schema Definition and automatically populate the Description fields for the schema and individual fields, respectively.
These description fields are editable in Data Studio ; however, changes made in Data Studio do not affect the remarks in the original data source.
Adding an SQL Query from a JDBC Data Source
Instead of (or in addition to) adding tables from a JDBC data source, you can directly use a custom SQL SELECT query. To do so:
-
Click the Data Catalog
icon in the application menu.
-
Click Data Schema Importer .
-
Select a JDBC data source from the Data Source drop-down menu.
-
Click the SQL Query tab.
-
Type an SQL SELECT statement into the box.
-
Optionally modify Row Count , which controls the number of rows that are sampled and displayed.
-
Click Run Query .
The page then displays the query results, which you can use to verify that the query is as expected.
-
To add a schema to the catalog based on this query, click Save to Catalog .
The system then displays a dialog box.
-
For Name , type a short, unique name for the new schema. You cannot change this name later.
-
For Description , type an optional description of the new schema.
The system then generates metadata for the query and displays the new schema, which you can now edit as described in Editing Schemas .
Adding a File-Based Schema
A file-based data source is a UTF-8 encoded file with one record per line, where each line follows a convention that uses a specific delimiter between fields (typically a comma or a tab). This convention implicitly defines a schema. To define this schema within the product, you need to upload and then import a sample file and specify the delimiter and the field names. (Excel files are a special case of a file-based data source in which you do not need to specify the delimiter that separates fields.)
FileDir , ExcelSingleFileDir , and S3Delimited data sources are all file-based data sources.
To add a file-based schema:
-
Obtain a sample file for the schema, applicable to a specific file-based data source.
The sample file can consist of only one line. Also, the sample file must have a header row—an initial line that contains the names of the fields. (In other words, the sample file may or may not contain any actual data.)
-
Rename the sample file so it has an appropriate and useful short filename, because this short filename becomes the schema name within the system (with punctuation characters removed), and it cannot be edited.
For example, if you load a sample file named sampledata.csv , that becomes a schema named sampledatacsv .
-
Use the File Manager to upload this file to the Samples directory used by the chosen file-based data source.
-
Click the Data Catalog
icon in the application menu.
-
Click Data Schema Importer .
-
Select the chosen file-based data source.
The section Available Items from Source lists the files in the Samples subdirectory for this data source.
-
Add files to be imported in either of two ways:
-
Click the check box next to each filename and then click Select for Import .
-
Click the Add to Imports
button in the applicable row or rows.
The filename or filenames are then moved to the section Items Selected For Import ; the system has not yet imported their metadata.
-
-
If the right side of the page contains a long list of files to import, you may want to double check the list. In doing so, if you want to filter this display, you can type into the Members Selected for Import filter. This option affects what this page displays , but does not affect the import of metadata.
-
Optionally, to remove a file from this list, click the Delete
icon
in the applicable row. -
To import the selected files, click Import .
The system then imports all the selected files and generates metadata from them.
The page then displays the Results of Last Import tab, which lists all the files it imported.
Adding a REST Schema
After you create a REST data source, you must define both the response-schema metadata and the API metadata in the REST Registry before you can import your schema into the Data Catalog. The REST Registry is available at Data Catalog > REST Registry.
To define a REST schema:
-
Click the Data Catalog
icon in the application menu.
-
Click REST Registry.
-
On the REST Schema tab, define the response schema metadata:
-
Vendor (Required) — Type the vendor name. This value becomes the schema name within the system and cannot be changed later.
-
Data Set (Required) — Type the name of the data set represented by this schema.
-
Field Name (Required) — Type the name of a field returned in the API response.
-
IRIS Field Type (Required) — Choose a field type from the list.
-
Description (Optional) — Type an optional description for the schema or individual fields.
-
-
Click Insert to add the schema entries.
-
On the REST Endpoint tab, define the API metadata that tells Data Studio how to retrieve data from the REST API:
-
Vendor (Required) — Type the vendor name. This value must match the vendor defined in the REST Schema tab.
-
Data Set (Required) — Type the data set name that this endpoint returns.
-
API Verb (Required) — Select GET or POST.
-
API Path (Optional) — Type the path or full URL for the REST API call (for example, /api/csaperiods). If you provide a relative path, it is appended to the API Server defined for the data source.
-
API Input (Optional) — Provide JSON arguments for the API request (for example, {"sinceDateTimeUtc":"20250121134009"}).
-
Load Type (Optional) — Choose simple or transactional. Simple loads retrieve data exactly as defined. Transactional loads update input parameters on subsequent runs and are typically used when retrieving incremental or time-based data.
-
Redirected Path (Optional) — Specify the JSON or XML path that identifies the target data within the API response. Use this field when the response contains metadata, nested objects, or references to the data you want to ingest.
Depending on the structure of the API response, the target data may be:
-
The entire response object.
-
A nested object within the response.
-
Data referenced indirectly, which requires Data Studio to follow response values to retrieve the final dataset.
-
-
-
Click Insert.
You can return to the REST Registry at any time to add fields, modify endpoint details, or update load strategies. Click an endpoint row to display its metadata at the top of the page. After making changes, click Update.
Importing a REST Schema
After defining the REST Schema and REST Endpoint entries, you can import the schema. To do so:
-
Click the Data Catalog
icon in the application menu.
-
Click Data Schema Importer .
-
Select the chosen REST data source from the Data Source drop-down menu.
-
Choose the items available for import from the Available Items from <Your Data Source> table in the Schema Import tab.
-
Click Import.
REST Schema Resolution Behavior
When processing REST endpoints, Data Studio determines the response schema automatically. If a matching REST Schema entry exists for the vendor and data set, Data Studio uses that predefined schema. If no matching schema exists at the time the endpoint is processed, Data Studio retrieves a sample API response and generates the schema dynamically.
You do not need to take any additional action to control this behavior.
Publishing a REST Schema
After you define and import your REST schema, you must publish it before using it in a Recipe. To do so:
-
Click the Data Catalog
icon in the application menu.
-
Click the Data Catalog Browser.
-
In the Schema Definitions table, click the row for the schema you want to publish. The Schema Definition page opens.
-
Select an Extraction Strategy from the drop-down.
-
Click Publish Schema.
Your schema is now ready to be used in a Recipe.
Filter Options When Importing Schemas
Depending on the type of data source, there may be a large number of schemas to choose among. The import page provides options to help you filter the display. You can filter and sort the items listed in the Available Items From Source section on the left, as well as the items listed in the Items Selected For Import on the right. In both areas, you can do the following:
-
To show only names containing a specific string, type that string into the filter above the Item Name column.
-
To display items based on whether they are in the catalog, click the drop-down list above the In Catalog column and select either or both of the following check boxes as needed:
-
Yes —display the items that are currently in the catalog.
-
No —display the items that are not currently in catalog.
-
-
You can click the Item Name and Item Name column headers to sort those columns.
These filters do not affect what is imported when you click Import . The only change is to the display of the items on this page.