Importing Schemas (2.7)
This page describes how to import the schemas that make up the Data Catalog. Each of these schemas (not to be confused with SQL schemas) is associated with a specific external data source and describes the structure of a single data element such as a table or a delimited file provided by that data source.
After importing a schema, it is generally necessary to edit the details such as the data extraction strategy and field types.
Adding Tables from a JDBC Data Source
To add one or more tables from a JDBC data source:
-
Click the Data Catalog icon in the application menu.
-
Click Data Schema Importer.
-
Select a JDBC data source from the Data Source dropdown menu.
-
Select an SQL schema (group of tables) from the Schema dropdown list.
The section Available Items from JDBC Source then displays the tables in the selected SQL schema.
-
Add tables to be imported in either of two ways:
-
Click the check box next to each table name and then click the Add to Imports button above the table.
-
Click the Add to Imports button in the applicable row or rows.
The table or tables are then moved to the section Items Selected For Import; the system has not yet imported their metadata.
-
-
Optionally select a different item from the Schema dropdown list and repeat these steps as needed. This process adds to the list in Items Selected For Import; the system has not yet imported their metadata.
-
Optionally, to remove a table from the list in Items Selected For Import, click the Delete icon in the applicable row.
-
To import the metadata for the selected tables, click Import.
The system then imports metadata for all the selected tables and displays the Results of Last Import tab, which lists all the tables whose metadata it just imported.
-
From the Results of Last Import tab, optionally start refining any of the imported schemas. To do so, click the Inspect icon in the applicable row. Then edit that item as described in Editing Schemas.
Adding an SQL Query from a JDBC Data Source
Instead of (or in addition to) adding tables from a JDBC data source, you can directly use a custom SQL SELECT query. To do so:
-
Click the Data Catalog icon in the application menu.
-
Click Data Schema Importer.
-
Select a JDBC data source from the Data Source dropdown menu.
-
Click the Query tab.
-
Type an SQL SELECT statement into the box.
-
Optionally modify Row Count, which controls the number of rows that are sampled and displayed.
-
Click Run Query.
The page then displays the query results, which you can use to verify that the query is as expected.
-
To add a schema to the catalog based on this query, click Save to Catalog.
The system then displays a dialog box.
-
For Name, type a short, unique name for the new schema. You cannot change this name later.
-
For Description, type an optional description of the new schema.
The system then generates metadata for the query and displays the new schema, which you can now edit as described in Editing Schemas.
Adding a File-Based Schema
A file-based data source is a UTF-8 encoded file with one record per line, where each line follows a convention that uses a specific delimiter between fields (typically a comma or a tab). This convention implicitly defines a schema. To define this schema within the product, you need to upload and then import a sample file and specify the delimiter and the field names. (Excel files are a special case of a file-based data source in which you do not need to specify the delimiter that separates fields.)
FileDir, ExcelSingleFileDir, and S3Delimited data sources are all file-based data sources.
To add a file-based schema:
-
Obtain a sample file for the schema, applicable to a specific file-based data source.
The sample file can consist of only one line. Also, the sample file must have a header row—an initial line that contains the names of the fields. (In other words, the sample file may or may not contain any actual data.)
-
Rename the sample file so it has an appropriate and useful short filename, because this short filename becomes the schema name within the system (with punctuation characters removed), and it cannot be edited.
For example, if you load a sample file named sampledata.csv, that becomes a schema named sampledatacsv.
-
Use the File Manager to upload this file to the Samples directory used by the chosen file-based data source.
-
Click the Data Catalog icon in the application menu.
-
Click Data Schema Importer.
-
Select the chosen file-based data source.
The section Available Items from Source lists the files in the Samples subdirectory for this data source.
-
Add files to be imported in either of two ways:
-
Click the check box next to each filename and then click the Add to Imports button above the table.
-
Click the Add to Imports button in the applicable row or rows.
The filename or filenames are then moved to the section Items Selected For Import; the system has not yet imported their metadata.
-
-
If the right side of the page contains a long list of files to import, you may want to double check the list. In doing so, if you want to filter this display, you can type into the Members Selected for Import filter. This option affects what this page displays, but does not affect the import of metadata.
-
Optionally, to remove a file from this list, click the Delete icon in the applicable row.
-
To import the selected files, click Import.
The system then imports all the selected files and generates metadata from them.
The page then displays the Results of Last Import tab, which lists all the files it imported.
Filter Options When Importing Schemas
Depending on the type of data source, there may be a large number of schemas to choose among. The import page provides options to help you filter the display. You can filter and sort the items listed in the Available Items From Source section on the left, as well as the items listed in the Items Selected For Import on the right. In both areas, you can do the following:
-
To show only names containing a specific string, type that string into the filter above the Item Name column.
-
To display based on whether they are in the catalog, click the dropdown above the In Catalog column and select either or both of the following check boxes as needed:
-
Yes—display the items that are currently in the catalog.
-
No—display the items that are not currently in catalog.
-
-
You can click the Item Name and Item Name column headers to sort those columns.
These filters do not affect what is imported when you click Import. The only change is to the display of the items on this page.