Translate

Introduction to Ingestion

In Rosetta Core, Ingestion is the process of converting an (electronic) input file into CDM.

The input file can be one of a number of different standard formats for electronic data storage and transport, such as XML or JSON. The CDM output can be:

  • Electronic file, using the same range of support formats as the input file and with CDM as underlying data model (a.k.a. a Serialised CDM document)
  • A CDM object that computer code can be directly executed on (a.k.a. a De-Serialised CDM object).

The Serialised view is useful to display the CDM output into an interface, in a way that a user can browse through and understand. This is the view that is used when demonstrating the Ingestion application in Rosetta Core. The De-Serialised object is not meant to be readable.

File Explorer Overview

This view allows users to interact with sample files specific to the loaded workspace.

../_images/translate_ingestion_explorer_overview.png

Interface

Filter Bar

The Filter Bar provides the ability to limit which files are displayed.

../_images/translate_ingestion_filter_bar.png

The Filter Bar is made up of the following components:

  • Synonym Sources: Use this drop down to view only sample files that run against a particular synonym source.
  • Show Results: Checking the checkbox will only show sample files whose tests have been run and have results to display.
  • File Name: Enter text here to filter by sample files whose name contains that text.
  • Run Button: Select to run all files in the current filtered view, this button also shows the number of results.

File upload

The file upload button allows users to select sample files from their desktop and store them either on the server or in their browser on their computer. The decision to store on the server or not depends on whether the “Upload to server” checkbox beneath the upload button is ticked.

Warning

Files uploaded to the browser on the users computer will be removed on logout

File Explorer

Groups

The file explorer view is split up into group rows and file rows. The group row hierarchy structure is dictated by the location of the sample files within the CDM project and their purpose is as a logical categorisation of the various sample files.

../_images/translate_ingestion_file_group.png

The group row itself is made up of three components:

  • Group Name: Sample file group name.
  • Synonym Source: The oval synonym source marker shows which synonym sources are used by each of the sample files in that group.
  • Group Status: The group status pie chart shows how many files have been run in this group and what percentage of them were successful.
  • Group Completeness: These three pie charts show the individual mapping, validation and qualification success percentages.
  • Run All Action: Runs all sample files within the group.
Sample Files

The sample file rows allows the user to interact with each sample file, trigger test runs, view run summary information and navigate to the detailed analysis page.

../_images/translate_ingestion_file_row.png

Sample file rows are made up of the following components:

Diagnostic Summary: This summary shows a pie chart with the percentage success calculated against an expected value for each of the following:

  • Mapping: These are are the number of paths in the source file that have mapped successfully to paths in the target CDM.
  • Validation: This result shows what percentage of validation rules passed for the qualified type.
  • Qualification: A qualification is the type that an ingested source is assigned having passed a set of criteria in a qualification function.
../_images/translate_ingestion_file_row_detailed_summary.png

Detailed Diagnostic Summary: When hovering over each of the diagnostic summary charts described above, each of these popup summaries contains the following:

  • Success: The actual vs expected success count for that result type.
  • Failure: The actual vs expected failure count for that result type.
  • Excluded: The actual vs expected excluded count for that result type.
../_images/translate_ingestion_file_row_rerun.png

Rerun: The rerun buttons appear when the results have gone stale due to a model change and allow the user to refresh the results.

../_images/translate_ingestion_file_row_states.png

File Run States: Here are all the row states:

  • Pending rows will have a grey background.
  • Successful rows will have a green background, all their expectations have passed.
  • Failed rows will have a red background and mean that the file has failed to pass the expected ingestion tests.
  • Warning rows will have a yellow background and mean that an error occurred when running the ingestion.
  • Stale rows will have a grey background.
Uploaded Files

Uploaded files have a slightly different view as outlined below. An uploaded file does not have a result summary as there are no expectations to compare the results against for these file types. These rows can still be interacted with to run and rerun in the same way as sample files.

../_images/translate_ingestion_file_upload_row.png

The uploaded file rows contain the following components:

  • Synonym Source Selector: This allows a user to change which synonym source to ingested a file against.
  • File location icon: Files which have been stored on the server display the cloud icon.

Uploading a File

To upload an electronic file for ingestion click the upload button and select the file from the popup. Before uploading the file a user can opt to save on the server by clicking the checkbox, this will mean the file will be present after logging in on a different computer.

During the upload process, the system will guess which synonym source to ingest the file against, this can be changed using the synonym source selector on the file upload row after the file was uploaded.

Note

File needs to be physically accessible from the user’s computer

Note

Only supported files will be selectable

Note

Ingestion in Rosetta Core currently supports XML and JSON input types but further formats will be added in future.

Running Ingestion

Provided that all the required code in the user workspace is ready to be run a user can process one or multiple files through ingestion by doing one of the following:

  • Single File Run: Clicking on a sample file row will start the process of ingestion, a spinner will appear on the row.
  • Group File Run: Clicking on the Run Group button on a group row will start processing all sample files in that group, a spinner will appear next to all files being ingested.
  • Filtered Run: After applying a filter the run all button will appear with a number of results. Clicking this button will start processing all results.
  • Rerunning Files: Once a file has been processed and results are displayed a user can reprocess a file by clicking the rerun button. A group can be reprocessed by click the group run button at anytime.

Viewing Results

When a file has results a user can click on a sample file row to open the file Viewer.

File Viewer Overview

Interface

../_images/translate_ingestion_viewer_overview.png

This view displays the result for a given sample file. The file viewer is split horizontally into three panels. The first is diagnostics which displays statistics and other key information summarising the success (or failure) of ingesting the file. The second and third panels are linked and allow the user to review how the system has ingested the file.

Note

Each panel can opened and closed by clicking on its header

Code viewer

Each code viewer has two modes Code View and Formatted Document View. The code view display the file in its original format. The Formatted Document View creates a tree structure which is colour coded to indicate the result of the ingestion process, here is a list of the colours used:

  • :red:`Red`: Invalid or unmapped values
  • :dark-green:`Dark Green`: Mapped values
  • :dark-green-underline:`Dark Green With underline`: Mapped and linked values from input to CDM
  • :light-green:`Light Green`: Conditional values
  • :yellow:`Yellow`: Excluded values
  • :black:`Black`: No mapping data for these values

Note

Clicking on a dark green field which is underlined will highlight and scroll into view the corresponding fields in the input or CDM code view.