Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Data

The Data tab lists all datasets currently linked to the selected Use Case by you and other team members. To access this tab, open a Use Case and click Data.

From this tab, you can:

Element Description
1 Add new Add a dataset, experiment, or notebook to your Use Case, or create a new Use Case.
2 Search Search for a specific dataset.
3 Sort Sort the dataset columns.
4 More options Click More options to interact with a dataset:
  • Explore: View exploratory data insights.
  • Wrangle/Continue Wrangling: Perform data wrangling on datasets retrieved from a data connection.
  • Start modeling: Set up an experiment using the dataset.
  • Remove from Use Case: Removes the dataset from the Use Case, also removing access for any team members. The dataset is still available via the Data Registry.
5 Asset type icons Each asset is preceded by one of the following icons:
  • : Indicates that the asset is a registered dataset.
  • : Indicates that the asset is a wrangling recipe.

Explore data

While a dataset is being registered in Workbench, DataRobot also performs exploratory data analysis (EDA1)—analyzing and profiling every feature to detect feature types, automatically transform date-type features, and assess feature quality. Once registration is complete, you can explore the information uncovered while computing EDA1.

Preview

Support for dynamic datasets in Workbench is on by default.

When this feature is enabled:

  • Datasets added via a data connection will be registered as dynamic datasets in the Data Registry and Use Case.
  • Dynamic datasets added via a connection will be available for selection in the Data Registry.
  • DataRobot will pull a new live sample when viewing Exploratory Data Insights for dynamic datasets.

Feature flag: Enable Dynamic Datasets in Workbench

To open the data explore page:

  1. In a Use Case, navigate to the Data tab.
  2. Click the More options icon next to the dataset you want to view and select Explore. Alternatively, click the dataset name to view its insights.

  3. For each feature in the dataset, DataRobot displays various feature details, including a histogram and summary statistics.

  4. To drill down into a specific feature, click its histogram chart along the top.

Data explore page

Availability information

The new data explore page is available for preview and off by default.

Feature flag(s): Enable Enhanced Data Explore View

The data explore view, accessed by clicking Explore in more options () on the right of a dataset, includes the following improvements to profile and work with your datasets:

  • View a version history of the dataset that displays all snapshot datasets created from the live data, allowing you to easily work with and view multiple dataset versions from the same page.
  • The Feature list tab provides a dedicated page to work with feature lists.

Dataset versioning

The data explore page supports dataset versioning, allowing you to access a history of data snapshots as well as create new snapshots from the same page. To access dataset versions, click the dropdown next to Data actions or open Dataset Versions in the right panel.

View dataset versioning in the data explore view.

  Element Description
1 Snapshot policy Displays the selected dataset version. If the snapshot version is selected, DataRobot displays the date and time of the snapshot creation. Click the dropdown to access the following:
  • Version history: An abbreviated version history that displays the dynamic dataset (live data) and most recent snapshot.
  • + Create snapshot: Creates a snapshot of the dataset you're viewing. After registration is complete, the new snapshot is listed as the latest version, and can also be accessed in the Use Case and Data Registry.
  • Select version: Opens Dataset Versions in the right panel.
2 Dataset Versions Displays a version history of the dataset. Click a dataset to view a different version.
3 + Create snapshot / Upload new version Allows you to add additional versions of the dataset, and after registration is complete, the new dataset is displayed in the version history. Additionally, it is added to the Use Case and Data Registry.
  • If the snapshot policy of the original dataset is dynamic or snapshot,the + Create snapshot button is available, which creates a snapshot of the dataset you're viewing.
  • If the original dataset is static (i.e., uploaded as a local file), the Upload new version button is available, which allows you to upload updated local versions of the dataset.
Data actions for snapshot policies

The data explore page supports the following snapshot policies:

  • Dynamic: DataRobot is connected to the data source and uses live data to perform the selected data action.
  • Snapshot: A fixed snapshot that is stored in DataRobot and used to perform the selected data action. This policy is recommended for repeatable experimentation if live data often changes.
  • Static: A local file used to perform the selected data action.

Feature list view

The Feature list view replaces the dropdown with a dedicated page to view, manage, and create feature lists.

Feature list tab open in the data explore view

To control the columns displayed here, click Settings and select the box next to the columns you want to view. Then, click Apply.

Open feature list column settings.

Dataset actions

From the data explore page, you can perform the following actions:

Available dataset actions from the data explore view.

  Element Description
1 Dataset name To rename the dataset, click on its name. To save your changes, click outside of the text field.
2 Data actions Open the Data actions dropdown to perform one of the following actions with the dataset you're currently viewing:
  • Start wrangling: Perform data wrangling on the dataset. Only available for dynamic datasets.
  • Start modeling: Set up an experiment using the currently selected dataset version. By default, the latest version of the dataset is used.
  • Start feature discovery: Use Feature Discovery to perform multi-dataset, interaction-based feature creation
  • Download dataset: Download the dataset locally. Only available for snapshotted datasets.
  • Remove dataset: Remove the dataset from the Use Case. It will no longer be visible on the Data tab, however, it will be available in Data Registry and will not affect experiments created with the dataset.
3 Data Versions actions Under Dataset Versions, click more options () to perform one of the following actions on a specific snapshot dataset:
  • Start modeling: Set up an experiment using this dataset.
  • Download dataset: Download the dataset locally. Only available for snapshotted datasets.
  • Delete: Removes the dataset from Version History, however, it will be available in Data Registry and will not affect experiments created with the dataset.

Feature lists

Preview

Support for feature lists in Workbench is on by default.

Feature flags: Enable Data and Feature Lists tabs in Workbench, Enable Feature Lists in Workbench Preview, Enable Workbench Feature List Creation

After adding a dataset to your Use Case, DataRobot generates feature lists as part of EDA. Feature lists control the subset of features that DataRobot uses to build models and make predictions. Each model has a feature list associated to it.

You might want to use feature lists to:

  • Remove features that cannot be used in the model for any reason, for example, a feature that is causing target leakage.
  • Make predictions faster by removing unimportant features (i.e., ones that don't improve the model's performance).

You can use one of the automatically created lists—Informative and Raw—or create a custom feature list.

View feature lists

Before setting up an experiment, use exploratory data insights to explore different feature lists before choosing the appropriate one to use for modeling.

To explore insights for a feature list:

  1. In the Data tab, click the More options icon next to the dataset you want to view and select Explore. Alternatively, click the dataset name.

  2. To access your feature lists, click the dropdown at the top of the page and select an available feature list. The preview updates to show only the features in the selected list.

Create a feature list

To create a custom feature list:

  1. While exploring a dataset, click the dropdown at the top of the page and select + New feature list. This opens the Features view.

  2. Select the box next to each feature you want to include in your custom list. Then, click Create feature list.

  3. Enter a name and description (optional) for the new feature list.

  4. Click Save changes. You can now access the new feature list in the dropdown.

Next steps

From here, you can:


Updated May 16, 2024