Performing a Dataset Integration: Overview
WILL NOT BE IN FINAL VERSION(This is my concept outline)
MODIFY MUCH OF TEXT THAT IS TAKEN FROM Spielmann & Kintigh 2010 & Manney 2010
Overview of Integration:
What is it - Conceptually
- Why you would want to do it
- What has to has to be true about the datasets for them to be integrated## Coding sheets (conceptual description)## Linkages to ontologies (conceptual description)
- Selection Tables to integrate
- Columns
- Filtering columns
- The resulting outputs
You need at least two datasets
- Finding an existing dataset
- Archiving a new dataset
- Special Metadata Considerations for Datasets that will be integrated## General Metadata## Mapping Dataset Columns## Ontologies### Creating### Finding### Your Workspace #### Bookmark your datasets#### Selecting Datasets in your workspace#### Selecting Tables to integrate#### The Excel Spreadsheet output
What is Data Integration?

tDAR, aside from a secure digital archive, also allows users to integrate digital datasets by combining and normalizing disparate datasets into a single cohesive dataset with analytically comparable observations. Rather than simply synthesizing the conclusions of many separate analyses, tDAR will allow a user to create integrated datasets of original observations (Spielmann and Kintigh 2010). This type of analysis can allow a user to focuse on larger spatial and temporal-scale questions (examples of synthetic research done with tDAR datasets).
In order to perform an integration some initial considerations must be made.
- Identify & Bookmark Datasets you want to Integrate
- Create Metadata and Normalize Datasets with coding sheets, mapping, and ontologies
- Decide Which Variables/Columns you want to map on from tDAR's "my Workspace"
Then, from your workspace, you are ready to begin the integration process!
Initial Steps
Click on blue text in document to open a new browser tab that contains additional information on that particular topic. |
Archiving Databases in tDAR: tDAR accepts a wide range of digital archaeological data. In order to contribute digital data to tDAR, users must first register and agree to the terms of a user agreement. In tDAR, digital documents or data are registered either as independent digital resources (for example, a single methodological article) or as part of a suite of information resources associated with a single project such as a single excavation or a survey. The digital information resources resulting from a project might include any number of separate datasets, documents, and image files resulting from the fieldwork.
In registering a project _or information resource, the contributor provides the archival and descriptive information (metadata) -that will permit their long-term preservation and scientific use. To simplify metadata entry, the ___project-level metadata (e.g., sponsor, location, culture) can be applied (inherited) to all of the project’s component information resources (databases, documents, etc.). However, you may choose to inherit all, some, or none of the metadata values. Within each project, individual information resources are described
with additional metadata specific to that digital object to enhance its ability to be discovered by a search and to be properly preserved and used in the very long term.
As we discuss below, integration of multiple datasets is necessary to address larger-scale research, and detailed metadata make it possible for observations to be made comparable across databases. For databases (and spreadsheets), the metadata includes information on the individual tables, and columns along with the coding sheets that provide the semantic labels for encoded values. For example, a column labeled “Taxon” encodes information on a bone’s taxonomic assignment and in that column the database value 101 may represent “Lepus”. A translation function in tDAR creates a dataset with both the value labels and the original numeric codes.
Identify & Bookmark Datasets you want to Integrate:
For information on how to identify and bookmark datasets you would like to integrate click on the links below.
Searching for a Dataset to Integrate
General Metadata Considerations:
FILL OUT THIS SECTION FROM TUTORIAL / HELP THAT YOU HAVE CREATED
Mapping Dataset Columns:
FILL OUT THIS SECTION FROM TUTORIAL / HELP THAT YOU HAVE CREATED
Developing Ontologies: We recognize that there is significant variation in how researchers code archaeological data. Our goal is not to standardize what individual analysts do, but instead to make it possible to integrate their data with those of others using a shared conceptual framework for analysis. In tDAR we feel that it is essential to maintain the data as they were originally encoded, along with the associated coding keys. To accomplish that goal while enabling integration, we employ “ontologies.” In tDAR, ontologies are hierarchically organized maps of concepts. In faunal analysis for example, a variable “burning” might have been recorded by one analyst as present/absent, while another may have subdivided “present” into charred, burned, and calcined. If communities of researchers agree upon a shared ontology, for example for the variable burning such that charred and calcined are subcategories of burned, the data integration tools of tDAR allow an analyst to map the individual translated codes in their databases (e.g., calcined) to ontology values (in this case, burned) used by other analysts. The result is that variables that are recorded differently in different databases can be integrated because the original encodings are mapped to shared values. The process of developing general ontologies involves a community of users moving toward a consensus on a framework that can be shared.
How to create an ontology (hyperlink here)
Mapping to Ontologies and Data Integration: Data integration in tDAR requires that the variables of interest have been mapped to the shared ontologies for those variables. Ideally, the original analyst or person uploading the dataset would perform these mappings; however a tDAR user can create these mappings her or himself. Datasets (ones’ own or developed by others) are then moved into the user’s workspace, and those to be integrated are identified.
The tDAR faunal integration tool allows the analyst to choose the variables that are to be integrated, as well as the level at which integration is to take place. For example, while two datasets may have specific degrees of burning intensity coded (e.g., charred, burned, calcined), the analyst may only be interested in the presence or absence of burning. In that case, as illustrated in Figure 2, selecting “burned” would include all those cases coded to a more specific “burned” value. Likewise, if an analyst were interested in comparing artiodactyls and lagomorphs, she could choose only those taxonomic values. Cases coded more to more specific taxonomic levels under artiodactyl or lagomorph (e.g., Antilocapra sp. or Lepus sp.) would aggregate up.
How to map an ontology (hyperlink here)
The output from the integration can be exported as an Excel file in which each dataset is a separate sheet. These spreadsheets, or a combined spreadsheet, can then be analyzed or uploaded into a statistical package for further analysis. We are currently working with our computer scientist partners to streamline ontology mapping and to make it possible to re-run previously used integration run streams.
Summary
The integration tool in tDAR allows a user to integrate two or more databases or spreadsheets. In order to perform a data integration you must be a registered user of tDAR and logged in. To find out more information on how to register as a user in tDAR or to register click on one of the links below.
Datasets to be integrated need to exist in tDAR and they need to be bookmarked. If one or more of the datasets to be integrated has not been uploaded to tDAR, that must be done first. Then the datasets to be integrated must be bookmarked.For more detailed information on how to create a new dataset, find a dataset, or bookmark an existing dataset please click on one of the following links:
Note: if you have decided to create new datasets make sure that you bookmark the datasets you created after you have submitted the dataset into tDAR.
References