The integration of data from two or more domains is required for addressing many fundamental scientific questions and understanding how to mitigate challenges impacting humanity and our planet, according to a new workshop report from the American Institute of Biological Sciences (AIBS). The publication identifies key barriers to complex data integration and offers recommendations for the research community, research funding organizations, and others.

The workshop, which was held in March 2015 in Arlington, Virginia, brought together more than two-dozen experts in genetics, genomics and metagenomics, biology, systematics, taxonomy, ecology, bio- and eco- informatics, and cyberinfrastructure development. The workshop report summarizes use cases that highlight barriers and solutions to complex data integration; impediments, technical problems, and crosscutting issues related to integrating data; and, recommendations and next steps required to achieve better data integration.

"There are incredibly important but terribly complex questions in the areas of human health and environmental sustainability for which society needs answers. We need to better understand how humans or agriculturally important species respond at the genetic level to environmental stresses, for example. To do this, researchers need to be able to find, access, and integrate data ranging from genetic to environmental," said Dr. Robert Gropp, Interim Co-Executive Director of AIBS and a workshop organizer.

Dr. Paula Mabee of the University of South Dakota and a workshop co-chair agrees. "This meeting was unique in that we were able to bring together such a diverse group. It is informative that the challenges surfaced from different fields are often quite similar."

"One of our common challenges is learning to talk to each other. We often use the same words, but they might have different meanings depending upon whether one is a biologist or a computer scientist. Developing the capacity or the cadre of professionals who can translate is important," said Dr. Corinna Gries, a workshop co-chair and Lead Information Manager at the Center for Limnology at the University of Wisconsin.

The report includes recommendations related to governance, education and training, data discovery and access, evaluation of data for fitness of use, and the process for data integration.

The full report, "Enhancing Complex Data Integration across Research Domains: A Workshop Report," is available from AIBS at www.aibs.org/public-policy/complex_data.html.

"AIBS looks forward to working with our members and partners to explore how we can advance the recommendations from this important meeting," said Gropp.

The U.S. National Science Foundation (NSF) provided funding for the workshop (EF-1450894), and the NSF-funded Phenotype RCN (DEB-0956049) supported the participation of several workshop participants.

With support from the National Science Foundation, the American Institute of Biological Sciences (AIBS) will host a meeting in March 2015 with the goal to: identify technical barriers and problems with integrating large complex data sets (from genotypes to phenotypes to ecosystems and to macrosystems) that could address the "grand challenges" in science.

The outcome of the meeting will be a workshop report that identifies: 

  • major research questions that could be addressed by integrating complex data sets, 
  • barriers and technical problems and their causes related to integrating data,
  • suggestions for addressing the barriers and technical problems, and 
  • next steps for continuing to address the barriers and to otherwise achieve better data integration.

As meeting co-chairs, we would like to obtain opinions from across the spectrum of science related to this meeting and its goal.  Please take a few moments to add your responses to the questions below.  It would be most helpful if you would identify yourself when typing your comments.  All of the comments will be presented and discussed as part of the meeting.

Thank you for your time.

Sincerely,

Corinna Gries
University of Wisconsin

Robert Gropp
American Institute of Biological Sciences

Paula Mabee
University of South Dakota

For additional information about complex data and integration see:

Frontiers in Ecology and the Environment, Volume 12, Issue 1 
February 2014) 


Phenomics: 
Genotype to Phenotype (A report of the Phenomics workshop sponsored by 
he USDA and NSF, 2011) 


There are five (5) questions listed for your response. To begin entering your responses, please click on a question below (or from the menu in the right top corner of this page). If desired, you can review others comments (below) before creating your response. To create your response to each question, please: 

  1. Enter the contact information requested. NOTE: You only have to enter this once, then click the Remember contact info box for the information to be used each time you answer a question. 
  2. Enter your comment. 
  3. Enter the captcha characters. (This will prevent the responses from being littered with spam.) 
  4. Click submit. 
  5. If the captcha characters were not accurate, you will be asked to re-enter them. 
  6. If your response was accepted, you can select the next question by selecting either (a) Return to the Original Entry and selecting the next question from the right menu, or (b) selecting Return to the Main Page and selecting any question from the right menu.

1 Comment

Our ultimate goal is to understand relationships among phenology (as a measure of biological activity) and physical variables, particularly meteorological and climatological variables. This will require data integration, and development of models that enable not only an understanding of, e.g., climatological drivers on phenology, but also feedbacks of phenology on weather and climate (e.g., latent heat flux). Unlike much prior synthetic work on biodiversity, we working at the daily time step with a focus on seasonal and intraseasonal variation at the organismal level, so we need access to datasets with high spatial and temporal resolution over ~2 centuries spanning today.


1 Comment

To develop real-time or short-term forecasts of phenology (e.g., based on phenological models), we need real-time access (via e.g., web-services) to current and near-term environmental conditions at high spatiotemporal resolution (e.g., daily weather data and numerical weather forecasts are generally available, but we need 3-month projections of 'climate' with estimates of uncertainty, at scales of days and kilometers).


1 Comment

Cultural issues are likely most important; technological issues remain, but are typically constrained by culture (e.g., history, control, desire by PIs to understand applications of data).

1 Comment

The issue here really has to do more with the integration, as opposed to the use, of data. Part of integration should include terms of use (e.g., allowed uses, attribution); standardized or cross-walked metadata; development and publication of ontologies. These are required up-front.