Data

The IsoGenie Database (IsoGenieDB)

Overview

Due to the nature of the data collected at Stordalen Mire, a standard one-size-fits-all database system is insufficient. The data is not only large in size, but also highly complex in the nature of the data types– including (but not limited to) soil organic matter chemical spectra, DNA and RNA sequencing data, metaproteomic data, geochemistry, climate, and vegetation surveys. Therefore, IsoGenie postdoctoral fellow Dr. Benjamin Bolduc lead the effort to design a database management system, the IsoGenieDB, which allows easy storage and accessibility of data collected by members of the IsoGenie project.

Use and Capacity

The IsoGenieDB allows data to be connected within and between different datatypes, which is a structure fundamental to environmental systems. Given this “meta”-level awareness, data within the IsoGenieDB can be queried on numerous properties, i.e. retrieving organic matter metabolite data associated with metatranscriptomic samples that show a correlation between recent rainfall and pH. The results of these analyses can reveal underlying or overarching patterns of interaction often invisible within ecosystems due to incomplete integration of these relationships among data types.

Structure and Design

The IsoGenieDB is a full software stack that provides storage of raw and processed data, an efficient means of query the data, and a web-based user interface for private and public dissemination of collected data. It consists of a Neo4j-powered graph database that uses a property graph model to represent and store data generated by the IsoGenie consortium. The flexible structure and property model design of graph databases allow inclusion of new datatypes without needing a priori knowledge of future data.