Click on the headings below to read about the recent progress on all of the MASC-related projects and resources.
The Arabidopsis community has always been very open, so today researchers and funding bodies can look back on more than 20 years of strong international collaboration and data sharing. The efforts of the Arabidopsis community have always been guided by decadal plans, which alongside led to the establishment of many Arabidopsis community projects and resources:
- The Arabidopsis genome research project (1990-2001) led to the completion of the Arabidopsis genome. During this decade two out of three stock and resource centers ABRC (Arabidopsis Biological Resource Center, US) and NASC (Nottingham Arabidopsis Stock Center, UK) were founded.
- The Multinational Coordinated Arabidopsis thaliana Functional Genomics Project (2002-2011) led to the functional annotation of most of the Arabidopsis thaliana genes. Alongside, The Arabidopsis Information Resource (TAIR) was founded in 2001 to meet the needs of the growing Arabidopsis research community.
- From Bench to Bountiful Harvests (2012-2021) aims to obtain in-depth knowledge of how the genome is translated into a continuum of processes, from the single molecule to cells and tissues, the whole plant, plant populations, and fields of plants, to be able to build a predictive model of an Arabidopsis plant. In order to provide a flexible platform to enable open sharing of the vast amount of data generated by today’s omics approaches, the International Arabidopsis Informatics Consortium (IAIC) founded the Arabidopsis Information Portal in 2013 (Araport).
The directors of Arabidopsis community projects and resources have been contributing to the MASC reports for several years, by presenting their respective goals, progress and news. Since 2014, general plant projects and resources have also been included, reflecting the growing connections between researchers focussing on different plant species.
Resource and Stock Centers
Arabidopsis Biological Resource Center (ABRC) Open or Close
By Erich Grotewold (Director) and Jelena Brkljacic (Associate Director), www.abrc.osu.edu.
The community continues to donate novel types of resources, along with some previously published, highly requested mutant and transgenic stocks. Genome editing resources are among the most highly requested new resources and are comprised of multiple sets of CRISPR/Cas9 vectors suitable for multiplexed genome editing in monocots and dicots, including the ones with egg cell-specific promoters to facilitate generation of homozygous mutants. Traffic lines, a set of lines used as a tool to identify homozygous mutants, have also recently been added to the collection. ABRC continues to solicit donations of genome editing and other novel and high-throughput resources, and donating is encouraged through the Stocks for Stocks Rewards program and by providing paid shipping to those who donate. The new policy allows us to accept donations of DNA and seed stocks from other organisms, depending on available space, growth requirements and other criteria. We continue to perform rigorous testing on newly donated and reproduced stocks, as well as on the stocks that users report as problematic. Our latest quality control statistics can be found at http://abrc.osu.edu/quality-control. In collaboration with the Nottingham Arabidopsis Stock Centre (NASC), ABRC is working on the development of a robust, expandable and fully sustainable Stock Center Module that will be integrated with the Arabidopsis Information Portal (Araport). We distributed 178,000 samples (including sets) in 2015/2016, with over 2,000 stocks sent as part of our education kits.
We are happy to announce that our new custom-made robotic system (Seed Aliquoting Machine, SAM) was able to dispense more than 250,000 seed stock samples in the first year. This enabled the preparation of eight large sets of T-DNA insertion lines, the distribution of which has increased 2.5-fold compared with the same period last year.
The Center is introducing a new category of stocks to the collection - antibodies and their corresponding antigens. Current inventory includes about 100 different antibodies, most of which were generated against Arabidopsis proteins, while the rest includes antibodies generated against antigens of other plant species or plant viruses. Ordering will be available starting in the summer of 2016.
ABRC continues being supported by the NSF for the 25th consecutive year. Our newest NSF grant will support the activities of the Center from 2016-2019.
Nottingham Arabidopsis Stock Center (uNASC) Open or Close
By Sean May, Director & Marcos Castellanos-Uribe, Operations Manager, http://www.arabidopsis.info.
Ordering statistics at NASC continue to be healthy and high for our seed service with well over 140,000 individual stocks sent this year (April 2015-16). Increased distribution stats alongside regular new donations of seed show that the Arabidopsis community continues to flourish and expand. Please do remember that we can save you time, effort and strengthen your research impact by distributing seed on your behalf to the wider plant community.
Please see our site for a comprehensive and up-to-date list of new stocks, collections and lines (as always) and consider viewing @NascArabidopsis (http://twitter.com/#!/NASCArabidopsis).
RIKEN BioResource Center (BRC) Open or Close
By Masatomo Kobayashi (coordinator), http://epd.brc.riken.jp/en/.
Since established in 2001, RIKEN BioResource Center (RIKEN BRC) has provided resources of plant, animal, human and microorganisms to the international research community. Our aim is the promotion of life sciences to contribute to food, health and environment problems that are urgent global issues for the human being. Since 2002, we join the National BioResource Project (NBRP) funded by the Japanese Government (http://www.nbrp.jp/index.jsp). The Experimental Plant Division is selected as the Core Facility of Arabidopsis/Cultured plant cells and genes from NBRP, and is responsible for distributing plant resources from RIKEN BRC. Approximately, 830,000 materials are preserved, and they have been distributed to ca. 1,900 laboratories and research groups in 45 countries.
We focus our efforts to collect, preserve and distribute Arabidopsis resources that have been established by Japanese researchers. Among them, RIKEN Arabidopsis full-length cDNA (RAFL) clone is the most famous resource. The total number of the clone is ca. 250,000. Among them, ca. 20,000 clones were fully sequenced. Another well-known resource is the RIKEN Transposon-tagged Mutant (RATM) line. More than 17,000 lines are on our catalogue, and homozygous seed stock is available for ca. 3,000 lines. Both RAFL clones and RATM lines are especially useful for reverse genetics approach. For forward genetics approach, we provide seed pools constructed from various types of Arabidopsis seed lines for screening purpose. They include activation(T-DNA)-tagged lines, FOX hunting lines, RATM lines and natural accessions. We also provide Arabidopsis T87 cells and At wt cells that are useful in various research purposes. In 2015, we started distributing TAC clone library deposited from Kazusa DNA Research Institute.
In order to establish and support the pipeline “from bench to bountiful harvests”, RIKEN BRC also provides model plant resources such as Tobacco BY-2 cells, Rice Oc cells, and full-length cDNA clones of various plant species including Brachypodium distachyon, an experimental plant of monocot. The DNA materials are shipped within 2 weeks after the arrival of ordering documents, while shipment of plant cells to abroad requires additional period due to the preparation of the culture that is tolerant to transportation and documentations necessary for customs clearance.
We believe that quality control is the most important issue for resource project. Before shipment, end-sequence of every DNA material is obtained and compared with the data on our database. Insertion site of Ds transposable element in the RATM line is examined by PCR for confirmation. The results are provided to the recipients before or at the time of shipment. Other information necessary to utilize the resources are provided via website and/or e-mail.
Any questions and comments from the community are appreciated.
Arabidopsis Informatics and Data Sharing Resources
International Arabidopsis Informatics Consortium (IAIC) Open or Close
By Blake C. Meyers (Interim Director) and Joanna D. Friesner (Assistant), http://www.arabidopsisinformatics.org/.
The IAIC was initiated by Arabidopsis community members in 2009 and formally established in 2011 via a US National Science Foundation RCN award (Award #1062348) to PI Blake Meyers (Danforth Center). The RCN award is set to conclude this year (2016); however, mechanisms for sustaining IAIC are currently being explored. Key community coordination overlap exists between the IAIC steering committee and the steering committee of a recent NSF award to Siobhan Brady (PI) and Joanna Friesner (co-PI) entitled ‘RCN Arabidopsis Research and Training for the 21st century (ART-21)’; NSF Award #1518280 (http://www.nsf.gov/awardsearch/showAward?AWD_ID=1518280). The ART-21 steering committee consists of the 8 member North American Arabidopsis Steering Committee (NAASC), key IAIC liaisons (Blake Meyers and Nick Provart), and Terri Long (USA), Jim Murray (UK), and Ute Kraemer (Germany).
The purpose of establishing the IAIC was to facilitate a coordinated global Arabidopsis informatics efforts to maintain the continuity of key Arabidopsis resources while simultaneously expanding their breadth and depth. Key aims were to include in the IAIC new technologies, resources, and participants on a global scale and advance plant biology while creating novel opportunities for research and education, and strengthening international collaborations. Arabidopsis community members, led largely by elected NAASC members, have participated in all stages of Consortium development and activities. In addition, a Scientific Advisory Board was elected by the Multinational Arabidopsis Steering Committee (MASC) and community participation was solicited in ten workshops including several organized specifically to enable IAIC design and development, and others at public conferences such as the annual International Conference on Arabidopsis Research (ICAR) and the Plant and Animal Genomes (PAG) meetings.
Two key project goals that have been achieved were (1) to facilitate a collaborative effort to establish and fund a new web-based “Arabidopsis Information Portal” for the global plant biology community (now entitled ‘Araport’, see NSF Award #1262414: ABI Development: The Arabidopsis Information Portal, http://www.nsf.gov/awardsearch/showAward?AWD_ID=1262414) and (2) to develop a public platform via an IAIC web page to coordinate activities and serve as a community resource (http://www.arabidopsisinformatics.org/).
Joint sponsorship of a community workshop (with Araport) at the 2016 Plant and Animal Genomes (PAG) meeting in San Diego, California this past January. On behalf of the IAIC and NAASC, Joanna Friesner gave a presentation entitled “Community Collaborations: Advancing Arabidopsis Research and Training (ART-21) and the International Arabidopsis Informatics Consortium (IAIC)”. The publicly-available presentation can be downloaded and viewed at: http://bit.ly/1QfMh4V.
Additional presentations on Araport were given by Araport staff including “A Tour of the Arabidopsis Information Portal (Agnes P Chan, J. Craig Venter Institute) and Module Development for Araport (Jason R. Miller, J. Craig Venter Institute).
(1) NAASC recently received a 5 year NSF award for a collaborative project entitled ‘Arabidopsis Research and Training for the 21st Century (ART-21)’, mentioned above. This coordinated program has these three core objectives: (1) identify emerging technologies where using Arabidopsis as a model organism will provide fundamental discoveries and enable translational research in crop species; (2) enhance interdisciplinary training of scientists for academia and extra-academic careers; and (3) increase the diversity of Arabidopsis research scientists. The IAIC project has overlapping steering committee members with ART-21 and intends to partner with NAASC to expand its outreach to community members to enable analysis of future training needs and emerging bioinformatic and computational skills.
The proposed collaboration includes several activities:
(a) May 2016: A NAASC and IAIC co-organized Focus Group on “Computational training of biologists for academia and industry in the 21st Century” led by Blake Meyers, Nick Provart, Siobhan Brady and Joanna Friesner. The Focus Group will include 35 participants discussing these over-arching questions (1) What are the bioinformatics and computational skills needed by plant scientists of the 21st century to deal with more complex datasets (predictive, quantitative and theory-driven)? (2) What are the bottlenecks to providing students with the needed skills? (3) What do employers (of various types) need/want from employees; what are marketable skills in this area? Key topics include: (i) Training and Education: Skills needed for positions: Industry Positions; Faculty Positions; Undergraduate, Graduate and Postdoc Education (ii) Collaborations:Working with a biologist: a quantitative expert’s perspective; Working with a quantitative expert: a biologist’s perspective; Retraining: Yourself; From a funding perspective (iii) Training Arabidopsis Biologists for High-Throughput Phenotyping; and (iv) Translating from Arabidopsis to Crop Species, and Vice versa. A workshop white paper, led by IAIC members Blake Meyers and Nick Provart, will be produced and include recommendations and analysis.
(b) Another future IAIC goal is to incorporate the outcomes from the activity described above into a hands-on workshop prior to ICAR 2017, envisioned to span 4-5 days and encompass both wet-lab and computational and bioinformatic analysis and techniques. ICAR 2017 will be organized by NAASC in St. Louis, USA, June 19-23, 2017.
Conferences and Workshops
- Plant and Animal Genomes (PAG) meetings: IAIC presentation: January 2016: San Diego. Presentation available at: http://bit.ly/1QfMh4V
- International Conference on Arabidopsis Research (ICAR): July 2015 (Paris)
The Arabidopsis Information Portal (Araport) Open or Close
By Chris Town (Principal Investigator), www.araport.org.
The Araport team extended its fully functional web portal by adding many data types to its ThaleMine data mining tool and many tracks to its JBrowse browser. The team delivered web site infrastructure that, even in its prototype stage, allows community participants to develop and deploy their own web services and data integration applications.
We have also completed the most up-to-date and complete re-annotation of the Col-0 genome to produce Araport11 that consists of 37,523 genes (27,688 protein-coding, 5,051 non-coding, 952 pseudogenic, and 3901 transposable element-related loci) and 58,149 transcripts. The annotation contains 738 new protein-coding loci and a further 508 novel transcribed loci. In addition, we retired 388 genes that encoded short (hypothetical) proteins for which there was no database or RNA-seq support. Araport11 is available on the Araport project site (http://www.araport.org) and will also be released in GenBank by the time this report is published.
JBrowse and ThaleMine continue to be central features of the portal’s user interface
JBrowse now hosts over 100 data tracks, including the latest gene models from Araport11 and their supporting evidence, as well as many community sourced tracks including 1001 genomes SNP data. Methods to allow community members to post and share their data through JBrowse using either GitHub or the CyVerse data store are in active development.
ThaleMine is a data warehouse which hosts and integrates a large collection of Arabidopsis genomics data including gene expression, orthologs, pathways, interactions, publications and others. We have continued to add new content and functionalities to ThaleMine. These include GeneRIFs, together with a portal to NCBI’s submission page that will allow community members to submit their own comments on gene function, and phenotype and stock data with links to ordering from ABRC and eNASC. The most recent addition is an RNA-seq-based expression module that allows users to view expression levels of their favorite genes across the 113 RNA-seq data sets used in the Araport11 re-annotation process.
Science Apps, Web Services and Modules
Despite its technical success and demonstrated ability to assimilate and integrate a wide range of data types, the site sees many fewer visitors than expected. Furthermore, although the attendees at the 2011 Design Workshop were enthusiastic about their vision of a federated data model with many community-contributed modules, their enthusiasm has so far not translated into the level of participation envisaged in the white paper. This is of concern to all of us, including our funders - the US National Science Foundation and the UK Biotechnology and Biological Sciences Research Council. As we develop a proposal for continued funding of the project, we will be pro-actively recruiting major data generators to the project to facilitate assimilation of their data into Araport and demonstrate the value of integration of multiple data types within the portal.
Conferences and Workshops
Project PIs attended the 25th ICAR in Paris in July 2015. In addition to a talk in a Plenary Session, there was a well-attended Araport workshop with contributions both from project PIs and from community contributors. We attended the ASPB meeting in Minneapolis, July 2015, presented a talk in the “Bioinformatics Resources for Plant Biology Research“ and also staffed a booth in the Exhibitor area together with colleagues from other resources. Project staff presented posters and/or talks at the Mid-Atlantic ASPB meeting (April, 2015), University of Maryland Mini-symposium (May 2015), the Mid-Atlantic Plant Molecular Biology Society Meeting (August 2015). Two team members spent one and a half days at Purdue University in November 2015 giving talks, a hands-on workshop and having one-on-one discussions with various faculty members. We organized the IAIC/Araport workshop at PAG, San Diego in January 2016 that included presentations from project personnel and community members.
Araport gave a talk at the ASPB mid-Atlantic Regional Meeting at Swarthmore in April 2016 and has been invited to give talks at ICAR 2016 in Korea in July and at the “GARNet2016: Innovation in the Plant Sciences” meeting in Wales in September 2016. We also plan a presence at the ASPB meeting in Austin in July 2016.
The Arabidopsis Information Resource (TAIR) Open or Close
By Tanya Berardini (TAIR curator) and Eva Huala (Director), www.arabidopsis.org.
TAIR has been supported by community contributions in the form of non-profit subscriptions since April 2014. We are grateful for the strong show of support from the worldwide Arabidopsis and plant community and continue to strive to provide data and analysis tools that will help drive scientific discovery in the field of plant biology forward.
Curation and New Features
Approximately 2100 new Arabidopsis gene-related research articles are published each year. TAIR continues to mine this rich trove of new experimental results to provide researchers with continuously updated information about Arabidopsis gene function, expression and mutant phenotypes. Newly published gene function information is manually extracted from experimental results reported in the research literature by TAIR curators and authors who curate their own publications, thereby bringing more visibility to their work. This new experimentally derived knowledge is captured in the form of Gene Ontology (GO) and Plant Ontology (PO) annotations, individually composed gene summaries and phenotype descriptions, and new links between articles and genes, which are added to TAIR on a weekly basis. Within TAIR, the new experimental data are integrated with annotations made by UniProt, the Gene Ontology consortium, the IntAct effort and computational predictions about Arabidopsis genes, proteins, and RNAs to present a comprehensive overview of gene function.
In 2015, we added nearly 600 new gene symbols and full names, created new and updated gene summaries for many genes and integrated new GO and PO annotations for 8835 genes (including experiment-based annotations from almost 770 research articles to 2996 genes). We have also put additional effort into integrating more allele and phenotype information. To increase the visibility of new Arabidopsis research, we have begun featuring a Paper of the Month on our home page and we have also added a browsable list of recently published Arabidopsis papers. In the coming year, we will be integrating gene family data including orthologs from other species to facilitate the translation of Arabidopsis research results to other plants.
Subscriptions and Free Access
TAIR is financially supported by contributions from individual researchers, academic institutions and consortia, companies and country-level subscriptions that together represent 29 countries. As of March 2016, subscribers included: 2 countries (China and Switzerland), 4 academic consortia, 165 individual institutions (list at http://bit.ly/1RPlaeu), and approximately 340 individuals. Corporate subscribers include 4 major agricultural companies and 4 smaller companies. We have provided free access for undergraduate and graduate classes at 13 non-subscribing academic institutions, enabling teachers and students to use TAIR as part of their curriculum.
Outreach: Conferences, Workshops, Social Media
TAIR staff presented posters, gave talks, and were available for one-on-one interactions at exhibit booths at the following meetings: ICAR 2015 (Paris), ASPB 2015 (Minneapolis), and PAG 2016 (San Diego). TAIR curators will be attending both ICAR 2016 in Korea and ASPB 2016 in Austin, TX to spread the word about our continuing efforts to provide up-to-date literature-based functional annotation and analysis tools to the research community.
- The Arabidopsis Information Resource: Making and mining the “gold standard” annotated reference plant genome. Berardini TZ, Reiser L, Li D, Mezheritsky Y, Muller R, Strait E and Huala E (2015) genesis 53: 474-85. DOI: 10.1002/dvg.22877
- Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model. Reiser L, Berardini TZ, Li D, Muller R, Strait EM, Li Q, Mezheritsky Y, Vetushko A, and Huala E (2016) Database baw018. DOI: 10.1093/database/baw018
Plant Projects and Resources with Strong Participation of the Arabidopsis Community
Bio-Analytic Resource for Plant Biology (BAR) Open or Close
By Nicholas Provart (Director), http://bar.utoronto.ca.
The Bio-Analytic Resource is a collection of user-friendly web-based tools for working with functional genomics and other data for hypothesis generation and confirmation. Most are designed with the plant (mainly Arabidopsis) reseacher in mind. Data sets include:
- 150 million gene expression measurements (75 million from A.th.), plus “expressologs” (homologs showing similar patterns of expression in equivalent tissues) for many genes across 10 species. View expression patterns with our popular eFP Browser.
- 70,944 predicted protein-protein interactions plus 36,306 documented PPIs (rice interologs also available!).
- 29,180 predicted protein tertiary structures and 885 experimentally-determined structures.
- Millions of non-synonymous SNPs from the 1001 Arabidopsis Genomes project, delivered through the MASC Proteomics Subcommittee’s site at 1001proteomes.masc-proteomics.org.
- Documented subcellular localizations for 9.3k proteins, predicted localization for most of Arabidopsis proteome, from the SUBA database at the University of Western Australia.
The BAR released a new, easier to navigate homepage in honour of its 10th birthday! Check it out at http://bar.utoronto.ca/. Thanks for citing papers in which our tools have been published, and for liking us on Facebook at https://www.facebook.com/BioAnalyticResource/. Hopefully this will help with getting funding to support our efforts.
Two new tools were released on the BAR - first, our Expression Angler tool has been updated to easily identify genes with any desired pattern of expression, simply by painting that pattern onto “eFP”-like pictographs of a plant’s anatomy (or you can search with a desired gene of interest). Second, with the set of genes identified in this manner (or your own, by e.g. expression profiling) you can easily explore the promoters of those genes for known motifs from JASPAR, Weirauch et al. 2014, PLACE and other sources, using Cistome. See http://bar.utoronto.ca/ExpressionAngler/ and http://bar.utoronto.ca/cistome.
We also released high resolution expression data for the Shoot Apical Meristem from Yadav et al. 2014 in the Tissue Specific Data Source of the Arabidopsis eFP Browser. New Lateral Root Initiation data sets from the Gifford, Muday and Bennett groups will be added in the 2nd quarter of 2016 to the Lateral Root Initiation Data Source.
We have added a linkout icon to a Genevestigator-developed resource, “Genevisible”, in the outputs of our eFP Browser. Now you can easily see the expression levels of your gene of interest in the top 10 tissues or perturbations (environmental stresses or mutations). In the latter instance, the top 10 also include cases where your gene exhibits the greatest decrease in expression. Another linkout icon to your gene of interest’s information at Araport.org was also added to the eFP Browser outputs.
The BAR is also pleased to announce the release of three new tools for exploring three gene expression atlases from two plant species and from human: a Moss (Physcomitrella patens) eFP Browser, a Camelina sativa eFP Browser and a Human eFP Browser. The Moss eFP Browser was developed by Joerg Becker’s group at the IGC in Portugal and may be accessed at http://bar.utoronto.ca/efp_physcomitrella/cgi-bin/efpWeb.cgi. The Camelina sativa eFP Browser was developed in conjunction with Isobel Parkin’s group at Agriculture and AgriFood Canada in Saskatoon and may be accessed at http://bar.utoronto.ca/efp_camelina/cgi-bin/efpWeb.cgi. The BAR’s own group created the Human eFP Browser (Patel et al., 2016, PLoS ONE, http://dx.doi.org/10.1371/journal.pone.0150982), and this may be accessed at http://bar.utoronto.ca/efp_human/cgi-bin/efpWeb.cgi.
Conferences and Workshops
The BAR partcipated in the 2015 ASPB Plant Biology conference in Minneapolis at the Plant Genome Resource Outreach booth. We were also conducting user testing for our new “ePlant” app for exploring Arabidopsis data integratively from the kilometre to nanometre scales at the 2015 International Conference on Arabidopsis Research - thanks for your feedback!
26 co-authors, 308 references, 17 sections and a brief survey of 54033 Arabidopsis articles ;-) Our review of 50 Years of Arabidopsis Research in honour of the 1st International Conference on Arabidopsis Research in 1965 is out now! See http://onlinelibrary.wiley.com/doi/10.1111/nph.13687/full. The Arabidopsis publications and their citations may be explored interactively at http://bar.utoronto.ca/50YearsOfArabidopsis/.
BrassiBase Open or Close
By Marcus A. Koch (director), http://brassibase.cos.uni-heidelberg.de/.
BrassiBase is continously developed into a comprehensive Brassicaceae-knowledge-database system. During 2015/2016 a first family-wide species check-list has been created. In total, more than 15,000 taxonomic entities (“names” of species, subspecies, etc., including synonyms) have been collected, checked and cross-referenced. We are now in the process to use this most actual and accurate species check-list as “backbone” for BrassiBase and link given information whenever possible to this information.
Furthermore, morphological descriptions of characters of any genus are now finalized and implemented into an interactive key to the genera. We hope that this will help to identify cultivated and/or collected material more easily, particularly if used in combination with the “Phylogenetic placement tool” implemented with BassiBase.
We intend to release the third version of BrassiBase during 2016 and we invite and encourage the Arabidopsis community to register with BrassiBase (it’s free) and help improving the system - by reporting and contributing with results and data and/or spotting problems and making suggestions for future releases.
Conferences and Workshops
BrassiBase workshop held in Heidelberg in October 2015.
CyVerse Open or Close
By Parker Antin (principal investigator), Eric Lyons (co-principal investigator), Nirav Merchant (co-principal investigator), Matthew Vaughn (co-principal investigator) and Doreen Ware (co-principal investigator), http://www.cyverse.org/.
CyVerse is one of eight projects funded by the National Science Foundation (NSF) Directorate for Biological Sciences. CyVerse is a dynamic virtual organization led by the University of Arizona to fulfill a broad mission to design, deploy, and expand a national cyberinfrastructure for life sciences research and train scientists in its use. CyVerse partner institutions each contribute an important component to the endeavor: Texas Advanced Computing Center, Cold Spring Harbor Laboratory, and the University of North Carolina at Wilmington.
Developing the Science of the Future
CyVerse fills a niche created by the computing epoch and a rapidly evolving world. Developing solutions to today’s grand scientific challenges means that we must understand how the organisms that contribute to our food, fuels, and ecosystem are shaped by interactions with their environment. CyVerse provides life scientists with powerful computational infrastructure to handle huge datasets and complex analyses, thus enabling data-driven discovery. CyVerse provides access to a comprehensive and cohesive suite of computational resources supporting data management, cloud computing, high-performance computing, high-throughput computing, identity management, and collaboration tools, all built from open source components. CyVerse resources are accessible using multiple methods, including web-accessible applications, command-line-based access and well-described Application Programming Interfaces (APIs) for ease of automation and performing scalable data analysis. The powerful extensible platforms provide data storage, bioinformatics tools, image analyses, cloud services, and more. Answering the need of an era of data science, CyVerse makes broadly applicable computational resources available across the life sciences.
Engaging the Data Science Community
CyVerse was launched in 2008 as the iPlant Collaborative, aiming to serve the plant science research community. From its inception, iPlant quickly grew into a mature organization providing powerful resources and offering scientific and technical support services to researchers nationally and internationally. Now rebranded to CyVerse, the project has expanded the mandate to provide CI support across the life sciences. CyVerse CI architecture and implementation is agnostic with regards to scientific domain and supports many different life science disciplines and their associated data types and analyses. CyVerse allows researchers to analyze their growing datasets more efficiently, with greater flexibility, and to address previously difficult or impossible questions. Together, CyVerse CI permits researchers to deposit and share new data, programmers to easily deploy new tools and analytical workflows, and researchers of all skill levels to easily use and reuse those data and tools. CyVerse has created a robust, widely used, and evolving CI that is profoundly impacting life sciences and bioinformatics. CyVerse also provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams.
Creating Global Collaborations
CyVerse envisions a future where all biologists have access to, are able to use, and know how to extend CI to solve problems and advance scientific discovery in research and apply CI to education. Through partnerships and direct engagement, CyVerse has helped accelerate the pace of science for many labs and individual researchers by offering computational and data management solutions that meet the demands of modern scientific technologies. Going forward, CyVerse aims to promote computational thinking and empower researchers to new scientific discoveries by enabling global collaborations in data sharing, management, analysis, and visualization.
European Plant Phenotyping Network (EPPN) & EMPHASIS Open or Close
By Roland Pieruschka & Ulrich Schurr, http://www.plant-phenotyping-network.eu/.
The European Plant Phenotyping Network is an EU funded project successfully managed to integrate the European plant phenotyping community by creating structural and functional collaborations between the leading plant phenotyping institutions in Europe and integrating the plant phenotyping community across Europe. EPPN followed the vision that the network of leading phenotyping infrastructures form the nucleus that provides a structured and efficient development of a persistently competitive plant phenotyping community in Europe.
EPPN addressed a wide stakeholder community from academia and industry in different levels of interaction. Joint Research Activities developed, adapted and benchmarked novel sensors and established experimental as well as IT standards for application in plant phenotyping. The standards were made available for the wider plant phenotyping community on the EPPN website and by publications in scientific journals. Networking Activities provided a link between phenotyping experts, user communities, and technology developers within Europe and beyond. EPPN realized communication, networking, and education throughout the duration of the project at different levels: i) between existing and newly developing phenotyping platforms; ii) between phenotyping platforms and users from academia and industry; iii) between platforms, developers, and users; iv) with other leading international phenotyping centres. This effort represented the basis for novel scientific approaches in the utilisation of the existing facilities through Transnational Access. The access was based on demand driven, transparent access procedure, which included independent reviewers from outside of EPPN. High demand from users across Europe for access to plant phenotyping facilities resulted in 66 experiments mostly from young scientists and new users of phenotyping facilities.
EPPN became an important nucleus for the integration of the plant phenotyping community by the establishment of cooperation with the user community and a number of national and international projects and initiatives. Successful EPPN activities have led to the creation of the EMPHASIS project, which was initiated by EPPN core members and has been listed in the ESFRI roadmap. EMPHASIS will facilitate structured development and use of plant phenotyping infrastructure in Europe based on the foundation of EPPN. Additionally, EPPN members represent the group of the International Plant Phenotyping Network (IPPN) which has been initiated as an association and an important hub for networking activities to successfully continue the integration of the plant phenotyping community on a global scale.
Phenotypic analysis has become a major limiting factor in genetic and physiological analyses in plant sciences as well as plant breeding. Molecular plant biology and molecular-based breeding techniques have developed rapidly within the last decade. In contrast, the understanding of the link between genotype and phenotype has progressed more slowly. Faster progress is currently hampered by insufficient technical and conceptual capacity in the plant science community to analyse the interaction between phenotypes of existing genetic resources and the environment. Improvement in phenotyping is a key factor for success in modern breeding as well as for advancement in basic plant research. Multi-scale plant phenotyping to analyse genotype performance under diverse environmental conditions is at the centre of the EMPHASIS project, a new large-scale European project coordinated by researchers at Forschungszentrum Jülich, Germany. EMPHASIS is part of the new ESFRI roadmap, in which the member states of the ESFRI Forums (European Strategy Forum for Research Infrastructures, http://www.esfri.eu) coordinate pan-European research strategies.
The current roadmap was made public on March 10th in Amsterdam, within the framework of the Dutch EU presidency. The project EMPHASIS – European Multi-Environment Plant Phenomics and Simulation Infrastructure – aims to create an integrated, European network of unique infrastructures for plant phenotyping. This includes research infrastructures bridging four dimensions (1) deep and high throughput research infrastructure in controlled environments (2) intense field site installations such as FACE facilities, field labs, etc., (3) lean phenotyping approaches with field sites across European climate zones and diverse soil conditions and (4) modelling platforms. The installations will be connected with common data management and standards and establish the competence to link the phenotypic with the genotypic data. EMPHASIS links national plant phenotyping platforms, such as the German Plant Phenotyping Network (DPPN, http://www.dppn.de/dppn/EN) and the French Plant Phenomic Network PHENOME (FPPN, https://www.phenome-fppn.fr/phenome_eng/) as well as the platforms in Great Britain (http://www.ukppn.org.uk/) and Belgium. EMPHASIS will also establish links with institutions, and include other European countries. The project will cooperate with users from industry such as technology developers and breeders and other international research organisations. After a preparatory phase funded by the European Union, EMPHASIS will be implemented and fully operational in the next few years with the goal to enable access to the key plant phenotyping facilities in Europe. Forschungszentrum Jülich will coordinate EMPHASIS in close cooperation with partners in France.
Gramene: A comparative resource for plants Open or Close
By Marcela Karey Tello-Ruiz (Project Manager) and Doreen Ware (PI), http://www.gramene.org/.
The Gramene database (http://www.gramene.org) is an integrated resource for comparative genome and functional analysis in plants. The database provides agricultural researchers and plant breeders with valuable biological information on genomes and plant pathways of numerous crops and model species, thus enabling powerful comparisons across species.
The genomes component of the Gramene project is developed in collaboration with the Ensembl Genomes project (EMBL-EBI) in the Ensembl infrastructure. The main pathways component of the project is the Plant Reactome (http://plantreactome.gramene.org); it was built on the Reactome framework.
The Gramene project has had 7 data releases since January 2015. The current data release contains 39 reference genomes including Arabidopsis thaliana and A. lyrata, rice, maize, wheat, barley, soybean, Brassicas, poplar, medicago, tomato, potato, banana, cocoa, peach, grapevine, Amborella, spikemoss and algae. Evolutionary histories are provided in phylogenetic gene trees classifying orthologous and paralogous relationships as speciation and duplication events. Orthologous genes inform synteny maps that enable inter-species browsing across ancestral regions. In addition, genome browsers from multiple species can be viewed simultaneously, with links showing homologous gene and whole-genome alignment mappings (WGAs). Within the last year, we added WGAs for tomato (Solanum lycopersicum), potato (S. tuberosum), grape (Vitis vinifera), and cocoa (Theobroma cacao) to enrich our existing collection against Arabidopsis thaliana (dicot model crop) and Oryza sativa Japonica (monocot staple food crop). Since April of 2015, we are providing links to gene annotations from external sources like Araport and Expression ATLAS. SNP and structural diversity data, including individual genotypes, are available for 11 species including A. thaliana, and are displayed in the context of gene annotation, along with the consequence of variation (e.g. missense variant). The Arabidopsis variation database contains data from the screening of 1,179 strains using the Affymetrix 250k Arabidopsis SNP chip (Horton et al, 2012), and an updated data set produced through a BBSRC funded multi-institutional collaboration involving resequencing 18 Arabidopsis lines (Clark et al, 2007). It also contains 392 strains from the 1001 Genomes Project (80 strains from the Cao pilot study; 132 strains from a study by the Salk Institute; and 180 strains from a study by the Nordborg group at GMI). Phenotype data was also added from a GWAS study of 107 phenotypes in 95 inbred lines carried out by Atwell et al (2010). The 1001 Arabidopsis Genomes project released data freely in a pre-publication format from the Salk Institute, WTCHG, MPI, and GMI, under the Fort Lauderdale agreement. Visual displays can be downloaded as high-resolution, publication-ready, image files. Our Blast and BioMart interfaces enable complex queries of sequence, annotation, homology, and variation data. Also in the past year, a new search interface (http://search.gramene.org) was developed and built. It provides a simple interface for expressive comparative queries and tools to view large datasets. New detailed views for search results featuring gene trees, pathway, and expression data from Atlas (EMBL-EBI). In addition to ~240 curated rice pathways, the Plant Reactome incorporates orthology-based pathway projections to 58 plant species including both, Arabidopsis thaliana and A. lyrata.
Outreach, Conferences & Workshops
During the reporting period, Gramene staff attended 1 international and 7 domestic conferences with a total of 15 oral presentations and 9 posters. Project workshops with live demos were presented at the Plant & Animal Genome (PAG 2015 & 2016) & another one will be presented at the 2016 Maize Genetic Conference. At the PAG conferences, we also co-organized a community outreach booth with the participation of the following Arabidopsis bioinformatics resources: Araport, BAR & NASC. Published 7 peer-reviewed articles and 5 book chapters, and another 2 manuscripts are under review. Delivered 12 monthly webinars between February 2015 & February 2016, including one devoted to Arabidopsis resources on Gramene (July 14, 2015) that is available on the Gramene YouTube channel. We are in the process of generating video-tutorials from the 6 talks offered during the Gramene Project Workshop at the 2016 Plant and Animal Genome Conference. We continue to foster ~55 international collaborations. Between CSHL & OSU, we trained 7 post-doctorates, 3 graduate students, 2 undergraduate students (summer only), 1 high-school student & 1 visiting scholar at OSU (Oct 2014-Oct 2015).
Besides the above listed projects and resources, there are many other international and multinational initiatives with major contributions from Arabidopsis researchers, e.g. the 1001 genomes Project (http://www.1001genomes.org/), the Epigenomics of Plants International Consortium (EPIC; http://www.plant-epigenome.org/), the Plant and Microbial Metabolomics Resource (http://metnetdb.org/PMR/) and the International Plant Phenotyping Network (http://www.plant-phenotyping.org/).