
Science has always been a ‘big’ topic – and not just those terrabucks physics projects – it asks some of the biggest questions of all: who are we? why are we here? why are you reading this? But, as technology has enhanced our ability to tackle those big questions it has also increased our capacity to generate even greater amounts of data. The trouble is, where can you publish all of this ‘stuff’, the large data sets themselves, and the increasingly important associated so-called metadata – ‘data about data’? Well, just as in the olden days if you wanted to publish obscure topics you started your own journal, the 21st century equivalent – which fittingly exploits some of this new technology – is GigaScience, Giga-database and GigaBlog: ‘new resources for the big-data community’. GigaScience, ‘a new type of journal’ from BioMed Central and BGI, is now taking submissions with the ‘goal of addressing many of the issues surrounding “big-data” ’. GigaScience aims to ‘revolutionize data dissemination, organization, understanding, and use. An online open-access open-data journal, we publish “big-data” studies from the entire spectrum of life and biomedical sciences. To achieve our goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database that hosts all associated data and provides data analysis tools and cloud-computing resources’. In case you’re wondering what qualifies as ‘big’ or ‘large-scale’ in this context, the official answer is… ‘it depends’(!). However, this move would seem to be timely as we read of a multi-institutional effort supported by the US Department of Energy (DOE) that takes many separate streams of biological information to create a single, integrated cyber-‘knowledgebase’ (Kbase for short). A major goal of Kbase is to ‘focus on a specific assortment of plants and microbes that the Energy Department hopes to exploit to produce biofuels, to sequester carbon in the ecosystem, and to clean up environmental pollution’. To help in expanding ecosystem research, the first part of a database, which includes ‘3 million traits for 69,000 of the world’s roughly 300,000 plant species’, has been published by Jens Kattge and >130 colleagues (Global Change Biology; doi:10.1111/j.1365-2486.2011.02451.x). Amongst the ambitions of TRY (as it is known, which is not an acronym, but ‘rather an expression of sentiment’) are that the improved availability of plant trait data in its unified global database will ‘support a paradigm shift from species to trait-based ecology, offer new opportunities for synthetic plant trait research and enable a more realistic and empirically grounded representation of terrestrial vegetation in Earth system models’. Let us hope the team keeps… err… trying! Finally, and on a more modest scale but also contributing to large datasets, the UK’s Centre for Ecology & Hydrology has released its third Land Cover Map for the UK. Produced at 25-m resolution, land cover was derived from ‘more than 70 satellite images’ and contains spectral information that corresponds to different ground surfaces and vegetation types in both summer and winter. An automated classification process was used to assign a land cover type, based on an existing Biodiversity Action Plan (BAP) Broad Habitats, to approximately 10 million land parcels, which are widely used in monitoring and reporting on the UK countryside. The new map reveals UK land cover as comprising mainly ‘Arable and Horticulture’ and ‘Improved Grassland’ habitats (25 % each). Aah, such a green and pleasant land… so best not to dwell on how much ‘Semi-natural Grassland’, ‘Mountain, Heath and Bog’, and ‘Broadleaved Woodland’ might have disappeared since the previous maps in 2000 and 1990.