ST117 2024 Final project assignment for WR (written report) – Phase 1
Your written report will be a summary of a real-world data analysis project using the UK ECN repository introduced in the last lecture in Week 10 of Term 2. The assignment is released in two phases. The first phase (released along with Log3) is detailed below. The second phase will be released along with Log4.
Phase 1: Scientific background, study design, data download, data subsets, and EDA
The goal of this phase is to download and familiarise yourself with the datasets and their context. This involves reading about the scientific methods used to collect the data and use R to explore them from many angles.
-
ECN is a UK-based multi-agency programme with funding and monitoring from a consortium of UK government departments and agencies. The network is coordinated by staff at the UK Centre for Ecology & Hydrology (UKCEH). UKCEH manage the data generated by the programme, which are stored in a central database and are made available for research and education. ECN is a highly valuable long-term data collection that started in the early 1990s.
"In its first two decades of operation the ECN has accumulated a robust set of baseline data that describe environmental and biological variability across a range of habitats in unprecedented detail. With appropriate, informed development, these should prove invaluable in discerning the causes and consequences of environmental change for decades to come." (Sier & Monteith, 2016)
-
The scientific basis for the ECN data repository is explained in Rennie, S. (2016). Providing information on environmental change: Data management, discovery and access in the UK Environmental Change Network Data Centre. Ecological Indicators, 68, 13-20. DOI: 10.1016/j.ecolind.2016.01.060 (see Moodle link)
-
The ECN homepage https://ecn.ac.uk is your starting point. Get an overview of the subpages available there, specifically:
-
information about the terrestrial monitoring locations https://ecn.ac.uk/sites/site/terr
-
publications based on ECN data https://ecn.ac.uk/what-we-do/science/20yrs-si-
keypoints
-
available datasets https://ecn.ac.uk/data/available-data
-
-
The data is actually held at UK CEH. The landing page for selecting datasets to download is https://catalogue.ceh.ac.uk/documents/4971bce4-8a81-4e23-9637-7fcff37c5f21
-
Plan where you store the data, supplementary information and R code for this project in your computer. For example, you can create an R project to store everything relevant in the same location (see Log2). Download the raw data and the supporting documentation for the following UK ECN datasets:
-
bat data (1993-2025)
-
bird data (1995-2015) (note: do not confuse with the other bird dataset with different year
range)
-
moths data (1992-2015)
-
meteorology (1991-2015)
-
Page 1 of 2 (please continue reading on next page)
-
Write a summary describing the aspect of these datasets that are most essential for statistical analyses (e.g. focus on when, where, and how the data was collected, as well as the study objectives). (Your summary should not exceed 1 page in 11pt font.)
-
Using the documentation, draw diagrams that visualise the structure of these datasets taking into account the spatial and temporal structure of the data collection and, where applicable, the mode of data collection and recording schedule.
-
Carry out exploratory data analysis (EDA) for the bat, bird, and moth data. In selecting your summaries and plots keep in mind that the goal is to familiarise yourself with the data considering:
-
guidance and resources in the EDA paragraph of the Log3 section on Data Analysis Cycle;
-
using Tidy R (see Log3) to administer the data structures using the piping technique
(which can be more readable that iterated subsetting and brackets);
-
recall the resources about data visualisations given for Activities 1 and 2;
-
the summary and diagram from Steps 6 and 7 can guide you in selecting suitable plots;
-
EDA includes data quality analysis (DQA), as mentioned in Log3 as well;
-
if there are obvious recording error you may fixed those before further analysis, but you
should leave a note in your report about this for transparency;
-
most of all we are interested in the counts (in the datasets these are in the column
VALUE) of the animals obtained during the recording periods which can be visualised from a variety of perspectives (across years, across recording periods, across sites, etc).
-
-
By Tuesday you will receive an email with the codes for 3 bat species, 4 bird species, 5 moth species, and 4 locations. These are specific to your report pod.
-
Carry out a more detailed descriptive analyses of these species (in the datasets this is the column FIELDNAME).
-
Compare your observations with those of the full datasets of bats, birds, and moths, respectively.
-
-
The meteorology data is scattered around many (big) files so EDA for all of them would be beyond the scope of this project. However, do carry out EDA for the meteorological variables for the year in which you were born.
The outputs of steps 6 to 10 will be used as a basis for your report writing. Note you want to also explain what you intended to show with a certain type of figure and what it does indeed show for the data, including what that means in the real-world context. (Detailed guides and submissions templates for the final version will be given along with Phase 2 of the assignment.)
Page 2 of 2