2 short-term on-premise servers with a capacity of 260 TB, and 1 long-term storage system provided by EPFL in Amazon S3 buckets
On-premise processing servers. After publication, the archive is moved to a 10 years long-term storage with access via standard S3 (AWS) protocol. The rest of the data is stored as long as the project lasts, from 5 to 15 years.
Large image data causes challenges beyond image acquisition. Storage space, associated costs, speed of transfer and processing, and the data quality.
The laboratory of Andrew Oates currently generates about 913 GB of raw data per 1 movie, and about 6 movies per week. If the acquisition rate remains the same for a year, then the data volume would account for 270 TB a year.
Such volume poses serious challenges.
First, the data is transferred through a costly high quality network to a short-term storage system. Transfer and processing of such data takes a substantial time, delaying the research.
Second, the data storage costs are growing exponentially. For example, a centralised workstation for image processing, GPU-computing and big data storage (130 TBs) costs $120,000. With electricity, cooling system, and IT staff maintenance the costs grow substantially.
Third, large data takes up storage space quickly. The simple solutions are either buying a new system or deleting the data to free up space, which opposes industry recommendation for re-using the data. The less data is re-used, the more animals are used. So the most reasonable solution is data compression. However, typical lossless algorithms offer low ratios, while lossy compression doesn’t preserve full information, thus affecting the quality of the data.