Thanks for your email and advise and well-intended warnings and heads-up
Wanted to clarify that it wouldn't be hosting peoples datasets for them, but rather opening up to the scientific community, like open source.
Inspired by my own studies with EEG, and listening to computational audiology and computational sciences ..."what could someone else do with my EEG data of my own brain". This project is to explore if there's any interest out there.
It would be people who have data , can put it into a pool which is then available to everyone (via, yes, "hosting" - but its actually SSH connection to the box, so its not "on the Internet" for everyone and their dog)
It's not backup, nor the only copy, nor hosting. Users would be advised to obviously have their own local copy, backups, etc that they normally use. They maintain their own data as per their own local regulations and or needs.
This NAS if you want to call it that, was 300 in total , using enterprise grade hardware - what's called e-waste. Corporations dumping gear 4 year cycle. So it's currently 8GB RAM on FM2+ dual core (this is 2011 era hardware capable of 64GB RAM but thats just speed of access and multitasking, so lower on the agenda) with 10 drive spaces , occupied by 5 drives at the moment. I anticipate the next cycle of big corporations dumping their "old" gear, specifically hard drives, to be 2TB or 4TB drives , so adding those in would boost from 5TB (useble is only 3TB due to ZFS2) to adding ten TB for 15 minus 2, a total of 13TB. Or even add 20, for 23TB. That cycle is two years away realistically. So 3TB for now. Each drive sells for peanuts (think the price of three craft beer pints) compared to retail value, as corporations simply wish to move to bigger or faster NVME style SSD drives due to more energy efficient , smaller, more reliable, and something like 4000x faster speed.
My question and email to AUDITORY List was more about seeing what people in our fields might imagine we could do with such a dataset specifically about hearing. Correlational studies? large data sets? Computational Audiology Network is an example of these types of concepts out there.
Best
- Nathan
Hi Nathan,
Respectfully, this sounds like a very bad idea for many reasons. Don’t make yourself responsible for maintaining other people's data. I would even advise against building a NAS to store your own data, as there
are plenty of ways to securely store ~20TB that will be easier, cheaper, and quicker than self-hosting.
S.
Dear AUDITORY List members,
Would there be any interest from members in a collective server with a data array of 20TB; the aim being able to secure store Auditory data from human participants in line with Human ethics boards requirements on data longevity. Think large open data sets.
Think being able to browse your dataset for new insights in a new way with their specialism.
Myself,I have found that long-term storage required a specialised array to extend past the 10year mark. This is for example the finalised dataset of semi or even fully anonymised data, or equally the large size original recordings (video, EEG, etc) that span
several GB a file.
A description of the server can be found at:
https://tinyurl.com/auditory-serve
(note the drives in the actual server are six 5Tb HDD's in an array, with data longetivty exceeding 15 years- but the description shows six 1Tb - an older configuration)
It runs Linux but SMB based (allowing Windows compatibility) and exFAT (limiting to 4GB per file) and using SSH currently for secure connection and data transfer.
I look forward with interest to any correspondence with AUDITORY members and their learned opinions on wether such a open source, non-profit server is of use within our auditory neuroscientific community, or indeed even hearing about your existing options in
this space.
nga mihi
Nathan
--
Nathan Barlow
BSc, PGDip, MSc(SpchSci)(Hons), CoP, MSc(Clinical Audiology)(Soton)