Re: [AUDITORY] F.O.S.S auditory data server 20TB

Re: [AUDITORY] F.O.S.S auditory data server 20TB - feasibility/utility ?

Subject: Re: [AUDITORY] F.O.S.S auditory data server 20TB - feasibility/utility ?

From: Krzysztof Basiński <k.basinski@xxxxxxxxxxxx>

Date: Fri, 27 Sep 2024 09:13:10 +0200

Arc-authentication-results: i=1; mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=pFWOuKhp; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gumed.edu.pl

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:in-reply-to:to:comments:subject:from:sender:reply-to :date:message-id:references:mime-version:approved-by:dkim-signature; bh=D6YX/5dLshCYq+lTca7bSp0nzNhwNRokRs6EDSooDHE=; fh=5/42mu9FVmfuMp6n0xGXVcDar2H3ENcHt8Uv11Om8gY=; b=i/570Nyk9xtFzLl6AhUqIuoGHzNTjeyWQy3Yaad6Lfdu2SJpRnQG+gcpQhwPHijbLW bKTjrBP7PtMM8OiMLnTISvYSvKXbP/sqkRbtJk6YdxBfHHm7CwzJ5nlPR1DzoaSro+++ w5+kAYH/du20c1G5GPyNkcRBI1lhWLVe8cBoBNLkuhT3mZMawCB3eVRKmq1oET3qlGGq rrT8I+ysxtzG5/95ME49ifcYPDJrX8zFDq3GiRQiCCin075VQ3Mpx2nYlTdNEOK/vLh2 XQa0edCyNLcMcFVRJ7ys1O/2xPyDo2W+Ut2nLYjNYSPazx1xfcTBHZBjA7j9NoI9LHbM Zfsw==; dara=google.com

Arc-seal: i=1; a=rsa-sha256; t=1727422959; cv=none; d=google.com; s=arc-20240605; b=Jd6yxqqIYHdF0xKDnipQzpmmDqYJW7DMnmRmU8Jjnz6lqYGL5RIZgzoShoQbqkQSuj 4bAr0ae+VJdNV/KWuXYHoCF0j250ZK/jxcHJNVRMea1YyKvqPjVUZCrpxtK/H5zJRsNs RWEdzuSbbpdIa6d9iHk89EX7IS24dTfcISvg0R4uE75ZhmN2cLF7RMysaTQE6xZFgs8t 269FlNPwBMNt6Onq3w9OxR1tQ0wuHVD4W+S2BHpQRc4LVLmaYlasgQwuGWnl0Lgxj97r d+p2GNArKAI9hpbMw4U8k67/Msf4dOg96Dk7fDioSYqTIGME7UNPJ2PYRpr/KM5TC7j5 TRjw==

Authentication-results: mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=pFWOuKhp; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gumed.edu.pl

Comments: To: Nathan Barlow <nb.audiology@xxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; d=LISTS.MCGILL.CA; s=SELECTOR1; c=relaxed/relaxed; bh=D6YX/5dLshCYq+lTca7bSp0nzNhwNRokRs6EDSooDHE=; i=@LISTS.MCGILL.CA; h=Approved-By:Content-Type:Mime-Version:References:Message-ID:Date:Reply-To:Sender:From:Subject:To:In-Reply-To:List-Help:List-Unsubscribe:List-Subscribe:List-Owner:List-Archive; b=pFWOuKhpiE3tM4E6FmC2REJLbi9wErrdnXOaJygJ2y+CipxA8dCeK0Gwa8S2+0CSf2fSt4QASW54/EBreInEyyuNnDQGUx+/x36XHBUY6dyMDYjqNlKc74R/akdxyCnUiWuqnoRY2BNrJI43PnZMPfixaC/fo/H6di33Jiu2YrXamGCebqUGH7e8+FQ1m+K0+oa4lwGGQP5RuBz2L0CyBhwhQ1ZbJRO4lmcp7sJTgoVbqmfqEfTmCQ8QC17igbRWnhgGwLBZuH/aoTbv+wEgG7UtqzWxG42iJVCMRC2cjmhYHD80jWZ/yVYN/W1QJAXByPqyWVcDIF/6WSBGJI5iKA==

In-reply-to: <CAP5KYLotHeYvWHEdK9kpgvErVOycemSQR9QTXWvy0hT0RhWs8A@mail.gmail.com>

List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <CAP5KYLotHeYvWHEdK9kpgvErVOycemSQR9QTXWvy0hT0RhWs8A@mail.gmail.com>

Reply-to: Krzysztof Basiński <k.basinski@xxxxxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Nathan,

We’re running a 6TB NAS over local LAN for our lab needs and use OSF repositories (https://osf.io/) for publishing open datasets. I think that the issue here is longevity and as a community we should aim for 50+ years of preservation guarantees. OSF has been setup for exactly that, with an institutional fund to support their servers for 50+ years.

Personally I’ve been in a situation where I'd needed data that someone had openly shared in the early days of open science (think early 2000s). They'd shared it on their institution’s server, which seemed sensible at the time. However, the server has since been updated a million times and the URL is broken. The corresponding author seems to have left academia, the senior author died, other authors have no clue as to what happened with the data…

So even though it’s quirky, I prefer to have an OSF DOI and a guarantee that it’s going to point to the dataset for the next 50+ years.

Storing data that’s not anonymised is another issue altogether, but I’d myself prefer to have something local for that.

Best,

Chris

Krzysztof Basiński, PhD
PRINCIPAL INVESTIGATOR

Auditory Neuroscience Laboratory

anl.gumed.edu.pl

ASSISTANT PROFESSOR
Division of Quality of Life Research
Medical University of Gdańsk

krzysztof.basinski@xxxxxxxxxxxx
zbnjz@xxxxxxxxxxxx
+48 58 349 1569
Tuwima 15, 80-210 Gdańsk

mug.edu.pl
farU.edu.pl/en

This e-mail and attachments to it may include information legally privileged. It is intended solely for the recipient. If you are not the recipient,please inform the sender accordingly and delete the e-mail with all attachments received. Copying and/or disseminating this e-mail and information included in the e-mail by unauthorised individuals is strictly prohibited. Your personal data is administered by Medical University of Gdańsk. Information on personal data processing is available at gumed.edu.pl/55839.html

On 26 Sep 2024, at 09:38, Nathan Barlow <nb.audiology@xxxxxxxxx> wrote:

Dear AUDITORY List members,

Would there be any interest from members in a collective server with a data array of 20TB; the aim being able to secure store Auditory data from human participants in line with Human ethics boards requirements on data longevity. Think large open data sets. Think being able to browse your dataset for new insights in a new way with their specialism.

Myself,I have found that long-term storage required a specialised array to extend past the 10year mark. This is for example the finalised dataset of semi or even fully anonymised data, or equally the large size original recordings (video, EEG, etc) that span several GB a file.

A description of the server can be found at: https://tinyurl.com/auditory-serve
(note the drives in the actual server are six 5Tb HDD's in an array, with data longetivty exceeding 15 years- but the description shows six 1Tb - an older configuration)
It runs Linux but SMB based (allowing Windows compatibility) and exFAT (limiting to 4GB per file) and using SSH currently for secure connection and data transfer.

I look forward with interest to any correspondence with AUDITORY members and their learned opinions on wether such a open source, non-profit server is of use within our auditory neuroscientific community, or indeed even hearing about your existing options in this space.

nga mihi
Nathan

--
Nathan Barlow
BSc, PGDip, MSc(SpchSci)(Hons), CoP, MSc(Clinical Audiology)(Soton)
www.eresope.wordpress.com
@eres_ope