Assisting Big Science with Big Data

By on 1 Feb 2017

Category: Tech matters

Tags: , ,

Blog home

The Square Kilometre Array (SKA) is an international, multibillion-dollar project to build the world’s largest radio telescope co-sited in Africa and Australia. Once completed, it will be the world’s largest public science data project, producing more than 100 terabytes of raw data per second to researchers all around the world.

100 terabytes of data per second: that equates to around three times more data than is currently produced daily on the Internet, and that’s only one experiment.

Big Science projects like the SKA (and the “elephant data flows” they produce) are nothing new, however, they are becoming more common according to Brett Rosolen from Australia’s Academic and Research Network (AARNet).

Brett Rosolen is the Data Program Manager, eResearch at AARNET.

“Emerging groups like geneticists, geologists and climatologists are starting to take advantage of more sensitive and affordable measuring tools, which in turn are generating very large volumes of data that many researchers find difficult to handle. This often requires them to transfer the data to a collaborator with higher computing power to process it for them,” says Brett, whose role in AARNet’s eResearch Data Program is to work with researchers across Australia to “come to grips” with how they can transfer large data flows.

“Physicists, having been transferring these elephant data flows for years, have become very good at it,” says Brett. Recently a physics group at the University of Melbourne transferred 53 terabytes in 24 hours, at 100% efficiency, from collaborators in Germany as part of research associated with CERN’s Large Hadron Collider.

“The branches of science without a background in computing need someone to advise them on how they can take advantage of National Research and Education Networks (NRENs).”

Providing fast and affordable transit

AARNet was established in 1989 to interconnect the Australian Research and Education community to each other and the world.

It currently connects over one million users — researchers, faculty, staff and students — all of whom communicate directly with other universities, research and education organizations globally without touching the commercial Internet.

“The real reason for our existence is to support peak research traffic requirements that a typical commercial ISP would not be able to do economically,” says David Wilde, AARNet’s Chief Technology Officer.

“We have an agreement between all NRENs to peer with each other, so we don’t charge institutions for data that goes between each other. This means we don’t penalize our community for research and education collaboration, but rather encourage it.”

David says it’s challenging to develop and offer such a network given Australia is one of the most geographically isolated and expansive countries in the world. However, it is exactly what AARNet’s Network Architecture and Infrastructure Development Group are committed to achieving.

David Wilde says he and the Network Architecture and Infrastructure Development Group at AARNET make sure they are building a network ahead of current requirements.

“A big part of this has been developing our own local infrastructure, for instance, dark fibre,” says David. “We’ve spent a lot of time building or procuring dark fibre between cities and campuses because that underpins everything else.

“Most importantly we try to make sure we are building just ahead of requirements,” something David and his team have done recently by upgrading AARNet’s transmission and IP networks to 100Gbps capabilities. The new upgrades can potentially allow researchers to transfer a petabyte of data in a day; a request that Brett says is becoming more common from the researchers he meets.

Educating new users to take advantage of the infrastructure

In the past five years, AARNet’s eResearch team has grown from two to five people as the organization starts to recognize the need to educate new users and provide services that meet their needs.

“We’ve done architecture and infrastructure very well; now education is critical,” says Brett.

“We used to say ‘there’s your pipe, knock yourselves out’, now it’s ‘here’s your pipe and here’s what you need to do to make the most of it’.

“It’s also about working with researchers and research support teams to understand their requirements and then relaying that back to David’s team who can help with developing systems and software that allows researchers to do this.”

One example of such a system is Science DMZ, which partitions a university network to allow for an optimized research network that can handle large data flows.

 

The Science DMZ concept was created by the Energy Sciences Network (ESNet) in the US. It is defined as a portion of the network, built at or near the campus or laboratory’s local network perimeter that is designed so the equipment, configuration, and security policies are optimized for high-performance scientific applications rather than for general-purpose business systems or “enterprise” computing.

 

“We often work with campuses to establish a DMZ to allow their data-intensive researchers to bypass campus firewalls that can slow big research data flows down, while not compromising the university’s general network,” says David.

“It’s all about allowing scientists to focus on doing the research and producing results of national and international significance faster, whether that be developing genetic treatments for Alzheimer’s disease or improved computer processing abilities, rather than trying to be part-time IT geniuses at the same time.”

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top