Monday, January 21, 2013

Big analog data

USA: In test and measurement applications, engineers and scientists can collect vast amounts of data every second of every day. For every second that the Large Hadron Collider at CERN runs an experiment, the instrument can generate 40 terabytes (1E12) of data.

For every 30 minutes that a Boeing jet engine runs, the system creates 10 terabytes of operations information. For a single journey across the Atlantic Ocean, a four-engine jumbo jet can create 640 terabytes of data. Multiply that by the more than 25,000 flights flown each day, and you get an understanding of the enormous amount of data being generated, as noted by John Gantz and David Reinsel in their November 2011 article “Extracting Value from Chaos.” That’s “big data.”

Drawing accurate and meaningful conclusions from such large amounts of data is a growing problem, and the term  big data describes this phenomenon. Big data brings new challenges to data analysis, search, data integration, reporting, and system maintenance that must be met to keep pace with the exponential growth of data.

The technology research firm IDC recently performed a study on digital data, which includes measurement files, video, music files, and so on. This study estimates that the amount of data available is doubling every two years. In 2011 alone, 1.8 zettabytes (1E21 bytes) of data was created, according to Adam Hadhazy in his May 2010 Live Science article “Zettabytes Now Needed to Describe Global Data Overload.” To get a sense of the size of that number, consider this: If all 7 billion people on Earth joined Twitter and continually tweeted for one century, they would generate 1 zettabyte of data. Almost double that amount was generated in 2011, according to Shawn Rogers in the September 2011 Information Management article “Big Data is Scaling BI and Analytics.”

The fact that data production is doubling every two years mimics one of electronics’ most famous laws: Moore’s law. In 1965, Gordon Moore stated that the number of transistors on an IC doubled approximately every two years and he expected the trend to continue “for at least 10 years.” Forty-five years later, Moore’s law still influences many aspects of IT and electronics. As a consequence of this law, technology is more affordable, and the latest innovations help engineers and scientists capture, analyze, and store data at rates faster than ever before.

Consider that in 1995, 20 petabytes (1E15) of total hard drive space was manufactured. Today, Google processes more than 24 petabytes of information every single day. Similarly, the cost of storage space for all of this data has decreased exponentially from $228/gigabytes (1E9) in 1998 to $.06/gigabytes in 2010. Changes like this combined with the advances in technology resulting from Moore’s law are fueling the big data phenomenon.

Engineers and scientists are publishing test and measurement data more voluminously, in a variety of forms, and many times at high velocities. Immediately after this data acquisition, a “Big Analog Data” problem exists. And advanced tools and techniques are required for data transfer and management, as well as systems management for the many automated test systems, said Dr. Tom Bradicich, Research & Development Fellow, National Instruments.

Value of big data
Small data sets often limit the accuracy of conclusions and predictions. Consider a gold mine where only 20 percent of the gold is visible. The remaining 80 percent is in the dirt where you can’t see it. Mining is required to realize the full value of the contents of the mine. This leads to the term “digital dirt,” meaning digitized data can have concealed value. Hence, big data analytics and data mining are required to achieve new insights that have never before been seen.

A generalized three-tiered solution to Big Analog Data challenges includes sensors or actuators, distributed automated test nodes, and IT infrastructure or big data analytics/mining.

Big analog data and the engineer and scientist
The sources of big data are many. However, among the most interesting to the engineer and scientist is data derived from the physical world. This is analog data that is captured and digitized; thus, it can be called “Big Analog Data”—derived from measurements of vibration, RF signals, temperature, pressure, sound, image, light, magnetism, voltage, and so on.

In the test and measurement field, data can be acquired at rates as high as many terabytes per day. Big Analog Data issues are growing challenges for automated test and analysis systems. In the case where there are many devices under test, many distributed automated test nodes (DATNs) are needed, which are often connected to computer networks in parallel.

Since DATNs are effectively computer systems with software drivers and images, the need arises for remote network-based systems management tools to automate their configurations, maintenance, and upgrades. The volume of test and measurement data is fueling a growing need in global companies to offer access to this data to many more engineers than in the past. This requires network gear and data management systems that can accommodate multiuser access, which in turn drives the need to geographically distribute the data and its access. A growing approach to providing this distributed data access is through the use of cloud technologies.

Big Analog Data applications create a strong dependency on IT equipment such as servers, storage, and networking. In addition, software is needed to manage, organize, and analyze the data. Thus traditional IT technologies are part of the user total solution post data capture to ensure the efficient data movement, archiving, and execution of analytics and visualization for both in-motion and at-rest data. Vendors such as Averna, Virinco, National Instruments, and Optimal Test already offer products to help manage Big Analog Data solutions.

To analyze and manage billions of data points from millions of files, engineers and scientists can use National Instruments DIAdem to mine, inspect, and generate reports on measurements data. They can also use DIAdem to interface with existing IT solutions or create new servers that can be accessed globally to make decisions faster from their data.

For manufacturing, Proligent from Averna and WATS from Virinco deliver solutions that provide visibility and control over product quality, test equipment, processes, and operations. Qualcomm successfully leveraged tools from OptimalTest to optimize its test process, which involves the accumulation of 4 terabytes of data per quarter (Evaluation Engineering, October 24, 2011). Visibility into actionable test data can help engineers identify emerging trends and proactively make decisions.

As it becomes both necessary and easier to capture large amounts of data at high speeds, engineers will face challenges with creating end-to-end solutions that require a close relationship between the automated test and IT equipment. This is driving demand for test and measurement system providers who work with IT providers to offer integrated and bundled solutions for automated test applications.

-- National Instruments, USA.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.