In this blog post, we explore whether big data can be the key to technological innovation and social progress, or if it risks becoming a tool for surveillance and control.
In the fairy tale “Hansel and Gretel,” two siblings who lost their way in the forest dropped bread crumbs to mark their path. The siblings intentionally left traces of themselves, but what if something representing us was leaking out without our knowledge and marking our path? It sounds like a fairy tale, but with smartphone penetration exceeding 60% and CCTV cameras installed in every corner of Seoul, and with computer networks now commonplace in every organization, we are leaving traces of information everywhere we go, without even realizing it. What’s more, these “footprints” do not disappear once they are recorded, but are stored in places unknown to us. Efforts to create useful value by utilizing this accumulated data have been ongoing for a long time, and we can already easily find examples of its use in various aspects of our lives.
The term “big data” may be unfamiliar to those who hear it for the first time, but it is more closely related to our lives than we think. The moment we leave our homes and tap our commuter pass on the bus, information about our commute and means of transportation is generated. The moment we swipe our credit card for lunch, information about our favorite menu items is stored, and even at the supermarket, information about our favorite foods and consumption patterns is entered somewhere. The huge pile of information that accumulates from this data is called big data. In its report, “Big Data, Big Impact: New Possibilities for Global Development,” the World Economic Forum stated that “researchers and policymakers are beginning to realize the potential that can be unlocked by harnessing this flood of data.” The expression “harnessing” implies that big data has no value if it is simply understood as a collection of data. Therefore, big data does not only refer to a large amount of data accumulated over a long period of time, but also includes techniques for extracting systematic rules and trends from that data.
There are various methods for finding and analyzing rules from large amounts of data. Natural language processing is a technology that mechanically analyzes human language and converts it into machine language that can be understood by machines, enabling the extraction and processing of useful information. The social media we use is also an excellent source of data. Social network analysis is a technique that analyzes the connection structure and strength of social media members to track the spread and influence of information. Cluster analysis is a method of extracting a set of information with similar characteristics by combining information with similar characteristics. In addition, various technologies for extracting information from data are being developed. One of the most widely used data analysis solutions created by combining these techniques is Hadoop, which is used by Yahoo and Facebook.
A good example of big data being used in our daily lives is the voice recognition feature found in almost all mobile phones. No matter how smart machines are, there are limits to their ability to accurately infer the meaning, intent, and grammatical relationships of human speech. Apple’s voice recognition technology, Siri, analyzes the grammatical structure of user commands by extracting repetitive language patterns from a database that organizes and categorizes numerous texts floating on the Internet, and uses the results as the grammatical basis for the program. In other words, the website itself acts as the brain of the program.
In the US presidential election a long time ago, Obama became a hot topic for his personalized campaign that actively utilized big data. The Obama camp went to great lengths to collect and organize information representative of voters, such as credit card and loan information, car models, newspaper subscriptions, and religion, and then analyzed and categorized their tendencies. They then delivered information that would satisfy their main interests through social media. For example, a housewife with children attending public school who recently tweeted about organic food would receive Michelle Obama’s eco-friendly message via Twitter. Obama was able to win by a landslide in key battleground states through this active use of data.
Google is also one of the most successful examples of commercializing big data. Google collects data on users’ interests by referencing their search history, visited sites, email content, and Google’s networking service, Google Plus. This data is analyzed to provide users with advertisements most relevant to their interests through Google’s email service, Gmail. In addition, Google Translate boasts the highest accuracy among similar programs, and the secret to its success lies in its vast amount of data and its utilization. Google Translate analyzes the text of numerous web pages written in more than 20 European languages, identifies grammatical rules, and uses them as a basis for translation, giving Google Translate an absolute share of the market in Europe. In addition, the potential of big data is endless. From commercial applications such as restaurant reviews and shopping mall product recommendations to police criminal profiling, preliminary research for government policy decisions, and even power analysis in baseball, big data and its applications are bringing enormous cost savings and new possibilities.
As the potential of big data has begun to attract attention, many companies have been trying to apply it, but surprisingly, not many seem to be utilizing it properly. Harper Reed, former Chief Technology Officer of the Obama campaign, expressed concern about the overuse of the term “big data,” saying, “Big data is bullshit.” He stated that among those who claim to deal with big data, almost none have enough data to qualify as “big,” and pointed out the public’s lack of understanding of big data, saying, “The term ‘big data’ has come to refer to the analysis tool rather than the data itself.”
With the increasing use of big data, there are growing concerns that personal information is being used excessively for corporate profit. There are also concerns that the spread of big data will make it easier for governments to control individuals, leading to a “Big Brother” society. According to a study by the Korea Information Society Agency, the average American is captured by a camera more than 200 times a day. In addition, too many institutions, including insurance companies and banks, already hold our information. For example, Edward Snowden, a former employee of the US National Security Agency, revealed the extent of government surveillance, saying, “The scope of mass information collection extends to the general public.” As such, concerns about security and anxiety over the development of national information gathering capabilities are major obstacles to the spread of big data technology.
It seems that it will take some time for big data to become fully established in South Korea. In a survey of 240 companies and public institutions in Korea conducted by the Korea Information Society Agency, 77.1% (208 institutions) responded that they utilize databases, but their score for data utilization technology was only 57.1 out of 100, and the percentage of data utilization in decision-making was only 27.0%. This is analyzed as being due to the fact that big data has not been introduced in Korea for very long, so the scale of data is small compared to advanced countries, and the methodology for handling it is not yet technically mature.
The robots that appear in the movie “Ex Machina” boast the ability to think and speak at a level almost identical to that of humans, and the software that forms the backbone of these robots is none other than the world’s largest search engine. In other words, all the information that people leave on the Internet is the brain itself. This may be an unrealistic story, but it reflects the reality that big data is used as an important indicator of thought and judgment in modern society. It is up to the data miners of future generations to extract value from the ever-growing mountain of information and decide how to process it. If used effectively, it will contribute to humanity and society in almost all fields, including politics, economics, society, and culture.