Арнайы жобалар

BIG DATA Technology

BIG DATA Technology

Students:Manauov D.B., Nurlan A., Abai M.A.

Teacher: Darkenbayev. D.K.



BIG DATA Technology

Abstract:  This article discusses the characteristics of large amounts of data, as well as issues related to their storage and analysis.  Detailed information on the field of study of too large or complex information analysis is given.  This article also has the importance of big data management. There are different notions about analytical methods in the analysis of big data.  Data analysis has been of great interest to many organizations over the years and has been used in many different applications.  Therefore, information on methods of data analysis is also provided.  The article also has how to solve basic problems to get the most out of big data.


Key words: Big data, data volume, data management, analysis, analytics



         A large data area is characterized by features such as scale, diversity, speed, data accuracy, and the value of the information collected.

        In many cases, work with large data involves a normal workflow, from the collection of unprocessed data to the acquisition of usable information.

        The main purpose of working with large data is to obtain valuable analytical conclusions based on them for practical application.  

        Big data is widely used in many areas of business.  They are used in healthcare, telecommunications, trade, logistics, financial companies, as well as public administration. Currently, many companies compete with each other by increasing the volume of processing big data, as the consumers of their software grow every year. This leads to the fact that software development companies must improve the processing of big data. Today the processing of large data and ensuring the safety and storage of them in a safe place is very important for everyone, the solution of such a task must be fast and reliable [1]. As a result of the use of Big Data technology, companies can get important information in a matter of seconds.  This, in turn, will increase the efficiency of economic decisions, respond more quickly to changes in customer behavior, identify market processes at the earliest stages in real time. Usually big data sets are a process of continuous accumulation of various types of unstructured data. It describes a set of data that exponentially grows, which are big, rough and unstructured for analysis by relational database methods.

         The state, which has a large amount of data on individuals and legal entities, also plays an important role in the development of Big Data.


Big data.

The term "Big Data" has recently been applied to datasets that grow so large that they become awkward to work with using traditional on-hand database management tools. They are data sets whose size is beyond the ability of commonly used software tools and storage systems to capture, store, manage, as well as process the data within a tolerable elapsed time. Big data also refers to databases which are measured in terabytes and above, and are too complex and large to be effectively used on conventional systems.

Big data sizes are a constantly moving target, currently ranging from a few dozen terabytes to many petabytes of data in a single data set. Consequently, some of the difficulties related to big data include capture, storage, search, sharing, analytics, and visualizing. Today, enterprises are exploring large volumes of highly detailed data so as to discover facts they didn’t know before [14]. Business benefit can commonly be derived from analyzing larger and more complex data sets that require real time or near-real time capabilities, however, this leads to a need for new data architectures, analytical methods, and tools. In this section, we will discuss the characteristics of big data, as well the issues surround storing and analyzing such data.


Big data characteristics. 

             Big Data (English: Big Data, [ˈbɪ )deɪtə]) is the study of approaches to analyzing, extracting, or processing information that is too large or complex to be processed by traditional data processing software.  When analyzing data, the study of many cases, or the multiplication of attributes of cases, leads to a high degree of accuracy of the study. [2]  Big data is used in data analysis, data storage, information retrieval, visualization, and more.  The basic concepts of big data are volume, variety, and velocity.

Big data is data whose scale, distribution, diversity, and/or timeliness require the use of new technical architectures, analytics, and tools in order to enable insights that unlock new sources of business value. Big data is characterized by three main features: volume, variety, and velocity. The volume of the data is its size, and how enormous it is. Velocity refers to the rate with which data is changing, or how often it is created. Finally, variety includes the different formats and types of data, as well as the different kinds of uses and ways of analyzing the data.


Importance of managing big data

According to Manyika, et al., there are five broad ways in which using big data can create value. First of all, big data can unlock significant value by making information transparent and usable at a much higher frequency. Second of all, as organizations create and store more and more transactional data in a digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days. This can therefore expose variability in the data and boost performance.

Third of all, big data allows a narrower segmentation of customers and therefore much more precisely tailored products or services to meet their needs and requirements. Fourth of all, sophisticated analytics performed on big data can substantially improve decision making. Finally, big data can also be used to improve the development of the next generation of products and services. For example, manufacturers are currently using data obtained from sensors which are embedded in products to create innovative after-sales service offerings such as proactive maintenance, which are preventive measuresthat take place before a failure occurs or is even noticed by the customer


Big data analytics

Big data analytics is where advanced analytic techniques operate on big data sets. Analytics based on large data samples reveals and leverages business change. However, the larger the set of data, the more difficult it becomes to manage [14]. Sophisticated analytics can substantially improve decision making, minimize risks, and unearth valuable insights from the data that would otherwise remain hidden. Sometimes decisions do not necessarily need to be automated, but rather augmented by analyzing huge, entire datasets using big data techniques and technologies instead of just smaller samples that individuals with spreadsheets can handle and understand. Therefore, decision making may never be the same. Some organizations are already making better decisions by analyzing entire datasets from customers, employees, or even sensors embedded in products . In this section, we will discuss the data analytics lifecycle, followed by some advanced data analytics methods, as well as some possible tools and methods for big data analytics in particular.



Advanced data analytics methods

With the evolution of technology and the increased multitudes of data flowing in and out of organizations daily, there has become a need for faster and more efficient ways of analyzing such data. Having piles of data on hand is no longer enough to make efficient decisions at the right time. As Oueslati and Akaichi acknowledged, the acquired data must not only be accurate, consistent, and sufficient enough to base decisions upon, but it must also be integrated and subject-oriented, as well as non volatile and variant with time. New tools and algorithms have been designed to aid decision makers in automatically filtering and analyzing these diverse pools of data.

Data analytics is the process of applying algorithms in order to analyze sets of data and extract useful and unknown patterns, relationships, and information . Furthermore, data analytics are used to extract previously unknown, useful, valid, and hidden patterns and information from large data sets, as well as to detect important relationships among the stored variables . Thus, analytics have had a significant impact on research and technologies, since decision makers have become more and more interested in learning from previous data, thus gaining competitive advantage .

Nowadays, people don’t just want to collect data, they want to understand the meaning and importance of the data, and use it to aid them in making decisions. Data analytics have gained a great amount of interest from organizations throughout the years, and have been used for many diverse applications. Some of the applications of data analytics include science, such as particle physics, remote sensing, and bioinformatics, while other applications focus on commerce, such as customer relationship management, consumer finance, and fraud detection.


Big Data Challenges

Several issues will have to be addressed in order to capture the full potential of big data. Policies related to privacy, security, intellectual property, and even liability all need to be addressed in a big data world. Organizations need to put the right talent and technology in place, as well as additionally structure workflows and incentives to optimize the use of big data. Access to data is critical, and companies will need to increasingly integrate information from multiple data sources, often from third parties or different locations. Furthermore, questions on how to store and analyze data with volume, variety, and velocity have arisen, and current research lacks the capability for providing an answer.

Consequently, the biggest problem has become not only the sheer volume of data, but the fact that the type of data companies must deal with is changing. In order to accommodate for the change in data, the approaches for storing data have changed throughout the years. Data storage started with data warehouses, data marts, data cubes, and then moved on to master data management, data federation and other techniques such as in-memory databases. However, database suppliers are still struggling to cope with enormous amounts of data, and the emergence of interest in big data has led to a need for storing and managing such large amounts of data.

Several consultants and organizations have tried coming up with solutions in order to be able to store and manage big data. Thus, Longbottom [1], recommends that organizations carefully research the following aspects regarding suggested big data solutions before adopting one:

• Can this solution deal with different data types, including text, image, video and sound?

• Can this solution deal with disparate data sources, both within and outside of the organization's environment?

• Will the solution create a new, massive data warehouse that will only make existing problems

worse, or will it use metadata and pointers to minimize data replication and redundancy?

• How can, and will, the solution present findings back to the organization, and will this only be based on what has already happened, or can it predict with some degree of certainty what may happen in the future?

• How will the solution deal with back-up and restore of data? Is it inherently fault tolerant and can more resource easily be applied to the system as required?

Thus, from the challenges of big data is finding or creating a solution which meets the above criteria in regards to the organization.

Big Data Development trends in Kazakhstan

      Currently, government agencies in Kazakhstan are working to implement the concepts of Big Data and Open Data.

       For example, the Ministry of Information and Communications of the Republic of Kazakhstan plans to introduce new technologies for storage and processing of large amounts of information.  The agency also plans to introduce new Big Data technology in Kazakhstan

       He said that they have a great task to bring ICT to the forefront in the world.

       Big data technology is used by healthcare,  banks,  retailers, telecom operators and more.  Use in many industries, many of which are standardized or actively standardized at the national as well as international levels, raises the issue of big data entry.

       Currently, several major standardization institutions, including the International Organization for Standardization and the International Electrotechnical Commission (ISO / IEC)[3], the International Telecommunication Union (ITU)[4], the British Standards Institution (BSI), and the US National Standards and Technology Institute (NIST)[5], are dedicated to big data.  involved in the development of standards.

        The International Organization for Standardization and the International Electrotechnical Commission (ISO) have developed the following technologies: big data (ISO / IEC JTC1 / WG 9 "Big Data"), Internet of Things (ISO / IEC JTC1 / WG 10 "Internet of Things") and smart cities.  (ISO / IEC JTC1 / WG 11 "Smart Cities") created 3 working groups aimed at standardization.

        In accordance with the ISO standard, the Big Data Working Group will act as the main theme of the big data standardization program and identify gaps in standardization.  It develops basic standards, including reference architecture.

        Today, ISO / IEC JTC1 / WG 9 "Big Data" - the International Working Group on Standardization is developing the following drafts of international standards: a set of standards for reference data architecture (ISO / IEC 20547 series of standards) and standards for terms and definitions (ISO / IEC)  20546).  These projects are at the stage of project preparation (stage code - 30).

        There are several areas of big data activity in the ITU.  ITU documents name the following areas of activity:

 - scalable network infrastructure with very high reliability, flexibility and high throughput, low interference.

 - merging and deleting data sets.

        At the end of 2015, ITU members agreed on an international standard for large data.  The new standard proposed by ITU-T V.3600[6] is called "Big data - requirements and capabilities based on cloud computing."

        The standard describes how cloud computing systems can be used to provide Big Data services.  Most importantly, it describes the requirements for cloud data-based cloud computing (data collection, data processing and data storage requirements, analysis, visualization and management requirements, and data security and protection requirements).

         In accordance with the state program "Information Kazakhstan - 2020" in the framework of national standardization this year KRMS "Big Data.  Work is underway to develop a project "Cloud computing based on requirements and capabilities" (based on V.3600: "Big data. Cloud computing based on requirements and capabilities").

         MS RK “Big data.  Requirements and Capabilities Cloud Computing ”project provides for the use of cloud computing to address the problems of big data usage.


     In addition, proposals were sent to the State Standardization Plan to harmonize the following standards for data processing in RC 34 "Information Technology" on the basis of JSC "National Infocommunications Holding" Zerde ".

     For 2017:

    - MS RK "Information Technology.  Data processing centers.  Key performance indicators.  Section 1.  Review and General Requirements ”on the basis of ISO / IEC 30134-1: 2016;

    - MS RK "Information Technology.  Data processing centers.  Key performance indicators.  Section 2. Energy efficiency factor (PUE) »on the basis of ISO / IEC 30134-2: 2016;

     - MS RK "Information Technology.  Data processing centers.  Key performance indicators.  Section 3.  Renewable Energy (REF) »on the basis of ISO / IEC 30134-3: 2016;

      For 2018-2020:

     - MS RK "Information Technology.  Tolerance for IT and with the help of IT.  Control and supervision of intelligent resources of data processing centers "on the basis of ISO / IEC 19395: 2015;

     - MS RK "Information Technology.  Exchange of information between telecommunications and systems.  High-level data channel management protocols based on ISO / IEC 13239: 2002;

     - MS RK "Information Technology.  Processing of documents and transfer of related data.  Verification of the conformity of the standard common language system for the description of documents based on ISO / IEC 13673: 2000 ";

     - MS RK "Information Technology.  Document description and data processing languages.  Hypertext Narrative Language ”based on ISO / IEC 15445: 200; 0.

      - MS RK "Information Technology.  Radio frequency identification for frequency management.  Software system infrastructure.  Section 2.  Data management.



In this paper we examined the concept of big data, as well as some of its different opportunities and challenges. In the first section of the paper, we discussed big data in general, as well as some of its common characteristics. After looking at the importance of big data and its management within organizations, and the value it can add, we discussed big data analytics as an option for data management and the extraction of essential information from such large amounts of data. Association rules, clustering, and decision trees were covered.

However, with enormous amounts of data, performing typical analytics is not enough. Thus, in the following section, we discussed Hadoop which consists of the HDFS and MapReduce. This facilitates the storage of big data as well as parallel processing. Finally, we covered the challenges which arise when dealing with big data, and still need further research. Future research can include applying the big data analytics methods discussed on real business cases within organizations facing big data problems. Furthermore, the challenges related to big data previously discussed can be tackled or studied in more detail.

Thus, we have seen that big data is a very important concept nowadays with comes with many opportunities and challenges. Organizations need to seize these opportunities and face the challenges in order to get the most value and knowledge out of their massive data piles.