• Internal Code :
  • Subject Code :
  • University :
  • Subject Name : IT Computer Science

Table of Contents


Research question

Literature review

Search techniques

Repeatable search processes

Gaps in literature



Introduction to Big Data Implementation

This era can be marked as the decade of digitization, globalization, utilization of different categorized sensors, adoption of wearable digital devices, and surfacing of different social networking stages. As a result there’s an enormous utilization of internet and gigantic quantities of data is being generated on a daily basis. If it’s being seen over the past 2 decades, there is a remarkable escalation in data in every field i.e. in education, government, lifestyle etc. Therefore, in this project the focus is on Big Data implementation, background context, research techniques, and its tools to analyze the big data centric globe around. Big data implementation is a term used globally to define those data sets which are hefty, complex in capacity that it cannot be stored in application software or rather it can be said that these software’s are inadequate. Before 2020, when the data implementation, analytics was stumpy transaction-processing application software was used by everyone.

The amount of Information technology data in 2020 is extremely large and enormous, that cannot be defined, and a research study of International Data Corporation claims in 2020- the total sum of data will be more than thirty five trillion GB which is analogous to the vast sky of our universe. Now for the storage of structural and unstructured data in a short lag is a sort of dare. And the kind of traditional technical software’s are not of use in case of the volume. That’s the main reason for the worldwide IT system to adopt or to implement Big Data tools or products (Wolfert et al., 2017). These global companies have started selling these various Big Data analytic tools, products to government, organizations. As a result these big data tools are gaining recognition online globally as it keeps up growing. Examples of some tools are - Hadoop, Apache Spark, and Talend.

Research question

The main objective of this research is based on this following questionnaire.

1. What is the prospective of Big Data implementation?

2. Why did big data evolve with the span of time? And how did it evolve?

3. What tools and transformations are being used for big data analytics implementation?

Literature Review of Big Data Implementation

Big data is associated with the development in the field of social media and weblogs (Oussous et al., 2018). This area covered the business analytics and business intelligence in the respective parts. Every day a large amount of data is generated and the amount is increasing day by day. So, big data analysis is important to manage all the data on the internet and maintain the security of all the private data. In recent years the in-memory database gives an opportunity to store large amounts of data in one time and manages data collection on the internet. There are large numbers of technologies controlling the data of the internet in various ways (Oussous et al., 2018). These are the followings:-

1. Distributed Computing:-

The big data which are distributed among the computing systems are based on the open source technology method.

2. Flash Memory:-

It is a solid-state drive that allows the computers to work as a universal entity.

3. Mobile Devices:-

It is a device which produces a huge amount of big data like a computer and works as a computer also (Zhou et al., 2016).

4. Cloud Computing:-

This is a very modern technology which gives facilities like storage, database, service, etc. It is considered a very big solution in the case of big data maintenance.

All the data are unstructured from various sources like websites, social media, weblogs, text files, emails, photo images, social media posts etc.

It can be stated that the renowned big data companies like Google, Facebook and twitter are using big data applications in a very efficiently and effective way (Erevelles et al., 2016).

Analytics is the area which is involved in data evaluation, exploring, visualizing, and communicating with the current data trend. Various approaches evolve in the big data analytic process. The followings are the types of approaches:-

Descriptive analytics: This analytical technique is very simple in nature. It elaborates the matter contained in the data or database (Erraissi et al., 2017).

Predictive analytics: It is the advanced software which can predict the variables and models for descriptive analysis.

Diagnostic analysis: This analytic is very useful to monitor health of a system or to retain the past data.

Prescriptive analysis: It uses the methods of decision science, management science and mathematical techniques to locate the various resources of data (Daki et al., 2017).

The terms which are used in analytics are-

  • Business Analytics:-

This term is defined as the combination of Prescriptive, descriptive and predictive analysis. It is to create these analyses in a new, creative and informative way.

  • Business Intelligence:-

It refers to the reports of business analysis and information. It also discusses the areas where business actions need to be employed (Erraissi et al., 2017).

The big data analytic software is made to perform in a big data platform. The various big data platforms are:-

  • The Hadoop ecosystem:-

It is open source software by Apache. It can maintain both structured and unstructured data in an efficient and reliable way.

  • Common Big data analytic tool:-

It performs machine learning methods and data mining algorithms (Erevelles et al., 2016). It is helpful for analyzing small scale and large scale data from the resources.

With so much data available with companies they are likely to face certain issues in dealing and management of the same. If we look back a few years we will be startled at the rapid rate at which data accessibility has increased, that companies now have access to twice the data than they used to have previously. Now, the companies have data for almost everything starting from the data about the expectation of the consumer to the restaurants liked by them, to the reactions of the youth of the nation on a given trend or issues and the like (Attaran et al., 2018). The problem is never with so much of in detailed data being generated but the issue is with the management of the same because it will soon be difficult to compute, save and retrieve such gigantic amounts of data. Talking of volume of data available it has to be noted that it is both structured and unstructured data that is being considered here. So when we talk about the increase in data it also includes data stored in formats like the audio, video, social media content and also the data being stored on smartphones. So, companies now use certain databases to help store and analyze the amount of data generated.

The most common amongst them is Cassandra, CouchDB, Greenplum Database, HBase, MongoDB, and Vertica and to deal with data volumes the companies rely on Hadoop. While these databases help store a large amount of data, there is also an issue with data privacy and real time data availability (Attaran et al., 2018). While most companies have access to the amount of data available there is still a challenge that they might face. The data keeps on getting updated every second and if they do not have a separate team employed to keep a track and doesn’t have necessary tools or technologies to update them on the every minute data being generated then the companies might lose out of important data that might have helped them to take valuable insights and enhance their decision- making strategies. So, most companies these days use the Hadoop database for data analytics. They also use it for management of large amounts of data. While Hadoop helps in parallel processing of data, it faces certain issues when it comes to computation power as in the CPU power and also the storage of the data (Verma et al., 2018).

The analysis of Big Data in recent times has resulted in a considerate decrease in cost of storage facilities and computing power, cost effective and flexible working of data centres and also the development of various data frameworks that further enhance the distributed computing of large amounts of data through flexible and easy parallel processing of data (Verma et al., 2018). Data security is a key concern with so much unstructured and structured data being generated every minute and from sources across the globe, companies should take necessary action for data privacy policies for securing the storage and for retrieving of data and preventing it from getting hacked or being accessed by unlicensed individuals who might tamper with so much data (Scutari et al., 2019).

Search Techniques

Big Data requires specialized searching mechanisms and systems. This is because of the large volume of data mined; it is difficult to perform traditional indexing. Data is usually spread across several servers and as a result, traditional database management systems are unable to perform otherwise simple tasks on the data (Scutari et al., 2019). To solve these issues, many specialized algorithms have been developed to ensure a faster transaction over these large volumes of data.

Although there is a lack of good software for Big Data search, there are still a few popular ones that one might use. One example that comes to mind is the Hadoop Aggressive Indexing Library or HAIL developed by Professor Jens Dittrich of Saarland University and his team. It is an enhancement to the Hadoop Distributed File System or HDFS and can increase performance up to 64 times of non-indexed searches (Huang et al., 2018). HAIL, typically, starts acting on data right at the time of upload and creates a binary called Partition Attributes Across or PAX. It picks row-based chunks of data from HDFS and arranges them into columnar-orient

Some other tools that can be cited here are Lucene by Apache, Apache Solr, and Elasticsearch. Lucene is a search algorithm that is under the Apache license. It can perform indexing at a rate of up to 100 GB per hour; provide ranked, field, and data-range search results. Solr is scalable and fault-tolerant and can perform near real-time indexing along with dynamic clustering, faceted search, and geospatial search (Huang et al., 2018). Elasticsearch is based on Solr and is JSON based. It can search through multiple indices and is equipped with easy rebalancing and rerouting.

Repeatable Search Processes

The three R’s of any searching technique have to be – reliable, repeatable, and relatable. With Big Data, repeating becomes very, very important given the massive volume of data being parsed (Adrian et al., 2017). The constant inflow of data means that processes must be repeatable and capable of providing similar results each time to maintain reliability. The first step towards achieving a repeatable search is data management. As long as the data is maintained in order, the search algorithm will fetch similar results under similar circumstances. Data Science is built on scientific methods and algorithms, along with building repeatable processes. Extracting knowledge from structured and unstructured data with equal elan and unfailingly is the crux of any Big Data system (Adrian et al., 2017). Without being equipped to search and analyse efficiently, the system is just an information dump that holds no significance. And for this efficiency alone it needs to be able to perform repeatable searches.

By understanding how to best structure and store data, and how best to query it, one can ensure that the system can perform searches easily, as well as repeat those searches with tweaked parameters as quickly. To ensure effectiveness, in the large data set, it is best to group and index elements fitting a certain criterion. As such searches or some version of them are bound to occur repeatedly, this simple technique can save the system a lot of time and effort (Schwartz and Lutz, 2018). This way, the system is equipped to perform searches and repeat the analysis without having to go through the massive set of data all over again.

Gaps in Literature

From the mighty multinational company to a small vendor producing day to day data Big companies produce tons of data in a single second. To hold and to save this amount of data they need a high capacity server, which can control the internal data and the external data. Companies like Google, Facebook, twitter need a buttery smooth data processor which can control the input output around the world. The study which has been done so far is somehow not being followed by the companies. Day by day the quantity of the data is increasing (Schwartz and Lutz, 2018). This is likely to be as vast as the universe. Companies around the world need to be more generic to implement laws to revise the data on the server. To implement these three categories a company may face different challenges. The limitations of having a big fat data are mentioned down below:

1) Junk information: servers are overloaded continuously by the unstructured data like pictures, videos etcetera. These files are usually potential to slow down the processing speed of the server. It is very important for the companies to remove all the junk files continuously from the server to maintain the free flow of the data search (Mazumder, 2016).

2) Shortage of tech experts: to maintain the huge quantity of the data, companies all over the world need a technologically sound person who can control this enormous data. The technological firm has to hire data experts and data scientists. This can easily increase the cost of the business.

3) Privacy policy of the company: This is probably the biggest threat for the company to keep the stakeholders’ data. Privacy and security is an open source for cyber attackers to breach that. Every other big company throughout the world has faced this massive data breach once (Mazumder, 2016). Privacy and security are dynamic in its zone. This has to be strong from time to time.

In recent studies of big data analysis, the efficiency of the data can be improved from the aged manufacturing system and it can handle the obstacles. It can detect the faults and overcome the difficulties. HDR can handle the various distance calculations and concentration of norms. This methodology helps the data to find the isolation and faults in it. The goal of present study is to collect the maximum amount of information and connect all other ways with one joint for better functioning (Zhou et al., 2016). Modern studies about big data implementation also try to do predictions about city dynamics. City dynamics need an elaborated study regarding the differences and correlations among multiple signals of that city. Digitalization of a city can cause flowing of large data in the city civilization. To convert an urban city to a smart city with a better environment all the data should be visualized and analyzed from various sources. In recent years big data is implemented in various real life situation studies to help the business to perform in a better way (Wolfert et al., 2017).

Conclusion on Big Data Implementation

The wrapping up of this research reporting that there is a probable remarkable benefit associated with big data implementation. By the rapidly increasing amount of big data of enterprises, social networks containment, its increasingly changing the process relating to data implementation. Therefore, big data offers a gigantic potential to optimistically function of different organizations and provide a viable benefit. The implicational products, analytical keys like Hadoop; Cloud-based analytics are being used in big data which contributed in the cost of technology associated to data warehouses. Although big data is not a surrogate key of data warehouse and it also have some limitations also. In this research, Big Data is associated with developmental solutions by different products for commercial and non- commercial purpose. The summation of trillion data’s are stored in these big data products are to be used in a perfect way to make it a useful resource for upcoming days.

Reference List for Big Data Implementation

Adrian, C., Abdullah, R., Atan, R. and Jusoh, Y.Y., 2017, July. Factors influencing to the implementation success of big data analytics: A systematic literature review. In 2017 International Conference on Research and Innovation in Information Systems (ICRIIS) (pp. 1-6). IEEE.

Attaran, M., Stark, J. and Stotler, D., 2018. Opportunities and challenges for big data analytics in US higher education: A conceptual model for implementation. Industry and Higher Education, 32(3), pp.169-182.

Daki, H., El Hannani, A., Aqqal, A., Haidine, A. and Dahbi, A., 2017. Big Data management in smart grid: concepts, requirements and implementation. Journal of Big Data, 4(1), pp.1-19.

Erevelles, S., Fukawa, N. and Swayne, L., 2016. Big Data consumer analytics and the transformation of marketing. Journal of Business Research, 69(2), pp.897-904.

Erraissi, A., Belangour, A. and Tragha, A., 2017. A Big Data Hadoop building blocks comparative study. International Journal of Computer Trends and Technology. Accessed June, 18.

Huang, C.K., Wang, T. and Huang, T.Y., 2018. Initial Evidence on the Impact of Big Data Implementation on Firm Performance. Information Systems Frontiers, pp.1-13.

Mazumder, S., 2016. Big data tools and platforms. In Big Data Concepts, Theories, and Applications (pp. 29-128). Springer, Cham.

Oussous, A., Benjelloun, F.Z., Lahcen, A.A. and Belfkih, S., 2018. Big Data technologies: A survey. Journal of King Saud University-Computer and Information Sciences, 30(4), pp.431-448.

Schwartz, B. and Lutz, W., 2018. Personalized predictions and clinical support tools based on big data: Development and implementation. Big Data in Psychology 2018, Trier, Germany.

Scutari, M., Vitolo, C. and Tucker, A., 2019. Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation. Statistics and Computing, 29(5), pp.1095-1108.

Verma, S., Bhattacharyya, S.S. and Kumar, S., 2018. An extension of the technology acceptance model in the big data analytics system implementation environment. Information Processing & Management, 54(5), pp.791-806.

Wolfert, S., Ge, L., Verdouw, C. and Bogaardt, M.J., 2017. Big data in smart farming–a review. Agricultural Systems, 153, pp.69-80.

Zhou, K., Fu, C. and Yang, S., 2016. Big data driven smart energy management: From big data to big insights. Renewable and Sustainable Energy Reviews, 56, pp.215-225.

Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Computer Science Assignment Help

Get It Done! Today

Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
Not Specific >5000
  • 1,212,718Orders

  • 4.9/5Rating

  • 5,063Experts


  • 21 Step Quality Check
  • 2000+ Ph.D Experts
  • Live Expert Sessions
  • Dedicated App
  • Earn while you Learn with us
  • Confidentiality Agreement
  • Money Back Guarantee
  • Customer Feedback

Just Pay for your Assignment

  • Turnitin Report

  • Proofreading and Editing

    $9.00Per Page
  • Consultation with Expert

    $35.00Per Hour
  • Live Session 1-on-1

    $40.00Per 30 min.
  • Quality Check

  • Total

  • Let's Start

500 Words Free
on your assignment today

Browse across 1 Million Assignment Samples for Free

Explore MASS
Order Now

Request Callback

Tap to ChatGet instant assignment help

Get 500 Words FREE
Ask your Question
Need Assistance on your
existing assignment order?