Big Data Analytics: Technologies, Current Research Trends, and Future Needs
Over the past two decades, the volume of data generated in a single day has significantly grown from gigabytes to petabytes (and beyond). Scalability has been among the most impacted attributes with this massive volume increase of data, also referred to as big data, and has continually presented the challenge for underlying hardware and analytical approaches to scale to this change. As a result, various big data technologies emerged that presented various approaches to address scalability issues, for example, Hadoop (a distributed system), NoSQL databases (with BASE properties instead of traditional ACID properties), data lakes (huge data repositories), Apache Spark (a non-map- reduce platform), in-memory databases. Over the past decade, though, big data field has obtained significant attention from many industries but not a lot of initiatives have been taken towards preparing young learners at their undergraduate and graduate in academia. Because of these continually changing technological paradigms, it is of utmost importance to prepare the next generation workforce to be able to learn, understand, research and adapt to these changes. Forbes.com has already estimated that the number of big data trained professionals will soar by 28% by the end of 2020.
Given the current research trends and applications of big data mining and its potential to keep growing in the future, it is important for every industry and academia to adapt to this change. For example, automatic calculation of text sentiments from online reviews to differentiate between negative and positive polarity has reshaped customer product satisfaction analysis. Opinion mining of large volume of reviews (e.g., twitter tweets, amazon reviews) have reshaped e-commerce and the working of many service-based industries, where customers’ opinions about the product are analyzed to compete with their competitors. The opinion mining and sentiment analysis of the large volume of textual data has also been studied to improve the next version of product release, considering the new ideas or product features mined from the large volume of reviews. There are many other research trends that have been observed that include generating automatic text summaries (from large textbooks, online reviews) to provide text highlights, analysis of tweets to predict stocks, processing streaming data like, YouTube etc.
A very few research institutions noticed big data research, training, and analytics need early on, and very quickly adapted to it, and showed the importance of big data analytics. Since this is much desirable adaptation in the current academic curriculum but not all the higher education institutes are ready for this change. The challenge that most non-research higher education institutions are facing is the lack of research infrastructure that presents the bottleneck to adapt to big data curriculum. To that end, there are some cloud-based solutions for big data computing that can help develop these computing infrastructure either at a very small investment or are freely available for educators/researchers. The available resources like Cloudera, Hortonworks, Amazon EC2, Microsoft Azure, DataBricks, MongoDB, etc., provide access and extensive resources to adapt to this changing need to train young learners for future needs.
Big data analytics, big data mining, big data research trends
Dr. Maninder Singh is working as an assistant professor in the department of computer science & information technology in St. Cloud State University, St. Cloud, USA. Dr. Singh received his Ph.D. degree in computer science from North Dakota State University, USA in the year 2019. His research interests are multi-disciplinary and focuses on big data analytics, graph mining, natural language processing, and empirical software engineering. Dr. Singh has published over 20+ research articles in peer-reviewed conferences, journals, and book chapters at some prominent national and international venues. His Ph.D. work is focused on developing and validating effective methods and tools for improving and measuring the quality of software artifacts with the applications of supervised and unsupervised learning methods. His research interest also focuses on developing big data algorithms for graph processing using vertical processing of data to address scalability issues. Dr. Singh has taught various graduate and under-graduate courses including introduction to data mining, big data analytics, databases, object oriented programming, artificial intelligence and software quality.