Big Data Testing
“Enterprise data will grow 650% in the next five years. Also, through 2015, 85% of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage.” – Gartner (mid-2015)
With growth and complexity of the data, comes challenges in putting the data to use and perform effective decision making. The ‘data’ that we are talking about here could be anything from a simple 3-5 word of tweet (in KB) to Photos /videos uploaded on social sites (in MB), a full-length movie on YouTube or other sites (in GB).
Now, think beyond and we have Terabyte (2 years of nonstop listening to MP3 files forms approx. 1 TB) or Petabyte (think about 100 years of television forms a PB, the total photos in Flickr site by 2011 formed 60 PB).
Challenges in data maintenance –
Challenges in testing big data –
Gone are the days of Gigabytes (1 GB = 1024 MB). Big data landscape has already seen Terabytes (1 TB = 1024 GB) and even Petabytes (1 PB = 1024 TB).
Such huge volume of data should be audited for its fitment for business purpose. Preparing test cases in this scenario has always been a challenge.
With the shortage of expertise, organizations may need to invest in training and develop automated solutions for big data. Moreover, it requires a mindset shift for testing units within an organization where testers will now have to be on par with developers in leveraging big data technologies.
Example (1). Let’s consider we need to generate a report from Twitter on a topic that will capture the Emotions of people on percentage. Sounds weird? Yes, understanding the ‘emotion’ factor from the data available is the challenge.
Example (2). We have websites that help us search for similar sounding songs. Just imagine – the song metadata need to be compared with millions of songs from the database and the results need to be displayed in seconds.
Big data is more than just size. Its significance lies in 4 V’s – Volume (magnitude), Velocity ( the distributedrate at which data is generated /transported), Variety (type of data) & Veracity (accuracy and quality).
So, what are the key aspects that we need to focus on while dealing with big data testing?