What is Data?
Big data Everything is data in this world. Have you ever wondered how Facebook accurately pops up a reminder to wish your connections on their birthday? Or how Google could store humongous data and how its web index match queries with accurate results? Well, that’s where big data comes into play.
What is Big Data?
Big data refers to a large amount of data from various sources in different formats like audio files, video files, text files, etc. The traditional data processing system is incapable to deal with the huge amount of unstructured data. In other words,
Big data usually means a huge amount of data that cannot be stored and processed using the traditional approach within a given time frame. It is an important asset that can be used to obtain innumerable benefits.
In this Article, you will learn,
Basically, big data is classified into three types –
- Structured,
- Unstructured
- Semi-structured.
- Structured Data – It refers to the data which is already stored in a database in an ordered manner. It accounts for about 20% of the total existing data.
Example: MS Access, Excel database, SQL Server, Data Warehouse.
- Unstructured Data – It is just the opposite of structured data and has no clear data and storage. It accounts for 80% of the total existing data. Until recently there is not much to do about unstructured data except storing it and analyzing it.
Example: Social media like, comments, tweets, followers, audios, videos, geo-specials.
- Semi-structured Data – Semi-structured data consists of information that is not in the traditional database format but contains some organization of properties that make it easy to process.
Example: Log files, XML data, web searches, sensor data.
Big Data Characteristics –
- The 5 V’s There is no place where big data does not exist and it has been sky rocketing in the past few years.
- It is estimated that 80% of the total existing data is formed in the previous 2 years. These big data’s characteristics can be defined by the 5 V’s and they are inter connected with each other.
- Volume – Volume refers to generating Terabytes, Petabytes, Zettabytes, Petabytes and Exabytes of data from the system or the internet which generally needs large storage.
- We are currently using distributed systems, to store data in several locations bring them together by a software Framework like Hadoop.
- Velocity – Velocity refers to the speed at which the data is getting generated and processed.
- Variety – Big Data is generated in multiple varieties and the latest trend of data is in the form of photos, videos, audios, gifs and many more, making about 80% of the total data be completely unstructured.
- Variety generally refers to the structured, unstructured, and semi structured data that is gathered from multiple sources.
- Veracity – Veracity means the degree of reliability offered by the data.
- Since a huge amount of data is unstructured, big data need to find an alternate way to filter them efficiently and translate them as data crucial for usage.
- Value – Value is not just about the amount of data being stored or processed.
- It is the amount of valuable, reliable and relevant data that needs to be stored, processed, analyzed to find insights.