5. Conclusions And Future WorksImpala is capable of handling vast amount of data and is more efficient than Hive.Pig is not suitable for this data setand is more suitable for complex queries.Impala is intended to handle real time adhoc queries to handle dataexploration and is well-suited to executing SQL queries for interactive exploratory analytics on large datasets.Perfomance of Impala scales with the number of hosts.However, this is tested on a low-cost hardware. Perfomance may change when better hardware is used for certainsoftware.Performance varies if the number of data nodes increases.This can be the next future work,by comparing each software perfomance in a better hardware environment and byincreasing the number of hosts.