Pandas is a fast, powerful and easy to use open source data analysis and manipulation tool which is built on top of the Python programming language. This is built on top of Numpy and also called as Numpy with labels.
Pandas DataFrame is mutable, heterogeneous tabular data structure with labeled axes (rows and columns). Pandas DataFrame consists of three principal components, the data,rows, and columns.
In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file or Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways here are some ways by which we create a dataframe.
DataFrame using list: DataFrame can be created using a single list or a list of lists.
import pandas as pd
# list of strings
lt = ['This', 'is', 'Pandas']
# Calling DataFrame constructor on list
df = pd.DataFrame(lt)
print(df)
Output:
0
0 This
1 is
2 Pandas
Steps to create DataFrame from dict:
To create DataFrame from dict of narray/list, all the narray must be of same length. If index is passed then the length index should be equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is the array length.
import pandas as pd
data = {'Name':['Tom', 'James', 'Ayden'],
'Age':[20, 22, 10]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
Output:
Name Age
0 Tom 20
1 James 22
2 Ayden 10
Steps to read csv file
import numpy as np
import pandas as pd
data= pd.read_csv('nobels.csv')
# display top 5 rows
data.head()
# display last 5 rows
data.tail()
# if you want to see top 20 rows
data.head(20)
# showcase name of all columns
data.columns