Data Handling Using Pandas – I

What are Python Libraries

Python libraries contain a collection of built-in modules that allow us to perform many actions without writing detailed programs for it. Each library in Python contains a large number of modules that one can import and use.

What is NumPy?

NumPy (Numerical Python), Pandas and Matplotlib are three well-established Python libraries for scientific and analytical use.

These libraries allow us to manipulate, transform and visualise data easily and efficiently.

NumPy uses a multidimensional array object and has functions and tools for working with these arrays. Elements of an array stay together in memory, hence, they can be quickly accessed.

What is PANDAS?

PANDAS (PANel DAta) is a high-level data manipulation tool used for analysing data. It is very easy to import and export data using the Pandas library which has a very rich set of functions. It gives us a single, convenient place to do most of our data analysis and visualisation work.

Pandas have three important data structures, namely – Series, DataFrame, and Panel to make the process of analyzing data organized, effective and efficient.

What is Matplotlib?

The Matplotlib library in Python is used for plotting graphs and visualization. Using Matplotlib, with just a few lines of code we can generate publication-quality plots, histograms, bar charts, scatterplots, etc.

Data Structure in Pandas

A data structure is a collection of data values and operations that can be applied to that data. It enables efficient storage, retrieval and modification to the data. For example, ndarray in NumPy  Series & DataFrame in Pandas.

Series

A Series is a one-dimensional array containing a sequence of values of any data type (int, float, list, string, etc) and numeric data labels (by default) starting from zero.

The data label associated with a particular value is called its index. We can also assign values of other data types as index.

Index      Value
0          Armit
1          Pooja
2          Rama
3          Tanmay
4          Lakshit

Creating Series Objects

A Series type object can be created in many ways by using Series( ) method of panda’s library.

To create or use series, we first need to import the Pandas library. There are different ways in which a series can be created in Pandas.

Creating Empty Series Object

Use Series( ) method with no parameter, to create an empty series object.

seriesObject = pandas.Series( )                 # Create a series object with default data type float64.

import pandas as pd

seriesObj1 = pd.Series( )

>>> seriesObj1

Series( [ ], dtype: float64

Creating Non-Empty Series Object

A non-empty Series object is creates by specifying the parameters for data and indexes-

seriesObject = pd.Series(data, index=indValue)

Where data is the data part of Series object, It can be –

(a) Python sequence,

(b) A Scalar Value,

(c) An ndarray,

(d) A Python Dictionary 

Creation of Series from Python Sequence:-

A Series can be created using python sequence values.

#importing Pandas with an alias pd
import pandas as pd                                      

#creating a Series
series1 = pd.Series([100,200,300])                       

#display the series
print(series1)                                            

Output:

0          10

1          20

2          30

dtype: int64

#importing Pandas with an alias pd
import pandas as pd                     

#creating a Series with explicitly given index argument
series1 = pd.Series([100,200,300], index=[‘jan’, ‘feb’, ‘mar’])           
#display the series
print(series1)                                  

Output:

jan          10

feb          20

mar          30

dtype: int64

Output is shown in two columns –

the index is on the left and the data value is on the right.

If we do not explicitly specify an index for the data values while creating a series, then by default indices range from 0 through N – 1. Here N is the number of data elements.

Creation of Series from Scalar Value :-

A Series can be created using scalar values.

The data given to Series( ) may be a single value i.e. scalar value, but in this case, the index must be given. The index can contain one or more values. The scalar value (given as data) will be repeated to match the length of the index. The index can be any type of sequence, numbers or labels, etc.                

seriesObject = pd.Series( scalarValue, index=[ ])

 import pandas as pd            # importing Pandas with an alias pd
 s = pd.Series(100)             # creating Series
 print(s)                       # display the series 

Output:

0          100                

dtype: int64  

 s = pd.Series(100, index=['a', 'b', 'c'])
 print(s) 

Output:

a          100

b         100

c          100

dtype: int64

 >>score = pd.Series(12, index=range(4))
 >>> score
Output
 0    12
 1    12
 2    12
 3    12
dtype: int64 
 >>> attendance = pd.Series('Present', index= ['Amrit', 'Tanmay'])
 >>> attendance
Output: 
Amrit       Present
Tanmay    Present
dtype: object 
 >>> abs = pd.Series("Absent", index=[1,5,8,9])
 >>> abs
 Output:
 1    Absent
 5    Absent
 8    Absent
 9    Absent
dtype: object 

Creation of Series from NumPy Arrays:-

A series can be created from a one-dimensional (1D) NumPy array.

Example:-

 import numpy as np                  # import NumPy with alias np
 import pandas as pd
 array1 = np.array([1,2,3,4])
 series3 = pd.Series(array1)
 print(series3) 

Output:

0          1

1          2

2          3

3          4

dtype: int32

 import numpy as np            # import NumPy with alias np
 import pandas as pd
 array1 = np.array([1,2,3,4])
 series3 = pd.Series(array1)
 print(series3) 

Output:

0          1

1          2

2          3

3          4

dtype: int32

Note : When index labels are passed with the array, then the length of the index and array must be of the same size, else it will result in a ValueError.

>>> series5 = pd.Series(array1, index = [“Jan”, “Feb”, “Mar”])

ValueError: Length of passed values is 4, index implies 3

Creation of Series from Dictionary: – 

A Series can be created by using Dictionary. Dictionary keys can be used to construct an index for a Series, as shown in the following example. Here, keys of the dictionary dict1 become indices in the series.

>>> dict1 = {‘India’: ‘NewDelhi’, ‘UK’: ‘London’, ‘Japan’: ‘Tokyo’}

>>> print(dict1)                               #Display the dictionary

{‘India’: ‘NewDelhi’, ‘UK’: ‘London’, ‘Japan’:’Tokyo’}

>>> series8 = pd.Series(dict1)

>>> print(series8)                            #Display the series

India               NewDelhi

UK                  London

Japan             Tokyo

dtype: object

Creating Series Objects – Using Additional Features

Specifying / Adding NaN values in a Series Object

In case of you don’t have complete data and you want to create a series object with some missing data. That missing data can be replaced with legal empty value NaN (Not a Number). NaN is defined in the NumPy module.

 >>> sc = pd.Series([12.5, np.NaN, 18.75, np.NaN, 25.0] )
 >>> sc
 Output:
 0           12.50
 1           NaN
 2           18.75
 3           NaN
 4           25.00 

Marks = [25, 35, 15, 40, 36]

Name = [“Amit”, “Sonal”, “Mohit”, “Ramesh”, “Pragya”]

>>> scObj = pd.Series( data= Marks, index = Name )

score1 = pd.Series(Marks, index=Name)

Index as loop:

score2 = pd.Series(Marks, index= [n  for n in Name] )

score3 = pd.Series(Marks, index= [x  for x in ‘pqrst’] )

Using a mathematical function / expression to create data array in Series

Pandas Series( ) method allow to create a series object with the help of mathematical expression or function which calculate values for data sequence.

 Marks = [25, 35, 15]
 scObj = pd.Series( data= Marks *2)
 print(scObj)
Output:
 0          25
 1          35
 2          15
 3          25
 4          35
 5          15 
 dtype: int32 
 arr = np.array([25, 35, 15])
 scObj = pd.Series( data=arr *2)
 print(scObj)
Output:
 0          50
 1          70
 2          30 
 dtype: int32 

Note: Numpy array , arr*2, uses the vectorised operations i.e. this operation applied on the all elements.


Class 12 Informatics Practices (065) Notes


Leave a Comment

You cannot copy content of this page

Scroll to Top