- Python Library
- Data Structure in Pandas – Series
- Creation of Series
- Series Attributes & Methods
- Accessing, Selecting, Slicing & Updating Series
- Series Mathematical Operation
- drop( ) and reindex( )
What are Python Libraries
Python libraries contain a collection of built-in modules that allow us to perform many actions without writing detailed programs for it. Each library in Python contains a large number of modules that one can import and use.
What is NumPy?
NumPy (Numerical Python), Pandas and Matplotlib are three well-established Python libraries for scientific and analytical use.
These libraries allow us to manipulate, transform and visualise data easily and efficiently.
NumPy uses a multidimensional array object and has functions and tools for working with these arrays. Elements of an array stay together in memory, hence, they can be quickly accessed.
What is PANDAS?
PANDAS (PANel DAta) is a high-level data manipulation tool used for analysing data. It is very easy to import and export data using the Pandas library which has a very rich set of functions. It gives us a single, convenient place to do most of our data analysis and visualisation work.
Pandas have three important data structures, namely – Series, DataFrame, and Panel to make the process of analyzing data organized, effective and efficient.
What is Matplotlib?
The Matplotlib library in Python is used for plotting graphs and visualization. Using Matplotlib, with just a few lines of code we can generate publication-quality plots, histograms, bar charts, scatterplots, etc.
Data Structure in Pandas
A data structure is a collection of data values and operations that can be applied to that data. It enables efficient storage, retrieval and modification to the data. For example, ndarray in NumPy Series & DataFrame in Pandas.
Series
A Series is a one-dimensional array containing a sequence of values of any data type (int, float, list, string, etc) and numeric data labels (by default) starting from zero.
The data label associated with a particular value is called its index. We can also assign values of other data types as index.
To create or use series, we first need to import the Pandas library. There are different ways in which a series can be created in Pandas.
Creating Empty Series Object
Use Series( ) method with no parameter, to create an empty series object.
seriesObject = pandas.Series( ) # Create a series object with default data type float64.
import pandas as pd
seriesObj1 = pd.Series( )
>>> seriesObj1
Series( [ ], dtype: float64
Creating Non-Empty Series Object
A non-empty Series object is creates by specifying the parameters for data and indexes-
seriesObject = pd.Series(data, index=indValue)
Where data is the data part of Series object, It can be –
(a) Python sequence,
(b) A Scalar Value,
(c) An ndarray,
(d) A Python Dictionary
Creation of Series from Python Sequence:-
A Series can be created using python sequence values.
#importing Pandas with an alias pd
import pandas as pd
#creating a Series
series1 = pd.Series([100,200,300])
#display the series
print(series1)
Output:
0 10
1 20
2 30
dtype: int64
#importing Pandas with an alias pd
import pandas as pd
#creating a Series with explicitly given index argument
series1 = pd.Series([100,200,300], index=[‘jan’, ‘feb’, ‘mar’])
#display the series
print(series1)
Output:
jan 10
feb 20
mar 30
dtype: int64
Output is shown in two columns –
the index is on the left and the data value is on the right.
If we do not explicitly specify an index for the data values while creating a series, then by default indices range from 0 through N – 1. Here N is the number of data elements.
Creation of Series from Scalar Value :-
A Series can be created using scalar values.
The data given to Series( ) may be a single value i.e. scalar value, but in this case, the index must be given. The index can contain one or more values. The scalar value (given as data) will be repeated to match the length of the index. The index can be any type of sequence, numbers or labels, etc.
seriesObject = pd.Series( scalarValue, index=[ ])
import pandas as pd # importing Pandas with an alias pd
s = pd.Series(100) # creating Series
print(s) # display the series
Output:
0 100
dtype: int64
s = pd.Series(100, index=['a', 'b', 'c'])
print(s)
Output:
a 100
b 100
c 100
dtype: int64
>>score = pd.Series(12, index=range(4)) >>> score Output 0 12 1 12 2 12 3 12 dtype: int64
>>> attendance = pd.Series('Present', index= ['Amrit', 'Tanmay']) >>> attendance Output: Amrit Present Tanmay Present dtype: object
>>> abs = pd.Series("Absent", index=[1,5,8,9]) >>> abs Output: 1 Absent 5 Absent 8 Absent 9 Absent dtype: object
Creation of Series from NumPy Arrays:-
A series can be created from a one-dimensional (1D) NumPy array.
Example:-
import numpy as np # import NumPy with alias np
import pandas as pd
array1 = np.array([1,2,3,4])
series3 = pd.Series(array1)
print(series3)
Output:
0 1
1 2
2 3
3 4
dtype: int32
import numpy as np # import NumPy with alias np
import pandas as pd
array1 = np.array([1,2,3,4])
series3 = pd.Series(array1)
print(series3)
Output:
0 1
1 2
2 3
3 4
dtype: int32
Note : When index labels are passed with the array, then the length of the index and array must be of the same size, else it will result in a ValueError.
>>> series5 = pd.Series(array1, index = [“Jan”, “Feb”, “Mar”])
ValueError: Length of passed values is 4, index implies 3
Creation of Series from Dictionary: –
A Series can be created by using Dictionary. Dictionary keys can be used to construct an index for a Series, as shown in the following example. Here, keys of the dictionary dict1 become indices in the series.
>>> dict1 = {‘India’: ‘NewDelhi’, ‘UK’: ‘London’, ‘Japan’: ‘Tokyo’}
>>> print(dict1) #Display the dictionary
{‘India’: ‘NewDelhi’, ‘UK’: ‘London’, ‘Japan’:’Tokyo’}
>>> series8 = pd.Series(dict1)
>>> print(series8) #Display the series
India NewDelhi
UK London
Japan Tokyo
dtype: object
Creating Series Objects – Using Additional Features
Specifying / Adding NaN values in a Series Object
In case of you don’t have complete data and you want to create a series object with some missing data. That missing data can be replaced with legal empty value NaN (Not a Number). NaN is defined in the NumPy module.
>>> sc = pd.Series([12.5, np.NaN, 18.75, np.NaN, 25.0] ) >>> sc Output: 0 12.50 1 NaN 2 18.75 3 NaN 4 25.00
Marks = [25, 35, 15, 40, 36]
Name = [“Amit”, “Sonal”, “Mohit”, “Ramesh”, “Pragya”]
>>> scObj = pd.Series( data= Marks, index = Name )
score1 = pd.Series(Marks, index=Name)
Index as loop:
score2 = pd.Series(Marks, index= [n for n in Name] )
score3 = pd.Series(Marks, index= [x for x in ‘pqrst’] )
Using a mathematical function / expression to create data array in Series
Pandas Series( ) method allow to create a series object with the help of mathematical expression or function which calculate values for data sequence.
Marks = [25, 35, 15] scObj = pd.Series( data= Marks *2) print(scObj) Output: 0 25 1 35 2 15 3 25 4 35 5 15 dtype: int32
arr = np.array([25, 35, 15]) scObj = pd.Series( data=arr *2) print(scObj) Output: 0 50 1 70 2 30 dtype: int32
Note: Numpy array , arr*2, uses the vectorised operations i.e. this operation applied on the all elements.
Class 12 Informatics Practices (065) Notes
- Web Browsers – Intro and Its Settings
- Website – Introduction
- Network Topologies – Bus Ring Star Tree Mesh Hybrid
- Network and types of computer network
- Computer Network Devices – Modem Repeater Hub Switch Bridge Router Gateway
- Data Handling Using Pandas – I
- Data Handling Using Pandas – II
- Plotting Data using Pyplot of Matplotlib
- Import and Export Data between CSV/MySQL and Pandas
- MySQL SQL Revision Tour
- Database Querying using SQL – Functions, Join and Set Operation
- Introduction to Computer Network and types of network
- Introduction to Internet and Web
- Societal Impacts