Overview of Python in Data Science

1. Data Structures

Similar to Strings, Python also supports below said (built-in) data structures:

  1. List
  2. Tuple
  3. Dictionary
  4. Set

1.1 List

Lists are similar to arrays and can be iterated via index values, starting from 0.

Its elements are enclosed within sqaure brackets, separated by commas and can be of any data-type.

These elements can be removed, added or changed through a number of functions.

In [1]:
li = [1, 1.5, "simran", True, print(), [1,2], ("a","b")]
for i in li:
    print(i, "\t",type(i))
1 	 <class 'int'>
1.5 	 <class 'float'>
simran 	 <class 'str'>
True 	 <class 'bool'>
None 	 <class 'NoneType'>
[1, 2] 	 <class 'list'>
('a', 'b') 	 <class 'tuple'>
In [2]:
print(li[0:-2:2])   #slicing in Lists : start, stop, jump
[1, 'simran', None]
In [3]:
li.append("oops!") #adding new element at the end
print(li)
li.pop() #removing last element
print(li)
li[4]="i replaced print" #update element at an index
print(li)
[1, 1.5, 'simran', True, None, [1, 2], ('a', 'b'), 'oops!']
[1, 1.5, 'simran', True, None, [1, 2], ('a', 'b')]
[1, 1.5, 'simran', True, 'i replaced print', [1, 2], ('a', 'b')]
In [4]:
new_li = [int(x) for x in input().split()] #comprehension : to input list elements as integers
for i in new_li:
    print(i, "\t",type(i))
1 	 <class 'int'>
3 	 <class 'int'>
6 	 <class 'int'>
2 	 <class 'int'>
4 	 <class 'int'>
9 	 <class 'int'>
6 	 <class 'int'>
7 	 <class 'int'>
666 	 <class 'int'>

1.2 Tuple

A tuple is an immutable list, implying its elements can be iterated but not modified.

Unlike lists, tuples are declared within round brackets.

But like lists, it can store heterogeneous elements.

In [5]:
tu = (1,1.5,"simran",False,len("hey there"),[1,2],("a","b"))
for i in tu:
    print(i,"\t",type(i))
1 	 <class 'int'>
1.5 	 <class 'float'>
simran 	 <class 'str'>
False 	 <class 'bool'>
9 	 <class 'int'>
[1, 2] 	 <class 'list'>
('a', 'b') 	 <class 'tuple'>

1.3 Dictionary

It marks the key-value pairs.

Individual elements can be accessed via keys but not through index values.

Thus the values can be modified by specifying the key.

A dictionary is declared within curly brackets with a colon between the key & its associated value. And each key-value pair is separated via commas. Again, the values can be heterogeneous but its keys can not.

In [6]:
d = {
    "name" : "simran",
    "age" : 22,
    "marks" : {
        "eng" : 95,
        "maths" : 81,
    },
    "some list" : [1,2,3]
}

print(d.keys())
print(d.values())
print(d.items())

d['name']="simran singh" #assigning new value to a key
print(d)
d['marks']['maths']=91 #altering the value of a nested dict
print(d)
dict_keys(['name', 'age', 'marks', 'some list'])
dict_values(['simran', 22, {'eng': 95, 'maths': 81}, [1, 2, 3]])
dict_items([('name', 'simran'), ('age', 22), ('marks', {'eng': 95, 'maths': 81}), ('some list', [1, 2, 3])])
{'name': 'simran singh', 'age': 22, 'marks': {'eng': 95, 'maths': 81}, 'some list': [1, 2, 3]}
{'name': 'simran singh', 'age': 22, 'marks': {'eng': 95, 'maths': 91}, 'some list': [1, 2, 3]}

1.4 Set

A set stores only the unique elements and the elements can not be accessed by specifying the index values.

The elements are sorted and can be added or removed. And can be of any data-type.

It is usually used for union or intersection problems.

It can not store set/list/tuple as elements like in case of lists or tuples.

In [7]:
s = {1,2.5,"simran",print(),3,4,5}
print(s)
{1, 2.5, 3, 4, 5, None, 'simran'}
In [8]:
s.pop()  #removes a RANDOM element
print(s)
s.pop() 
print(s)
s.pop() 
print(s)
{2.5, 3, 4, 5, None, 'simran'}
{3, 4, 5, None, 'simran'}
{4, 5, None, 'simran'}
In [9]:
s1 = {1,2,3,4,5}
s2 = {2,3,4}
print(s1.union(s2))
print(s1.intersection(s2))
print(s2.issubset(s1))
print(s1.issuperset(s2))
{1, 2, 3, 4, 5}
{2, 3, 4}
True
True

1.5 Use Cases

In [10]:
#to calculate character frequency in a given string
my_string = "hey there, wassup?"
freq = dict()
#creating a dictionary to keep count of each character in string
for char in my_string:
    if char in freq:
        freq[char] = freq[char] + 1 
        #we can modify the values by specifying the key  --> dict[key] = new value
    else:
        freq[char] = 1
        
print(freq)
{'h': 2, 'e': 3, 'y': 1, ' ': 2, 't': 1, 'r': 1, ',': 1, 'w': 1, 'a': 1, 's': 2, 'u': 1, 'p': 1, '?': 1}
In [11]:
#finding pairs of elements in a given list that add up to 10
some_list = [1,8,9,3,7,1]
target = 10

visited= set()
#to keep check of elements visited already
result = list()
#the list that would return combo pairs

for first_num in some_list:
    required_num = 10 - first_num
    
    if required_num in visited : 
        result.append([required_num, first_num]) 
        #append takes single argument, so pushing each pair as a list
        
    visited.add(first_num)

print(result)
[[1, 9], [3, 7], [9, 1]]
In [12]:
# we can typecast a dictionary into a list but not otherwise
list_new = list({
    "name" : "simran",
    "age" : 22
})
print(list_new)
['name', 'age']
In [13]:
# to typecast into a dictionary, we need to zip the individual lists 
car = ["swift","polo","innova"]
color = ["silver","blue","white"]
price = [5,7,12]

a_zipped_list = list(zip(car,color,price))
print(a_zipped_list)
[('swift', 'silver', 5), ('polo', 'blue', 7), ('innova', 'white', 12)]
In [14]:
# for dictionaries we need to pass only 2 parameters in the zip function i.e keys and the values
vehicle = dict(zip(car,color))
print(vehicle)
{'swift': 'silver', 'polo': 'blue', 'innova': 'white'}

2. Functions

With an objective to reduce repetition, function is piece of code that can be called (utilised) at multiple points within in the same program.

Syntax for creating user defined functions:

def my_func():
    pass

Just like in any other programming language, we may pass a variety of parameters but the general sequence is -

  1. Positional Arguments, Named Arguments
  2. Variable Arguments, Keyword(ed) Arguments

Default parameters must be declared after args.

def my_func(var_a, *args ,var_b = "100"):
    pass

Parameters vs Arguments - While defining a function we declare parameters and on calling the function we pass arguments.

In [15]:
def my_func(var_a,var_b,var_c):
    print(var_a,var_b,var_c,sep="\n")
In [16]:
my_func("I am A", var_c = "I am C", var_b="I am B")
I am A
I am B
I am C
In [17]:
def my_func2(var_d,*args):
    print(var_d,)
    print(*args)
In [18]:
my_func2(10,20,30.6,"hello")
10
20 30.6 hello
In [19]:
def last_func(a,b,c,*args,d="I am D"):
    print(a,b,c,*args,d,sep = "\n")
In [20]:
last_func("I am A","I am B","I am C","I am *args1","I am *args2",d="I am new D")
I am A
I am B
I am C
I am *args1
I am *args2
I am new D

3. Classes and Objects

Similar to functions, classes are fundamentals of Object Oriented Programming that promote code re-usability.

A class may be defined as a collection of similar objects, say a Car class, encompasess objects like Swift, Polo and Innova. And provides a general template for its objects, for ex - every object posseses certain number of attributes like seats, wheels etc.

class Car():
    pass

These objects within a class, hold some state and behaviour. For ex - a car's color and age can be mapped to its state and the behvaiour may include functions like start engine or turn on the AC.

swift=Car()
In [21]:
class Human():
    
    population=0
    
    def __init__(self, name, money=0):
        self.name=name
        self.money=money
        Human.population+=1
        
    def party(self):
        if(self.money>300):
            self.money-=300
        else:
            print("broke!")
            
    def borrow(self,friend,amount):
        if(friend.money>amount):
            friend.money-=amount
            self.money+=amount
            
    def __confidential(self):                  #encapsultion
        print("This is confidential.")
        
    def super_power(self):
        self.__confidential()
    
    @classmethod                               #singleton
    def thanos(cls):
        Human.population//=2
In [22]:
simran=Human("Simran Singh",200)
someone=Human("Mr. Someone",1000)
In [23]:
simran.party()
broke!
In [24]:
simran.borrow(someone,650)
simran.party()
simran.money
Out[24]:
550
In [25]:
someone.party()
someone.money
Out[25]:
50
In [26]:
simran.super_power()
This is confidential.
In [27]:
p1 = Human("Person 1")
p2 = Human("Person 2")
p3 = Human("Person 3")
p4 = Human("Person 4")
In [28]:
Human.population
Out[28]:
6
In [29]:
Human.thanos()
Human.population
Out[29]:
3
In [30]:
# inheritance
class Child(Human):
    
    # overriding
    def party(self):
        print("You are too young to party")

kid = Child("Mr. Kid")
kid.party()
You are too young to party
In [31]:
Human.population
Out[31]:
4

4. File Handling

Before performing read/write operations on a file, we use open() function to return a file object wherein we pass two arguments, the filename and the mode.

The modes available are:

  1. r : read
  2. w : write
  3. a : append

If the mode isn't specified, Python accepts it in the read mode by default.

In certain cases, we need to explicilty declare the type along with the mode : say wt for write-text and wb for write-binary.

By default the type is set to text but in scenarios like pictures or similar files, we need to specify the binary part.

r+ allows both read and write operations but r+ can be used only for existing files.

w+ apart from writing on to the file, allows to create a file if it is not present already.

While write overwrites anything stored within the file, append allows to add content at the end of the file.

Post write/append operation we need to call flush() to push away all the data in the stream on to the file. Also, when call we close() it automatically flushes the stream-data.

f = file.open("file_name.txt","w")
f.write("This will be written to file")
f.close()

A simpler syntax, which automatically closes the file is :

with open("file_name", "mode") as variable:
variable.function()
In [32]:
with open('sample.txt','w+') as f:
    f.write("Hey there!")
In [33]:
with open('sample.txt') as f:
    print(f.read())
Hey there!
In [34]:
with open('sample.txt',"a") as f:
    f.write("\nWassup")
In [35]:
with open('sample.txt') as f:
    print(f.read())
Hey there!
Wassup
In [ ]: