String in Python
This isn’t the first time that we are encountering Strings since we have started learning python. In many of the previous tutorials we have used strings in examples or discussed about it, so it shouldn’t be an ambush for you. Nonetheless, this chapter will give you more insight about how they can be used, manipulated and implemented in python world. We will also checkout some handy string functions to manipulate string. So, without wasting time let’s jump right into it.
What is a String?
String can be defined as a sequence of characters, and that’s the most basic explanation of string that you can provide. In this definition, we can see two important terms, first being sequence and other is characters. If you are here after finishing the last tutorial, then there, we already explained – What is Sequence data type and how Strings are a type of sequence. Just for revision, in python, Sequence is a data type which is made up of several elements of same type, i.e., integers, float, characters, strings etc.
Note: There is a unique code provided to all existing characters. The coding convention had been labelled as Unicode format. It consists of characters of almost every possible languages and in fact emoticons too (yes, emoticons had been declared as characters too).
Hence, strings can be considered as a special type of sequence, where all its elements are characters. For example, string
"Hello, World" is basically a sequence [‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘,’, ‘ ‘, ‘W’, ‘o’, ‘r’, ‘l’, ‘d’] and its length can be calculated by counting number of characters inside the sequence, which is
Note: Yes, space, comma everything inside those quotes will be a character if the length is 1.
Generally in programming languages there is a different data type dedicated to characters only, while in Python, there is no character data type. Instead characters are just treated as a string of length 1.
Declaration of Strings
>>> mystring = "This is not my first String" >>> print (mystring); This is not my first String
You can access each individual character of a string too. Just like accessing each element of a Sequence, we can use index numbers for this purpose. To access first character of
mystring, we can do following:
>>> print (mystring); T
T is the first character of our string
This is not my first String, hence it will have index number as
0 (zero). Similarly, for further characters we can use index 1, 2, 3 and so on, i.e., in order to access ith element we will have to use (i-1)th index.
There is another trick to access elements of the sequence from its end. For example, if you want to access the last element of the sequence just do the following:
>>> print (mystring[-1])
-1 in the index will imply that you are asking for the 1st element from the last. Similarly, in order to access 2nd last element use
-2 as index, for 3rd last use
-3 and so on, i.e., for ith element from the last use
-ith as the index. So that settles the generalization for accessing each character from both forward and backward side in a string. Note that positive index number implies you are accessing character from the forward side, while negative index number means you’re accessing it from the rear end.
We can conclude the what we have covered till now in a simple table. Consider a string
PYTHON. For this each character can be accessed in two ways – from the front, or from the rear end.
Suppose you want a string to store a quote by Mahatma Gandhi.
“You must be the change you wish to see in the world” – Gandhi
This is the exact line you want to display in the console. And you also wish to have the quotes surrounding this sentence. As you go ahead and print the statement, you will see that it isn’t that simple.
Python will instantly return a syntax error. This is because of those extra double quotes that we added. In above image you can notice that Gandhi’s quoted text is in black colour, while “- Gandhi” is in green. Also, if you have used IDLE enough you might know that all the characters inside the string are highlighted in green in the IDLE (it can be other colours too depending upon text editor, python version, OS etc). This clearly means that Python isn’t treating You must be the change you wish to see in the world part of the sentence as a string. Therefore, this concludes that whenever we open a quote and close it, to declare a string, whatever we write after the closing quote, is just considered as some python keyword.
Like for the above quotation text, we started the string with two double quotes and wrote You must be the change you wish to see in the world just next to it, since double quote was already closed before this phrase, hence Python considered the entire sentence as some non-understandable python keywords. After the phrase, another double quote started, then came – Gandhi after that and finally the closing double quote, since – Gandhi part is within a pair of double quotes hence its totally legitimate.
Now you understand the problem that we can face if we use uneven number of double quotes. Now let’s see how we can actually have a quote in a string. Well, there are two ways to do so:
- First one is a bit compromising. You can use single quotes inside of double quotes, like:
>>> print ("'You must be the change you wish to see
in the world' - Gandhi"); ‘You must be the change you wish to see in the world'
Hence, it’s legitimate to use single quote inside double quotes, however, reverse is not true, i.e.,
>>> '"You must be the change you wish to see in the
world" - Gandhi'
Will give an error.
- Second one is for those who hate to compromise, or just want to use the double quotes. For you people, there is something called escape sequence or literally speaking, a back-slash
. You can use it like:
>>> print (""You must be the change you wish to
see in the world" – Gandhi");
Can you guess what happened? We used backslash or escape sequence at two places, just before the quotes which we directly want to print. If you want to inform the compiler to simply print whatever you type and not try to compile it, just add an escape sequence before it. Also remember, you must use one escape sequence for one character. For example, in order to print 5 double quotes, we will have to use 5 backslashes, one before each quote, like this:
>>> print (""""""");
Input and Output for String
Input and Output methods have already been discussed in Input and Output tutorial in details. It is recommended to go through that tutorial, if you haven’t already.
Operations on String
String handling in python probably requires least efforts. Since in python, string operations have very low complexity compared to other languages. Let’s see how we can play around with strings.
- Concatenation: No, wait! what? This word may sound a bit complex for absolute beginners but all it means is – to join two strings. Like to join
"World", to make it
"HelloWorld". Yes, that’s it.
>>> print ("Hello" + "World"); HelloWorld
Yes. A plus sign
+ is enought to do the trick. When used with strings, the
+ sign joins the two strings. Let’s have one more example:
>>> s1 = "Name Python " >>> s2 = "had been adapted " >>> s3 = "from Monty Python" >>> print (s1 + s2 + s3) Name Python had been adapted from Monty Python
- Repetition: Suppose we want to write same text multiple times on console. Like repeat “Hi!” a 100 times. Now one option is to write it all manually, like “Hi!Hi!Hi!…” hundred times or just do the following:
>>> print ("Hi!"*100)
Suppose, you want the user to input some number
n and based on that you want a text to be printed on console
n times, how can you do it? It’s simple. Just create a variable
n and use
input() function to get a number from the user and then just multiply the text with
>>> n = input("Number of times you want the text
to repeat: ") Number of times you want the text to repeat: 5 >>> print ("Text"*n); TextTextTextTextText
- Check existence of a character or a sub-string in a string: The keyword
inis used for this. For example: If there is a text India won the match and you want to check if won exist in it or not. Go to IDLE and try the following:
>>> "won" in "India won the match" True
Amongst other datatypes in python, there is Boolean datatype which can have one of the possible two values, i.e., either
false. Since we are checking if something exists in a string or not, hence, the possible outcomes to this will either be Yes, it exists or No, it doesn’t, therefore either True or False is returned. This should also give you an idea about where to use Boolean datatype while writing programs.
not inkeyword: This is just the opposite of the
inkeyword. You’re pretty smart if you guessed that right. Its implementation is also pretty similar to the
>>> "won" not in "India won the match" False
You can see all the above String operations live in action, by clicking on the below Live example button. Also, we suggest you to practice using the live compiler and try changing the code and run it.
Converting String to Int or Float datatype and vice versa
This is a very common doubt amongst beginners as a number when enclosed in quotes becomes a string in python and then if you will try to perform mathematical operations on it, you will get error.
numStr = '123'
In the statement above
123 is not a number, but a string.
Hence, in such situation, to convert a numeric string into
int datatype, we can use
numStr = '123' numFloat = float(numStr) numInt = int(numFloat)
And then you can easily perform mathematical functions on the numeric value.
Similarly, to convert an
float variable to
string, we can use the
num = 123 # so simple numStr = str(num)
Slicing is yet another string operation. Slicing lets you extract a part of any string based on a start index and an end index. For example, if we have a string This is Python tutorial and we want to extract a part of this string or just a character, then we can use slicing. First lets get familiar with its usage syntax:
string_name[starting_index : finishing_index :
- String_name is the name of the variable holding the string.
- starting_index is the index of the beginning character which you want in your sub-string.
- finishing_index is one more than the index of the last character that you want in your substring.
- character_iterate: To understand this, let us consider that we have a string
Hello Brother!, and we want to use the slicing operation on this string to extract a sub-string. This is our code:
>>> str = "Hello Brother!" >>> print(str[0:10:2]);
str[0:10:2]means, we want to extract a substring starting from the index
0(beginning of the string), to the index value
10, and the last parameter means, that we want every second character, starting from the starting index. Hence in the output we will get, HloBo.
His at index
0, then leaving
e, the second character from
Hwill be printed, which is
l, then skipping the second
l, the second character from the first
lis printed, which is
oand so on.
It will be more clear with a few more examples:
Let’s take a string with 10 characters,
ABCDEFGHIJ. The index number will begin from
0 and end at
Now try the following command:
>>> print s[0:5:1]
Here slicing will be done from 0th character to the 4th character (5-1) by iterating
1 character in each jump.
Now, remove the last number and the
colon and just write this.
>>> print (s[0:5]);
You’ll see that output are both same.
You can practice by changing the values. Also try changing the value of the character iterate variable to some value
n, then it will print every
nth character from starting index to the final index.