Top 50 Python Interview Questions for Data Analysts (2025)
![]() |
Python Interview Questions |
1. What is Python and why is it popular for data analysis?
Python is a high-level, interpreted programming language known for simplicity and readability. It’s popular in data analysis due to its rich ecosystem of libraries like Pandas, NumPy, and Matplotlib that simplify data manipulation, analysis, and visualization.
2. Differentiate between lists, tuples, and sets in Python.
⦁ List: Mutable, ordered, allows duplicates.
⦁ Tuple: Immutable, ordered, allows duplicates.
⦁ Set: Mutable, unordered, no duplicates.
3. How do you handle missing data in a dataset?
Common methods: removing rows/columns with missing values, filling with mean/median/mode, or using interpolation. Libraries like Pandas provide .dropna(), .fillna()
functions to do this easily.
4. What are list comprehensions and how are they useful?
Concise syntax to create lists from iterables using a single readable line, often replacing loops for cleaner and faster code.
Example: [x**2 for x in range(5)] → ``
5. Explain Pandas DataFrame and Series.
⦁ Series: 1D labeled array, like a column.
⦁ DataFrame: 2D labeled data structure with rows and columns, like a spreadsheet.
6. How do you read data from different file formats (CSV, Excel, JSON) in Python?
Using Pandas:
⦁ CSV: pd.read_csv('file.csv')
⦁ Excel: pd.read_excel('file.xlsx')
⦁ JSON: pd.read_json('file.json')
7. What is the difference between Python’s append()
and extend()
methods?
⦁ append()
adds its argument as a single element to the end of a list.
⦁ extend()
iterates over its argument adding each element to the list.
8. How do you filter rows in a Pandas DataFrame?
Using boolean indexing:
df[df['column'] > value]
filters rows where ‘column’ is greater than value.
9. Explain the use of groupby()
in Pandas with an example.
groupby()
splits data into groups based on column(s), then you can apply aggregation.
Example: df.groupby('category')['sales'].sum()
gives total sales per category.
10. What are lambda functions and how are they used?
Anonymous, inline functions defined with lambda
keyword. Used for quick, throwaway functions without formally defining with def
.
Example: df['new'] = df['col'].apply(lambda x: x*2)
11. How do you merge or join two DataFrames?
Use pd.merge(df1, df2, on='key_column', how='inner')
with options:
⦁ how='inner' (default) for intersection,
⦁ left, right, or outer for other joins.
12. What is the difference between .loc[]
and .iloc[]
in Pandas?
⦁ .loc[]
selects data by label (index names).
⦁ .iloc[]
selects data by integer position (0-based).
13. How do you handle duplicates in a DataFrame?
Use df.duplicated()
to find duplicates and df.drop_duplicates()
to remove them.
14. Explain how to deal with outliers in data?
Detect outliers using statistical methods like IQR or Z-score, then either remove, cap, or transform them depending on context.
15. What is data normalization and how can it be done in Python?
Scaling data to a standard range (e.g., 0 to 1). Can be done using sklearn’s MinMaxScaler or manually using (x - min) / (max - min).
16. Describe different data types in Python.
Common types: int, float, str, bool, list, tuple, dict, set, NoneType.
17. How do you convert data types in Pandas?
Use df['col'].astype(new_type)
to convert columns, e.g., astype('int')
or astype('category')
.
18. What are Python dictionaries and how are they useful?
Unordered collections of key-value pairs useful for fast lookups, mapping, and structured data storage.
19. How do you write efficient loops in Python?
Use list comprehensions, generator expressions, and built-in functions instead of traditional loops, or leverage libraries like NumPy for vectorization.
20. Explain error handling in Python with try-except.
Wrap code that might cause errors in try: block and handle exceptions in except: blocks to prevent crashes and manage errors gracefully.
21. How do you perform basic statistical operations in Python?
Use libraries like NumPy (np.mean(), np.median(), np.std())
and Pandas (df.describe())
for statistics like mean, median, variance, etc.
22. What libraries do you use for data visualization?
Common ones are Matplotlib, Seaborn, Plotly, and sometimes Bokeh for interactive plots.
23. How do you create plots using Matplotlib or Seaborn?
In Matplotlib:import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()
In Seaborn:
import seaborn as sns
sns.barplot(x='col1', y='col2', data=df)
24. What is the difference between .apply()
and .map()
in Pandas?
⦁ .apply()
can work on entire Series or DataFrames and accepts functions.
⦁ .map()
maps values in a Series based on a dict, Series, or function.
25. How do you export Pandas DataFrames to CSV or Excel files?
Use df.to_csv('file.csv')
or df.to_excel('file.xlsx')
26. What is the difference between Python’s range()
and xrange()
?
In Python 2, range()
returns a list, xrange()
returns an iterator for better memory usage. In Python 3, range()
behaves like xrange()
.
27. How can you profile and optimize Python code?
Use modules like cProfile, timeit, or line profilers to find bottlenecks, then optimize with better algorithms or vectorization.
28. What are Python decorators and give a simple example?
Functions that modify other functions without changing their code.
Example:
def decorator(func):
def wrapper():
print("Before")
func()
print("After")
return wrapper
@decorator
def say_hello():
print("Hello")
29. How do you handle dates and times in Python?
Use datetime module and libraries like pandas.to_datetime()
or dateutil
to parse, manipulate, and format dates.
30. Explain list slicing in Python.
Get sublists using syntax list[start:stop:step]
. Example: lst[1:5:2] picks items from index 1 to 4 skipping every other.
31. What are the differences between Python 2 and Python 3?
Python 3 introduced many improvements: print is a function (print())
, better Unicode support, integer division changes, and removed deprecated features. Python 2 is now end-of-life.
32. How do you use regular expressions in Python?
With the re
module, e.g., re.search(), re.findall()
. They help match, search, or replace patterns in strings.
33. What is the purpose of the with
statement?
Manages resources like file opening/closing automatically ensuring cleanup, e.g.,
with open('file.txt') as f:
data = f.read()
34. Explain how to use virtual environments.
Isolate project dependencies using venv
or virtualenv
to avoid conflicts between package versions across projects.
35. How do you connect Python with SQL databases?
Using libraries like sqlite3
, SQLAlchemy
, or pymysql
to execute SQL queries and fetch results into Python.
36. What is the role of the __init__.py
file?
Marks a directory as a Python package and can initialize package-level code.
37. How do you handle JSON data in Python?
Use json
module: json.load()
to parse JSON files and json.dumps()
to serialize Python objects to JSON.
38. What are generator functions and why use them?
Functions that yield values one at a time using yield
, saving memory by lazy evaluation, ideal for large datasets.
39. How do you perform feature engineering with Python?
Create or transform variables using Pandas (e.g., creating dummy variables, extracting date parts), normalization, or combining features.
40. What is the purpose of the Pandas .pivot_table()
method?
Creates spreadsheet-style pivot tables for summarizing data, allowing aggregation by multiple indices.
41. How do you handle categorical data?
Use encoding techniques like one-hot encoding (pd.get_dummies())
, label encoding, or ordinal encoding to convert categories into numeric values.
42. Explain the difference between deep copy and shallow copy.
⦁ Shallow copy copies an object but references nested objects.
⦁ Deep copy copies everything recursively, creating independent objects.
43. What is the use of the enumerate()
function?
Adds a counter to an iterable, yielding pairs (index, value)
great for loops when you need the item index as well.
44. How do you detect and handle multicollinearity?
Use correlation matrix or Variance Inflation Factor (VIF). Handle by removing or combining correlated features.
45. How can you improve Python script performance?
Use efficient data structures, built-in functions, vectorized operations with NumPy/Pandas, and profile code to identify bottlenecks.
46. What are Python’s built-in data structures?
List, Tuple, Set, Dictionary, String.
47. How do you automate repetitive data tasks with Python?
Write scripts or use task schedulers (like cron/Windows Task Scheduler) with libraries such as pandas
, openpyxl
, and automation tools.
48. Explain the use of Assertions in Python.
Used for debugging by asserting conditions that must be true, raising errors if violated:
assert x > 0, "x must be positive"
49. How do you write unit tests in Python?
Use unittest
or pytest
frameworks to write test functions/classes that verify code behavior automatically.
50. How do you handle large datasets in Python?
Use chunking with Pandas read_csv(chunk_size=…)
, Dask for parallel computing, or databases to process data in parts rather than all at once.
Learn more about Python. Click Here