Introduction to Data and Data Analysis Concepts - 22.5%
|
Define and Classify Data |
- Define data and explain how it becomes meaningful
-
Define data and explain its role in decision-making, business, and everyday life.
-
Distinguish between data, information, and knowledge, and describe how raw data gains meaning through processing and interpretation.
-
Describe how raw data is processed into usable insights for decision-making.
- Classify data by type and format
-
Identify and classify data as quantitative or qualitative.
-
Differentiate structured, semi-structured, and unstructured data using real-world examples.
|
Describe Data Sources, Collection Methods, and Storage |
- Identify data sources and collection methods
-
Identify and describe various data sources, including APIs, web pages, databases, IoT devices, surveys, and logs.
-
Explain common data collection methods such as surveys, interviews, observations, automated systems, and web scraping.
-
Discuss the role of representative sampling and the implications of biased or incomplete data.
-
Compare advantages and limitations of different data collection techniques for qualitative and quantitative research.
- Explain how data is stored and organized
-
Describe data formats (CSV, JSON, Excel, databases) and storage systems (data lakes, warehouses, relational databases).
-
Explain the role of metadata and compare storage solutions based on the type, structure, and purpose of the data.
-
Evaluate the suitability of different storage options based on data structure, scale, and use case.
|
Explain the Data Lifecycle and Its Management |
- Describe the data lifecycle
-
List and explain the stages of the data lifecycle: collection, storage, processing, analysis, visualization/reporting, archiving, and deletion.
-
Explain how errors or issues at any stage (e.g., missing, inaccurate, or poorly stored data) can influence final results and decision-making.
-
Identify tools and techniques associated with each stage of the lifecycle.
- Discuss the value and challenges of lifecycle management
-
Explain the importance of managing data throughout its lifecycle for ensuring quality, security, and compliance.
-
Describe challenges in managing large-scale data and strategies to address them (e.g., cloud storage, data pipelines).
|
Understand the Scope of Data Science, Analytics, and Analysis |
- Differentiate between Data Analysis, Data Analytics, and Data Science
-
Define data analysis, data analytics, and data science, and explain how they relate to each other.
-
Compare the scope, tools, and goals of each field using real-world examples.
-
Describe the roles and responsibilities of professionals in each area.
-
Identify typical tasks that belong to each field (e.g., statistical summaries vs. machine learning modeling).
- Explain the data analytics workflow
-
Describe the four major types of analytics: descriptive, diagnostic, predictive, and prescriptive.
-
Identify the questions each type of analytics answers and their business relevance.
-
Explain the key steps in the data analytics process: data collection, preprocessing, analysis, and reporting.
-
Match each analytics type to a real-world example scenario.
|
Identify Ethical and Legal Considerations in Data Analytics |
- Describe key ethical principles and legal frameworks
-
Explain transparency, consent, privacy, fairness, and accountability in data handling.
-
Identify major laws such as GDPR, HIPAA, and CCPA, and explain how they guide responsible data use.
-
Describe methods like anonymization and encryption that support ethical and legal compliance.
|
Python Basics for Data Analysis - 32.5%
|
Work with Variables and Data Types |
- Use variables and data types, and perform basic operations.
-
Define and assign variables in Python using the assignment operator =.
-
Perform simple operations with numbers (e.g., addition, subtraction) and strings (e.g., concatenation, repetition).
-
Use type() and isinstance() to inspect variable types.
-
Identify common Python data types: int, float, str, and bool.
|
Use Python Data Collections and Sequences |
- Create and manipulate lists.
-
Create and access list elements using indexing and slicing.
-
Use list methods: append(), insert(), pop(), remove(), sort(), reverse(), count(), and index() to manage, modify, and analyze collections.
-
Use list comprehensions to transform or filter data.
- Work with tuples and sets.
-
Create and access tuples using indexing.
-
Explain tuple immutability and when to use tuples over lists.
-
Create sets and perform set operations (add(), remove(), union(), intersection(), isdisjoint(), difference()).
-
Use sets to remove duplicates and test for membership.
- Use dictionaries for data storage, grouping, and lookup.
-
Create dictionaries with key-value pairs.
-
Access, update, and delete values using keys.
-
Use dict.get() to safely retrieve values with a default.
-
Loop through dictionaries using for key in dict: and items().
-
Apply dictionaries in basic counting, lookup, and categorization tasks.
-
Represent data as lists of dictionaries (e.g., [{ 'product': 'Laptop', 'price': 999 }, ...]).
- Work with strings as sequences and apply string methods.
-
Treat strings as character sequences (e.g., indexing, slicing, looping).
-
Work with strings using common built-in methods: startswith(), endswith(), find(), capitalize(), isdigit(), isalpha().
|
Use Functions and Handle Exceptions |
- Define and call functions
-
Create reusable code blocks using the def keyword.
-
Use parameters to pass values into functions; distinguish between positional, keyword, and default parameters.
-
Return values using return, and explain how None is used when no return is specified.
-
Use pass to define placeholder function bodies during development.
- Understand scope and variable behavior in functions
-
Distinguish between local and global variables in a data script.
-
Explain name shadowing and how reusing variable names inside functions affects program behavior.
-
Use global variables only when necessary and understand when to prefer local scope.
- Handle errors with try-except blocks
-
Identify common runtime errors (TypeError, ValueError, IndexError) that can occur in data handling.
-
Wrap function calls in try-except blocks to make analysis scripts more robust.
-
Print or log meaningful error messages for debugging and clarity.
-
Use exception handling to prevent crashes when reading files (FileNotFoundError), converting values, or indexing lists.
|
Control Program Flow with Conditionals and Loops |
- Apply Boolean logic and comparisons
-
Use comparison operators (==, !=, <, >, >=, <=) to evaluate expressions.
-
Apply logical operators (and, or, not) to combine multiple conditions.
-
Use Boolean expressions to drive data filtering and validation logic.
- Use conditional statements to control logic
-
Write if, elif, and else blocks to choose between actions based on data values.
-
Use conditional logic to check for missing data, outliers, or invalid input.
-
Nest conditionals for more complex decision-making.
- Write loops for repeated tasks
-
Use for loops to iterate over strings, lists, dictionaries, and ranges.
-
Use while loops for condition-controlled repetition.
-
Apply break, continue, and else with loops to manage control flow.
-
Combine loops with conditionals to perform data cleaning, aggregation, or transformation.
|
Use Modules and Packages |
- Import and use Python modules and packages
-
Import built-in modules using import, from ... import, and aliases.
-
Access and use functions from standard libraries (math, random, statistics, collections, os, datetime) in data-related tasks.
-
Use the csv module to read from and write to CSV files.
-
Understand the difference between built-in and third-party packages, and when to use them in data analysis.
-
Navigate and interpret official documentation (docs.python.org)
- Use external libraries in data workflows
-
Install and import external libraries (e.g., numpy) using pip.
-
Import and use numpy to work with arrays and perform numeric analysis.
-
Understand the difference between built-in and third-party packages.
-
Navigate and interpret official documentation (numpy.org)
-
Use documentation to troubleshoot errors, learn new functions, or understand unfamiliar behavior
|
Working with Data and Performing Simple Analyses - 32.5%
|
Read and Write Data Using Files |
- Read and write plain text files using Python built-ins
-
Apply file operations to store and retrieve simple datasets – use open(), read(), readlines(), and write() to handle text file input and output.
-
Use with statements to open files safely and automatically close them.
-
Work with file paths and check file existence using the os module (os.path.exists()).
-
Apply try-except blocks to catch file-related errors such as FileNotFoundError.
- Read and write CSV files using the csv module
-
Use csv.reader() to read structured data from CSV files line by line.
-
Use csv.writer() to write tabular data into CSV format.
-
Manually parse and clean lines using .strip() and .split(',') where appropriate.
-
Write formatted summaries using f-strings for clean file output.
|
Clean and Prepare Data for Analysis |
- Identify and handle missing or invalid data
-
Use conditionals and list comprehensions to detect missing or null-like values (e.g., None, empty strings).
-
Replace or remove missing values using logical checks.
-
Use if statements to check for invalid types, unexpected formats, or out-of-range values (e.g., negative age, empty name field) before processing data.
- Remove duplicates and normalize values
-
Use set(), dictionary keys, or comprehension-based filtering to eliminate duplicates.
-
Apply min-max normalization manually using list expressions.
-
Apply transformations using enumeration when index tracking is needed.
- Clean and format strings
-
Use built-in string methods like .strip(), .lower(), .upper(), .replace(), and .title() for text normalization.
-
Chain string operations to perform multi-step cleaning (e.g., .strip().lower().replace()).
- Convert and format data for analysis and storage
-
Convert between common types using int(), float(), str(), and bool().
-
Format numbers using f-strings for precision (e.g., f'{value:.2f}').
-
Manipulate string fields using .split() and .join().
-
Parse and format dates and times using datetime.strptime() and strftime() for time-based data processing.
|
Perform Basic Analytical Computations |
- Perform aggregations using Python built-ins
-
Use len(), sum(), min(), max(), and round() to summarize data and compute simple aggregations.
-
Count values using .count() or dictionary accumulation patterns.
- Calculate descriptive statistics using built-in libraries
-
Calculate mean, median, and standard deviation using the statistics module (statistics.mean(), statistics.median(), statistics.stdev()).
-
Use the math module for basic numeric computations (math.sqrt(), math.ceil(), math.floor()).
-
Use collections.Counter() to compute frequency counts for categorical data.
- Perform numerical operations with NumPy
-
Convert lists to arrays using numpy.array().
-
Apply numpy functions to perform array-based statistics (numpy.mean(), numpy.median(), numpy.std(), numpy.sum())
-
Generate number sequences using numpy.arange() and linspace().
- Calculate conditional metrics based on filters or categories
-
Use if statements or list comprehensions to calculate metrics (e.g., average or count) for subsets of data.
-
Group values by simple categories (e.g., gender, region, pass/fail) and calculate summaries per group using dictionaries or loops.
-
Combine multiple conditions using and/or to create more specific filters (e.g., scores above 80 and in a specific class).
|
Conduct Basic Exploratory Data Analysis (EDA) |
- Identify patterns and trends using sorting and filtering
-
Sort data using sorted() or numpy.sort().
-
Filter data using filter(), list comprehensions, or logical conditions.
- Identify unique values and frequencies
-
Use set() and numpy.unique() to identify distinct values.
-
Use Counter() to count the frequency of items in lists.
- Perform simple correlation checks and detect outliers
-
Use numpy.corrcoef() to compute correlations between numeric lists or arrays.
-
Detect outliers using simple rules (e.g., thresholds, standard deviation) and conditional logic.
-
Filter outliers using numpy boolean indexing or conditionals.
-
Interpret basic patterns and anomalies found through code-based exploration.
|
Communicating Insights and Reporting - 12.5%
|
Understand Basic Principles of Data Visualization |
- Recognize common types of visualizations and their purposes
-
Identify bar charts, line charts, and pie charts, and explain when to use each.
-
Discuss the strengths and limitations of each visualization type.
-
Select appropriate visuals based on data type and communication goals.
- Interpret simple data visualizations
-
Describe trends, comparisons, and proportions represented in basic visuals.
-
Identify misleading or unclear visuals and explain how they can be improved.
-
Assess whether a visualization supports or confuses the intended insight.
|
Apply Fundamentals of Data Storytelling |
- Structure and communicate data insights as a narrative
-
Explain the basic structure of a data story: introduction, insights, conclusion.
-
Lead with a key message supported by evidence.
-
Use transitions and signposting to create flow between sections.
-
Adjust tone, language, and depth based on audience knowledge and needs.
|
Create Clear and Concise Analytical Reports |
- Summarize and organize analytical results effectively
-
Write short summaries of key patterns and findings with supporting data (e.g., averages, proportions).
-
Use a logical structure: problem, analysis, insight, recommendation.
-
Apply formatting (headings, bullet points, visuals) to improve clarity and readability.
|
Communicate Insights Effectively in Presentations |
- Present data insights clearly using visual and verbal techniques
-
Use accessible and clean design principles (labels, titles, colors, font size).
-
Explain charts or results clearly during presentations.
-
Respond to questions using evidence from visuals or numeric findings.
|