Analysis Overview
Block Type Distribution
Complexity Analysis
Total Complexity: 67
Average Complexity: 1.86
Max Complexity: 5
Min Complexity: 1
Function Analysis
| Function Name | Parameters | Line Count | Complexity | Calls Made | Variables Used | Async |
|---|---|---|---|---|---|---|
| age_category | 1 |
2
|
5 | |||
| add_age_category | 1 |
1
|
0 | |||
| filter_and_select | 1 |
1
|
0 | |||
| aggregate_stats | 1 |
1
|
0 | |||
| repartition_and_cache | 1 |
1
|
0 |
Block Analysis
| Block ID | Type | Description | Lines | Complexity | Dependencies | Variables |
|---|---|---|---|---|---|---|
12345_IMPORT001_L3 |
import | Import: (unknown, from_import, quality: 100.0) | 3-3 |
1
|
0 | R:0 W:0 |
12345_IMPORT002_L8 |
import | Import: (unknown, from_import, 2 items, quality: 100.0) | 8-8 |
2
|
0 | R:0 W:0 |
12345_IMPORT003_L14 |
import | Import: (unknown, from_import, 2 items, quality: 100.0) | 14-14 |
2
|
0 | R:0 W:0 |
ASSIGN_...L22 |
assign | Assignment: spark = ... (quality: 96.0) | 22-22 |
3
|
0 | R:7 W:1 |
ASSIGN_...L102 |
assign | Assignment: data1 = ... (quality: 100.0) | 102-102 |
2
|
0 | R:3 W:1 |
ASSIGN_...L190 |
assign | Assignment: columns = ... (quality: 100.0) | 190-190 |
1
|
0 | R:2 W:1 |
...L240 |
df_op | Assignment: df1 = ... (quality: 100.0) | 240-240 |
1
|
3 | R:3 W:1 |
ASSIGN_...L294 |
assign | Assignment: data2 = ... (quality: 100.0) | 294-294 |
2
|
0 | R:2 W:1 |
...L360 |
df_op | Assignment: df2 = ... (quality: 92.0) | 360-360 |
1
|
3 | R:9 W:1 |
FUNC_...L429 |
func_def | Function: age_category (complexity: 2, quality: 100.0) | 429-430 |
2
|
0 | R:5 W:0 |
...L446 |
control | Control flow: if_statement (conditional, complexity: 3, quality: 100.0) | 446-448 |
3
|
0 | R:1 W:0 |
...L446 |
control | Control flow: if_statement (conditional, complexity: 3, quality: 100.0) | 446-448 |
3
|
0 | R:1 W:0 |
ASSIGN_...L519 |
assign | Assignment: age_category_udf = ... (quality: 96.0) | 519-519 |
2
|
0 | R:7 W:1 |
FUNC_...L598 |
func_def | Function: add_age_category (complexity: 1, quality: 100.0) | 598-599 |
1
|
0 | R:0 W:0 |
FUNC_...L748 |
func_def | Function: filter_and_select (complexity: 1, quality: 100.0) | 748-749 |
1
|
0 | R:0 W:0 |
FUNC_...L1015 |
func_def | Function: aggregate_stats (complexity: 1, quality: 100.0) | 1015-1016 |
1
|
0 | R:0 W:0 |
FUNC_...L1267 |
func_def | Function: repartition_and_cache (complexity: 1, quality: 100.0) | 1267-1268 |
1
|
0 | R:0 W:0 |
...L1311 |
control | Control flow: if_statement (conditional, complexity: 2, quality: 99.0) | 1311-1323 |
2
|
3 | R:43 W:7 |
ASSIGN_...L1354 |
assign | Assignment: df1_cat = ... (quality: 100.0) | 1354-1354 |
1
|
1 | R:1 W:1 |
ASSIGN_...L1354 |
assign | Assignment: df1_cat = ... (quality: 100.0) | 1354-1354 |
1
|
1 | R:1 W:1 |
ASSIGN_...L1399 |
assign | Assignment: df2_cat = ... (quality: 96.0) | 1399-1399 |
1
|
1 | R:7 W:1 |
ASSIGN_...L1399 |
assign | Assignment: df2_cat = ... (quality: 96.0) | 1399-1399 |
1
|
1 | R:7 W:1 |
ASSIGN_...L1458 |
assign | Assignment: df_union = ... (quality: 99.2) | 1458-1458 |
4
|
2 | R:3 W:1 |
ASSIGN_...L1458 |
assign | Assignment: df_union = ... (quality: 99.2) | 1458-1458 |
4
|
2 | R:3 W:1 |
ASSIGN_...L1600 |
assign | Assignment: df_filtered = ... (quality: 90.0) | 1600-1600 |
1
|
1 | R:10 W:1 |
ASSIGN_...L1600 |
assign | Assignment: df_filtered = ... (quality: 90.0) | 1600-1600 |
1
|
1 | R:10 W:1 |
ASSIGN_...L1676 |
assign | Assignment: df_stats = ... (quality: 98.0) | 1676-1676 |
1
|
1 | R:6 W:1 |
ASSIGN_...L1676 |
assign | Assignment: df_stats = ... (quality: 98.0) | 1676-1676 |
1
|
1 | R:6 W:1 |
ASSIGN_...L1734 |
assign | Assignment: df_stats_cached = ... (quality: 94.0) | 1734-1734 |
1
|
1 | R:8 W:1 |
ASSIGN_...L1734 |
assign | Assignment: df_stats_cached = ... (quality: 94.0) | 1734-1734 |
1
|
1 | R:8 W:1 |
...L1811 |
df_op | Assignment: df_joined = ... (quality: 84.2) | 1811-1811 |
5
|
2 | R:7 W:1 |
...L1811 |
df_op | Assignment: df_joined = ... (quality: 84.2) | 1811-1811 |
5
|
2 | R:7 W:1 |
FUNC_...L2021 |
func_call | Expression: show(...) (quality: 90.0) | 2021-2021 |
2
|
1 | R:5 W:0 |
FUNC_...L2021 |
func_call | Expression: show(...) (quality: 90.0) | 2021-2021 |
2
|
1 | R:5 W:0 |
FUNC_...L2077 |
func_call | Expression: stop(...) (quality: 90.0) | 2077-2077 |
2
|
1 | R:1 W:0 |
FUNC_...L2077 |
func_call | Expression: stop(...) (quality: 90.0) | 2077-2077 |
2
|
1 | R:1 W:0 |
Program Flow Visualization
💡 Shows the sequential flow of program execution and block relationships
Program Flow Graph
Function Call Relationships
🔗 Visualizes function relationships and call patterns throughout your code
Function Call Graph
Block Dependencies
🔀 Displays how code blocks depend on each other and data flow
Block Dependencies Graph
Interactive Enhanced Code
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,
lit, when, udf
from pyspark.sql.types import IntegerType, StringType
spark = SparkSession.builder().appName("LineageTest").getOrCreate()
data1 = [("Alice", 34), ("Bob", 45), ("Charlie", 23)]
columns = ["name", "age"]
df1 = spark.createDataFrame(data1, columns)
data2 = [("Dave", 29), ("Eve", 52)]
df2 = spark.createDataFrame(data2, columns)
def age_category(age):
if age < 30:
return "Young"
else:
return "Old"
age_category_udf = udf(age_category, StringType())
def add_age_category(input_df):
return (
input_df
.withColumn("age_category", age_category_udf(col("age")))
)
def filter_and_select(input_df, min_age):
return (
input_df
.filter(col("age") > min_age)
.select(
col("name"),
col("age_category"),
(col("age") * lit(2)).alias("double_age")
)
)
def aggregate_stats(input_df):
return (
input_df
.groupBy("age_category")
.agg(
{"age": "avg", "*": "count"}
)
.withColumnRenamed("avg(age)", "avg_age")
.withColumnRenamed("count(1)", "count")
)
def repartition_and_cache(input_df, num_partitions):
return input_df.repartition(num_partitions).cache()
if __name__ == "__main__":
df1_cat = add_age_category(df1)
df2_cat = add_age_category(df2)
df_union = (
df1_cat.union(df2_cat)
.dropDuplicates(["name"])
.orderBy("name")
)
df_filtered = filter_and_select(df_union, min_age=30)
df_stats = aggregate_stats(df1_cat)
df_stats_cached = repartition_and_cache(df_stats, num_partitions=2)
df_joined = (
df_filtered
.join(df_stats_cached, on="age_category", how="left")
.orderBy(col("count").desc())
.limit(10)
)
df_joined.show(truncate=False)
spark.stop()
Detailed Metrics & Statistics
Block Statistics
Total Blocks: 36
Functions: 5
Variables: 14
Block Type Distribution
Function Statistics
Total Functions: 5
Async Functions: 0
Avg Parameters: 0.0
Total Parameters: 0
Function Complexity Chart
Complexity Analysis
Complexity 1: 18 blocks
Complexity 2: 11 blocks
Complexity 3: 3 blocks
Complexity 4: 2 blocks
Complexity 5: 2 blocks
Complexity Distribution Chart
Quality Metrics
No quality scores available.
Quality Score Distribution
No quality scores available.
Dependency Analysis
Total Dependencies: 31
Blocks with Dependencies: 21
Max Dependencies per Block: 3
Average Dependencies: 0.9
Dependencies per Block
Variable Usage
Total Variable Reads: 179
Total Variable Writes: 28
Blocks Reading Variables: 29
Blocks Writing Variables: 22
Read/Write Ratio: 6.39 if total_writes > 0 else "∞"