PyFlow Analysis Report

Analysis Overview

Total Blocks

Functions

Total Complexity

1.9

Avg Complexity

Variables

Call Graph Nodes

Block Type Distribution

import 3 (8.3%)

assign 17 (47.2%)

df_op 4 (11.1%)

func_def 5 (13.9%)

control 3 (8.3%)

func_call 4 (11.1%)

Complexity Analysis

Total Complexity: 67

Average Complexity: 1.86

Max Complexity: 5

Min Complexity: 1

✓ No high complexity blocks detected.

Function Analysis

Function Name	Line Count	Complexity	Variables Used
age_category	1	2	5
add_age_category	1	1	0
filter_and_select	1	1	0
aggregate_stats	1	1	0
repartition_and_cache	1	1	0

Block Analysis

Block ID	Type	Description	Lines	Complexity	Dependencies	Variables
`12345_IMPORT001_L3`	import	Import: (unknown, from_import, quality: 100.0)	3-3	1	0	R:0 W:0
`12345_IMPORT002_L8`	import	Import: (unknown, from_import, 2 items, quality: 100.0)	8-8	2	0	R:0 W:0
`12345_IMPORT003_L14`	import	Import: (unknown, from_import, 2 items, quality: 100.0)	14-14	2	0	R:0 W:0
`ASSIGN_...L22`	assign	Assignment: spark = ... (quality: 96.0)	22-22	3	0	R:7 W:1
`ASSIGN_...L102`	assign	Assignment: data1 = ... (quality: 100.0)	102-102	2	0	R:3 W:1
`ASSIGN_...L190`	assign	Assignment: columns = ... (quality: 100.0)	190-190	1	0	R:2 W:1
`...L240`	df_op	Assignment: df1 = ... (quality: 100.0)	240-240	1	3	R:3 W:1
`ASSIGN_...L294`	assign	Assignment: data2 = ... (quality: 100.0)	294-294	2	0	R:2 W:1
`...L360`	df_op	Assignment: df2 = ... (quality: 92.0)	360-360	1	3	R:9 W:1
`FUNC_...L429`	func_def	Function: age_category (complexity: 2, quality: 100.0)	429-430	2	0	R:5 W:0
`...L446`	control	Control flow: if_statement (conditional, complexity: 3, quality: 100.0)	446-448	3	0	R:1 W:0
`...L446`	control	Control flow: if_statement (conditional, complexity: 3, quality: 100.0)	446-448	3	0	R:1 W:0
`ASSIGN_...L519`	assign	Assignment: age_category_udf = ... (quality: 96.0)	519-519	2	0	R:7 W:1
`FUNC_...L598`	func_def	Function: add_age_category (complexity: 1, quality: 100.0)	598-599	1	0	R:0 W:0
`FUNC_...L748`	func_def	Function: filter_and_select (complexity: 1, quality: 100.0)	748-749	1	0	R:0 W:0
`FUNC_...L1015`	func_def	Function: aggregate_stats (complexity: 1, quality: 100.0)	1015-1016	1	0	R:0 W:0
`FUNC_...L1267`	func_def	Function: repartition_and_cache (complexity: 1, quality: 100.0)	1267-1268	1	0	R:0 W:0
`...L1311`	control	Control flow: if_statement (conditional, complexity: 2, quality: 99.0)	1311-1323	2	3	R:43 W:7
`ASSIGN_...L1354`	assign	Assignment: df1_cat = ... (quality: 100.0)	1354-1354	1	1	R:1 W:1
`ASSIGN_...L1354`	assign	Assignment: df1_cat = ... (quality: 100.0)	1354-1354	1	1	R:1 W:1
`ASSIGN_...L1399`	assign	Assignment: df2_cat = ... (quality: 96.0)	1399-1399	1	1	R:7 W:1
`ASSIGN_...L1399`	assign	Assignment: df2_cat = ... (quality: 96.0)	1399-1399	1	1	R:7 W:1
`ASSIGN_...L1458`	assign	Assignment: df_union = ... (quality: 99.2)	1458-1458	4	2	R:3 W:1
`ASSIGN_...L1458`	assign	Assignment: df_union = ... (quality: 99.2)	1458-1458	4	2	R:3 W:1
`ASSIGN_...L1600`	assign	Assignment: df_filtered = ... (quality: 90.0)	1600-1600	1	1	R:10 W:1
`ASSIGN_...L1600`	assign	Assignment: df_filtered = ... (quality: 90.0)	1600-1600	1	1	R:10 W:1
`ASSIGN_...L1676`	assign	Assignment: df_stats = ... (quality: 98.0)	1676-1676	1	1	R:6 W:1
`ASSIGN_...L1676`	assign	Assignment: df_stats = ... (quality: 98.0)	1676-1676	1	1	R:6 W:1
`ASSIGN_...L1734`	assign	Assignment: df_stats_cached = ... (quality: 94.0)	1734-1734	1	1	R:8 W:1
`ASSIGN_...L1734`	assign	Assignment: df_stats_cached = ... (quality: 94.0)	1734-1734	1	1	R:8 W:1
`...L1811`	df_op	Assignment: df_joined = ... (quality: 84.2)	1811-1811	5	2	R:7 W:1
`...L1811`	df_op	Assignment: df_joined = ... (quality: 84.2)	1811-1811	5	2	R:7 W:1
`FUNC_...L2021`	func_call	Expression: show(...) (quality: 90.0)	2021-2021	2	1	R:5 W:0
`FUNC_...L2021`	func_call	Expression: show(...) (quality: 90.0)	2021-2021	2	1	R:5 W:0
`FUNC_...L2077`	func_call	Expression: stop(...) (quality: 90.0)	2077-2077	2	1	R:1 W:0
`FUNC_...L2077`	func_call	Expression: stop(...) (quality: 90.0)	2077-2077	2	1	R:1 W:0

Program Flow Visualization

💡 Shows the sequential flow of program execution and block relationships

Program Flow Graph

100%

💡 Interactive Controls: Shows the sequential flow of program execution • Mouse wheel + Ctrl to zoom • Drag to pan • R to reset • F to fit • + - for zoom

Function Call Relationships

🔗 Visualizes function relationships and call patterns throughout your code

Function Call Graph

100%

💡 Interactive Controls: Visualizes function relationships and call patterns • Mouse wheel + Ctrl to zoom • Drag to pan • R to reset • F to fit • + - for zoom

Block Dependencies

🔀 Displays how code blocks depend on each other and data flow

Block Dependencies Graph

100%

💡 Interactive Controls: Displays how blocks depend on each other • Mouse wheel + Ctrl to zoom • Drag to pan • R to reset • F to fit • + - for zoom

Interactive Enhanced Code

💡 Click on any graph node to highlight and popup the corresponding code block below

                        12345_IMPORT001_L3
                        Import:  (unknown, from_import, quality: 100.0)
                    
                        import C:1
                    
                    from pyspark.sql import SparkSession

                        12345_IMPORT002_L8
                        Import:  (unknown, from_import, 2 items, quality: 100.0)
                    
                        import C:2
                    
                    from pyspark.sql.functions import col,

lit, when, udf

                        12345_IMPORT003_L14
                        Import:  (unknown, from_import, 2 items, quality: 100.0)
                    
                        import C:2
                    
                    from pyspark.sql.types import IntegerType, StringType

                        ASSIGN_...L22
                        Assignment: spark = ... (quality: 96.0)
                    
                        assign C:3 W:spark R:builder, SparkSession, DataFrame, data, LineageTest, creation, Sample Calls:builder, getOrCreate, appName
                    
                    spark = SparkSession.builder().appName("LineageTest").getOrCreate()

                        ASSIGN_...L102
                        Assignment: data1 = ... (quality: 100.0)
                    
                        assign C:2 W:data1 R:Charlie, Alice, Bob
                    
                    data1 = [("Alice", 34), ("Bob", 45), ("Charlie", 23)]

                        ASSIGN_...L190
                        Assignment: columns = ... (quality: 100.0)
                    
                        assign C:1 W:columns R:name, age
                    
                    columns = ["name", "age"]

                        ...L240
                        Assignment: df1 = ... (quality: 100.0)
                    
                        df_op C:1 W:df1 R:data1, spark, columns Calls:createDataFrame
                    
                    df1 = spark.createDataFrame(data1, columns)

                        ASSIGN_...L294
                        Assignment: data2 = ... (quality: 100.0)
                    
                        assign C:2 W:data2 R:Eve, Dave
                    
                    data2 = [("Dave", 29), ("Eve", 52)]

                        ...L360
                        Assignment: df2 = ... (quality: 92.0)
                    
                        df_op C:1 W:df2 R:A, spark, corresponding, data2, function, UDF, columns, Python, simple Calls:createDataFrame
                    
                    df2 = spark.createDataFrame(data2, columns)

                        FUNC_...L429
                        Function: age_category (complexity: 2, quality: 100.0)
                    
                        func_def C:2 R:ENDELSE, Young, ENDIF, Old, age
                    
                    def age_category(age):

                        ...L446
                        Control flow: if_statement (conditional, complexity: 3, quality: 100.0)
                    
                        control C:3 R:age
                    
                    if age < 30:
        return "Young"
    else:
        return "Old"

                        ASSIGN_...L519
                        Assignment: age_category_udf = ... (quality: 96.0)
                    
                        assign C:2 W:age_category_udf R:add, category, age_category, Function, column, age, an Calls:udf, StringType
                    
                    age_category_udf = udf(age_category, StringType())

                        FUNC_...L598
                        Function: add_age_category (complexity: 1, quality: 100.0)
                    

                        func_def C:1
                    

                    def add_age_category(input_df):
    return (
            input_df
            .withColumn("age_category", age_category_udf(col("age")))
        )
                

                        FUNC_...L748
                        Function: filter_and_select (complexity: 1, quality: 100.0)
                    

                        func_def C:1
                    

                    def filter_and_select(input_df, min_age):
    return (
            input_df
            .filter(col("age") > min_age)
            .select(
                col("name"),
                col("age_category"),
                (col("age") * lit(2)).alias("double_age")
            )
        )
                

                        FUNC_...L1015
                        Function: aggregate_stats (complexity: 1, quality: 100.0)
                    

                        func_def C:1
                    

                    def aggregate_stats(input_df):
    return (
            input_df
            .groupBy("age_category")
            .agg(
                {"age": "avg", "*": "count"}
            )
            .withColumnRenamed("avg(age)", "avg_age")
            .withColumnRenamed("count(1)", "count")
        )
                

                        FUNC_...L1267
                        Function: repartition_and_cache (complexity: 1, quality: 100.0)
                    
                        func_def C:1
                    
                    def repartition_and_cache(input_df, num_partitions):
    return input_df.repartition(num_partitions).cache()

                        ...L1311
                        Control flow: if_statement (conditional, complexity: 2, quality: 99.0)
                    
                        control C:2 W:df1_cat, df_union, df_joined, df2_cat, df_stats, df_filtered, df_stats_cached R:then, cache, num_partitions, left, aggregate, df_union, stats, name, on, drop, age_category, spark, count, categorized, df2, df2_cat, truncate, Compute, __name__, session, __main__, df1, duplicates, Union, min_age, df_joined, results, df1_cat, Join, Repartition, Stop, df_filtered, DataFrame, original, order, filtered, Spark, the, two, DataFrames, how, df_stats, df_stats_cached Calls:show, desc, add_age_category, join, col, stop, orderBy, aggregate_stats, limit, repartition_and_cache, union, filter_and_select, dropDuplicates
                    
                    if __name__ == "__main__":

                        ASSIGN_...L1354
                        Assignment: df1_cat = ... (quality: 100.0)
                    
                        assign C:1 W:df1_cat R:df1 Calls:add_age_category
                    
                    df1_cat = add_age_category(df1)

                        ASSIGN_...L1399
                        Assignment: df2_cat = ... (quality: 96.0)
                    
                        assign C:1 W:df2_cat R:drop, the, two, DataFrames, df2, duplicates, Union Calls:add_age_category
                    
                    df2_cat = add_age_category(df2)

                        ASSIGN_...L1458
                        Assignment: df_union = ... (quality: 99.2)
                    

                        assign C:4 W:df_union R:df1_cat, df2_cat, name Calls:orderBy, union, dropDuplicates
                    

                    df_union = (
            df1_cat.union(df2_cat)
            .dropDuplicates(["name"])
            .orderBy("name")
        )
                

                        ASSIGN_...L1600
                        Assignment: df_filtered = ... (quality: 90.0)
                    
                        assign C:1 W:df_filtered R:Compute, aggregate, df_union, stats, min_age, on, original, DataFrame, the, categorized Calls:filter_and_select
                    
                    df_filtered = filter_and_select(df_union, min_age=30)

                        ASSIGN_...L1676
                        Assignment: df_stats = ... (quality: 98.0)
                    
                        assign C:1 W:df_stats R:df1_cat, cache, Repartition, DataFrame, the, stats Calls:aggregate_stats
                    
                    df_stats = aggregate_stats(df1_cat)

                        ASSIGN_...L1734
                        Assignment: df_stats_cached = ... (quality: 94.0)
                    
                        assign C:1 W:df_stats_cached R:Join, then, num_partitions, order, filtered, stats, df_stats, results Calls:repartition_and_cache
                    
                    df_stats_cached = repartition_and_cache(df_stats, num_partitions=2)

                        ...L1811
                        Assignment: df_joined = ... (quality: 84.2)
                    

                        df_op C:5 W:df_joined R:age_category, df_filtered, count, left, how, on, df_stats_cached Calls:desc, join, col, limit, orderBy
                    

                    df_joined = (
            df_filtered
            .join(df_stats_cached, on="age_category", how="left")
            .orderBy(col("count").desc())
            .limit(10)
        )
                

                        FUNC_...L2021
                        Expression: show(...) (quality: 90.0)
                    
                        func_call C:2 R:Stop, session, Spark, df_joined, truncate Calls:show
                    
                    df_joined.show(truncate=False)

                        FUNC_...L2077
                        Expression: stop(...) (quality: 90.0)
                    
                        func_call C:2 R:spark Calls:stop
                    
                    spark.stop()

Detailed Metrics & Statistics

Block Statistics

Total Blocks: 36

Functions: 5

Variables: 14

Block Type Distribution

assign 17 (47.2%)

func_def 5 (13.9%)

df_op 4 (11.1%)

func_call 4 (11.1%)

import 3 (8.3%)

control 3 (8.3%)

Function Statistics

Total Functions: 5

Async Functions: 0

Avg Parameters: 0.0

Total Parameters: 0

Function Complexity Chart

age_category 2

add_age_category 1

filter_and_select 1

aggregate_stats 1

repartition_and_cache 1

Complexity Analysis

Complexity 1: 18 blocks

Complexity 2: 11 blocks

Complexity 3: 3 blocks

Complexity 4: 2 blocks

Complexity 5: 2 blocks

Complexity Distribution Chart

Complexity Score → Count

Quality Metrics

No quality scores available.

Quality Score Distribution

No quality scores available.

Dependency Analysis

Total Dependencies: 31

Blocks with Dependencies: 21

Max Dependencies per Block: 3

Average Dependencies: 0.9

Dependencies per Block

Total: 36 blocks, 21 with dependencies

Variable Usage

Total Variable Reads: 179

Total Variable Writes: 28

Blocks Reading Variables: 29

Blocks Writing Variables: 22

Read/Write Ratio: 6.39 if total_writes > 0 else "∞"

Variable Read/Write Ratio

📖 Variable Reads 179 (86.5%)

✏️ Variable Writes 28 (13.5%)

Analysis Overview

Block Type Distribution

Complexity Analysis

Function Analysis

Block Analysis

Program Flow Visualization

Program Flow Graph

Function Call Relationships

Function Call Graph

Block Dependencies

Block Dependencies Graph

Interactive Enhanced Code

Code Block

Detailed Metrics & Statistics

Block Statistics

Block Type Distribution

Function Statistics

Function Complexity Chart

Complexity Analysis

Complexity Distribution Chart

Quality Metrics

Quality Score Distribution

Dependency Analysis

Dependencies per Block

Variable Usage

Variable Read/Write Ratio