[GSoC 2024] Potential Proposal and Discussion - Lexical Scopes for `swift-syntax`

Hey Everyone!

Intro

Happy to see Swift accepted after another successful year with some exciting potential projects for this years Google Summer of Code! It was indeed quite difficult to pick due to the brilliant ideas set by the mentors that further improve, add new features, reuse code and lot more!

Description

As described by @Douglas_Gregor to summarise, the idea involves implementing lexical scopes as part of swift-syntax that would introduce a new library to perform lookups for names and scopes for particular syntax node just like in a typical implementation for any programming language.

Defining and Representing Scopes

We would define a class to represent lexical scopes and this would include information about the variable names and their respective visibilities depending on what context where we are talking here. At the basic level, we will have a record which can be a dictionary to store information like the name and it's parent, so while performing a lookup we can access it in a bottom-up fashion.

// A class to represent a lexical scope.
class LexicalScope {
    var variables: [VariableDeclaration] = []
    var parentScope: LexicalScope?
    
    // Scope-related information...
}
// A structure to represent a variable declaration.
struct VariableDeclaration {
    let name: String
    let varType: <SomeType>

    // Variable info...
}

Scope Tracking and Variable Extraction

The processNode helper function during traversal would be used to break down and extract the type of node we are interested in, which for example can be a simple variable declaration. Again, the special cases like if let, guard, etc. or temporary scopes should have special handling to immediately destroy or remove it from the context.

// Function to traverse the syntax tree and track lexical scopes
func traverseSyntaxTree(node: SyntaxNode, currentScope: LexicalScope) {
    // Process the current node
    processNode(node, currentScope)
    
    // Recursively traverse child nodes
    for child in node.children {
        let childScope = LexicalScope()
        childScope.parentScope = currentScope
        traverseSyntaxTree(node: child, currentScope: childScope)
    }
}
// Function to process a syntax node and update the scope
func processNode(_ node: SyntaxNode, _ currentScope: LexicalScope) {
    switch node.kind {
    case .variableDeclaration:
        if let variableName = extractVariableName(from: node) {
            let variableDeclaration = VariableDeclaration(name: variableName)
            currentScope.variables.append(variableDeclaration)
        }
    // Handle other kinds of nodes and scope-related logic
    }
}
// Function to extract variable name from a variable declaration node
func extractVariableName(from node: SyntaxNode) -> String? {
    // Extract variable name logic here based on the syntax tree
    // For example, if node is a variable declaration, extract the name from its tokens
    // This can be more complex depending on Swift syntax
    return nil
}

Name Lookup API

Although, this is relatively simple, it can be further extended on the basic idea and additional features can be added depending on the requirements.

// Function to perform name lookup in lexical scopes
func findVariableInScopes(_ variableName: String, _ currentScope: LexicalScope) -> VariableDeclaration? {
    var scope: LexicalScope? = currentScope
    
    // Traverse up the scope hierarchy
    while let current = scope {
        if let variable = current.variables.first(where: { $0.name == variableName }) {
            return variable
        }
        scope = current.parentScope
    }
    
    return nil
}

Integration with SourceKit-LSP

Once the APIs are all set we can consider integrating our library with SourceKit-LSP to enhance IDE support. This could involve building features like code completion, navigation, and refactoring based on lexical scope information. My personal favourite is the jump-to-definition (and callers) which really helps developers to navigate with ease.

Testing

The testing part will require a lot of work to handle comprehensive unit tests and validate the correctness of the scope tracking and name lookup implementation. Various Swift constructs can be used to cover a wide range of scenarios.

Potential Challenges

  • In my understanding the lookup part is the one that needs to be optimised the most and this could be discussed.

  • Tests may not be able to cover all scenarios initially and hence will need frequent updates in the beginning.

  • Integration with SourceKitLSP and Swift Syntax could get complex since the logic and implementation needs to conform and adapt with the libraries to have consistency in the codebase while still having good performance.

  • Currently the C++ implementation needs to be replaced which would require some work, as it is an important step in replacing scope-related logic within the Swift Compiler.

Conclusion

I am looking forward for feedback from @Douglas_Gregor and the Swift Community for further improving on this and would love to know what changes and additions can be made!

Thanks :heart:
-- Rajveer (GitHub - Rajveer100)

3 Likes

Some of these types might be implicit, or built from the existing syntax nodes. For example, variables a and b that are declared in

let (a, b) = c

are going to show up as IdentifierPatternSyntax nodes in the syntax tree. A function declaration like func f() { } will show up as a FunctionDeclSyntax node. Rather than introduce a new VariableDeclaration, we should be able to abstract over the kinds of syntax nodes that introduce names (say, with a protocol).

Here, we might also consider going in the opposite direction---starting from the innermost syntax node (which you can find with token(at:)) and walking up the scope tree. Maybe we don't even need a representation of scopes, if that walk is efficient enough? I'm not sure, but the syntax tree has a lot of the information needed to establish scope readily available.

Definitely. Some clients want to know about a specific name ("what would name 'x' refer to at this point in the source file?") while others want to know about everything ("find all declarations visible from this point in the source file").

Name lookup occurs quite often, so yes, it's important that it be efficient.

Within swift-syntax, we've put a lot of emphasis on making it easy to write good, comprehensive unit tests. That would be a great thing to strive for here as well.

Getting to this point is certainly the dream, but we should recognize that it would be a significant stretch for a GSoC project to get to complete replacement. It also depends on other work that's going on now in the Swift compiler, which is replacing the existing parser with the swift-syntax parser.

Doug

2 Likes

Indeed, we could instead introduce a new LexicalScope protocol that would have these members (say parentScope) and also dependent conforming methods.

If you meant traversing bottom to top that would indeed be from the innermost node which is what I too meant. Although in the opposite case, i.e, going from top to bottom would be inefficient since at each level we would have to search the matching node to choose the right path till we reach the leaf.

@Douglas_Gregor

We could have a protocol something like this:

import SwiftSyntax

// Protocol representing a lexical scope
protocol Scope {
    var enclosingScope: Scope? { get }
    var symbols: [String: Syntax] { get }

    // ...
}

class IfStatementScope: Scope {
    let enclosingScope: Scope?
    var symbols: [String: Syntax] = [:]

    init(enclosingScope: Scope? = nil) {
        self.enclosingScope = enclosingScope
    }

    // ...
}

I think this would be more generic.

I also had a look at the traversal logic in Syntax file, it looks like the complexity is good, i.e, O(maxHeight) ~ O(log(N)). So we good go top down as well, by maintaining two pointers that will have the absolute position and eventually shorten and reach like binary search to the bottom child nodes.

|-------------------------------|
        |---------------|
            |-------|
                 |--|
                  .

@Douglas_Gregor
I will try and land a final proposal in the next few days, let me know any particular aspects that you expect to have.

@Douglas_Gregor
How are we integrating this into the codebase, would this be a stand-alone package since a complete replacement would be a stretch as you mentioned earlier?

@Douglas_Gregor @ktoso

Here is the final proposal for this project, let me know for any further revisions.

Also, I believe the project size remains Medium (~175 hours)?

Please make sure to submit the proposal over here on the summer of code page: https://summerofcode.withgoogle.com/

Forums posts are good for discussion but the official proposals must go through the GSoC website

1 Like

Sure Konrad, of course will do that!

1 Like