Skip to content

Conversation

@datvo06
Copy link

@datvo06 datvo06 commented Dec 4, 2025

A more detailed check and better type context construction for the Synthesis Handler.
Attempt to close #361 and subsumes #421

@kiranandcode
Copy link
Contributor

kiranandcode commented Dec 8, 2025

This may not be possible to avoid but get sources doesn't work in the repl, so in the following code submitted to the Repl, the LLM does not see the definition of MovieSummary

@dataclass
class MovieSummary:
    name: str
    summary: str
    score: int

@Template.define
def get_movie_classification_function(database: str) -> Callable[[str], MovieSummary]:
    """Generate a function that makes CURL requests to retrieve a movie summary from the database {database}."""
    ...

with handler(LiteLLMProvider()), handler(ProgramSynthesis()):
    imdb_classification = get_movie_classification_function('imdb') 
    letterboxd_classification = get_movie_classification_function('letterboxd')
    print(inspect.getsource(imdb_classification))
    print(inspect.getsource(letterboxd_classification))
    print(imdb_classification("die hard!"))
    print(letterboxd_classification("die hard!"))


source_code = textwrap.dedent(code)
# Construct the full function code
code = f"def {func_name}({param_sig}) -> {return_type}:\n{body}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried sending a few requests, the LLM often returns an entire module body not a function body.

I.e it returns something looking like:

print(result.body)
import requests

class MovieSummary:
    def __init__(self, title, summary, year, director, cast):
        self.title = title
        self.summary = summary
        self.year = year
        self.director = director
        self.cast = cast

    def __str__(self):
        return f"Title: {self.title}\nYear: {self.year}\nDirector: {self.director}\nCast: {', '.join(self.cast)}\nSummary: {self.summary}"


def fetch_movie_summary_from_letterboxd(movie_title: str) -> MovieSummary:
    url = f"https://api.letterboxd.com/v0/movie/{movie_title}"
    headers = {
        'Authorization': "Bearer YOUR_ACCESS_TOKEN",
        'Content-Type': 'application/json'
    }

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        data = response.json()
        summary = data.get('summary', 'No summary available.')
        year = data.get('releaseYear', 'Unknown')
        director = data.get('director', 'Unknown')
        cast = data.get('cast', [])
        
        return MovieSummary(title=movie_title, summary=summary, year=year, director=director, cast=cast)
    else:
        raise Exception(f"Failed to fetch data from letterboxd: {response.status_code}")
(Pdb) 

So in this case, it would make more sense to compile the entire file as a module and extract the last definition. This code compiles successfully, but returns None and does no computation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e the generated function is:

print(inspect.getsource(gs[func_name]))
def fetch_movie_summary_from_letterboxd(movie_title: str) -> MovieSummary:
    import requests

    class MovieSummary:
        def __init__(self, title, summary, year, director, cast):
            self.title = title
            self.summary = summary
            self.year = year
            self.director = director
            self.cast = cast

        def __str__(self):
            return f"Title: {self.title}\nYear: {self.year}\nDirector: {self.director}\nCast: {', '.join(self.cast)}\nSummary: {self.summary}"


    def fetch_movie_summary_from_letterboxd(movie_title: str) -> MovieSummary:
        url = f"https://api.letterboxd.com/v0/movie/{movie_title}"
        headers = {
            'Authorization': "Bearer YOUR_ACCESS_TOKEN",
            'Content-Type': 'application/json'
        }

        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            data = response.json()
            summary = data.get('summary', 'No summary available.')
            year = data.get('releaseYear', 'Unknown')
            director = data.get('director', 'Unknown')
            cast = data.get('cast', [])
        
            return MovieSummary(title=movie_title, summary=summary, year=year, director=director, cast=cast)
        else:
            raise Exception(f"Failed to fetch data from letterboxd: {response.status_code}")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree. I noticed that this can happen frequently, maybe generating the full module is a better way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One drawback is that we won't be able to force constrainted decoding if we generate the full module. But I think we already have a bigger problem than that. (cc'ing @jfeser: I'll try generate the full module instead )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest this as well. If you want the LLM to tell you the function name separately, that would be straightforward to do.

return fwd()

# Collect all types referenced in the signature
referenced_types = collect_referenced_types(ret_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the definitions only given for the return type? what about the arguments? Also I think it might make sense to do this based on lexical context as well. That would be more consistent with the semantics for tool calling as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'm leaving this for a separate issue #427, but happy to incorporate this here once the prior issues are addressed.

@kiranandcode
Copy link
Contributor

we could provide definitions of any functions in the context

def collect_lexical_functions() -> Dict[str, str]:
    lexical_context = {**globals(), **locals()}
    current_module_name = globals().get('__name__', '__main__')
    collected = {}
    for name, obj in lexical_context.items():
      if isinstance(obj, types.FunctionType) and \
         getattr(obj, "__module__", None) == current_module_name:
          try:
              collected[name] = textwrap.dedent(inspect.getsource(obj)).strip()
          except OSError:
              collected[name] = f"<function {obj.__name__} from {obj.__module__}>: {obj.__doc__}"

    return collected

@datvo06
Copy link
Author

datvo06 commented Dec 8, 2025

@kiranandcode Thanks! Let me try that.

@datvo06
Copy link
Author

datvo06 commented Dec 8, 2025

@kiranandcode @jfeser Thanks. I added the following:

  1. Collecting lexical context in Template.define.
  2. Add these symbols to the synthesized function module.
  3. Added tests to check that synthesized functions can use the other functions in scope.

Also closes #427

@datvo06
Copy link
Author

datvo06 commented Dec 8, 2025

Maybe we can also treat types the same way: they are also symbols within this lexical scope.

@eb8680
Copy link
Contributor

eb8680 commented Dec 8, 2025

Can we do the lexical context collection in a separate PR? It's a critical element of the overall design and as discussed in #427 it's not specific to synthesis.

@datvo06
Copy link
Author

datvo06 commented Dec 8, 2025

@eb8680 yes. Let me factor it out.

@datvo06
Copy link
Author

datvo06 commented Dec 9, 2025

I refractored lexical context to #434, I will rebase this on top of that one.

@datvo06
Copy link
Author

datvo06 commented Dec 16, 2025

Note: One problem I encountered from test_handlers_llm_provider_synthesis and test_handlers_llm_provider_image is that __future__ annotations would break Encodable.
With the import:

from __future__ import annotations

def foo(x: str) -> Callable[[str], int]: ...

Annotations are stored as strings: "str", "Callable[[str], int]".
Without the import:

def foo(x: str) -> Callable[[str], int]: ...

Annotations are stored as actual types: str, Callable[[str], int].

When it comes to type_to_encodable_type, it uses singledispatch, which caches results in a WeakKeyDictionary.
When a string is passed in:

type_to_encodable_type("str") 

Then it would try to cache str as a key

# WeakKeyDictionary.cache["str"] = ...

Leading to

TypeError: cannot create weak reference to 'str' object

Strings (and other built-in immutable types) cannot be weakly referenced.
For now, I avoided from __future__ import annotations in these tests.

@datvo06 datvo06 requested review from eb8680 and jfeser December 16, 2025 21:42
t = str

@classmethod
def encode(cls, context: LexicalContext) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect EncodableLexicalContext.encode or its equivalent to recursively call type_to_encodable_type, not implement custom serialization logic for a bunch of other types internally.

right form and with the right type.
"""

def __init__(self, type_check: bool = False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect all of this typechecking-related logic to live in a separate handler for a separate operation typecheck that is called by some other handler, not be part of ProgramSynthesis and its handler for Template.__call__.



@type_to_encodable_type.register(LexicalContext)
class EncodableLexicalContext(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want a EncodableLexicalContext of this form at all - we should never be decoding a LexicalContext instance from the output of an LLM and it probably shouldn't be possible to do so in the first place. The encoding logic should probably live somewhere else.

sources: list[str] = []
seen_names: set[str] = set()

for name, obj in context.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the cases in this loop should really be standalone Encodable implementations for the relevant types, e.g. types.ModuleType, collections.abc.Callable, Template, etc.

I would strongly suggest doing each of those in its own standalone PR to expedite code review and improve code quality.

return "\n\n".join(sources)

@classmethod
def decode(cls, source: str) -> LexicalContext:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, we should never be decodeing a LexicalContext - decode is for the result of an LLM call, and LLM calls should never be generating LexicalContexts.

Comment on lines 76 to 79
# NOTE: Register str explicitly to avoid WeakKeyDictionary cache issues with singledispatch
@type_to_encodable_type.register(str)
def _type_encodable_type_str[T](ty: type[T]) -> Encodable[T]:
return _type_encodable_type_base(ty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure from the comment what's meant to be happening here but I don't think this is correct? At the very least, whatever the issue is should probably be resolved closer to the source.

if not is_callable:
return fwd()

# Include the full lexical context - all functions, types, values available to synthesized code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to do this by default? It seems like it could quickly blow up the context. At the very least we shouldn't be incorporating all source code for every function, class and module by default.

@datvo06
Copy link
Author

datvo06 commented Dec 22, 2025

Some tests from Jax are failing. Would merging staging-llm fix this?

@eb8680
Copy link
Contributor

eb8680 commented Dec 22, 2025

@datvo06 can we please break out each of the new Encodables in this PR into smaller standalone PRs, one for each case? They still need work, but they aren't specific to synthesis or typechecking, and as always smaller PRs mean faster review cycles. Once those land, the remaining typechecking-specific changes here will be simple and straightforward.

Some tests from Jax are failing. Would merging staging-llm fix this?

They're failing on master too (#452) so I think we'll need to fix this on master and merge that into staging-llm first.

@datvo06
Copy link
Author

datvo06 commented Dec 22, 2025

@datvo06 can we please break out each of the new Encodables in this PR into smaller standalone PRs, one for each case? They still need work, but they aren't specific to synthesis or typechecking, and as always smaller PRs mean faster review cycles. Once those land, the remaining typechecking-specific changes here will be simple and straightforward.

Some tests from Jax are failing. Would merging staging-llm fix this?

They're failing on master too (#452) so I think we'll need to fix this on master and merge that into staging-llm first.

Thanks! I'm on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

effectful.handlers.llm.synthesis should type-check generated code

5 participants