Adhoc derive macro facility, for easily autogenerating code from struct definitions¶

Project information and coordinates¶

Main cooordination point: IRC: irc.oftc.net ##tor-hackweek-derive-adhoc

Proposer/coordinator: Ian Jackson aka @Diziet

Team: Diziet, nickm (partial), trinity-1686a (partial, maybe)

gitlab: https://gitlab.torproject.org/Diziet/rust-derive-adhoc

If you are interested, please come onto irc and introduce yourself. You're welcome to join the ##tor-hackweek-derive-adhoc channel. Or you can ping Diziet on #tor-dev.

Hackweek link: https://hackweek.onionize.space/hackweek/talk/EVCPVP/

We may use voice chat if we want to talk about how we're going to do things etc., if all the participants are OK with that.

Background¶

Rust has two primary macro systems: macro_rules, which is easy to use, and proc macros, which are powerful: in particular, they can derive: autogenerate code from data structure definitions.

Project goal and outline¶

This project will provide a facility which can be used to easily autogenerate code, ad-hoc, based on data structures, without having to write proc macro code (which is hard, and unsuitable for one-off use cases) and without having to try to write macro_rules pattern matchers for (sort-of-) struct definitions.

The project proposer has a plan for how to achieve this.

There are motivating use cases in the Arti code base, such as:

The channel operational parameters struct is wrapped up in a macro call so that a parameters update struct, and support code, can be automatically generated.
Where Arti's configuration contains lists of things, we use macros to generate the types and accessors, but the arrangements require us to recapitulate the struct field definitions.

Skills and resources¶

Collaborators should have prior experience of Rust and a Rust development environment.

We will be writing proc macro code, but you can learn that on the job :-). Indeed, this might be a fun opportunity to play with proc macros.

Current status and next steps¶

Have discovered a problem with using an attribute macro; we must use a function-like macro for the invocation, which is less nice. See below.

Next steps:

 * Decide on an invocation syntax wombat2
 * Update the plan in NOTES.txt and the commentary in macros.rs and the README.md
    to correspond to the new reality
 * Complete the proof of concept  It seems to work now

Settle on template syntax, at least for MVP, with room for expansion
Move derive_adhoc_expand impl into its own module, and have it talk only about proc_macro2::TokenStream, like with derive_adhoc.
Actually have derive_adoc_expand parse the input, to syn::DeriveInput I guess

Big ticket todos: * Implement some kind of actual templating engine in derive_adhoc_expand that isn't total cheese * Add features to templating engine (filtering of things like attributes; filtering of fields according to attribute presence, etc.) * Reuseable macros (see NOTES.txt) * Tests (trybuild?, and unit tests for the templater) * Docs

Example of what using this facility might look like:¶

    #[derive(Adhoc)]
    pub struct Config {
        enabled: bool,
        padding: PaddingParameters,
    }

    #[derive_adhoc(ChannelsParams)]
    pub struct ChannelsParamsUpdates {
        $(
            pub(crate) $field: Option<$ty>,
        )*
    }

Update: this doesn't work because #[derive_adhoc] is an attribute macro and attribute macros can only apply to "items" and that means the compiler has tried to parse it already.

Experimentation reveals the following syntaxes:

/// Possible invocation ssyntaxes

derive_adhoc!{
    #[derive_adhoc(ChannelsParams)] // this is fake, not really an attribute
    type Wombat1 = $ Struct;
}

derive_adhoc!{
    ChannelsParams:
    type Wombat2 = $ Struct;
}

#[derive_adhoc(ChannelsParams)] x!{
    type Wombat3 = $ Struct;
}

derive_adhoc_apply_ChannelsParams!{
    type Wombat4 = $ Struct;
}

Note in all of these caes the "type ..." bit is standing in for some more full thing that's probably a fn or a struct or something. Now at least we can have several in the same macro invocation, rather than having to repeat the #[derive_adhoc(ChannelsParams)]

I'm not sure I understand how those would actually look if you wrote the whole thing out here; there may be too much abbreviation in the above examples for me to grok it.

I think that the ones with Wombat2 is least ugly though. Other ideas (no idea if they work) based on Wombat2:

derive_adhoc!{

      ChannelsParams =>

       type Wombat2b = $ Struct;

}

derive_adhoc!{
    ChannelsParams => {
        type  Wombat 2c = $ Struct;
    }
}

[ stuff repeated in README.md deleted from pad ]

iteration control example¶

[derive(Adhoc)]¶

enum Extremity { Hand { num_fingers: usize }, Foot { num_toes: usize }, Nose, } derive_adhoc!{ Extremity: $( println!("variant name {:?} field name {:?}", stringify!($vname), stringify!($fname)); ) }

Currently fails

error: inconsistent repetition depth: firstly, variants inferred here --> src/bin/second.rs:85:34 | 85 | stringify!($fname)); | ^^^^^

error: inconsistent repetition depth: secondly, fields inferred here --> src/bin/second.rs:85:34 | 85 | stringify!($fname)); | ^^^^^

Wacky ideas¶

Annotations on source structure¶

Allow additional anotations on the fields on the source structure that can get interpreted somehow by the later #[derive_adhoc() application]

#[derive(Adhoc)]
#[adhoc(antipodean)]
struct Biome {
    #[adhoc(wobbly, piquant)]
    wombats: u8,
    #[adhoc(moist, type="monotreme")]
    platypodes: Option<String>,
}

Turns into ${field:wobbly} or something I guess ? You want $( thing thing ${field:wobbly} )? maybe?

Lots of questions here about what the template syntax for this stuff looks like. But yes we should definitely have this !

I have no idea what the template syntax would look like here, but I am excited :)

I think one of the things that would be worth padding is possible syntaxes. There are so many options.

You also want things like ${field_attr=serde) or something?

(padding?) putting in pad

I think that might be a good idea though I dunno ...

expansion template syntax (and semantics) ideas¶

$field_attrs gives all attrs vs $field_attr expands to one attr and must be in $( )*
${somevariable:...conditions?modifiers??...}
filtering, ${field_attr=serde} ?
separate out first para of doc comment
conditional things for if the thing is present, eg, those custom attrs, if-ish? ${field_attr=spicy}?( ... ) errrr not sure this is great
needs a quick syntax for testing #[adhoc(spicy)]
also std macro $( )* etc. cannot do "defaulting" ie "if this doesn't expand to anything, expand this other thing instead" mabye $!{field_attr=spicy}, which expands 0 or 1 times to nothing?
how much to reproduce the magical expansion count thing that macro_rules does? Maybe it's familiar but it's kinda funky

Constraints and thoughts:

We can define our own identifiers here and use whatever case
Probably we should use $name for basic expansions
We cannot suffix it, $name:something because ambiguity with $name : something
Probably we should use ${name... additional options or modifiers} ?
Should we keep $( for repetition? It's a clumsy syntax but everyone is familiar with it
It would be nice for things like attributes to have both a thing that expands to the list, and a thing that expands to each one for handling them more fine-grained-ly
We could support conditionals via things that expand 0 or 1 times so you can do something like $( spicy_field: SpiceThing< ${attr=spice_type} >, )* which would get only fields #[adhoc(spice_type=Garlic)] etc.
But maybe we want "normal" conditionals; if they are repetitions with $( ) then the repetition count goes after ) which is crazy, so we need a syntax starting ${
Are we going to expect users to use paste! ? We could reimplement paste! or always call it or something. Expecting users to write it themselves gives another layer of { } nesting which is quite undesirable.

[derive(Adhoc)]¶

[derive_adhoc(ListBuilderAccessors)]¶

struct SomeArtiConfig { #[adhoc(config_list="Fallback")] fallbacks: FallbackListBuilder,

instead of define_list_builder_accessors!,

define_derive_adhoc!{ ListBuilderAccessors =

$(
  /// but here we need to conditionalise this on the #[adhoc(config_list)]
  /// so maybe
  ${attr=config_list}   // expands to nothing once or zero times?  But that means multiple repetitions in the same $( )* which we are now multiplying, maybe it's an ad-hoc condition
pub fn $field(&mut self, Vec< ${attr:config_list   do we tokenise the literal string or what  } >) -> &mut Self {... }
 )*

                            ^ How do we know to say "Fallback" here?
                            ^ does "config_list" need to be "config_list(item="Fallback")"?
                            Well we could
                               Vec< [< ${field_type:regexp:.*ListBuilder:$&} >] or some thing less nightmare

^ IMO that's too magic.  I know we want to do things here that can't be done with hygenic macros, but that's downright perl6.

  lol


I think *adding* should be allowed (a la paste!)

Adding is naughty, but not evil. So sure.

Or maybe ${if:attr:config_list}( $) but this is kind of weird because the scoping of attr:config_list is only really sensible inside $( ) which repeats field

Maybe this whole magical repetition matching thing from macro_rules is just too funky and we should have

$fields(

  or something

I think that the magical part that seems inconvenient here is creating a new ${} thing with its own multiplicity in 0/1 , and using that to control whether an expansion happens. I think it's probably better to somehow have an expand-or-don't thing.

${if:attr=config_list: $(

)}

dunno.

or maybe $IF:

one danger is that this wants to turn into a full programming language. :p

Yes. But OTOH we want features. I think I'm ok with "no regexps or other kinds of adhoc string disassembly". As the boundary.

I think we'll want "attr is present", "attr is absent", "attr is present with value = CONST". and "attr is present with value, bind that value to $var". The latter (binding thing) may be a thing that we can expand.

I think we'll need NOT and AND, and should probably have OR for convenience.

How about we say "empty string is false, nonempty is true" and then we can have booleans which do short circuit and an explicit test for empty separately from expanding it And I guess #[adhoc(list)] causes ${attr:list} to expand to something nonempty

"booleans aren't booleans" historically leads to confusion once you need to distinguish false from "" or 0 or whatever. It does sound convenient to implement tho

The advantage of doing that is we don't need a separate type system and/or two separate expansion sublanguages.

Hmm. So the alternative would be to say there's one lanaguage and one type; that the type is something like enum {bool,string}, and then we either define the operations for weird cases like and(string,bool) to be something where strings are all true, or we define them to expand to error.

Not string, anyway: TokenStream (sure)

Let's park this a moment and think about what kind of modifiers we are going to want to have on these expansion thingies.

We might want some conversion things, eg "parse this literal string as tokens" (since there are limits to what tokens you can put in the attribute reasonably and sometimes the spans come out funny, which is why serde and builder and people let you specify strings for exprs). Do we want "parse this thing as " ?

^ I think this isn't madness, since the macro system already has ty/tt/expr/etc Right. We might want a richer set of things. We have to decide on naming. Probably not syn's? Or we could use syn's but support only a small subset.

I don't care strongly about that.  I'd like to use the current macros-by-example names as a subset of the names here, to leverage existing knowledge.  Sure, that seems reasonably.  (Notably the two namespaces are disjoint because of case...)

We will definitely want some modifiers for handling generics (generics happen both for the whole struct, and for field types). We need "extract the parameters names" and "extract the where clauses" separately, so that you can write your impls or whatever.

We want to be able to say "just paste all the attributes here" as well as "iterate over the attributes".

$xyz:meta seems appropriate for that?

Do we want to be able to iterate in stacking orders that do not correspond to those in the actual struct definition ? I guess maybe. ${foreach:field( ${foreach:field(

Conceivably but maybe we don't need to figure it out day 1 v1.

I think we need a model of iteration that is not insane. Frankly the macro_rules one is ... well, very confusing at the very least. IMO it's fine when you're using it in simple ways, and gets confusing when you're doing fancy tricks.

Let's talk next steps? I think we need some examples here of trivial things, simpler than list-config-foobar. Maybe a trivial Hash implementation, or Debug, or Visitable or something?

I think examples are a great idea but I don't much think in examples as you have noticed :-). How about you make a file in-tree with examples of things you think should work? I mean, just write it out as if it had been magically implemented.

sure. How about playground/ ... ? Sure

I would like to play about writing some kind of template-interpretation code and I think that would make some of the questions concrete.

The big question is the iteration model which is the whole computation model and I have an idea: $( )* infers the loop control variable from what is inside, somehow; and you can also write $for_fields( ) (syntax to be argued over later)

(digression: Not sure what the =$var is for - are you imagining comparing one bit of the derive input with another? no. I'm imagining a way to do #[adhoc(config_list="Fallback")] and using the type "Fallback" in the expansion. Oh your value = $var was binding, not use. That should be done ${attr:config_list} or something. )

possible additional things to think about now: * What if we want to apply two different derive_adhocs() to a struct? do we need to namespace the #[adhoc(attribute)]s?

I think adhoc attributes are just a sub-namespace of normal derive attributes, so the answer is yes. That is, it's the users' responsibility not to invent clashing attribute names.

well, consider:

#[derive(Builder, Serialize)]
struct Config {
    #[builder(default)]
    #[serde(default]
    username: String
}

and contrast

#derive[Adhoc]

#[derive_adhoc(MyBuilder, MySerialize)]

struct Config {

#[adhoc(BuilderDefault, SerializeDefault)]

// ^ like that?  yes only conventionally sname case I think

#[adhoc(builder(default))]

#[adhoc(serde(default))]

username: String

}

If we're doing it like that, we could require that the identifier matches something. Like, if you're doing MyBuilder then you say "I want "builder"" and then you get to look at adhoc(builder(...)) but not adhoc(serde(...)).

You get to look at all of them, anyway, since actual derive macros can see each others' attributes. I think that's good, but it oughtn't to be the default. So a convenience thing for "my attributes" which ignores the others. And a "$all_attrs".

NB this namespace thing is only for "defined" reuseable adhoc derives, not adhoc adhoc derives. The adhoc ones don't have a name, but I think that's fine.

ack

Applications to arti¶

Let's list a few external proc-macro type crates that Arti uses, and consider whether they could be profitably rewritten with this.

Maybe. I reckon this will be slow to compile and harder to read than some existing crate that users are familiar with. But we should consider it for "rare" things where we found a thing under a rock.

The other possibility would be the config stuff but I don't want to disturb all of that. There are definitely bits round the edges where this adhocery is going to be useful.

Right; I'm not proposing (for example) to actually replace derive_builder or educe or derive_more with this. But I am thinking it would be clever to ask: 'what additional features would this need in order to be able to emulate those crates?'

Ah yes, very good thinking.

(if this is good enough maybe it --or one of its descendants-- gets eventually rolled into Rust as a standard feature.)

testing¶

Does one test proc macros by having input-and-expected-output pairs? That could be fun.

Yes. It's super fragile.

There's also a crate (name I've forgotten) you use for this, and it can check that "this thing doesn't compile and produces such-and-such an error". That's even more fragile as the wrong compiler version => wrong message.

Some way to reuse #[derive_adhoc] blocks.¶

Suppose that somebody wants to define a #[derive_adhoc] block in a way that it can be used ... multiple times! ... or by other crates! How would that look?

Eg, Instead of saying ...

    #[derive_adhoc(ChannelsParams)]
    pub struct ChannelsParamsUpdates {
        $(
            pub(crate) $field: Option<$ty>,
        )*
    }
    #[derive_adhoc(CircsParams)]
    pub struct CircsParamsUpdates {

        $(
            pub(crate) $field: Option<$ty>,
        )*
    }

maybe there could be ... ```#[define_adhoc(UpdatesDerivation out_struct)] pub struct $out_struct { $( pub(crate) $field: Option<$ty>, )* }

[derive_adhoc_with(UpdatesDerivation ChannelsParamsUpdates ChannelsParams)]¶

[derive_adhoc_with(UpdatesDerivation CircsParamsUpdates CircsParams)]¶

```

or something like that that you could export and apply elsewhere...?

(This syntax above is probably awful for implementor and user alike)

We definitely want something like this. I like the #[define_adhoc] name.

Then you say

[derive(Adhoc)]¶

[derive_adhoc_apply(UpdatesDerivation)] // ?? naming ??¶

struct Thing { }

?

Doing this involves plumbing the macrology the other way round but the core expansion routine is the same, I think.