Meandering Musings

Everything not fit to publish

A Practical Example of Metaprogramming in Ruby

| Comments

Practical is relative

What we will be implementing here actually exists in the standard library, so it is not all that practical from that standpoint.

It is, however, practical because learning how to write code that you use every day in library code is useful. Using a linked list from an API is scut work. Creating one along with the algorithms to operate on it teaches you something. Okay, a linked list is hardly a challenge, but knowing how it is created has value. It’s not only that it helps to teach how a specific data structure works, it’s so one can understand performance characteristics and write efficient algorithms that operate on that data structure. Just like learning to read and write bytes directly from and to a network connection or hard drive has value in helping to understand network programming and file systems respectively, even though it is a rare case where either is needed.

This will not create a linked list or raw sockets or a file system reader today, perhaps another day. Instead, this will describe how to create a ‘struct’ class. It is vaguely like a struct in C, except it is very flexible at runtime.

A struct in C is a very basic data structure. It holds a collection of related and various basic types.

A simple struct in C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <stdio.h>
#include <math.h>

struct Point {
  float x,y;
};

int main(int argc, char *argv[])
{
  struct Point p1,p2;
  float temp;

  p1.x=5;
  p1.y=7;

  p2.x=10;
  p2.y=22;

  temp = ((p2.x-p1.x)*(p2.x-p1.x)) + ((p2.y-p1.y) * (p2.y-p1.y));

  printf("Distance of the line is: %.4f\n", sqrt(temp));

  return 0;
}

Hopefully, there are no errors in that. I just realized I am a bitvery rusty with C. The types in the struct do not have to be the same, I just lack imagination.

Our example will be replicating the basic usages and hopefully adding additional, useful functionality, of a class called OpenStruct. It is used as a dynamic struct that you can add elements that work sort of like a struct, but it is a plain old Ruby object. Under the hood, we don’t have to worry about memory alignment, padding, and such things that the compiler can do for us using C.

Object variables interact with an accessor and a mutator, aka ‘getters’ and ‘setters’ because object variables in Ruby are private by default and can not have their visibility modifier changed.

These methods and their instance variable are created dynamically when you call them for the first time. The alternative is to create a class that has every possible variable name along with their getters and setters, have fun with that.

OpenStruct Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
require 'ostruct'

p1 = OpenStruct.new x: 5, y: 7
p2 = OpenStruct.new x: 10, y: 22

puts p1.y
7
#could also write it p2.x=(3) since x= is the method name
p2.x = 3
puts p2.x
=> 3

#can add more variables to each object dynamically
p1.name = 'some name'

puts p1
#<OpenStruct x=5, y=7, name="some name">

That shows the basic functionality and more can always be added as needed during runtime, both variables and methods. The point of this class is to be able to store data dynamically and use it dynamically and allow for changes during runtime. It generally isn’t a good idea to do this sort of thing to more concrete classes, so this is a good way to keep them obviously separate from the rest of your program. It is great for prototyping and if you have data sets where the names of the values will change during runtime.

Our class will not be a perfect copy. The variable names and values are stored as a hash in OpenStruct, we will create actual instance variables. There is not a huge functional difference and it is not really any more or less complex. This is an example of what more could be done and over time we will see if there are advantages or pitfalls in our implementation. How exciting!

Getting Started

The first step is figuring out a name and for me, that is often the hardest part of programming, I am not very creative. The reason OpenStruct is named that and not Struct is because there is a Struct class defined in the Ruby standard library. In Kotlin, there is something similar to OpenStruct called a data class. But Data is already defined in the standard library so we will go with Example::Struct, our class Struct resides in the Example module. As I said, I am not very creative. If I figure out a better module name, I will change it. Using modules to scope classes makes the problem of collisions a non-issue. Without modules, you might accidentally open an existing class that is automatically loaded which adds confusion.

For example, if we just named it Struct without the class residing in a unique module, the class Struct that already exists would be changed dynamically. That is a potential pitfall of many dynamic languages. If you use the same class name in the same scope as another in a language like Java, the compiler will barf on you. Ruby will assume that you really meant to open the existing class and get out of your way. Depending on the version of Ruby, you might get a warning message telling you about the potential conflict, especially if the conflicting name is not the same type as the existing name. All languages, no matter if it is statically or dynamically typed, have pros and cons, sharp edges, traps, and pitfalls. It just takes time and often walking into the traps to learn them and how to effectively leverage the benefits they bring.

Yes, if you were to write a class named String - which exists in all OO languages that I am aware of - you would get no warning or errors. The runtime would assume that you meant to open the existing String class for modification. This is not necessarily an error, there are good reasons to want to add functionality to existing classes.

Here is the shell of our soon-to-be-hopefully-awesome program.

Example::Struct
1
2
3
4
5
6
7
8
9
10
module Example
  class Struct

  end
end

# This doesn't do anything useful but can be instantiated
# and it has the basic functionality and knowledge of itself
# that all Ruby objects have.
my_struct = Data::Struct.new

If anyone is actually reading this and following along, use a decent text editor to create this. If you are on Windows, WordPad, or even File Explorer, they are liable to save it as ‘data.rb.txt’ which will work but is clunky. Ruby itself doesn’t care too much about file extensions, that is a Windows fetish. An actual text editor will save it as just a .rb file. You do not have to name it ‘data.rb’, the file name doesn’t have to match any name in the file itself but it is a good way to help document your application. I am using Ruby version 2.7.1, it should work on any version 2.0 or higher or possibly lower - there are few reasons to use any version under 2.3 on this date - and any reasonable alternative implementation such as JRuby. Traditionally, new major releases for Ruby come on Christmas day and hopefully, 3.0 is on track for this year. It looks to be a mostly great release and will be testing it out and writing about the cool aspects sometime after that date. I can’t guarantee it will work in 3.0. Most likely it will work without change, but no guarantees.

Initializing values and creating an accessor and mutator

The first functionality we will add is dynamically creating the getter and setter for a single variable and store the value as the code above show: p1.name = 'some name'. The reason we are doing this first is because passing keys and values in the initializer need to have this functionality defined. Creating the ability to create one getter and setter and storing the value at a time makes it trivial to write the initializer.

To do this we need to override method_missing, which is briefly discussed in my first Ruby ramblings. This method is called by the runtime if it can not respond to the message - aka method name - being passed to the object, and this is where we create two instance methods and an instance variable. In this case, the message is name =, there is also the case that just name is passed and needs to be accounted for in method_missing. It is very possible to just create the exact method and not its pair, but it is better to do both since this class will need both for each instance variable.

method_missing example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
module Example
  class Struct

    def method_missing m, *args, &block
       puts "#method called: #{m}"
       puts "#{args}"
    end
  end
end


my_struct = Example::Struct.new
my_struct.x = 7
#Output
method called: x=
[7] #array of arguments

The method_missing arguments are m which is the name of the method called that does not exist, it is stored as a symbol. *args are an array of the method arguments. The asterisk denotes that it can accept multiple arguments but it is accessed by dropping the ‘&’ and just using args. &block converts any passed blocks to a Proc, for most methods a block being passed in will not have a reference to it. It is simply called using the yield keyword. For our usage, only one method argument, at most, will be passed in so any additional arguments can be ignored. I left in the &block argument for an example of the full method signature but Ruby doesn’t require that you match the signature to override it. The block argument is useful for creating more complicated methods dynamically with custom code passed into it. When I write about an example of a DSL, it will become useful.

All an accessor - getter - method does is return the value of the variable we are wanting to get at. The variable name will match the method name with the variable name having a ‘@’ prepended to it. The ‘@’ is an access modifier that denotes its scope. In this case, it is an object variable. There are also variable names with ‘@@’ and ‘$’ prepended and also nothing prepended. They mean class, global, and local scope respectively. So the name method will look for the value of the variable called `@name'.

A mutator - setter - method will change the value of the relevant object variable to the value passed to it. It is named the same as the getter with a ‘=’ appended to it. So if we are looking to change the variable @name it will be called name=.

Create getter/setter and instance variable
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module Example
  class Struct
    def method_missing m, *args
      new_method = m.to_s.delete '='
      if new_method.split(' ').size==1
        create_methods new_method.strip, args[0]
      else
        raise "Invalid method name: #{new_method}"
      end
    end
    private
    def create_methods m, val
      self.class.attr_accessor m
      instance_variable_set "@#{m}",val
    end
  end
end

When a currently unknown method is called on the Struct object, the runtime calls method_missing and things start to happen. If it is not overridden, the default method_missing simply raises a NoMethodError, which will cause execution to halt.

Like all languages, method names have rules, one of them is that the name can not have multiple words separated by whitespace. This typically isn’t possible to call as the runtime will assume the second word is an argument and will raise an except for improper arguments because they are not delimited by a comma. The ‘=’ is removed via m.to_s.delete '=' because it will cause problems because it would create an instance variable such as @name= which is a syntax error. If there is an ‘=’ anywhere else in the name, it will delete that and cause problems down the line but are ignoring that for now. attr_accessor will create the method name=, if we don’t initially remove the ‘=’, it would be name==. The to_s call is necessary because in most cases the method name will be passed in as a symbol which are basically interned strings, and as such, it has no methods to mutate itself.

There also can not be a string with leading or trailing whitespace so it is just trimmed out using strip.

Certainly, there are lots of other things to look out for such as ‘!’ and ‘?’ are appended, which are legal but don’t make sense for our usage, or illegal characters in a method or variable name. Of course, any invalid characters will cause one or the other line in create_methods to raise an exception. A more real-world example will add in something to validate the name, that something likely being a regular expression - aka regex. I just put an article on the mathematics, theory, and usage of regular expressions on my rather large to-write list.

Like one of my favorite professors says in some of his programming assignments: “Assume user utopia.” User utopia assumes that the user won’t do anything unexpected, crazy, or whathaveyou, to varying degrees. This is completely divorced from reality but helps in learning basic concepts. It is like a beginning physics class allowing students to disregard friction and air resistance in their problem solving, especially before they are even covered in class. That way the implementation won’t get messy obscuring basic metaprogramming concepts but there are at least a few small examples of data validation.

As mentioned here, attr_accessor is a class-level method that creates both an accessor and mutator automatically with the name of the method passed - as a symbol or string - and it also creates an instance variable of the same name. This makes it super simple to accomplish our task.

There are several ways to call attr_accessor, I chose self.class which simply calls it on the class in scope, which is obviously Example::Struct. Example::Struct.attr_accessor also works.

The method instance_variable_set does exactly what the name implies. This method is public and can be called from outside the object. It accepts two arguments, the name of the variable in string form and the value to set it to.

What if attr_accessor did not exist? It would make this more verbose but it is still possible to accomplish our task. This how it can be done when dynamically creating more complex methods than getters and setters.

Pretend attr_accessor isn't a thing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
module Example
  class Struct
   def method_missing m, *args
     new_method = m.to_s.delete '='
      if new_method.split(' ').size==1
        new_method.strip!
        create_setter new_method, args[0]
        create_getter new_method
      else
        raise "Invalid method name: #{new_method}"
      end
    end

    private
    def create_setter method_name,value=nil
        instance_variable_set "@#{method_name}",value
        self.class.send :define_method, "#{method_name}=" do |value|
           instance_variable_set "@#{method_name}",value
        end
    end

    def create_getter method_name
        self.class.send :define_method, method_name do
            instance_variable_get "@#{method_name}"
        end
    end

  end
end

That is still not too bad but it might require a few explanations.

Off the top of my head, I can’t think of an instance_variable_set and instance_variable_get replacement. I believe that would require digging into the langauge itself and basically reimplement those methods, which is not worth the hassle.

method_missing starts out the same, removing any ‘=’ at the end of the name and making sure it is a single word. strip! is called instead of strip because we are calling two methods with new_method being passed in. In Ruby, methods appended with an exclamation mark denotes that the method is ‘dangerous’ or ‘destructive’. Typically that means that the object is mutated in place, so it will not return a copy of the object. As far as I know, these methods mean the same thing in the entire Ruby standard library but other libraries such as ActiveRecord have other definitions of dangerous. Typically, it is better to use non-destructive forms of the methods but in this case, it is just easier to type and I am lazy. We would need to either write new_method=new_method.strip or put new_method.strip in the calls to create_setter and create_getter or call strip in both methods. “Don’t repeat yourself” is an important concept to keep in mind.

In create_setter, the first thing it does is create and set the instance variable. The next step is to create the method using define_method. This takes two arguments. The first is the argument, it can accept a string or symbol but since we need to append ‘=’ to it, it has to be a string so we use string interpolation. #{method_name} is automatically converted to a string for us.

The second argument is a block, denoted by either do end or {}. The version with the brackets is typically used for single-line blocks but is not enforced. attr_accessor would automatically create this block but we need to do it manually. A block is simply an anonymous method(or function in other languages), which is invoked by the use of yield or call depending on the context. Inside create_setter the block is not invoked, which is why we need to set it elsewhere in this method for the initial creation. The runtime will automatically execute the block whenever name= is invoked after creation. |value| is the argument that name= will accept. Anything within the pipes is considered arguments for the block.

Adding in the initialize method

So far everything should be pretty straightforward, hopefully, and now that the creation of the methods and instance variables are in a private method, creating an initializer that accepts a hash of names and values to create the getter and setter methods is trivial. This way we can add multiple instance variables in one call.

The whole program so far
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
module Example
  class Struct
    def initialize name_vals={}
      name_vals.each_pair {|name,val| method_missing(name, val)}
    end

    def method_missing method_name, *args
      method_name = method_name.to_s.delete '='
      if method_name.split(' ').size==1
        create_methods method_name.strip, args[0]
      else
       #will add proper and complete method validation in another article
       raise "Invalid method name: #{new_method}"
      end
    end
    private
    def create_methods m, val
      self.class.attr_accessor m
      instance_variable_set "@#{m}",val
    end

    private
    def create_method m, val
      self.class.attr_accessor m
      instance_variable_set "@#{m}",val unless val.nil?
    end
  end
 end

Everything is the same except for the addition of the initialize method, which is not surprising, the object initializer. new is the constructor which exists in BasicObject the first class in the inheritance chain. initialize accepts a hashtable but if one is not passed, an empty hash table is created. This is called a default argument variable. {} is the literal for a hash and is equivalent to Hash.new. each_pair is called on the hash which iterates over the hash and executes the block {|name,val| method_missing(name, val)} on each iteration. The arguments are the current key and value in the hash.

The way the initializer accepts arguments is a little different than OpenStruct for no other reason than this is slightly easier with less explanation, but the end result is the same. In a future article it will be brought into line but for now this is perfectly fine. The way OpenStruct works is with a nifty feature called keyword arguments. Semantically, they are the same and the way you call it is the same but I uncharacteristically choose the slightly more verbose way to do this.

Now, because we are lazy - like every proper programmer - we make use of the existing code in method_missing by simply calling it. It would be less efficient if we made the runtime call it automatically for each name/value pair by transforming the name/value into a direct invocation of name = value but that is very inefficient since we would have to use send to pass the name of the method and then let the runtime search the method table for it and when not found call method_missing for us. Initially, it might seem like it is just good old fashioned laziness that all good programmer’s prize to make the runtime do more work, but it actually takes more effort to get the runtime to behave properly in this case, so it is simply bad design.

If there is an invalid name, an exception is raised and a possible flaw of our implementation is that any methods and variables created from the hash before the error will still exist. There are ways to manage that automatically but would clutter the example. I am not sure that is something that would need to be addressed in a real-world implementation. It is the sort of thing that could cause hours-long nerd arguments.

It should be noted that this needs testing to flesh out any traps and pitfalls but it should suffice for this example.

This program is an example of a type of programming called metaprogramming. What that means is that this is an example of ‘code that writes code’. It is extremely powerful and usually concise, especially in Ruby. Other languages such as Elixir and the Lisp family of languages have even more powerful metaprogramming abilities. Outside of that, you would be hard-pressed to name a mainstream language that has more powerful metaprogramming abilities. You could write something very vaguely similar in Java but it would require a ton of esoteric code and/or massive use of strings. It would be a clunky mess, which is an apt description of that language.

There are certainly extremely powerful Ruby libraries such as Active Record and RSpec that makes use of metaprogramming and the implementation in these wonderful libraries are certainly a bit convoluted - let’s face it, a powerful database ORM and full testing framework are never going to have a simple implementation - and can be difficult to follow if you need to read the code. That generally isn’t a huge issue for users of the library because the interface is extremely clean and concise and simple to use for most use cases, and it works well enough that it is unlikely you will need to read the library code. If you can’t find a way to do something in Active Record, you can easily use SQL and Active Record will pass it straight through to the database.

If you read up on libraries like Active Record, you will see complaints that it does a lot of “magic” and that somehow makes them bad libraries to use. They use the term magic, to denote that it does mysterious things, and is a black box. Neither is true and like tricks from a magician, it only looks like magic if you don’t understand the mechanics of the trick, but is obviously not real magic. Many Ruby libraries seem like magic to people that don’t understand the Ruby object model and its metaprogramming facilities. Of course, at some point, the abstractions below the point you are working at will be somewhat of a black box. I understand the basics of the CPU - cache, adders, pipeline, out-of-order execution, etc but as a complete working unit, it is mostly a black box to me. I am also nowhere near an expert of the Ruby VM internals - but I understand the gist. There is no such thing as magic in programming, no matter how neat the ‘trick’ looks.

What is great about our little class is that it can deal with an infinite number of changes in our dataset during runtime with absolutely no need to change this code.

Anyway, these 25 lines give us the basic functionality of OpenStruct, and the ‘manual’ version was not really that much longer. Of course, there is more functionality to fill out so they can be altered further, printed, and whatnot. The class we are copying has more functionality that we don’t need to fill in for this example.

Adding the ability to remove methods and instance variables

Dynamically adding methods and instance methods get us most of the way to match the very basic functionality of OpenStruct, but to keep our objects clean when the data changes, the ability to remove unwanted methods and instance variables would add a lot to our humble little class.

Removing methods and instance variables
1
2
3
4
5
   #omitting the rest of the class code
   def remove_data name
      remove_instance_variable "@#{name}"
      self.class.remove_method name
   end

We remove the variable and since remove_method is a class method, not an object method, we need the qualifiers self.class to access it to get to the class object named Struct. I really need to do an article on the object model of Ruby, but for this example just know that Struct is an object and its parent is Class. remove_method is not typically used this way, neither is attr_accessor.

Normal usages for remove_method and attr_accessor
1
2
3
4
5
6
7
8
class MyClass
  attr_accessor :var1, :var2
  remove_method :send #we want to remove send() for some reason...

  def initialize
  # blah blah blah
   end
end

In this context, they very much look like keywords but they are methods that accept symbols. There are very few keywords and operators in Ruby, most are actually methods.

Usage Example

Complete Struct Class(so far)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 module Example
  class Struct
    def initialize name_vals={}
      name_vals.each_pair {|name,val| method_missing(name, val)}
    end

  def method_missing method_name, *args
    method_name = method_name.to_s.delete '='
    if method_name.split(' ').size==1
      create_methods method_name.strip, args[0]
    else
     raise "Invalid method name: #{new_method}"
    end
  end
  def remove_data name
    remove_instance_variable "@#{name}"
    self.class.remove_method name
  end

  private
  def create_method m, val
    self.class.attr_accessor m
    instance_variable_set "@#{m}",val unless val.nil?
  end
 end
end

We should test it a little bit and see if it actually works. Instead of a full testing setup, this will just verify that it works for basic cases and future articles will test it more thoroughly and in a repeatable way.

Code to test out our class
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
require 'ourstruct'

struct = Example::Struct.new
struct1 = Example::Struct.new val: 5, str: 'test'

struct.value = 15
puts struct.value
=> 15

struct1.instance_variables
=>[:@val, :@str]

struct1.str = [:new,:array, 'of', 'stuff']
puts struct1.str
=> new
   array
   of
   stuff


struct1.remove_data :str
struct1.instance_variables
=>[:@val]

To run this, put our struct code in a file named ourstruct.rb or name whatever you like or paste it into the REPL. The test code can in the REPL, the ourstruct.rb file, or a different file in the same directory. If you made a file for testing called teststruct.rb you can invoke it on the command prompt, making sure that the prompt is in the correct directory, and typing ruby teststruct.rb.

The => just denotes the output and is not part of the program. If you blindly copy/paste the runtime will spew out error messages.

There is much more to do to test and expand our Struct class and it will be written about later on, along with a usage example based on a silly diversion I do during football season. It will necessitate more functionality and also show how to extend this class via polymorphism. Very exciting, no?

Comments