Syntax
Most of the syntax is the same as Smallalk-80. For example, comments
are contained within double quotes:
"this is ignored"
The few minor additions to Smalltalk-80 syntax are to accomodate
compilation from plain text files, variadic blocks (and methods), an
expanded range of literal types, and direct access to non-printing
characters in Character and String literals.
Programs are translated one source file (with zero or more
additional source files being imported) at a time. To the compiler,
this body of code is called a translation unit. The compiler
always processes one complete translation unit at a time, and
currently (this is a temporary limitation) a translation unit must
contain an entire program (all object and method definitions required,
with no external or unresolved references).
A translation unit consists of a sequence of definitions
and imperatives. Definitions either either create a new
prototype or add a method to an existing prototype. Imperatives are
sequences of code that are executed in-order when the program is run.
Imperatives
A literal block can appear at the top-level (outside any other kind of
definition):
[ statements ]
The code within the block is executed at the moment 'control'
nominally reaches the block within the source file at runtime. This
is handy for initialising complex data structures (think of it as a
means to obtain behaviour similar to class initialisation methods) and
also for starting the whole program in motion at the end of the source
(something akin to a 'main' method, if you like).
Top-level imperatives can also take the form:
{ directive optionalArguments... }
Currently only one directive is recognised:
{ import name }
will search for a file called 'name.st' and substitute its
contents in place of the directive. (The Scanner is currently pretty
stupid and wants to see precisely '{ import ', with one space
after the '{' and one space after 'import', in order to
recognise this directive.)
Prototype definitions
Two top-level forms provide for the creation of new prototypes:
name ( listOfSlots )
creates a new 'root' prototype (it has no parent, or 'delegate') and
binds it to name. The prototype contains zero or more named
slots, similar to instance variables. The definition could be read
as: "name is listOfSlots".
Such a prototype has no useful behaviour (it can't even clone
itself to create useful application objects). Adding a minimum of
primitive behaviour (e.g., cloning) is the first thing you'll want to
do to such an object.
The second form:
name : parent ( listOfSlots )
is similar, except the new prototype delegates to the
named parent object and inherits the parent object's
slots before adding its own. Such definitions could be read as:
"name extends parent with listOfSlots".
(This is every bit as bogus as a single inheritance mechanism
being used to share state and behaviour, but I'm still trying to
figure out how to separate delegation from the sharing of state
without sacrificing performance. Only allowing slots to be accessed
by name in their defining prototype, forcing inherited slots to be
accessed by message send, is probably the way to go. Better still,
making all state accesses into message sends -- especially
assignments.)
Method definitions
Methods are just 'named blocks', tied to a particular prototype only
by permitting direct access to the state within that prototype.
(Therein lies yet another reason to abolish direct access to state.)
This is reflected in the syntax of the top-level form for adding
methods (named blocks) to a prototype:
name pattern [ statements ]
where name identifies a prototype object (defined as described
above), pattern looks (more or less) like a Smallalk-80 message
pattern, and sequence (notice the brackets) is the body of a
block. (The block can take arguments, but these are contained in
the pattern and so the syntax prohibits explicit block
arguments from appearing at the start of the statements
sequence. This restriction applies only to blocks used as method
implementations.)
The pattern component can be a unary, binary or keyword
message pattern. Extending Smalltalk's fixed-arity messages, blocks
associated with keyword patterns can be variadic (accomodating
zero or more additional arguments, beyond those associated with
explicit keywords). This is indicated by an ellipsis in the message
pattern. Expanding the pattern part of the above syntax, the
four valid forms of message pattern are therefore:
name unarySelector [ statements ]
name binarySelector argumentName [ statements ]
name keywords: arguments [ statements ]
name keywords: arguments ... [ statements ]
where 'keywords: arguments' are 'keyword: argument'
pairs, repeated as many times as necessary, and '...' means
an explicit ellipsis (and does not mean 'more
keywords/arguments, as required'). (See the discussion on message
sends below for the syntax of sending a message with optional 'rest'
arguments.)
(Simply for lack of time, there is currently no friendly syntax to
recover the 'rest' arguments within the body of a message. Wizards,
however, can easily recover these arguments by writing some low-level
'magic' inside an external block. There will be an example of
this later.)
Blocks
Blocks are similar to Smalltalk-80 blocks, but allow for local
(block-level) temporaries:
[ statements ]
[ :arguments | statements ]
[ | temporaries | statements ]
[ :arguments | | temporaries | statements ]
Both arguments and temporaries are strictly local to the block and
will not conflict (other than in name) with similarly-named arguments
or temporaries in lexically disjoint blocks. The compiler currently
disallows the shadowing of names.
(This means that you cannot set a method-level temporary by naming
it as a block argument. It also means two blocks in the same method
that share an argument or temporary name will each refer to a
completely different value, regardless of the common name.)
Assignment
The Smaltalk-80 'left arrow' assignment operator is gone. The
corresponding form is:
identifier := expression
with the ':=' operator having the lowest precedence of any operator
(including keyword message sends) and associating from left to right.
Message sends
Are similar to Smalltalk-80: unary, binary and keyword messages have
the same precedence as in Smalltalk-80 and cascaded messages (with the
';' operator) work in exactly the same manner.
primary unarySelector
unaryMessage binarySelector unaryMessage
binaryMessage keywords: binaryMessages
receiver messageSend ; messageSend
(Whether or not the binary selectors should be treated differently,
introducing several levels of implicit precedence based on the
operator name to provide the traditional arithmetic order of
evaluation, would also be a possibility.)
Extending the Smalltalk-80 syntax is the ability to send a keyword
message with 'anonymous' arguments. (See the discussion above on
variadic message patterns.) The simplest possible change that would
allow this is to drop the name part of the keyword (but keep the
colon):
receiver keywords: arguments : anonymousArgument
with as many ': argument' pairs as required. (Anonymous
arguments can only appear after arguments associated with a proper
keyword; no more 'keyword: argument' pairs are allowed after
the first ': anonymousArgument' occuring in a keyword message
send.)
Parentheses
If you don't like the precedence defined by unary, binary, and keyword
sends, put parentheses around expressions to force evaluation order.
Literals
Literals are immutable. In other words: literals created by the
compiler cannot be modified by the program. This was done for two
reaons:
- It's cleaner, making the semantics simpler to explain (no more
confusing behaviour when a program inadvertently modifies a literal
causing some method somplace to have behaviour different to that
implied by its source code).
- My C compiler puts literals in a read-only data section, at one
point causing me a certain amount of stress while debugging what was
ultimately correct code but containing an attempt to write into a
read-only location. If all compiler-generated literals are immutable
then this particular platform idiosyncracy ceases to be of any concern
whatsoever.
A handful of new classes (ImmutableArray, ImmutableByteArray,
ImmutableWordArray) are present in the library to accomodate the
above.
In addition to literal Arrays
#( elements )
we also have literal WordArrays
#{ integers }
and ByteArrays
#[ integers ]
(where each integer must be between 0 and 255). In Array
literals, nested Array, ByteArray and WordArray literals can appear
without the initial '#' (although one can be supplied if you like).
Integer literals themselves are in decimal by default, with the
usual
radixInteger r valueInteger
syntax supported. For the hackers out there, I saw no reason to avoid supporting
0xvalueInteger
for hexadecimal integers too. Digits greater than '9' in hexadecimal
literals (in either of the above syntaxes) or in literals of any base
greater than ten (in the 'r' syntax) can be specified using upper- or
lower-case letters.
Smalltalk-80 Character literals are supported:
$character
as are non-printing Characters either by mnemonic or by explicit value
(following the ANSI 'escape sequence' conventions):
| syntax | asciiValue | ASCII designation |
| $\a | 7 | bel (alert) |
| $\b | 8 | bs (backspace) |
| $\t | 9 | ht (horizontal tab) |
| $\n | 10 | nl (newline) |
| $\v | 11 | vt (vertical tab) |
| $\f | 12 | np (new page, or form feed) |
| $\r | 13 | cr (carriage return) |
| $\e | 27 | esc (escape) |
| $\\ | 92 | \ (a single backslash character) |
(Extended mnemonic names such as '$\newline' for '$\n' could easily be
supported too.) In the event that a non-printing character literal
not in the above list is required, a generic octal escape is provided:
$\octalNumber
where octalNumber is precisely three (no more, no less) octal
digits in the range '000' to '377' specifying the value of the
Character. In other words, '$\n' and '$\012' are the same Character,
and '$\000' is the 'nul' Character (ascii value zero).
String literals obey much the same rules as Smalltalk-80.
Adjacent String literals:
'like''this'
are concatenated with an intervening single quote:
like'this
However, the conventions that apply to '\' in escaping single
Character literals also apply to characters within a String. You
could write a String literal that contains two lines, each terminated
by a newline with the whole String terminated by a nul Character:
'like\nthis\n\000'
(I was very, very tempted to make consecutive String literals simply
concatenate without the implicit intervening single quote, as in other
languages that support juxtaposed String literals. I may yet change
this so that single quotes inside Strings must be escaped
'like\'this'
to bring them into line with other languages. [Escaping the embedded
single quote does already work just fine, but it isn't currently the
unique means to introduce a single quote into a String -- which is a
bug.] If you think that's bad, just consider that it took all my self
control to avoid making Character literals look like 'a' 'b' and 'c',
and Strings look like "abc" -- with an obviously necessary change to
comments too.)
Note: The 'character escape' rules above apply to
Symbols too. If you want to write the literal symbol for the
'remainder on division' binary message, you have to say
'#\\\\' (since the first and third backslash characters
escape the second and fourth). I think this is a bug (character
escapes should only be recognised if the Symbol is created from a
String [so '#'\\\\' == #\\' would hold]) and intend to fix it
sometime. In the meantime: beware!
Anything else...?
If you find something (either some feature in the sources that I
wrote, or something you think should work but doesn't, that does not
seem to be explained here) then please let me know so I can fix this
document.
Semantics
The semantics are similar to Smalltalk-80, with three main differences:
- There is no built-in distinction between classes and instances.
There are only objects. If you want to make some objects behave in
a class-like manner, and others in an instance-like manner, that's
your choice. Even the 'Smalltalk kernel' library, used by the
Compiler and provided for reuse in your own programs if you want,
gets by just fine without classes. (Although some ideas on how to
separate class/instance concerns in a more traditional fashion are
given below.)
- Blocks are real closures.
- Everything with a name (with one exception) is first-class (i.e.,
a variable that you can modify).
Blocks
The restrictions placed on Blocks by Smalltalk-80 have been
eliminated, and the (end-user) notion of BlockContext has been
replaced by BlockClosure (in several variations according to
optimisability). When you write a block '[...]' in a program, what
you create is a BlockClosure (and not a partially-crippled,
half-initialised activation context, as would be the case in
Smalltalk-80).
Block contexts (activated BlockClosures) have strictly
local arguments and temporaries. The value of an argument or
temporary can never come into contact with, nor be affected in any way
by, an enclosing lexical context. They are quite literally
inaccessible. You cannot, for example, implictly assign to a method
temporary by naming it as a block argument.
BlockClosures can 'close-over' local state defined in a
lexically-enclosing scope. In such cases, the closed-over state will
be preserved on exit from the enclosing scope, leaving it accessible
to future activations of blocks defined within that scope. Each time
the defining scope is entered, fresh copies of closed-over state are
created. (In other words, block closures 'see' the state associated
with the activation in which they were created, rather than
that associated with the closure in which they were created.
Things like 'fixTemps' are completely unnecessary.)
All BlockClosures are first-class (they can be stored or passed
upward for activation at a later time) although
block activations are strictly LIFO, with no
exceptions. (Your hardware really, really wants things to be this
way.)
(For the terminally-curious: closed-over state, corresponding to
any variables that appear 'free' within a lexically-nested scope, are
stored in a heap-allocated 'state vector' independent of the defining
method or block activation context. These state vectors persist for
as long as there are reachable block closures that reference them --
either explicitly, as their defining context, or implicitly, by
holding a reference to a free variable stored within the vector.)
Non-local returns
An explicit return statement inside a block behaves just like in
Smalltalk-80: the method activation in which the block closure
was originally created will return the indicated value.
There is currently one limitation: blocks containing non-local
returns make no attempt to detect whether their defining method
context has already returned. Attempting to return from a block whose
method activation has already exited, rather than resulting in a
friendly runtime error along the lines 'this block cannot return',
will most likely provoke a segmentation fault and core dump. (This is
really easy to fix; I'm just too lazy to deal with it right now.)
Prototypes and objects
Well, it's all just objects really.
Objects are created by being cloned, which creates
an uninitialised shallow copy of the original object. By
convention the 'reusable' object that you clone, to make a new object
to be modified and otherwise abused, is the 'prototype' for its 'clone
family'. All members of a clone family share the same behaviour
(response to messages), including the 'prototype' at the head of the
clone family. If you modify the behaviour of the prototype (or any
other member of its clone family) then the behaviour of all
members of the clone family (including that of the prototype) is
modified, identically. This is something of a compromise between
Lieberman-style prototypes (much simpler and more general
oganisational conventions, but very difficult to implement
efficiently) and class-instance systems (easier to implement
efficiently, but imposing much more complex organisational conventions
on their surrounding systems).
In other words, a prototype (in the sense of the present
discussion) is nothing more than an object that has been:
- cloned from some 'parent' prototype object;
- had named slots added to it (according to its prototype
specification);
- been associated with a fresh (empty) method dictionary, in which
undound messages are delegated to the 'parent' prototype object; and
- bound to a name by which it is known to further prototype
specifications and method definitions.
In yet other words, writing:
Foo : Point ()
is equivalent to:
" add 'Foo' to the set of visible named prototypes, then... "
Foo := Smalltalk allocate: Point byteSize + N "size of Foo slots in bytes".
Foo methodDictionary: (MethodDictionary new parent: Point methodDictionary).
This results in a useful idiom for creating shared structures:
BadVisibilityZone : Dictionary ()
[
(BadVisibilityZone := BadVisibilityZone new)
at: 'Archer' put: #below;
at: 'Warrior' put: #below;
at: 'Sparrowhawk' put: #above;
at: 'Cardinal' put: #above.
]
(although I'm not suggesting that this is either the best idiom
nor, by a long way, a secure and desirable one.)
Note that the explicit reinitialisation (by sending 'new') of the
prototype is required since the implicit cloning in the prototype
specification creates an uninitialised object (in all respect other
than having a valid method dictionary installed in it).
Random thoughts on class-like behaviour
The easiest thing is just to mix 'meta' and 'application' behviour:
Point : Object ( x y )
Point new
[
self := super new.
x := 0.
y := 0.
]
Point magnitude
[
^((x * x) + (y * y)) sqrt
]
The only 'bizarre' (or not, according to your perspective) thing about
this is that any 'instance' of 'Point' will be able to create
new 'Points' in response to 'new'.
Another possibility would be to create parallel hierarchies, with
class behaviour defined in one and instance behaviour in the other.
Point : Object () "the 'class' side"
aPoint : anObject ( x y ) "the 'instance' side"
Point new
[
self := aPoint clone.
x := 0.
y := 0.
]
aPoint magnitude
[
^((x * x) + (y * y)) sqrt
]
Everything is first-class
In case you hadn't already noticed, 'self' is a variable. (As are
'nil', 'true', and 'false'.) If you assign to 'self' inside a method,
the receiver instantly changes identity and retains the new identity
through to the end of the method (or the next assignment to 'self'),
including any implicit return of 'self' at the end of the method. The
following have exactly the same behaviour:
Point new
[
self := self clone.
x := y := 0
]
Point new
[
^super new setX: 0 setY: 0
]
(assuming the existence of 'setX:setY:'), although the former is: (a)
cleaner, (b) more in keeping with 'prototype and clone' style (as
opposed to 'class and instance' style), and (c) faster. The
disadvantage is that 'super new' might not return a Point, after which
assigning to 'x' and 'y' directly might not be a good idea. (Yet
another reason to abolish direct manipluation of 'inherited' state
within methods...)
The only 'special name' to which you cannot assign is 'super'.
(Actually, I never tried to assign to super. I don't think the Parser
will let you, but you might just be able to assign to 'self' by
calling it 'super'. Of course, the correct response to assigning to
'super' should be to dynamically re-parent 'self', but that's fraught
with semantic complications -- not to mention problems with
maintaining consistency in methods that access state directly. Again,
a great reason to get rid of it.)
Pragmatics
The ABI (executable code conventions) are entirely C-compatible. The
intention is to integrate seamlessly with other
languages/applications, platform libraries and data types, without
having (in the vast majority of cases) to leave the object-message
paradigm.
In the meantime, primitive behaviour has to be hand-coded (by a
wizard) and inserted explicitly into the compiled code at the
appropriate point. Code appearing between braces '{...}' is copied
verbatim to the output. Such external blocks are legal
- at the top-level (executed in-order during initialisation, along
with other definitions, directives, etc.);
- as the body of a method (the external block is the entire method body);
- wheverver a statement would be legal (in a top-level 'Smalltalk'
code block, in the body of a method, or in the body of a block
closure)
provided that the code cannot be confused with a directive
('{ import ...') or WordArray literal ('#{...}').
Here's a trivial example, showing how to send a 'Character' to the
'console', answering 'true' or 'false' depending on whether the
operation succeeded:
Character : Object
(
value "character's value as a Smalltalk integer"
)
Character putchar
{
struct t_Character *this= (struct t_Character *)self;
int value= _integerValue(this->value);
return putchar(value) >= 0 ? v_true : v_false;
}
A few things to note:
- 'self' is known as 'self', and has type 'oop' (generic 'pointer
to object');
- arguments, temporaries and globals are prefixed with 'v_' (so a
local called 'foo' can be accessed as 'v_foo');
- a named prototype leaves behind a struct declaration of the same
name prefixed with 't_' (so a reference to a 'Foo' can be cast to
'struct t_Foo *' for access to the structure members);
- (not shown) free variables are not readily accessible (you have
to know the offset within the state vector to which they were assigned
by the compiler);
- (not shown) non-local returns are not readily accessible (for a
similar reason, but involving knowing where to look for a 'pointer'
to the 'home' context).
Alternatively, the above example could be written to raise a
'primitive failed' error on failure (more in keeping with traditional
Smalltalk-80 primitive methods):
Character putchar
[
{
struct t_Character *this= (struct t_Character *)self;
int value= _integerValue(this->value);
if (putchar(value) >= 0) return self;
}.
" fall through to failure code... "
^self primitiveFailed
]
The function '_integerValue(anObject)' used in the above examples is
one of several 'helper functions' defined for external code to use.
The full set is as follows:
sel_t _selector(char *name)
invokes the message send '_selector
intern: name'.
oop _proto(oop parent)
invokes the message send 'parent
_delegated'.
void _method(oop prototype, sel_t selector, imp_t method)
invokes the message send 'prototype _methodAt: selector
put: method'.
(The effects of the message sends performed by the above functions are
explained in detail in the section describing the runtime system.)
imp_t _bind(oop object, sel_t selector)
performs a memoized lookup of 'selector' in 'object',
yielding a method implementation for the corresponding message
response.
imp_t _rebind(oop object, sel_t selector)
similar to _bind, except that the result is not placed
in a point-of-send inline cache (if such are enabled). (This is
critical for the correct implementation of 'Object perform:' and
similar.)
void *_newPointers(int size)
answers the address of a (collectible) block of
uninitialised memory at least size bytes in length. (Pointers to
objects within this memory will be considered by the garbage
collector during marking.)
void *_newBytes(int size)
answers the address of an atomic (uncollectible) block
of uninitialised memory at least size bytes in length. (The
contents of this memory are ignored by the garbage
collector.)
oop _integerObject(int value)
answers an object corresponding to the given integer
value.
int _integerValue(oop object)
answers the integer value corresponding to the given
object. (No check that the object is in fact an integer is
performed.)
int _isIntegerObject(oop object)
answers nonzero if the given object is an
integer.
int _areIntegerObjects(oop a, oop b)
answers nonzero if both a and b are integer
objects.
Two global variables are predefined to make writing command-line
applications a little easier:
int _argc
contains a copy of the original value of argc
passed to the program at startup.
char **_argv
contains a copy of the original value of argv
passed to the program at startup.
For some examples of the above in use, search for '{' within the
Smalltalk library source code.
Finally, here is the 'variadic method' example promised earlier in
this document:
Foo sum: firstArgument ...
[
" Add all arguments until one of them is nil, then stop. Answer the sum. "
| sum next |
sum := firstArgument.
{ va_list ap; va_start(ap, v_firstArgument) }. " start scanning additional arguments "
[{ v_next= va_arg(ap, oop) }. " read next argument "
next notNil]
whileTrue:
[total := total + next].
{ va_end(ap) }. " stop scanning arguments "
^sum
]
[
| total |
total := Foo sum: 1 : 2 : 3 : nil. " leaves 6 in total "
]
If that didn't make much sense, type 'man stdarg' on any Unix-based
machine.
The runtime system: introspection and intercession
The only intrinsic runtime operation (in the sense that it is
inaccessible to user-level programs) is memoized 'secondary' dynamic
binding, taking place entirely within the method cache. Every other
runtime operation (prototype creation, cloning objects, method
dictionary creation, message lookup ['primary' dynamic binding,
outside the method cache], etc.) is achieved by sending messages to
objects, is expressed in entirely in idst, and is therefore
accessible, exposed and available for arbitrary modification
by any user-level program.
Runtime structures
Four types of object are used within the runtime system, and are the
basis for all computation. The hierarchy looks like this:
_object ()
_selector ( size _name next )
_binding ( selector _method )
_vtbl ( size capacity _bindings delegate )
(Slots starting with an underscore '_' are primitive types useful for
for their state only -- you cannot send message to these objects. All
other slots contain pointers to real objects that respond to
messages.)
_object is a singleton prototype that defines behaviour
common to all objects. This behaviour includes message lookup
(dynamic binding), which is achieved by sending (real) messages to the
objects involved. In other words, every single object created must
eventually delegate to _object, otherwise it would be impossible to
interact with (send messages to) that object. In yet other words,
_object is necessarily the parent of every other object
in the system. If you write
RootPrototype ()
then the runtime system will tacitly convert this into
RootPrototype : _object ()
to ensure that you (and, more importantly, the system itself) can send
messages to RootPrototype (and its clones).
_selector is an interned (unique) string, much like a
Smalltalk Symbol. The selector itself is stored as a 'size' (in
bytes) and a '_name' (a primitive array of bytes; i.e., of type 'char
*'). The 'next' field links all the selectors into a list for
purposes of interning.
_binding associates a _selector (in the 'selector' slot)
with the address of native code implementing a method (in the
'_method' slot).
_vtbl is a 'virtual table', similar to a MethodDictionary in
Smalltalk-80. Virtual tables map selectors to method implementations
for a particular clone family (one _vtbl is shared between all clones
in a given family). The '_bindings' slot points to a primitive vector
of pointers to _binding objects describing the virtual table's
mapping. The 'size' slot contains the number of entries in _bindings,
and 'capacity' is the maximum number of entries that _bindings can
contain (without being grown). Finally, 'delegate' points to another
_vtbl to which all unrecognised messages are delegated.
Essential protocol of runtime objects
Compiled code assumes the existence of responses to the following
messages:
- _object _delegated
creates a new prototype (with an empty protocol) whose clones
delegate unimplemented messages to the receiver's family. This is the
only mechanism for creating a prototype hierarchy, including during
object/prototype initialisation at program startup. The source form
Foo : Bar (...)
is equivalent to creating a new name Foo and then evaluating:
Foo := Bar _delegated
(The '_delegate' message has an initial underscore to avoid
over-polluting the protocol of user prototypes derived from _object.)
- _selector _intern: _cString
creates a new unique selector whose name is the given C string (a
primitive string, of type 'char *' ). This is the only mechanism for
creating new selectors. (The initial underscore in the selector is a
convention indicating that the argument '_cString' is a primitive,
non-object type.)
- _object _methodAt: aSelector put: _aMethod
extends (or modifies) the protocol of the receiver's clone family.
Subsequent lookups of aSelector in the receiver's family will be
resolved to _aMethod. This is the only mechanism for adding protocol
to a prototype. The source form
Foo bar: baz [ ... ]
is equivalent to evaluating:
Foo _methodAt: (_selector _intern: "bar:") put: barMethod
where "bar:" is a primitive string ('char *')
and barMethod is the address of the native code implementing
'Foo _bar:'. (The initial underscore in the selector is to avoid
polluting _object's protocol.)
- _vtbl lookup: aSelector
answers (a raw pointer to the native code of) the method
implementing the response to aSelector within the receiver. This is
the only mechanism for performing message lookup (dynamic binding)
within the system. (Note that for performance reasons the results of
'lookup:' may be memoized by the runtime system. There is currently
no way to prevent this, meaning that a given _vtbl might only have one
chance to influence the meaning of a given message send. This is a
limitation [read: bug] and will be fixed soon.)
Ambitious applications can therefore (amongst other tricks) redefine
'_object _methodAt:put:' and/or '_vtbl lookup:' to implement unusual
dynamic binding behaviour.
The implementation of the above methods (along with several
potentially useful auxiliary methods in the runtime classes) can be
found in the file 'Smalltalk/runtime.st'.
Additional protocol of runtime objects
Several additional methods are defined in runtime objects for
convenience. Amongst these are:
- _vtbl flushCache
causes the runtime system to purge all 'memoized' lookups stored
in method caches. After invoking this method, the next send of a
given selector to an object in a given protocol family is guaranteed
to invoke '_vtbl lookup:' to resolve the message send. This method
should be invoked after every modification of (other than monotonic
additions to) protocol to ensure the changes take effect
immediately.
- _object _beTaggedType
_object _beNilType
causes the runtime to consider tagged objects (whose oops are
odd addresses) and nil (oop zero), respectively, to belong to the
receiver's clone family. (For example, these messages are sent from
within 'Smalltalk/kernel.st' to the prototypes 'SmallInteger' and
'UndefinedObject' to advise the runtime of the peculiar behaviour of
their corresponding object pointers.
Runtime examples
The file 'Smalltalk/runtime.st' contains a disabled (commented)
section of code at the end. Remove the comments to see the above
methods bringing the entire system up, in excruciating detail.
A (somewhat contrived) example showing how to control the
modification of object protocol and the behaviour of method lookup can
be found in 'example/intercede.st'. To build and run it, type
make PROGRAM=intercede
./intercede
from within the 'example' directory.
Object layout and object pointers
Objects have a single header word followed by zero or more bytes
corresponding to the named slots containing the state of the object.
The header word is a pointer to the object's virtual table.
Message sends to the object are resolved (when not present in the
method cache) by sending 'lookup:' to the header object. This is the
only explicit relationship between an object and the value stored in
its header word.
Object pointers correspond to the address in memory of the first
slot of an object, one word beyond the object's header (_vtbl
pointer). In other words, the object header (containing the _vtbl
pointer) is in the word before the one referenced by the
object's oop. This is done to allow 'toll-free bridging' of idst
objects to C/C++ structs/classes, Objective-C instances, or to native
objects in any other language that does not use the same convention of
putting a header in the word before an object's address. Allocating
the idst _vtbl pointer before (e.g.) a C/C++/ObjC object effectively
'wraps' the foreign object in an 'invisible' idst object, whose layout
is identical to (and whose state is stored at the same address as)
that expected by the native implementation of the foreign object.
Caveats and gotchas for Smalltalk programmers
Pepsi includes a pervasive experiment in zero-relative indexing.
- The first element in a SequenceableCollections is at index 0. The
last element is at index 'size - 1'. This simplifies a lot of code by
removing frequent adjustments of indices by 1.
- Messages that operate on a contiguous range of elements within a
SequenceableCollection (such as 'replaceFrom:to:with:startingAt:')
have an inclusive (zero-relative) lower bound, but an exclusive
(zero-relative) upper bound. In other words, to replace the first
five elements of a collection you would specify '0' and '5' as the
'from:' and 'to:' indices. This simplifies a lot of arithmetic that
calculates an index from a combination of sizes involving two or more
collections. (If you don't believe me, look at the definition of the
concatenation message ','.)
Numbers are signed (positive or negative) and the scanner is not
context-sensitive (the syntactic type of each token is uniquely
determined by its spelling, irrespective of its position).
- '+1' is legal, being the number '1' with a sign that is ignored.
(In Smalltalk-80 the sign would not be legal.)
- '--1' is legal, with the negative signs cancelling out.
(Similarly illegal in Smalltalk-80.)
- '3+4' is illegal, being the integer literal '3' followed by the
literal '+4'. (In Smalltalk-80, the sign would be parsed as a
selector.)
- '6-5' is similarly illegal, being '6' followed by '-5'. (In
Smalltalk-80, the sign would be parsed as a selector.)
- The symbol '#foo:bar' is illegal, since it is not a well-formed
keyword. (Smalltalk-80 happily accepts identifiers in the last
position of a multi-part keyword symbol.)
- There are probably others, but I've yet to stumble across
them.