Skip to content

bieber/col

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Copyright 2012, Robert Bieber

col is a simple programming language, inspired by John Backus' FP language, 
although not strictly adhering to it.  Currently the interpreter only supports 
file execution, a full REPL environment will come later.  

-------------------
BUILDING COL
-------------------

To build the col interpreter, extract the project files into a directory and 
cd to it.  Then run the following commands:

  mkdir build
  cd build
  cmake ..
  make

You can install col with the command:

  make install
  
If you modify any of the primitive functions or functional forms in 
primitives.h/c and forms.h/c, you will also need to run the script

  utils/gendoc.py
  
Be sure to execute this script from the project root directory.  It will update 
auto-generated function pointer tables in the src/gen directory as well as the 
PRIMITIVE_FUNCTIONS and FUNCTIONAL_FORMS documentation files in the root 
directory.

-------------------
RUNNING COL
-------------------

To execute a file, simply run

  colint [-v] <source file> [optional command line arguments]

and the interpreter will load the code in program.col and execute its main
function.  The main function is called with any command-line arguments passed
to it as a sequence of strings (see language reference for details).  If the -v 
flag is passed, the interpreter will print debugging information before and 
after running the program.

-------------------
LANGUAGE REFERENCE 
-------------------

1 - Constants and Identifiers
  Allowable constants in col are numbers, characters, strings, bottom,
  true or false, and sequences.  
  
  A constant number is any integer or floating point number, with an optional 
  +/- in front of it.  Numbers may also be specified in scientific notation.
  Examples of valid numerical constants:
    5
    5.2
    +5.2
    -5.2
    0.2
    .2
    5.2e12
    5.2E12
  
  A character is any character enclosed in single quote marks.  Alternatively, 
  some two-character escape sequences are allowed using the backslash.  Valid 
  escape sequences are listed below:
    \\ - Backslash
    \' - Single quote
    \n - Newline
    \t - Tab character
    
  A string is any series of characters enclosed in double quote marks.  Strings 
  also support two-character escape sequences using the backslash.  All of the 
  escape sequences used in characters are supported with the exception of the 
  single quote, which need not be escaped in a string.  Instead, the double 
  quote may be escaped in strings.  Valid escape sequences are listed below:
    \\ - Backslash
    \" - Double quote
    \n - Newline
    \t - Tab character
  
  Bottom represents an undefined value.  Sequences and functions in col are 
  bottom-preserving.  This means that any sequence which contains bottom as an
  element is equivalent to bottom, and any function that accepts bottom as an
  argument will return bottom as its result.  Bottom is simply written in 
  lowercase text as follows:
    bottom
    
  Logical true and false are simply written in lowercase text, as follows:
    true
    false
  
  A sequence is any series of valid constants surrounded by angle brackets and 
  separated by commas.  A sequence may be any length, and its elements need not
  be of the same type.  Examples of valid sequence constants:
    <1, 2, 3>
    <"Sequence", "of", "strings">
    <"Nested", <"Sequence", true>>
    <>
    
  An identifier is any combination of lower or uppercase letters, numbers, and 
  the symbols +, -, *, /, !, _, and -, with some caveats.  An identifier must 
  not be of a form that could be read as a number: any text which is a valid 
  numerical constant will be read as one, and never as an identifier.  
  Identifiers also cannot use the reserved words bottom, true, or false, and 
  using the name of a built-in function or functional form as an identifier will
  cause an error.  Examples of valid identifiers:
    my-function
    my_function
    1+
    !
    25!
  In the interest of readability, it is advisable to avoid using identifiers 
  that might be too easily mistaken for numbers, but any combination of the 
  allowable characters that cannot be read as a number will be read as a valid 
  identifier.

2 - Comments
  The # symbol marks the beginning of a comment.  The lexer will disregard any 
  text between the # and the following newline.

3 - Data Types
  Types are never written explicitly in a col program, but col is a strongly 
  typed language, and primitive functions will normally only behave correctly 
  when operating on a limited number of types.  To represent the listed types as
  constants, see section 1.

  Integers: An integer is a positive or negative whole number.  They are 
  represented internally using C's int type, which will define the limitations 
  on their size.
  
  Floating Point Numbers: Floating point numbers are represented internally 
  using C's double type, with corresponding limitations.  When reading numeric 
  constants, the interpreter will regard any number with a decimal point or 
  using scientific notation as floating point.

  Characters: ASCII characters, represented internally by C's char type.
  
  Strings: Any series of characters.  Note that strings are not equivalent to 
  sequences of characters.
  
  Boolean Values: Either true or false.  Note that false is not equivalent to 
  bottom.
  
  Bottom: Bottom represents an undefined value.  All functions and sequences in
  col are bottom-preserving, meaning that any function which receives bottom as 
  an argument will return it as a result, and any sequence containing bottom is
  equivalent to bottom.  Bottom is commonly returned as a result of failed 
  computation.
  
  Sequence: A sequence is an ordered collection of values.  The elements of a 
  sequence may be of any type, including sequences.  The elements may also be 
  heterogeneous.  The empty sequence must always be written "<>", there is no 
  special symbol for it, and it is distinct from logical false and bottom.

4 - Functions
  Every function in col accepts a single argument and produces a single return 
  value.  col functions, with the exception of I/O operations, cannot (at 
  present) have any side-effects.  Aside from I/O, every col function will 
  always maintain referential transparency.
  
  col functions are always defined in terms of other functions, combined using 
  functional forms (defined below).  The language provides a set of primitive 
  functions, and more complex functions may be constructed from them.
  
  col functions do not have type signatures.  A function may accept any value as
  an argument, and some primitive functions may accept many possible input types
  and return multiple corresponding output types.  If a function receives as 
  input a value that it cannot logically handle, it will return bottom.
  
  Functions are normally represented simply by their identifier, but some 
  primitive functions may also accept arguments at load-time (think of these as
  similar to templates in C++ or generics in Java) which alter their behavior.  
  These are called specializers, and they must be constant values.  As an 
  example, consider the function const, which always returns the same value 
  regardless of its input (unless its input is bottom, of course).  Because 
  including a separate const function for every possible constant value would be
  impractical, the const function accepts a specializer which specifies the 
  constant that it should return.  For instance, the primitive function written
  
    const(15)
    
  will always return the value 15, regardless of its input.  At this time, only
  primitive functions can accept specializers.
  
5 - Functional Forms
  Functional forms are the tools with which functions are combined to create 
  more complex functions.  A functional form accepts one or more functions as 
  arguments, and creates a function that differs from the original(s) in some 
  useful way.  Take, for instance, the compose functional form.  It accepts 
  two or more functions and creates a new function which accepts an argument, 
  feeds it to the last function in the argument list, feeds that function's 
  result to the next-to-last function in the list as its argument, and so on 
  until it reaches the first function in the list, whose argument becomes the 
  return value of the combined function.  Consider the following function 
  definition (more on function definitions later):
  
    inc-then-print = compose{ print, str, 1+ }
    
  The primitive function 1+ increments a numeric value by one, the str function 
  converts it to a string, and the print function writes a string out to the 
  screen.  The result of combining the three with the compose form is a function
  which first increments the value passed to it and then prints the incremented 
  value to the screen.
  
6 - Function Definitions
  A program in col is written as a series of function definitions.  The 
  interpreter reads the input file, loads the definition, and then runs the 
  function called main (which must be present).  It passes as arguments to main
  a sequence of strings containing the command-line arguments given to the 
  interpreter, or the empty sequence if none were present.
  
  An actual function definition is written as a valid identifier followed by the
  assignment operator, '=', followed by either the name of a function to 
  duplicate or, more likely, a functional form combining one or more existing 
  functions into a new, more useful function.  Functional forms may be nested, 
  so a function definition can be as complex as necessary to achieve the 
  desired functionality.  As an example, consider the following function 
  definition for double, a function which accepts a numerical value and doubles 
  it.
  
    double = compose{ *, construct{ id, const(2) }}
    
  We have already defined the compose functional form and the const primitive 
  function.  The id primitive function simply returns its input as its output.  
  The construct functional form accepts one or more functions and creates a 
  function which creates a sequence from the results of applying each of its 
  argument functions to its input.  The * primitive function accepts a sequence 
  of numerical values and multiplies them together.
  
  To understand the operation of this function, consider its application to the 
  number 3.  The compose functional form will first pass this value to its 
  last argument, which is 
    
    construct{ id, const(2)}
    
  The construct function will in turn pass that value to each of its argument 
  functions, and return a sequence from the results.  The id function always 
  returns its argument, so it will return 3.  The function const(2) always 
  returns the number 2, so it will return 2.  The construct form combines them 
  into the sequence 
  
    <3, 2>
    
  The compose form will then take this result and pass it to its next function 
  argument to the left, in this case *.  * accepts the sequence and multiplies 
  its elements together, returning 6.  Because * is the first function in 
  compose's argument list, it is returned as the result of the combined 
  function.  The end result is that the number 3 has been doubled to yield 6.
  
  The arguments to functional forms can be either primitive or user-defined 
  functions, so more complicated functions can be broken down into more simple 
  sub-functions, and existing functions can be reused in more complicated ones.
  Because all function definitions are loaded before executing the main 
  function, their order in the source file is inconsequential.  If there are any
  references to functions that don't exist, an error will be triggered at 
  runtime.
  
7 - Grammar
  What follows is the grammar of the col language in Backus-Naur Form (BNF).  
  This should match the textual description given in section 6.  The 
  <identifier> and <value> nonterminals are lexed according to the 
  descriptions given in section 1.  Comments are not included in the grammar, 
  but discarded by the lexer before parsing.
  
    program ::= <function-definition> 
              | <function-definition> <program>
    
    <function-definition> ::= <identifier> "=" <function-expression>
    
    <function-expression> ::= <identifier>
                            | <identifier> "(" <constant-arguments> ")"
                            | <identifier> "{" <function-arguments> "}"
                          
    <constant-arguments> ::= <constant>
                           | <constant> "," <constant-arguments>
                         
    <function-arguments> ::= <function-expression>
                           | <function-expression> "," <function-arguments>

    <constant> ::= <value>
                 | "<" <constant-arguments> ">"

8 - Further Reference
  All of the available primitive functions and functional forms are documented 
  in the files FUNCTIONAL_FORMS and PRIMITIVE_FUNCTIONS in the project root 
  directory.  They are listed in alphabetical order with a definition and 
  description of each provided.


About

A programming language loosely based on John Backus' FP

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published