Source Files

The main source file is opened by the SB-Assembler when you start the program. This main source file may open as many include source files as you need, one at a time. The include files follow exactly the same rules as the main source file.
The maximum file size of every source file is only limited by your operating system and the main source file can open an infinite number of source files. In Version 3 of the SB-Assembler included source files may even be nested. This means that any included file can open any number of other included files, almost indefinitely. In Version 2 of the SB-Assembler nesting include files is not possible.
Every source file is interpreted from top to bottom on a line by line basis. The parsing of a source file is put on hold when an include file is opened. This include file is then interpreted from top to bottom again. Once an include file is completed the SB-Assembler will continue with the last file which was put on hold, until all included files and the main source file are completed.
The SB-Assembler is a 2 pass assembler which means that it interprets all source files twice. Labels and macros are collected and defined during the first pass only. The second pass will finally produce the code.

In version 2 source files may only contain plain ASCII characters. It is recommended to use normal standard ASCII characters only. Extended ASCII characters (Full IBM characters) are discouraged because they may lead to unexpected results. This restriction does not apply to the comment fields however. Formatted files produced by e.g. Word or LibreOffice can not be used by the SB-Assembler.

In version 3 sources files are expected to be in UTF-8. It is recommended that you use only normal plain ASCII characters in your code for platform compatibility reasons. Comments may contain any UTF-8 characters. However restrictions on Windows computers may print some characters wrong, because Windows doesn't use UTF-8.
Strings in your code are expected to be in plain ASCII. Strange encoding errors may occur if you use non ASCII characters, which may result in strange characters in your strings.

A wide variety of text editors (not word processors!) is available that can be used to edit your source files. Starting at the lowest end in a Microsoft environment there is EDLIN (Created by the makers of Windows), then upwards to COPY CON, followed by EDIT, NE, QE, E, ME, ED, PE and many, many others. Even Window's Notepad can be used to create your source files.
For Linux and Unix a wide variety of suitable editors exist, like in no particular order: vi, vim, joe, emacs, tex, jed, nano, gedit and bluefish.

The SB-Assembler is case insensitive, which means that it doesn't matter whether you use upper or lower case characters in your source files for labels, mnemonics, directives and operands.

Naming Your Source Files In Version 2

The name of source files and all other files used or created by the SB-Assembler must obey to the DOS rules. Mainly this means that the filename may contain a maximum of 8 characters followed by a dot and a 3 character extension. Files may be preceded by a drive letter (e.g. C: or M: ) and a path name. The total length of drive letter, path name and file name may not exceed 127 characters, or less if the DOS you use is the limiting factor.

The default extension for source files is .asm . This means that you do not have to specifically supply this extension if you save your source files with the .asm file extension.
Thus you can start the assembler by typing sbasm source, or by sbasm source.asm. If your file name ends in anything else, for instance .src, you will have to supply the extension.

Naming Your Source Files In Version 3

On a Windows system you can use the standard file naming standards again. The limitations of the 8.3 naming convention no longer exists. You may use longer names if you like. You may also include drive letter and path names to direct the assembler to the proper files. Path names and file name are separated by a backward slash \ or the forward slash /.
However I would strongly discourage you to use file names with special characters. File and path names with spaces in them can be very confusing to the system and are therefore discouraged.
Please note that a Windows system makes no distinction between upper and lower-case characters in file and path names.

Different rules apply on Linux, Unix and MAC systems. For starters, in these operating systems file names ARE case sensitive. This means that soure.asm and Source.asm are two different files.
Please note that this also applies to the file extensions. Default file extensions, like .asm are always in lower case.
Drive letters don't exist in these operating systems. Path names and file name are separated by a forward slash / .
On the command prompt you may use the backward slash character \ to escape special characters, like spaces.

The default extension for source files is .asm . This means that you do not have to specifically supply this extension if you save your source files with the .asm file extension.
Thus you can start the assembler by typing sbasm source, or by sbasm source.asm. If your file name ends in anything else, for instance .src, you will have to supply the extension.

Source File Contents

A source file consists of one or more program lines. These lines are interpreted by the SB-Assembler one after the other. Every program line contains up to 4 fields. Some fields may be empty, others may only be empty if the rest of the line can be treated as a comment field.
Space or TAB characters separate the different fields from each other on a line.
In the description below the space character is equivalent to the TAB character. EOL means End Of Line, which is indicated by a CR/LF pair on a Microsoft system, or a single LF character on a Linux/Unix/MAC system.
Please note that it is not necessary for the SB-Assembler to put all fields in neatly organized columns in your source file. However I do recommend you do make neat columns in your source files for better readability for us humans.

                .CR     6502            Use 6502 overlay
RESET           LDA     #$FF            Initialize stack
                TAX
                JSR     INIT            Initialize machine
                JMP     START           Get going
 .CR 6502 Use 6502 overlay
RESET LDA #$FF Initialize stack
 TAX
     JSR INIT     Initialize machine
  JMP       START    Get going

Both of these imaginary programs do exactly the same. The SB-Assembler doesn't care less which one of these two programs it has to translate. It's up to you to decide which one you think is the easiest to read by a human.

Completely empty lines are only listed in the list file and are ignored by the assembler.

Below is a typical source line containing all fields. All fields are separated from each other by at least one space or TAB character.

LabelField    InstructionField    OperandField    CommentField

Label Field

The first field on a line is the label field. This field always starts at the first position on the program line. This means that no spaces or TABs may be in front of this field.

The label field contains a label if the first character of the line is a letter A to Z, a dot or a colon. As of Version 3.01 a label may also start with an underscore. The label field will end at the first space, TAB or EOL character on the program line.

The label field is empty if the first character on the program line is a space or a TAB character.

If the first character on the program line is a semicolon ( ; ) or an asterisk ( * ) the whole line is treated as a comment line. Comment lines are only listed and are ignored by the assembler. Clearly they are intended for human readability only.
In Version 3 of the SB-Assembler a # symbol is also treated as a comment identifier.
As of version 3.01.00 an @ symbol is also added as a comment identifier. This enables Notepad++ users to use @START and @END directives to fold back part of the code you're not currently working on.

Any other character found on the first position of a program line will result in a Bad symbol error message in Version 2, or a an Illegal label name in Version 3.

; Here are some legal label names
LABEL
GLOBAL_LABEL
.LOCAL_LABEL
:MACRO_LABEL
; Comment line
* Also a comment line
# A comment line, only in Version 3

!  This line is illegal because of its first character !
0  A digit is also an illegal first character in the label field

Instruction Field

The second field on a program line is the instruction field, also known as the mnemonic field. This field starts at the next non-space character following the label field. If the label field is empty the instruction field starts at the first non-space character of the program line.

The instruction field is assumed to be empty if no non-space character follows the label field. It is perfectly legal to have only a label field on a source line, in which case the label gets assigned the current program counter value.

The instruction field must contain a legal directive if it starts with a dot. Otherwise an Unknown directive error is reported.
The instruction field may contain a single = symbol which is interpreted as an .EQ directive.

An instruction field starting with the > symbol is a macro call. Please refer to the description of Macros for more details.

An instruction field starting with a semicolon ; or an asterisk * will cause the rest of the source line to be treated as comments. The rest of the source line will be ignored by the SB-Assembler. This effectively means that this line does not contain an operand field.
Version 3 of the SB-Assembler will also accept the # symbol as comment field.
As of version 3.00.00.B08 an @ symbol is also added as a comment identifier. This enables Notepad++ users to use @START and @END directives to fold back part of the code you're not currently working on.

In all other cases the instruction field should hold a legal mnemonic. Mnemonics are exclusively interpreted by Cross-Overlays. This means that mnemonics can't be interpreted without a loaded Cross-Overlay (see .CR directive). In Version 2 of the SB-Assembler Mnemonics may not be more than 10 characters long. In Version 3 there is no maximum limit.

The instruction field ends at the next space or TAB character or at the EOL.

COUNTER         .EQ    50

LABEL           NOP
                RET

.LOCAL          NEG
EXAMPLE         ; A comment example

Operand Field

The third field is the operand field. The operand field starts at the next non-space character after the instruction field.

The operand field should contain the operands that are expected by the mnemonic or directive in the instruction field.

Some mnemonics or directives don't expect an operand, in which case the operand field is treated as a comment field.

Other mnemonics or directives have optional operands. In those cases the operand should follow the instruction field within 10 space characters, or at most 1 TAB character (within 9 spaces for Version 2). Otherwise the operand field is treated as comment field, effectively omitting the optional operand. You can force a comment field sooner by using a semicolon as first character in the operand field. The other comment prefixes, # and * are not accepted here and may have different meanings.

Still other mnemonics or directives require one or more operands. In which case the distance between the instruction field and operand field is not important. Now it is not allowed to start the operand field with a semicolon because an operand is expected!

Multiple operands in an operand field should be separated from each other by commas. The operand field ends at the next space or TAB character or EOL. Except when the operand is a delimited string of course.
In Version 3 of the SB-Assembler a space directly following a comma is allowed for better readability. Be careful though, using spaces to follow commas makes your programs incompatible to Version 2 of the SB-Assembler. Thus avoid spaces following commas if cross compatibility is important to you.

All other rules for the operands are dictated by the mnemonic or directive in the instruction field.

LABEL           LD      A,B
                LD      A, B    Only allowed in Version 3
                INC     A

.LOCAL          CALL    DISPLAY

Comment Field

The fourth and last field is the comment field. This field is only meant for us humans and will always be ignored by the SB-Assembler.

The comment field automatically starts when the SB-Assembler doesn't expect any more operands. This also means that it is not necessary to start a comment field with a semicolon, as with most other assemblers. It won't hurt if you do start the comment field with a semicolon though.

Everything is allowed in the comment field, for the SB-Assembler has already stopped interpreting your source line here.

LABEL           LD      A,B        Auto comment field
                INC     A          ; Forced comment field

.LOCAL          NOP                No operand required, auto comment
CAUTION       ; Requires forced comment, otherwise expects operand here
; Line contains only comment, forced comment is required now