News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Manipulation of strings

Started by Herakles, May 17, 2010, 04:40:50 PM

Previous topic - Next topic

Herakles

QuoteWrite a SPIM-programme, which is able to read in a string, scrolling character by character through it and replacing every small letter with capital characters, without touching the other characters. The programme must be able to print the modified string on the shell. Also describe the algorithm of your conversion function with pseudo code in the form of a commentary right before your actual programme starts. Don't forget to write comments wherever it's necessary.
Hints:

- In case you are using a UNIX system, you can get an overview of ASCII codes with man ascii and on the internet you can find it on http://www.asciitable.com.
- The actual conversion from small to capital letters can be implemented through the subtraction of a constant.


Hi, folks! :wink

First of all I want to say again a big THANKS for your unequalled help recently.

As I just started to work on the new exercise, I'll be able to post my code later on.

BTW: Unfortunately I am not a native English speaker nor am I living in a country where English is an official language, so please feel free to ask me in case parts of the task above sound strange or absurd.

Regards,
Herakles

redskull

while not end-of-string
     if (ascii_value >= 97) and (ascii_value <=122)
          ascii_value = ascii_value - 0x30 0x20

edit - nice catch, clive
Strange women, lying in ponds, distributing swords, is no basis for a system of government

clive

Quote from: redskull on May 17, 2010, 04:45:25 PM
while not end-of-string
     if (ascii_value >= 97) and (ascii_value <=122)
          ascii_value = ascii_value - 0x30

That should be 0x20 (32), 'A' = 0x41, 'a' = 0x61

SPIM is a MIPS simulator, for those unfamiliar with the question
http://pages.cs.wisc.edu/~larus/spim.html

It would be appropriate to post your initial stab at the code first, before we all pile on..


It could be a random act of randomness. Those happen a lot as well.

Herakles

Sorry, I should have mentioned that we are talking about the MIPS simulator SPIM.

In the meanwhile I've read the documents they gave us for help, but they are completely useless.

I didn't really get if they want us to write a programme that is able to read in a string which was typed in the shell from a user or if it is enough to define a string once for all in the code?


I am really desperate since I have no clue how to start this exercise, not to mention how to implement it.
You must imagine that I am a totally layman. I hear the task "only small letters to capital letters conversion" and my first reaction is OMG.

The code which redskull posted must be the pseudo code the answer asked for, right?

redskull

Check out the ASCII chart (http://www.asciitable.com/), which is what maps the numerical representation of characters that the CPU uses to the little pictures of letters that the shell will spit out.  If you peruse it further, you'll see that all the lowercase letters fall in one range, and all the uppercase in another.  The project consists of scanning a string (a sequential range of bytes, terminated by a '00') for numbers in the one range (lowercase), and then converting them to the equivelent number in the other (uppercase).  As the hint on the problem statment goes, they are seperated by hex 20.  If my MIPS class memory serves me, the SPIM simulator emulates some psudo "syscall" instructions to do input and output to the simulator, but that was many moons ago; google SPIM and SYSCALL

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

clive

print_string is syscall # 4, read_string is syscall #8

By scrolling, I suspect it means just traverse the input string, and perhaps printing out each char as you process it.

Pseudo code in C of conversion task

char chartoupper(char a)
{
  if ((a >= 'a') && (a <= 'z'))
    a = a - 32;

  return(a);
}

void stringtoupper(char *s)
{
  while(*s)
    *s++ = chartoupper(*s);
}


Convert char to upper case, branching

        andi    $t0,$t0,0xFF    # Mask it to a character
        addiu   $t1,$t0,-0x61   # 'a'
        sltu    $t1,$t1,26      # a-z 26 character span
        beq     $t1,$0, upper   # $t1 == 0?
        nop                     # stuff branch delay slot

        addiu   $t0,-0x20       # -('a' - 'A')

upper:


Converting char to upper case, non branching (MIPS IV)

        andi    $t0,$t0,0xFF    # Mask it to a character
        addiu   $t1,$t0,-0x61   # 'a'
        sltu    $t1,$t1,26      # a-z 26 character span
        addiu   $t2,$t0,-0x20   # -('a' - 'A')
        movn    $t0,$t2,$t1     # if $t1 NZ then $t0 = $t2
It could be a random act of randomness. Those happen a lot as well.

Herakles

#6
Thanks redskull. :U

In case I really understood it, I have to say it is easy.
I have to take the Hx value of the small character and then I have to subtract Hx 20 to get the Hx value for the capital letter...?

The most difficult thing now is to implement the idea...


Thanks clive. :U
Yes, "to traverse" puts it in a nutshell.

I am not totally sure if they will accept the pseudo code in C... I think they want a pseudo code in MODULA. But it's not that important (I hope :lol ).

clive, the two last codes you posted, are they two alternatives and which one would you say is more easy to implement for laymans...?

In the German language there are some "strange" letters, e.g. the so-called "umlauts" (Ü ü, Ä ä, Ö ö). Is there a way to let SPIM handle them, too?


EDIT:
The commands sltu and movn aren't mentioned in the alphabetical command overview they gave us... Are there different SPIM versions?

clive

Correct, subtracting 32 is how to do it with ASCII.

With umlauts, etc, I think I would probably use a case conversion table as that would be more efficient than doing multiple comparisons/conversions. I think the IBM PC (ASCII) character set has them in the high order characters (>=128)

a = uppertbl[a];

SLTU is a MIPS 3000 instruction, MOVN/MOVZ are MIPS IV, they both work on the current SPIM cited earlier.

I come at this from an ARM perspective, I would try to avoid branches whenever possible. The pipeline penalty is reasonably cheap for the original MIPS design, but as more pipeline stages get added, the more important things like branch prediction become. It was reasonably easy for me to demonstrate a ~13 (16) cycle stall on the Intel Atom.

The layman would probably use the compare/branch approach. On ARM which has always had conditional execution, one would probably try to use that instead of branches.
It could be a random act of randomness. Those happen a lot as well.

Herakles

Thank you very much so far.

The only thing is that we are using the MIPS R2000... Do you think it's allowed to use the two commands?

clive

Well the solution to that is probably to provide multiple answers to the question, providing timing estimates of each.

You could also ask the professor.

I don't have definitive documents in front of me, but Googling some more suggest SLT,SLTI,SLTU,SLTIU,SLTUI are present in the MIPS R2000 instruction set. Most of the SoC stuff I deal with is M4K, MIPS32, MIP16, so I'll apologize for not having the antiquated stuff all straight.

It could be a random act of randomness. Those happen a lot as well.

Herakles

Never mind! :wink

I am too silly to write a working programme! :'(

That's what I have so far:

.data
str1: .ascii "Please type in chars.\n"
.ascii "In case you enter 0 the programme will stop \n"
.asciiz "the input and convert all small letters to capital ones.\n"
askstr: .asciiz "\n?-> "
answstr: .asciiz "The new string is: "
str2: .asciiz "\n\n"

.text
main: li $v0, 4
la $a0, str1       # load str1
syscall                 # PRINT_STR(str1)

loop: li $v0, 4
la $a0, askstr     
syscall               # PRINT_STR(askstr)

li $v0, 8
syscall # $v0 := READ_STRING;
li $t2, 41
bgeu $v0, $t2, upper # jump to upper if v0 >= t2


upper:

exit: li $v0, 4
la $a0, answstr   
syscall                 # PRINT_STR(answstr)



li $v0, 4
la $a0, str2       
syscall # PRINT_STR(str2)

li $v0, 10 # Systemaufrufnr. 10 = EXIT
syscall


I think it's the best to describe my issues, so you can see where I have problems.

1.) Is it possible to subtract Hx 20 from a entered character without "saying" to the programme "Hey, this is a character, you have to find it's Hx value and then subtract Hx 20!". For example is it allowed to say "c - Hx 20"?
2.) How do I implement the idea of "if the Hx value of the character is >= 61 and <= 7A then subtract Hx 20 else don't change anything"?
3.) How do I implement the idea of traversing the input string character by character, while converting to capital letters when it is necessary, without deleting the capitals checked/converted so far?
For example: the entered string is "AbCcDd" and the output string must be "ABCCDD" not "bcd" or "d"...
4.) It think it's the best when the programme is able to present the converted string in one line and not character by character (vertical order). But how do I implement this?

As you can see, my mind's IT architectural thinking is very poor.
I know that the whole thing about SPIM is the implementation with labels and that shouldn't be very difficult, but although the course started three weeks ago, I still have huge problems. :'(

Thank you very much for your help and your patience.

clive

1) You have to load the value into a register. Once you have it in a register you can add, subtract, compare the value

2) Once the value is in a register, you can compare it against values in other registers (constants), or immediate values. You can test whether the value is equal, different, above, below, etc. and make decisions/branches based on those comparisons.

3) Again, you need to load the address into a register. Then use that address to read the content into another register. This second register holds the character you want to process. If the character is zero (NUL), it means the end of the string, if you encounter this character you stop. Otherwise you add one to the register holding the address, and then read the content from the new address, and so on.

4) You need to look at the Read String SYSCALL function some more. You can specify a buffer, and a length. This way it can read multiple characters into a buffer, zero terminate it, and return. So you type the sentence and hit enter/return. Once you have the whole string in a buffer, you process it the manner covered in 3)
It could be a random act of randomness. Those happen a lot as well.

clive

Please type in chars, then ENTER.
The input is converted to upper case.

?-> The Quick Brown Fox Jumped Over The Lazy Dog
The original strings is: The Quick Brown Fox Jumped Over The Lazy Dog

The new string is: THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG


# Assumes no Load Delays, and no Delay Slots for branches

        .data

str1:   .ascii "Please type in chars, then ENTER.\n"
        .asciiz "The input is converted to upper case.\n"

askstr: .asciiz "\n?-> "

orgstr: .asciiz "The original string is: "

answstr:.asciiz "The new string is: "

crlf:   .asciiz "\n\n"

strbuf: .asciiz "12345678901234567890123456789012345678901234567890123456789012345678901234567890"

        .text

main:   li      $v0, 4                  # Print String
        la      $a0, str1               # load address str1
        syscall

        li      $v0, 4                  # Print String
        la      $a0, askstr             # load address askstr
        syscall

        li      $v0, 8                  # Read String
        la      $a0, strbuf             # Buffer Address
        li      $a1, 81                 # Characters to read + 1
        syscall

        li      $v0, 4                  # Print String
        la      $a0, orgstr             # Original Message
        syscall

        li      $v0, 4                  # Print String
        la      $a0, strbuf
        syscall

        li      $v0, 4                  # Print String
        la      $a0, crlf
        syscall

        la      $a0, strbuf             # load address of string buffer

loop:

        lbu     $t0, 0($a0)             # Load Byte (unsigned) from [$a0 + 0] to $t0
        beq     $t0, $zero, done        # NUL encountered

        addiu   $t1, $t0, -0x61         # 'a'
        sltu    $t1, $t1, 26            # a-z 26 character span
        beq     $t1, $zero, upper       # $t1 == 0?

        addiu   $t0, -0x20              # -('a' - 'A')
        sb      $t0, 0($a0)             # Store Byte (only when we change it)

upper:

        addi    $a0, $a0, 1             # $a0++
        b       loop

done:

        li      $v0, 4                  # Print String
        la      $a0, answstr            # Answer Message
        syscall

        li      $v0, 4                  # Print String
        la      $a0, strbuf
        syscall

        li      $v0, 4                  # Print String
        la      $a0, crlf
        syscall

        li      $v0, 10                 # Exit
        syscall
It could be a random act of randomness. Those happen a lot as well.

Herakles

Thank you very much, clive!

Concerning my issues: You hit that nail square on the head! :U

I read your second last post first and I intentionally ignored your last post in order to implement your explanations by my own (learning effect). Like mad I made it with branches and now I do see why you said "avoid branches whenever it's possible." My code works but I'll keep your warning always in mind.

BTW: Are you from the USA?

clive

Technically I'm English, and just live/work in the US.

I dug up my copy of "MIPS RISC ARCHITECTURE, KANE 1998" ISBN 0-13-584749-4
http://www.alibris.com/booksearch?binding=&mtype=&keyword=0-13-584749-4&hs.x=0&hs.y=0&hs=Submit
http://www.amazon.com/MIPS-R2000-RISC-Architecture-Gerry/dp/0135847494/ref=sr_1_1?ie=UTF8&s=books&qid=1274303952&sr=8-1

It's a 1st edition, 4th printing from 1989. This covers mostly the R2000, with a bit on the R3000. The SLTxx instructions look to be pretty safe.
It could be a random act of randomness. Those happen a lot as well.