CMSC 231 Intelligent Robotics – End of Semester Evaluation
Nazmus Saquib
Instructors: Rebecca Thomas and Sven Anderson
Projects
Undertaken So Far:
Trilateration using Microphones
- Teaching the robot how to make its own map (a naive machine learning method)
- Finding the width of a gap between two obstacles
I was working in the vision group and used ACTS libraries and the C++ code for colorfollowingexample.cpp to create a system that would track a specific colored blob and move the robot towards that. For the ACTS api to understand the nature of the color blob, I used a software called ACTS EZ-Trainer which takes input from the camera and saves the color information specified by the user, when the user clicks on the particular blob to track on the video window. This color information is saved in what is called a channel file with the extension .lut. A single channel file can be used to store information about an object’s colors in different lighting conditions, and there can be multiple channel files stored for the same session, so that the robot can chase different colored objects at the same time from the camera. The overall session’s data should be stored in a runtime configuration file. The code for colorfollowingexample.cpp in the ACTS demo folder takes parameters as input from the runtime configuration file and combines the information from the camera to track the specific colored blob and controls the robot accordingly. So, basically the code needs the runtime configuration file to search the color in the image being received from the camera. The setting for the camera that we use with the pc (not the sony camera used with the robot) is – Svideo0. I basically used this whole system to train the robot for finding a particular color in different lighting conditions, and tested it on MobileSim. But unfortunately we could not get the particular framegrabber working during the week of opening. So, the project was not demonstrated finally.
The day before the opening day, I was trying to write a thread method that would integrate the speech recognition methods, but I didn’t have a good grasp on pointer programming, so couldn’t implement it. Then I worked with Maxim and Rob to develop a code based on Becky’s code for dead reckoning to run the robot randomly around the lobby and commenting at different regions. The speech recognition group already wrote the template, but we modified it to say random sentences. I wrote a method to integrate becky’s methods and the speech recognition group’s methods as a single method with appropriate calls from the main() function. I was there during the demonstration to help manage the robot and give answers to peoples’ queries.
This was done as homework to the class problem of creating an action that moves the robot towards the closest object and stops it from 2 ft distance. I worked with Rob to edit the code of RKC opening day tour03.cpp and add an action called Approach. It was much like the Turn behavior but it checked the right and left ranges from the obstacle to decide which obstacle is closer, and would turn to that and move towards that. An obvious problem was when the robot approached a corner of an obstacle and couldn’t decide which was smaller range, because if it turns towards left, say, then the right range would become smaller at the same time. So, it would try to turn to the right range’s direction now, but then the left range becomes smaller. Thus the robot would start to oscillate around that corner. Rob edited the RKC map with several obstacles and we tested our program on that. It worked pretty fine except that problem.

Finding closest distance
Oscillating Problem
C++ Learning (Pointer Programming):
I had some problem in understanding pointer based programming. So, I read several articles from net and went through some example programs to understand/review the concepts.
1. Though I did not totally concentrate on VisLib, I was trying to implement an efficient algorithm to find object in an image. It preliminary started as an assignment to implement some form of object detection (homework problem), but I got interested in finding an algorithm to effectively use others’ object detection functions. I am working in collaboration with Maksim, since he already developed a detection algorithm and code. But the problem with it is, a starting point must be defined by clicking on the image. For the real-time rendering of the video, we probably need some algorithm that defines the starting point avoiding the O(n^2) time to go through every pixel and check (row-column wise). I have thought and come up with a (probabilistic, probably!) algorithm that will segment the region of image into a certain number of sections, say 20 sections equal in area, in other words, divide the region in 20 grids, generate 10 random points in each grid, and test those points for the color of the blob. If not found in any grid (though it is very likely to be found), we will use the method again to check for the 2nd time, but this time ensuring that the previous points are not repeated. To ensure that those points are not repeated, we can use two different number generating formulas that will always have a distinct set of numbers. I am not sure whether using a randomize function to generate random numbers is more time consuming or not, so I think it is better to use a number generating formula. Generally in the worst case, the linear algorithm will check 640 * 480 = 307200 pixels, but assuming 20 grids, this algorithm in the worst case would take n * 20 * 10 = 200n pixels to check, where n is the number of times we repeat the test. I am still working with the code, though the progress is slow at the moment because I’m working on several tasks at once right now. Probably I will be able to finish it over the winter break and test it.
2. Another idea that I gave to Maksim was to use a skip value to skip pixels rather than checking every pixel for the specific color in the loop, since we do not need very sharp edge detection for the object. So, there is no point checking every pixel for the color value. Though theoretically it should work, we haven’t been able to implement it. Maksim is still working on it.
(Though we have object detection functions directly available in VisLib, I took these tasks as an extension to the homework problem, and of course, to have some fun finding some new methods and algorithms!)
3. Right now I’m working on a method of depth perception from the image of the tennis ball. I tried to formulate a relation between the lens’ focal length and the image size of the tennis ball. Another solution was to use similar triangles to find the actual image size, given that the height of the object (tennis ball) is known. But the problem with both of these methods is – the image size found through the camera image is not the actual image size. We get something in terms of pixels, which is not an actual measure by which we can calculate real distance. So I am trying to experimentally find results for different image sizes and use an array of values to estimate the range of the tennis ball from the robot. It is not efficient and very accurate, and I’m still thinking about the formulas. I think this is important in the sense that the robot can get a feeling of the distance from the tennis ball at any time, so it knows how far to go to reach the ball.
Trilateration Using Microphones:
I am working to develop a system to find the direction of sound of a particular frequency, which might be useful in detecting human voice and moving towards the sound source. I was trying to find a mathematical relationship to find the angle between the robot and the sound source using two microphones, rather than 3, since that will lessen the hassle. But, it turned out to be pretty challenging. I was able to find a vague relation using the direction of wavefronts and average speed of sound, but unfortunately I have always ended up in an equation with 3 unknown variables. I am still trying to find a method that does not have to use mathematical equations directly, rather can estimate values for the angle empirically.


In the empirical method, I will try to generate a look-up table for different wavefront directions having angle theta with the M1M2 axis. (I’ll write more about this later, in this page).
Another problem is to integrate the microphones with the current running system. There are two solutions for this – we may use a multiplexer (made by us) to generate a single wave function for all the inputs, and use the serial port to input the signal. I don’t know yet how are we gonna interpret the serial cable data and integrating may be really daunting. Rather I am searching for free C++ engines/libraries that take input from microphone ports directly. For the case of multiple microphones, we can use a sound card with more channels for mic input. But I don’t think we will be able to finish this task within this semester. We can continue doing the work in the next semester.
Other works attempted that are
incomplete:
- Make another 2D array, that represents the whole room (that is more like a grid, where each unit represents a coordinate)
- Copy the 2D array values in the appropriate positions in the Room array
- Now scan the Room array for the coordinates that are left out, that is, not visited.
- For each empty (unvisited) element, search in the neighboring area of unvisited elements for a reasonable pattern, (for example square or a rectangle), and mark them.
- Repeat this for other unvisited and unmarked elements.
- Finally save the array in a file with the extension .map

This figure shows a depicted first run and how the actual map grid looks like after the First Run
So we have a roughly accurate method to find out the static obstacles on the room and their shapes and sizes. We can again run the robot and let it roam around the area to make corrections to the Room array. For a correction, we need to write another program that compares the current Room array and the new version of the array, and un-tags the unvisited elements as necessary.
But I did not continue with the project because – 1. We don’t need it for the robot right now, we can make the map more easily by taking measurements ourselves! Right now, my method is not efficient at all! :P 2. It was going to be time consuming in terms of coding and modifying this suggested algorithm.
I have already thought some more detailed improvements to this method, but I don’t have time to put all of those in the page right now (need to write my Fysem essay!). I hope to work on this algorithm, code and modify it during the winter break. Anyone who is interested to suggest any improvements or write code, please email me! I will be updating the page with the improvements I will be making (given that I get access to the robotics lab(!) :P). It will be fun to modify this method, and experiment with it.

I would like to work on these ideas over the winter as kind of a research both to have some fun and preparing a paper for the CCNSCE conference.
Teaching the robot how to make a map of an unknown area by itself – If no one has already done the job! I will search on the net to make sure this kind of algorithm does not exist! :P (ha ha ha)
Depth perception from image by
comparing other objects’ sizes to the target object’s size:
Basically I had two plans –
1. Implementing stereoscopic vision using 2 cameras, which
is easier. But considering our hassles with a single camera and the specific
frame-grabber problem, probably I may have to stop thinking in this way.
2. Hence, I have been thinking in a new way to approach the problem. There are
certain algorithms available that extracts useful information from a given
image. By useful information, what I mean is the algorithms find out the areas
in the image that has 'reasonable' shapes and colors, and segment those areas
into recognizable objects by marking them (some algorithms are about 90%
statistically correct in finding these). Now, given that I use the other Vision
group's blob tracking function, my algorithm would implement these algorithms
to constantly detect some other objects in the environment, and compare the
size of the target object (the blob) with their sizes to estimate the range.
The most important concern is the time efficiency of the methods. So, I thought
if the framegrabber grabs 30 frames/sec, then we don't have to use all 30
images to determine the range, probably updating the image system 4/5 times
every second will do the job. The method can run as a thread with the main
system running. But I still need time to learn, implement and experiment with
these methods.