Monday, April 18, 2016
Compiling and Testing TORO on Windows 10
I am currently using MIGW64 to compile TORO from the command line. It is only necessary to run make in the toro-code/tests, and then, run the corresponding .sh file to compile the example, e.g., ToroHello.sh by invoking ./ToroHello.sh. The last commits fix any problem during the compilation. The outputting screen should look like in the picture. The needed tools for the compilation are MINGW64 for the make, Nasm for the compilation of the bootloader and fpc-2.6.4 for the build.pas. Once all installed, run the procedure presented above.
The debugging works well by using the last version of qemu. After installed it, it is only necessary to invoke the corresponding .sh by passing the -e argument, e.g., ./ToroHello.sh -e. By doing so, we get a pretty TORO outputing saying that he is alive.
Wednesday, February 04, 2015
FOSDEM2015
Hi folks,
after a weekend in Brussels, I am again in Nice. It was a really interesting experience at FOSDEM. I put some pictures of the talk. I really enjoyed. I also left the link to the presentation. In any case, I think the presentation will be online soon at fosdem.org.
Regards, Matias.
after a weekend in Brussels, I am again in Nice. It was a really interesting experience at FOSDEM. I put some pictures of the talk. I really enjoyed. I also left the link to the presentation. In any case, I think the presentation will be online soon at fosdem.org.
Regards, Matias.
Sunday, January 11, 2015
Toro Hello world
In this post, we explain how toro's examples can be compiled, and then simulated.
1) Installs YASM and FPC/Lazarus.
$ apt-get install yasm
To install Lazarus/FPC, please follows the instruction here.
2) Clones TORO repository.
$ git clone git://toro.git.sourceforge.net/gitroot/toro/toro
3) Compile tools
Before to compile the toro's examples, we have to compile the tools needed to generate the Toro's image. To do so, runs "make" at /toro/tests.
4) Toro Hello world.
The toro/tests directory has examples that can be simulated by using qemu. These are good examples to start with Toro. Here, we explain how the hello world toro example can be compiled and simulated. To do so, it is only necessary to run:
$ bash ToroHello.sh -e
This will compile the ToroHello.pas example, and generate the ToroImage.img image, if the compilation was successfully. The "-e" parameter is used to execute qemu to perform the simulation of the generated image.
Matias E. Vara
torokernel.io
1) Installs YASM and FPC/Lazarus.
$ apt-get install yasm
To install Lazarus/FPC, please follows the instruction here.
2) Clones TORO repository.
$ git clone git://toro.git.sourceforge.net/gitroot/toro/toro
3) Compile tools
Before to compile the toro's examples, we have to compile the tools needed to generate the Toro's image. To do so, runs "make" at /toro/tests.
4) Toro Hello world.
The toro/tests directory has examples that can be simulated by using qemu. These are good examples to start with Toro. Here, we explain how the hello world toro example can be compiled and simulated. To do so, it is only necessary to run:
$ bash ToroHello.sh -e
This will compile the ToroHello.pas example, and generate the ToroImage.img image, if the compilation was successfully. The "-e" parameter is used to execute qemu to perform the simulation of the generated image.
Matias E. Vara
torokernel.io
Wednesday, December 10, 2014
TORO in FOSDEM 2015
I have the pleasure to give a talk in FOSDEM 2015 about TORO. The talk will be at devroom microkernel on Sunday. Here you have a summing of the talk. I will be at the conference the whole weekend so don't hesitate to contact me to talk.
Matias E. Vara
Sunday, November 25, 2012
Debugging TORO with BOCHS and GDB
In the following lines I'll tray to explain how debug Toro using bochs+gdb. I often used qemu instance of bochs but the latest days I figured out some rare behaviors so I replaced qemu for bochs with good results.
Before to start we have to compile bochs for such purpose, after download the source run the next command in order to compile it:
> ./configure --enable-cpu-level=6 --enable-all-optimizations --enable-x86-64 --enable-pci --enable-vmx --enable-disasm --enable-debugger-gui --enable-logging --enable-fpu --enable-3dnow --enable-sb16=dummy --enable-cdrom --enable-x86-debugger --enable-iodebug --disable-plugins --disable-docbook --enable-gdb-stub
> make install
It is not possible to enable the smp support and gdb stub at the same time. If the compilation was right we'll be able to run bochs with gdb support.
Now It is needed to compile the toro application within the debug symbols, if there're any old .o or .ppu file they must be deleted because they don't have symbols information. We should execute in the toro source directory the next commands:
> fpc ToroHello.pas -g -oToroHello -Fu../rtl/ -Fu../rtl/drivers
> ./build 4 ToroHello boot.o ToroHello.img
This procedure is for ToroHello.pas but it's the same for other toro's app.
So far, we've the bochs and the Toro's image, now we have to build the .bochsrc file and launch bochs. The following lines may be useful:
megs: 32
ata0-master: type=disk, path="ToroHello.img"
boot: c
log: bochsout.txt
gdbstub: enabled=1
Check that we've indeed to enable the gdbstub in the bochs' source file.
The next step is to run bochs and then GDB:
> bochs
> gdb ToroHello
If we run gdb in the toro/test directory .gdbinit will be used, otherwise we have to connect to bochs manually as follow:
> (gdb) target remote localhost:1234
If everything goes well we are able to set breakpoints and uses all the tools of gdb. For instance we could do:
> (gdb) b KERNELSTART
> (gdb) c
In the first line a breakpoint is set at KERNELSTART and then the virtual machine continues until comes back when the breakpoint is reached.
Many commands could be usefull in this point like n, for running line by line, step for stepping into, info registers and soon on.
There're a lot of information that we can get at this point but that's for another tutorial ;)
Matias E. Vara
www.torokernel.org
Friday, September 21, 2012
Toro's article in Microelectronic Congress 2012
Here it's the link to the article accepted by Microelectronic Congress 2012, its title is "Memory module designing for embedded purpose". It's about Toro Kernel memory module and it deals with the improvements did when an Operating System runs in a dedicated environment like embedded systems
In this opportunity the congress will be in the city of Rosario, Argentina. Unfortunately, this time I'll not be able to show the article as the last year.
Again, the article is in spanish agreed with the congress' rules.
www.torokernel.org
Etiquetas:
embedded,
Hypertransport,
memory,
NUMA,
SMP
Saturday, June 23, 2012
Slides about migration threading in TORO
Here, I left a few slides about migration procedure in TORO kernel, I made a brief comparation with Linux mechanism. Also, I told a few lines about the migration in a real time environment.
Enjoy!
Matias E. Vara
www.torokernel.org
Enjoy!
Matias E. Vara
www.torokernel.org
Sunday, May 20, 2012
Git workflow for Toro
"Git workflow" is very known subject, It is possible find thousands documents talking about and with enough information that it makes unnecessary write other tutorial. Anyway, when I have to do any change in Toro's source, I do not follow any patter and I just make the changes and commit them.
In order to make cleaner this procedure I decide to write a kind of "git protocol" to be follow every time that I make a change on Toro.
I figured out how important is to be clear when we change the source, moreover if the project is huge and every little modification can become it unstable. It helps for understand the design and also for finding faster a possible bug.
For implementing a new feature or fix a bug, we should start creating a new branch. It will keep our job independent of the master and we can mistake without care.
> git checkout -b new_branch
This will create a local branch and it will switch to it.
Makes the changes and then commits when you made enough changes, try to avoid very short commit that they aren't adding important information.
> git commit -a
Note about commit message:
"- Write in present
- First line of the description is a short summary of the change:
- If I’m fixing a bug, a small example of the bug I fixed or issue I was changing.
- If I’m changing a feature, or adding a new one, why I think it the right choice and why I think it’s
acceptable or make that sort of change in the given part of the product cycle we are in."
For sharing your job before finish it, push the branch to the remote repo.
> git push -u origin new_branch
Then, in order to show all the branches either local or remote, first execute:
> git branch -a
And after that, to perform a local checkout run:
> git checkout -b new_branch origin/new_branch
It could be useful write a test in toro/tests/ in order to check the new feature and the stability of the kernel.
When you are sure about the changes, merge to master:
> git checkout master
> git merge --no-ff new_branch
A comment about --no-ff flags extracted from here:
"The --no-ff flag causes the merge to always create a new commit object, even if the merge could be performed with a fast-forward. This avoids losing information about the historical existence of a feature branch and groups together all commits that together added the feature"
Finally, we should remove the remote and local branch:
> git push origin :new_branch
> git branch -d new_branch
Matias E. Vara
www.torokernel.org
Matias E. Vara
www.torokernel.org
Tuesday, May 01, 2012
Toro debug with ECLIPSE
Hi everyone! I figured out that the server where I had the video about "TORO debug with ECLIPSE" is down, so I've uploaded it again. It shows the TORO Builder running on Windows 2003 x64. It is interesting how we can do "step by step" debugging and "set a breakpoint". Qemu is emulating a x86_64 arch.
Enjoy!
www.torokernel.org
Monday, April 30, 2012
Toro's article in Microelectronic Congress
Here it is the link to the article that I've shown at Microelectronic Congress 2011, Faculty of Engineering, University of La Plata, Argentina. The only thing is it is in spanish. It talks about TORO project and it shows a benchmarks that I made for my graduated project. Enjoy!
Matias E. Vara
www.torokernel.org
Matias E. Vara
www.torokernel.org
Etiquetas:
atomic operations,
kernel,
linux,
locks,
multiprocessing,
NUMA,
SMP,
threading cooperative,
windows,
x86,
x86-64
Tuesday, January 03, 2012
IOAPIC supported!
Until the latest TORO's source, I was using 8259 controller to catch interrupts, but the problem was all the interrupts were captured by the Boot Strap Processor (or core #0). This is not agree with the dedicate hardware model.
In this way, I have started to use the IOAPIC instance of 8259 when TORO runs in a multicore environment. Thus, I can redirect the IRQ to the core where the hardware was dedicated. The dedication must be performed by the user.
This is a new step toward a real multicore kernel.
Matias E. Vara
www.torokernel.org
Monday, December 12, 2011
Fixed an important bug in emigrate procedure
That's just a brief post about a recent change in the way that Toro migrates threads.
Previously, when a Thread running in core #0 wanted create a new Thread in core #1, function ThreadCreate allocated the TThread structure, TLS and the Stack then, It migrated the whole TThread structure to the core #1.
The main problem in this mechanism was that all memories block were allocated in parent core. This is a serious infraction in the NUMA model: TThread, TLS and the Stack are not already local memory.
Thus, I rewrote the way that Threads are migrated. When a Thread wants to create a new one remotely, Toro still invokes ThreadCreate BUT it is executed in the remote core. Instance of migrate the TThread structure, now Toro migrates a set of arguments to be passed toward ThreadCreate. When ThreadCreate finishes, the parent thread retrieve the TThreadID value or nil if it fails.
As we can see, while a local thread is made immediately when ThreadCreate is invoked, a remote thread spend two steps of latency: one for migrate the parameters and other for retrieve the result.
Matias E. Vara
www.torokernel.org
Thursday, August 25, 2011
Patching GDB 7.3 for QEMU remote kernel debug
This time I will try to explain how patch GDB 7.3 in order to debug a kernel using QEMU through remote debuging. If we try to debug remotely, we'll find a error message like:
Remote packet too long: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 ...
I am not sure about problem but I suppose it's about register size. When the virtual machine jumps from real mode to long/protect mode, the register size changes but GDB doesn't know that. Thus, when GDB receives a bigger packet than it expects, it fails. Therefore, The patch just increments the buffer in those cases.
The first step is to download GDB 7.3 from http://www.gnu.org/s/gdb/download/, I've implemented the patch on 7.3 version but I think it works in oldest too.
Once downloaded and uncompressed, edit the file gdb-7.3/gdb/remote.c and go to 5693 line. That's the process_g_packet procedure. Now, look for and replace the original source with the following lines:
/* Further sanity checks, with knowledge of the architecture. */
//if (buf_len > 2 * rsa->sizeof_g_packet)
// error (_("Remote 'g' packet reply is too long: %s"), rs->buf);
if (buf_len > 2 * rsa->sizeof_g_packet)
{
rsa->sizeof_g_packet = buf_len;
for (i = 0; i < gdbarch_num_regs (gdbarch); i++)
{
if (rsa->regs[i].pnum == -1)
continue;
if (rsa->regs[i].offset >= rsa->sizeof_g_packet)
rsa->regs[i].in_g_packet = 0;
else
rsa->regs[i].in_g_packet = 1;
}
}
Finally, it just remains to execute:
$ ./configure
$ make
In some systems may be necessary to install termcap library, simply execute:
$ sudo apt-get install libncurses5-dev
In some systems may be necessary to install termcap library, simply execute:
$ sudo apt-get install libncurses5-dev
After compilation, the binary could be found in gdb-7.3/gdb/gdb, It must be enough to run GDB correctly.
Matias E. Vara
www.torokernel.org
Tuesday, August 23, 2011
Toro in Microelectronic Conference (UNLP)
Toro will be shown in the Microelectronic Conference at University of La Plata, Argentina. In the work that I've done I will show the kernel capabilities and a few tests comparing Toro with a general purpose operative system. The conference will be the 8th of September in "Sala A" at 16.20hs.
Matias E. Vara
www.torokernel.org
Saturday, July 30, 2011
Toro in Ubuntu 11.04!
Now It is possible to compile and test TORO easy trough Linux Ubuntu 11.04. Actually, I am moving the whole project to Linux environment. In other way, I have started to use GIT in order to simplify the developed. I have updated the WIKI, giving the instructions to compile and TORO in Ubuntu.
Enjoy!
Matias E. Vara
www.torokernel.org
Friday, June 24, 2011
Toro bootloader
How can I start it?. The bootloader is a project itself, if you want to write a hobby OS you do not have to start from the bootloader. First, It will take you a lot of time, second, it is too hard to debug so you will become disappointed fast and you wont finish. I think that the important and interested things happens inside the kernel. Anyway, there are a few crazy guys that they want to make one. For that kind of guys, I have just started to write a few documentation about Toro's bootloader in the wiki. I hope that you find it interesting and appreciate the effort done (Yes, I don't like to write documentation but I know that it is too important ;) ).
Matias E. Vara
www.torokernel.org
Sunday, April 17, 2011
Memory Protection in a multicore environment
This post is contained into the final paper of Matias Vara named “Paralelizacion de Algoritmos Numericos con TORO Kernel” to get the degree on Electronic Engeniering from Universidad de La Plata. These theorical documents help to understand the kernel design.
Introduction
When a Kernel is designed for a multicore system, the shared memory must be protected of concurrent writing accesses. The memory's protection increments kernel code complexity and decreases operative system's performance. If one or more processors are having access to some data at the same time, mutual exclusion must be realized to protect shared data in multicore systems.
In a mono-processor multi-task system the scheduler often switch the task, so the unique risk is while the task is changing the information the scheduler take it out the cpu. The protecction is this case is easy: disabled the scheduler while the task is in a critical section and then enabled again.
In a Multiprocessor system that solution can't be implemented. When we have tasks running in parallel, two or more tasks may execute the same line in the same time; Hence, the scheduler state doesn't care.
Resources protection
For protect resources in a multiprocessing system we need to define atomic operations. These are implemented in just one assembler instruction but several clock cycles.
Atomic operations
In every processor, write and read operations are always atomic. This means that when the operation is executing nobody is using that memory area.
For certain kind of operations the processor blocked the memory, with this purpose is provided the #Lock signal that it is used for critical memory operations. While this signal is high, the calls from other processors are blocked.
Bus memory access is non-deterministic; this means that the first one processor gets the bus. All the processors compete for the bus, then in a system with a lot processor this is a bottleneck.
But, why do we need atomic operations? Supposing that we have to increment a counter, the pascal's source is :
counter := counter +1;
If this line is executed at the same time, in several processors, the result will be incorrect if it is not atomic.
The correct value is 2, using atomic operations the processors access to the variable once per time and the result is corrected. The time to the sincronization increments with the number of processor. The common atomics operations are "TEST and SET" and "COMPARE and SWAP".
Impact of atomic operations
In system with a few processors, atomic operations does not represent a big deal and they are a fast solution for shared memory problem; However, if we increment the number of processors then we make a bottleneck.
Supposing a computer with 8 cores and with 1.45 GHz [1], while an instruction average time is 0.24 ns, atomic increment spends 42.09 ns. The time wasted making lock becomes critical.
[1] Paula McKenney: RCU vs. Locking Performance on Different Types of CPUs.
http://www.rdrop.com/users/paulmck/RCU/LCA2004.02.13a.pdf, 2005
Etiquetas:
atomic operations,
locks,
protection,
SMP
Tuesday, April 05, 2011
Memory organization in a multicore system: Conclusion.
From programmer point of view, the access to local and remote memory is transparent. An NUMA could be implemented in a SMP system without any problem. However, the OS must do an efficient memory assignation for improve these technologies.
In the case of SMP, memory administation is easy to implemented while in NUMA is not. The system has to assign memory depending of the cpu where the process is running. Every CPU has an own memory bank. The system performance is poor if there are more remote access than local.
Windows has supported NUMA since 2003 version and Linux since 2.6.X. Both of them gives syscalls to exploit NUMA.
TORO kernel is optimized for NUMA technologies, keeping in mind the moderns processors. The unique way to support NUMA is using dedicate buses. In the high performance environment these improves mustn't forget.
Matias E. Vara
www.torokernel.org
In the case of SMP, memory administation is easy to implemented while in NUMA is not. The system has to assign memory depending of the cpu where the process is running. Every CPU has an own memory bank. The system performance is poor if there are more remote access than local.
Windows has supported NUMA since 2003 version and Linux since 2.6.X. Both of them gives syscalls to exploit NUMA.
TORO kernel is optimized for NUMA technologies, keeping in mind the moderns processors. The unique way to support NUMA is using dedicate buses. In the high performance environment these improves mustn't forget.
Matias E. Vara
www.torokernel.org
Sunday, March 13, 2011
e1000 driver for TORO
I have just started the implementation of e1000 driver like Intel Gigabit or compatible. I am using Minix 3 source and qemu as an emulator (Begin 0.12.0 version it supports e1000 emulator). The detection procedure is complete as you can see in the picture, It is uploaded to SVN.

Saludos!
Matias E. Vara
www.torokernel.org
Saludos!
Matias E. Vara
www.torokernel.org
Sunday, March 06, 2011
Memory organization in a multicore system II
Continuation of the article Memory organization in a multicore system.
Non uniform memory architectures.
In a NUMA system the processors have assigned a memory region, it access more fastly that others to it, however every procesor can access to every memory position. It is used message passing to remote memory access. The programmer see a continuos memory space and the hardware makes that abstraction.
The first one in the NUMA tecnology was Sequent Computer Systems. They introduced NUMA at '90. Afterwards it was acquired by IBM and the tecnology was implemented in Power processors.
In other way, IBM made it own NUMA implementation called SE (Shared Everything). This implementation is presented in Power6 processors.
The Intel NUMA implementation is called QuickPath Interconnect. It allows to share memory between the processors and it is transparent for the Operative System. Each processor has a point to point controller.
AMD implementation uses fast links called "Hypertransport Links". In this implementation each procesor has a memory controller and a local memory. The processors are connected between them through a coherent Hypertransport link. Futhermore, each processor has a bi-directional no-coherent bus for IO devices.
Using Point-to-Point controller, the processor can access to memory region more fastly than other and there is an important latency if it tray to access to remote memory. In this way, we have two kind of memory: Local Memory and Remote.
Matias E. Vara
www.torokernel.org
The first one in the NUMA tecnology was Sequent Computer Systems. They introduced NUMA at '90. Afterwards it was acquired by IBM and the tecnology was implemented in Power processors.
In other way, IBM made it own NUMA implementation called SE (Shared Everything). This implementation is presented in Power6 processors.
The Intel NUMA implementation is called QuickPath Interconnect. It allows to share memory between the processors and it is transparent for the Operative System. Each processor has a point to point controller.
AMD implementation uses fast links called "Hypertransport Links". In this implementation each procesor has a memory controller and a local memory. The processors are connected between them through a coherent Hypertransport link. Futhermore, each processor has a bi-directional no-coherent bus for IO devices.
Using Point-to-Point controller, the processor can access to memory region more fastly than other and there is an important latency if it tray to access to remote memory. In this way, we have two kind of memory: Local Memory and Remote.
Matias E. Vara
www.torokernel.org
Subscribe to:
Posts (Atom)

