Saturday 26 June 2010

GSoC Report: Week 5

In my last post I said I had complete the warm-up phase and that I will start to think about how to handle caches which are no longer available.

The caching mechanism relies in the files _darcs/prefs/sources and ~/.darcs/sources, basically the content of those file is used to generate the cache entries, each of the entries in that file indicate an alternative source to get files. If we want to specify global caches we put that in ~/.Darcs/sources but if we want an alternative repositories to pull from, we specify that in the repository sources file which is in _Darcs/prefs/sources, also each time we do a pull from an external repository it is added to the sources file
automatically.

The problem of expiring caches is given because sometimes it happens that repositories that were available, can become unavailable. For example if I had pulled from 3 different repositories and 2 of them stop being available, it could take up to 2 minutes to get each patch, because Darcs could try to fetch every patch it needs from those 2 not longer available repositories, it tries to establish a connection, an then waits for a time-out or a bad response code. After the problem I just mentioned, the idea is to design a mechanism which can help Darcs to establish which entries should be expired.


We can split the cache entries in two groups: locals and remotes. Dealing with local non-longer reachable repositories is not a big deal since if we don't find the local entries we can assume they don't exist and we can drop them from the cache and stop trying to fetch files from them. Remote repositories are more tricky, for example I can't eliminate an entry just because it gives a time-out when it tries to establish a connection with it, there are other external factors which could interfere with that particular entry in a given moment
(firewalls).

So as we seem this is not an easy task, handling remote repositories is out of our hands, we don't have control over the external sources, we don't have control over the network configuration and so on. So a first approach to this is to mark the entries which are not working and ignore them for the rest of the pulling since we don't want to try to establish a connection with an entry which we know is not available. If we try to establish a connection and fails we can mark it as a bad entry but also I think it could be awkward to wait for a 60 seconds time out, something we could implement is a default time for waiting for a connection to succeed (10-15 seconds maybe) if it doesn't happen between that time we can skip it, mark it as bad, and don't try for the rest of the patches that particular entry. Other approach suggested in the bug tracker was to try to establish a connection with all of the entries and use the one that responds first but then what if all the entries are bad entries?. I have to think more about it, I will discuss on irc and the mailing and then with a clearer idea I will start to code a patch to solve the issue.

Also I sent a first version of a failing test for the case of unreachable entries, but I need to amend, as there are some missing cases.

More of my progress can be found in the wiki.

Sunday 20 June 2010

GSoC Report: Week 4

I have completed my phase of warm up issues which was oriented to allow me to get familiar with the Darcs system, specifically the cache part. I have sent the patches and they have been applied.

During the week I worked in finishing issue 1176, continued with the documentation part, more specifically describing in a higher level how a patch is fetch with a given hash, making easier for someone with a non-technical background understand what is happening, fixing a test which was failing in Windows when it shouldn't and finding out why the IO operations over hashedRepos were put in a different module.

For fixing issue 1176 I did some modifications over some code I had already sent which was about keeping the caches sorted by locality, the initial idea was that anything which was local should be first in the list and then the remote sources, but we realized that between the remotes also exist a "wanted" hierarchy, basically we would prefer to access first http repos over ssh, so the new sorting keeps all the locals first, http in the middle and ssh repos at the end. One of the problems this solve is that weird behaviour of darcs trying to establish a ssh connection when pulling from a http or local repository.


While I was working in the test which was failing we redefine what gets saved in the _darcs/prefs/sources, the global caches were one of those things, it wasn't necessary to have them there because that's why a global cache configuration file exist (~/.darcs/sources), so basically we just drop anything related with global caches before saving the sources file of a repository.


For the next week I will start to work in the problem of how to deal with the unused cache, the idea is by the end of the week to have a work plan and write a test case.

You can check out more of my progress in the wiki.

Saturday 5 June 2010

GSoC Week 2

After last week's meeting with Eric ( my mentor) we got a better time line written down, and now I have clear goals for each week until the midterms evaluation, for this week my goals were:
* Complete issue1503
* Complete issue1210
* Description of cache usage for each module in Darcs.Repository

I'm happy to say that issue 1503 was closed, and I sent the patch for issue1210 but is waiting for revision, also I noticed that I'm more familiar with the darcs structure ( at least the part in which I'm working ) and I've started to know where I have to search for something each time I need a particular functionality.

When I first sent the patch for issue 1503, Eric and Petr did some comments about design and style which took me to rewrite the original patch, thanks to them I have learn to push myself more into thinking in a bigger view each time I plan to introduce changes somewhere, sometimes you just develop bad practices which you don't see unless someone else point it to you, so I feel more convinced that a good way to learn and become a better programmer is contributing to this kind of projects, where you interact with other people, where they are commenting on you code, making you think better about what you are doing, all this kind of stuff is something you won't learn from the university, but just getting involve in something in the real world.

Other thing that I learnt was that I had a wrong idea of the concept of tests, I thought the test case for a certain issue would be one were it used to fail before applying the changes, but I didn't thought about how should the test behave in case of failing, here thanks again to Eric for explaining it to me :).

For the next week I plan to work on issue1176, start to elaborate a test plan for issue 1599, write the test cases for issues 1503 and 1210 and continue my work in the document of the darcs cache.


You can always check out my advance and know more about my project in the darcs wiki where we are documenting everything.