What’s the coolest development platform out there right now? (Why?)

Today I had to answer this simple question – what’s the coolest development platform out there right now?
This is how I did this.

Well, I think mobile application development is going to be huge. This is where we’re heading to. If we look back at a few past decades there is a huge growth in computational power, the devices become more powerful at the same time shrinking in size.
More and more people use smartphones these days and we’ll see new ways of using them in the future.
If we look at the technology in mobile space today then there are more or less 2-3 platforms: iPhone, Android and Blackberry. iPhone has a big customer base, Android is catching up and supported by Google.
My recent experience with the iPhone development has been positive in terms of technology. It’s a good platform, the API is clean and consistent, great documentation from Apple and powerful language (Objective-C). Objective-C of course has it’s own drawbacks but overall it’s a great language and it’s kind of sad it’s not so popular. So, for me it’s iPhone and mobile application development in general.

What’s your take on this? Как бы вы ответили на этот вопрос?

Developing augmented reality app for iPhone. Preface.

Lately I’ve been doing some iPhone development and found it quite interesting and fascinating. I’ve learned a lot about Objective-C, Cocoa Touch and the platform itself. It’s a good experience and I’m glad I left Java aside for a little bit and took a closer look at how things are done on Mac.

I’ve written two iPhone applications and here I’m going to be marketing them. :) But it wouldn’t be fair if I only did that, so in next few posts I’m going to talk more about implementation, math and the rest. Stay tuned.

WikiEye screenshot

Meanwhile have a look at the applications

  • WikiEye – shows Wikipedia articles around you using augmented reality. (Promo codes: 3PJR7K7LM3NX 6E696TJH4M43 HMFLL6EXJ9FT XAH7PJE464F7 E7FE6J3LFHJJ )
  • Facemosaic – uses OpenCV framework for face detection, captures faces of your friends and splits them into different parts randomly. The result is unexpected and rather funny to see. :)

Curious case of Xml

Can you quickly grasp what’s happening here?

 <mix:for-each select="$dataInTwoMonths">
    <mix:variable name="hour" select="xfn:format-dateTime(xfn:sql-dateTime-to-xml-dateTime(./date_time), 'HH')"/>
    <mix:variable name="dayOfWeek" select="{getDayOfTheWeek(./date_time)}"/>
    <mix:choose>
        <!--if it's Monday - Friday-->
        <mix:when test="$dayOfWeek &gt;=2 and $dayOfWeek &lt;=6 ">
            <mix:choose>
                <!--if from noon to 7 pm-->
                <mix:when test="$hour &gt;= 12 and $hour &lt; 19 ">
                    <mix:global-variable name="afternoon_spendings"
                        select="$afternoon_spendings + ./spendings_per_hour"/>

                    <mix:variable name="peak_date">
                        <mix:call-function name="isPeakEventDate">
                            <mix:with-param name="date">{./date_time}</mix:with-param>
                        </mix:call-function>
                    </mix:variable>
                    <mix:if test="$peak_date">
                        <mix:choose>
                            <mix:when test="$hour &gt;= 12 and $hour &lt; 15">
                                <mix:global-variable name="moderate_peak_event_spendings" 
                                    select="$moderate_peak_event_spendings + ./spendings_per_hour"/>
                            </mix:when>
                            <mix:when test="$hour &gt;= 15 and $hour &lt; 18">
                                <mix:global-variable name="critical_peak_event_spendings" 
                                    select="$critical_peak_event_spendings + ./spendings_per_hour "/>
                            </mix:when>
                        </mix:choose>
                    </mix:if>
                </mix:when>
                <!--if 10 am - noon or 7pm - 9pm-->
                <mix:when test="($hour &gt;= 10 and $hour &lt; 12) or ($hour &gt;= 19 and $hour &lt; 21)">
                    <mix:global-variable name="morning_evening_spendings"
                        select="$morning_evening_spendings + ./spendings_per_hour"/>
                </mix:when>
                <!--if 9pm - 10 am-->
                <mix:when test="$hour &gt;= 21 or $hour &lt; 10 ">
                    <mix:global-variable name="night_spendings"
                        select="$night_spendings + ./spendings_per_hour"/>
                </mix:when>
            </mix:choose>
        </mix:when>
        <mix:otherwise>
            <!--weekend-->
            <mix:global-variable name="weekend_spendings" select="$weekend_spendings + ./spendings_per_hour"/>
        </mix:otherwise>
    </mix:choose>
</mix:for-each>

Was it easy?

CPU Profiling Explained

These days I play a lot with CPU profiling tools (i.e., the tools that help to optimize the execution time of a program) and spent some time explaining to myself terminology and approaches to interpretation of profiling results. From a naive point of view, in order to speed up a program it is enough to glance at a flat list of function execution times, pick the heaviest function and rewrite it. But as we’ll see this is the most trivial use case of optimization, not always available (e.g. what if a library function appears to be the heaviest?) and actually there are more possibilities for optimization.

How programs are profiled

If you ever tried to go beyond recording current time before and after some lengthy process and logging the delta, then you certainly turned your sight toward special tools called profilers. The task of a profiler is to measure resources usage by a program (in our case, to measure CPU time consumption). In an ideal world results are not affected by a method of measurement used, but in reality they do. This is why several measurement approaches exist.

A profiler needs to record what functions were invoked and how many times it took to execute a function. The simplest way of obtaining this data is sampling. When using this method, a profiler interrupts program execution at specified intervals and logs the state of program’s call stack. The problem with sampling is that even with hardware support it’s not 100% accurate. Some function calls can fall down through `holes’ of a sampling grid, and will not be seen in a profile. But this usually means that these functions are not worth optimizing—CPU spent very little time executing them. The great benefit of the sampling approach is that it almost doesn’t affect program execution, so it can be successfully used in production environment.

Instrumentation is often considered as a more `precise’ approach to profiling. Doing an instrumentation means inserting a special code that performs calls counting or calculation of execution time into prologs and epilogs of functions. The problem with this approach is that it can seriously skew the results of profiling. Consider a function which takes just a dozen CPU cycles to execute and add another dozen of cycles due to instrumentation—now the instrumented function runs twice as long as the original version! And getting the current time is a system call which means that it will take a lot more CPU cycles to burn. That’s why some profilers (e.g. GNU gprof) usually instrument code only to record function’s call count—this is a relatively cheap operation, and measure function execution time using sampling. The other solution is to restrict instrumentation only to a selected subset of program’s functions.

Presenting profiling results

Stack SamplesRaw data collected during a profiling session needs to be presented in some compact and useful way. Recall that during a session call stacks (stack traces) are logged. A call stack is a sequence of function calls: from the currently executed function up to the program’s entry point.

The most complete and at the same time compact representation for profiling data is a call graph. Each node of a call graph represents a function, and each directed edge between nodes represents a call of one function by another. Both nodes and edges of a call graph have associated weights, according to their CPU usage. Call graphThe weight of an edge is the number of stack traces where this call can be found. For a node, two weights are concerned: self weight is the number of stack traces where the corresponding function was currently executed, total weight (also known as cumulative weight) is the number of stack traces where the corresponding function can be found. Let’s assume for now that every program’s function appears in a stack trace no more than once, it means that we disallow recursive or mutually recursive functions. This is only for simplicity, we’ll deal with them later.

Node weightsThe weights defined are related in the following way:

  • total weight of a node is equal to the sum of weights of its incoming edges;
  • total weight is equal to the node’s self weight plus the sum of weights of outgoing edges;
  • thus, self weight of a node is always less or equal than its total weight, and they are equal for leaf nodes (having no outgoing edges).

When producing an output, normalized values of weight values are also used. Node’s weight can be related either to the total number of samples collected, or to its parent’s total weight.

To construct a call graph, one needs to process all stack samples gathered. For each function from a sample, a node corresponding to the function need to be found or created, and its total weight increased by one. For the node corresponding to the currently executed function, self weight also needs to be increased. Edges corresponding to calls between functions need to be established as well, and their weights to be increased.

The problem with call graphs is that it is not so easy to visualize them using standard GUI controls, and almost impossible using a text output. That’s why other representations are often used: a flat view and a tree view. A flat view historically appeared the first and is produced from a call graph by simply discarding all the edges and listing only nodes. For each list item self and total weights (or ratios) are printed. An algorithm for building a flat view is the same as for a call graph, except that relations between functions are not taken into account.

In contrast to a flat view, a tree view retains relations between nodes. Tree’s top-level node can either correspond to the program’s entry point—a top down tree, or top-level nodes can correspond to all functions ever being currently executed—this is called a bottom up tree. Usefulness of the latter view is that it starts from functions that were direct CPU consumers, not from their callers. The problem with a bottom up view is that it is counterintuitive: node’s children are indeed callers of the function and this makes working with self and total weights somewhat tricky.

Top-down treeA top down tree can be constructed from a list of gathered call stacks using the following algorithm. A stack sample is viewed as a route from tree’s root to some leaf. The route starts from program’s entry point. Along the way, non-existent nodes are created and total weights of nodes passed through are increased by one. When a node corresponding to stack sample’s last function is reached, its self weight is also increased by one.

One significant difference between a tree and a graph is that in a tree every node has only a single parent node. Thus, when mapping a call graph onto the corresponding top down tree, every node that has multiple incoming edges is being split into several `instances’ with weights split accordingly. That is, the sum of self weights of tree nodes corresponding to the same function must be equal to self weight of a call graph tree node, the same is true for total weight.

Bottom-up treeA bottom up tree is more interesting. It can be constructed from call stacks using the same algorithm as for building a top down tree, but when iterating through a call stack, one must start with the opposite end—the function currently executed. In the resulting bottom up tree call graph’s nodes are also split among different branches. E.g. the program’s root function is a leaf node of every bottom up tree branch. An interesting property of a bottom up tree is that a node can appear at the top level of a tree if and only if it has a non-zero self weight, so self weights of all nodes are concentrated in top-level nodes of a bottom up tree, and total weights are distributed among non-top-level nodes, which only can have zero self weight. Thus, it is enough only to have a single weight value for each node. For top-level nodes, this will be the self weight of the corresponding call graph node, and for the rest it will be a piece of the total weight. Similarly to a top down tree, summing up weights of all call graph node’s instances will give us its total weight.

Recursive functions

Recursive functionHere comes the fun part. If you have a recursive function, or a set of mutually recursive functions, trying to calculate nodes’ and edges’ weights for them will result in a paradox, because node’s incoming edges can in the same time appear as its outgoing edges. This ruins our relations of weights.

To deal with this problem, let’s recall that recursion is equivalent to iteration. That means, a recursive function can be rewritten using a loop. In this case, calculating its weights isn’t a problem anymore. Thus, for a recursive function we can contribute the weight of its self-call edge to node’s self time, and don’t assign any weight to such edge.

While processing stack samples, it is important to distinguish between time spent in function calls (i.e. the number of stack samples) and the number of function calls themselves. A stack sample can contain multiple occurrences of a recursive function (and thus, multiple self-calls), but this only contributes as a single unit of a call graph node’s and edge’s weights. Thus, when constructing a flat view, every function from a stack sample needs to be counted only once. Self weight is always calculated right because in every stack sample there is only one currently executed function.

Detecting recursionDuring a tree view construction, the right decision is not to perform cycles detection at all. This preserves a relation between weights of parent and child nodes: the total weight of a parent node equals to a sum of its self weight with the sum of child nodes total weights. But when it is needed to gather up several nodes in order to calculate weights of a corresponding call graph node, total weights of recursive function calls must be ignored. In a tree view, recursive calls are easy to detect: a node with the same label exists in node’s chain of parents.

Using profiles to detect bottlenecks

Armed with the knowledge of how profiling data is collected and visualized, let’s figure out how to use it for speeding up a program. I can enumerate the following cases that are detectable by profiling paired with source code analyzis:

  • an individual heavy function (e.g. a sorting function);
  • excessive usage of a library function (e.g. of a memory copying function);
  • repeatedly appearing sequence of function calls.

For finding an individual heavy function, a flat view is enough. Sort it by self weight in decreasing order, and such functions will bubble up to the top. Especially if their total weight is close to self weight (this means they do all the work by themselves), they are good candidates for optimization. This method works well both for recursive and non-recursive functions.

If a hot function can’t be optimized by itself (e.g. this is a library function), a bottom up view is helpful to look at its callers. Get the caller with the most weight (that is, the one that calls the function most frequently), and look up in the source code why it makes an excessive usage of the function. It often appears, that a result of function’s call can be cached, or maybe there is a more effective way of doing the same thing that doesn’t require calling the function at all.

Sorting a flat view by total weight in decreasing order and secondary by self weight can also reveal functions that inadequately cause heavy processing. Consider an example: let’s say we see a function `top5′ which has big total weight, but small self weight. Now we are looking at its callees in the top down view, and see that `top5’s time is mostly occupied by a call to `qsort’. Looking at `top5′ source we see that this call is redundant, because finding top 5 array items can be done using a single linear scan, without a need to sort an array.

A call graph view can actually be used in all of these cases, especially if it can be adjusted to highlight nodes and edges based on their weights. The only problem of a call graph, as I said, is that it can be big, so good scalable approach to viewing it is required.

But a call graph really shines when one wants to find a repeatedly appearing sequence of function calls. As every program’s function has exactly one corresponding node (unlike a tree view), and edges have weights, it is relatively easy to detect expensive sequences. After finding them and looking through functions’ code it is often seems that data calculated by a function is often used only partially or even thrown away by a caller (this happens when programs are built from big reusable `universal’ functions). Crafting a specialized version of a function that doesn’t perform unneeded calculations can speed up a program dramatically.

Image manipulation in Java

Created a small and simple utility class for myself to be able to perform common operations on images in Java.
Using this class you can load and save images, you’re able to resize images maintaining aspect ratio, you’re able to save jpeg images with specified compression, you can rotate images to any given angle, you can also watermark them.
Hope it’ll save you some time.
Read more…

On the Edge—Trying Chromium on Mac

about-chromiumImmediately after Google Chrome announce, it was known that it will be on Linux and Mac. Some day… And this day is coming. Right now it is possible to compile Chromium on Mac and it will be a working browser!

Well, almost. I’m trying to use it and it works—this post is composed using Mac Chromium. But it’s definitely far from being a polished consumer product yet. Major problems that I found so far: the Omnibox (navigation edit box) only accepts URLs; downloads doesn’t work, and tabs crash from time to time. Though the last isn’t a big problem because Chromium is a multi-process browser, so a tab crash doesn’t affect other tabs. That’s pretty cool comparing to Safari 4 Beta.

Chromium

What is really exciting is that Mac Chromium will become better and better every day. And any developer can help by contributing to a project.

Moving Editor Configuration into the Cloud

I know, I know—everybody does it in our days. But the results are so sweet that I can’t resist a temptation to make a post myself. So, if your editor configuration is already uploaded to some hosting, and is version-controlled, please go away, you won’t learn anything new here. If not, go on reading—it should be this way.

My migration was driven by circumstances. Currently I simultaneously develop for 3 platforms—this means 3 developer hosts, and of course I want to have the same editor configuration under all of them. Previously I copied my .emacs file all around, and soon it became annoying. Especially if you’re subscribed to tips&tricks feeds and always experiment with new packages.

So the decision was made to move all the configuration under publicly hosted version control in order to be able to retrieve it wherever I may roam. Now the details: which version control and what hosting. As I can’t live without git, I chosen it. This made github a quite natural choice for hosting.

Moving all the setup to a repository and integrating current setups from different machines was the next task at hand. I decided to be as conservative as possible and move on in baby steps. Step one was to move current config file into a repository and load it from the “main” .emacs file—done, works. Great, move to the next step—introduce variables in order to obtain some flexibility—done, works. And so on. Pretty soon I gathered all the packages in the repository and uploaded it into github. The next step—clone it on the second machine, integrate changes and push them back. Then sync the repository on the first machine and test that it still works. After some iterations, all the stuff was in place.

After several days of using the new setup, I discovered another advantage of having editor configuration under a version control. Now I can experiment freely with new stuff and have no more commented out blobs of configuration code. If a package or macro doesn’t work, but looks promising and I don’t have time to deal with it now—just move it into a separate branch! Also if some setting feels like not so useful as it seemed initially—just revert it with a single command. Boy, now I’m agile! ;)

Follow

Get every new post delivered to your Inbox.