[Cocci] excluding a function from coccinelle transformation

Discussion:

Jeff King

2018-08-24 06:42:29 UTC

In Git's Coccinelle patches, we sometimes want to suppress a
transformation inside a particular function. For example, in finding
conversions of hashcmp() to oidcmp(), we should not convert the call in
oidcmp() itself, since that would cause infinite recursion. We write the
semantic patch like this:

@@
identifier f != oidcmp;
expression E1, E2;
@@
f(...) {...
- hashcmp(E1->hash, E2->hash)
+ oidcmp(E1, E2)
...}

This catches some cases, but not all. For instance, there's one case in
sequencer.c which it does not convert. Now here's where it gets weird.
If I instead use the angle-bracket form of ellipses, like this:

@@
identifier f != oidcmp;
expression E1, E2;
@@
f(...) {<...
- hashcmp(E1->hash, E2->hash)
+ oidcmp(E1, E2)
...>}

then we do generate the expected diff! Here's a much more cut-down
source file that demonstrates the same behavior:

int foo(void)
{
if (1)
if (!hashcmp(x, y))
return 1;
return 0;
}

If I remove the initial "if (1)" then a diff is generated with either
semantic patch (and the particulars of the "if" are not important; the
same thing happens if it's a while-loop. The key thing seems to be that
the code is not in the top-level block of the function).

And here's some double-weirdness. I get those results with spatch 1.0.4,
which is what's in Debian unstable. If I then upgrade to 1.0.6 from
Debian experimental, then _neither_ patch produces any results! Instead
I get:

init_defs_builtins: /usr/lib/coccinelle/standard.h
(ONCE) Expected tokens oidcmp hashcmp hash
Skipping:foo.c

(whereas before, even the failing case said "HANDLING: foo.c").

And then one final check: I built coccinelle from the current tip of
https://github.com/coccinelle/coccinelle (1.0.7-00504-g670b2243).
With my cut-down case, that version generates a diff with either
semantic patch. But for the full-blown case in sequencer.c, it still
only works with the angle brackets.

So my questions are:

- is this a bug in coccinelle? Or I not understand how "..." is
supposed to work here?

(It does seem like there was possibly a separate bug introduced in
1.0.6 that was later fixed; we can probably ignore that and just
focus on the behavior in the current tip of master).

- is there a better way to represent this kind of "transform this
everywhere _except_ in this function" semantic patch? (preferably
one that does not tickle this bug, if it is indeed a bug ;) ).

-Peff

Julia Lawall

2018-08-24 11:04:27 UTC

Permalink

Post by Jeff King
In Git's Coccinelle patches, we sometimes want to suppress a
transformation inside a particular function. For example, in finding
conversions of hashcmp() to oidcmp(), we should not convert the call in
oidcmp() itself, since that would cause infinite recursion. We write the
@@
identifier f != oidcmp;
expression E1, E2;
@@
f(...) {...
- hashcmp(E1->hash, E2->hash)
+ oidcmp(E1, E2)
...}

The problem is with how how ... works. For transformation, A ... B
requires that B occur on every execution path starting with A, unless that
execution path ends up in error handling code.
(eg, if (...) { ... return; }). Here your A is the start if the function.
So you need a call to hashcmp on every path through the function, which
fails when you add ifs.

If you use * (searching) instead of - and + (transformation) it will only
require that a path exists. * is mean for bug finding, where you often
want to find eg whether there exists a path that is missing a free.

If you want the exists behavior with a transformation rule, then you can
put exists at the top of the rule between the initial @@. I don't suggest
this in general, as it can lead to inconsistencies.

What you want is what you ended up using, which is <... P ...> which
allows zero or more occurrences of P.

However, this can all be very expensive, because you are matching paths
through the function definition which you don't really care about. All
you care about here is the name. So another approach is

@@
position p : script:python() { p[0].current_element != "oldcmp" };
expression E1,E2;
@@

- hashcmp(E1->hash, E2->hash)
+ oidcmp(E1, E2)

(I assume that "not equals" is written != in python)

Another issue with A ... B is that by default A and B should not appear in
the matched region. So your original rule matches only the case where
every execution path contains exactly one call to hashcmp, not more than
one. So that was another problem with it.

julia

Post by Jeff King
This catches some cases, but not all. For instance, there's one case in
sequencer.c which it does not convert. Now here's where it gets weird.
@@
identifier f != oidcmp;
expression E1, E2;
@@
f(...) {<...
- hashcmp(E1->hash, E2->hash)
+ oidcmp(E1, E2)
...>}
then we do generate the expected diff! Here's a much more cut-down
int foo(void)
{
if (1)
if (!hashcmp(x, y))
return 1;
return 0;
}
If I remove the initial "if (1)" then a diff is generated with either
semantic patch (and the particulars of the "if" are not important; the
same thing happens if it's a while-loop. The key thing seems to be that
the code is not in the top-level block of the function).
And here's some double-weirdness. I get those results with spatch 1.0.4,
which is what's in Debian unstable. If I then upgrade to 1.0.6 from
Debian experimental, then _neither_ patch produces any results! Instead
init_defs_builtins: /usr/lib/coccinelle/standard.h
(ONCE) Expected tokens oidcmp hashcmp hash
Skipping:foo.c
(whereas before, even the failing case said "HANDLING: foo.c").
And then one final check: I built coccinelle from the current tip of
https://github.com/coccinelle/coccinelle (1.0.7-00504-g670b2243).
With my cut-down case, that version generates a diff with either
semantic patch. But for the full-blown case in sequencer.c, it still
only works with the angle brackets.
- is this a bug in coccinelle? Or I not understand how "..." is
supposed to work here?
(It does seem like there was possibly a separate bug introduced in
1.0.6 that was later fixed; we can probably ignore that and just
focus on the behavior in the current tip of master).
- is there a better way to represent this kind of "transform this
everywhere _except_ in this function" semantic patch? (preferably
one that does not tickle this bug, if it is indeed a bug ;) ).
-Peff
_______________________________________________
Cocci mailing list
https://systeme.lip6.fr/mailman/listinfo/cocci

Jeff King

2018-08-24 20:53:50 UTC

Permalink

Post by Julia Lawall

Thank you! This explanation (and the one below about A and B not
appearing in the matched region) helped my understanding tremendously.

Post by Julia Lawall
What you want is what you ended up using, which is <... P ...> which
allows zero or more occurrences of P.

And now this makes much more sense (I stumbled onto it through brute
force, but now I understand _why_ it works).

Post by Julia Lawall
However, this can all be very expensive, because you are matching paths
through the function definition which you don't really care about. All
you care about here is the name. So another approach is

Yeah, it is. Using the pre-1.0.7 version, the original patch runs in
~1.3 minutes on my machine. With "<... P ...>" it's almost 4 minutes.
Your python suggestion runs in about 1.5 minutes.

Curiously, 1.0.4 runs the original patch in only 24 seconds, and the
angle-bracket one takes 52 seconds. I'm not sure if something changed in
coccinelle, or if my build is simply less optimized (my 1.0.4 is from
the Debian package, and I'm building 1.0.7 from source; I had trouble
building 1.0.4 from source).

Post by Julia Lawall
@@
position p : script:python() { p[0].current_element != "oldcmp" };
expression E1,E2;
@@
- hashcmp(E1->hash, E2->hash)
+ oidcmp(E1, E2)

Aha, this is exactly the magic I was hoping for. I agree this is the
best way to express it. I just had to tweak the patch to include the
position:

- ***@p(E1->hash, E2->hash)

and it worked great. Unfortunately, Debian's spatch is not built with
python support. :(

I'm not sure if we (the Git project) want to make the jump to requiring
a more specific spatch. OTOH, only a handful of developers actually run
it, and the python support does seem quite useful. And 1.0.4 is rather
old at this point.

Again, thanks very much for your response. I have a much better
understanding of what's going on now, and what our options are for
moving forward.

-Peff

Julia Lawall

2018-08-24 21:00:38 UTC

Permalink

Post by Jeff King

Post by Julia Lawall

Thank you! This explanation (and the one below about A and B not
appearing in the matched region) helped my understanding tremendously.

Post by Julia Lawall
What you want is what you ended up using, which is <... P ...> which
allows zero or more occurrences of P.

And now this makes much more sense (I stumbled onto it through brute
force, but now I understand _why_ it works).

Yeah, it is. Using the pre-1.0.7 version, the original patch runs in
~1.3 minutes on my machine. With "<... P ...>" it's almost 4 minutes.
Your python suggestion runs in about 1.5 minutes.
Curiously, 1.0.4 runs the original patch in only 24 seconds, and the
angle-bracket one takes 52 seconds. I'm not sure if something changed in
coccinelle, or if my build is simply less optimized (my 1.0.4 is from
the Debian package, and I'm building 1.0.7 from source; I had trouble
building 1.0.4 from source).

I don't remember the exact status of 1.0.4. It is possible that an
optimization was found to pose problems and was removed in the meantime.

<... ...> can be useful when you expect it to eg match an if branch. For
a function with over 1000 lines and many conditionals, it might not be a
good idea. Actually, the main problem is with loops. If there is a loop
in the function the performance can be much slower.

julia

Post by Jeff King

Post by Julia Lawall
@@
position p : script:python() { p[0].current_element != "oldcmp" };
expression E1,E2;
@@
- hashcmp(E1->hash, E2->hash)
+ oidcmp(E1, E2)

Aha, this is exactly the magic I was hoping for. I agree this is the
best way to express it. I just had to tweak the patch to include the
and it worked great. Unfortunately, Debian's spatch is not built with
python support. :(
I'm not sure if we (the Git project) want to make the jump to requiring
a more specific spatch. OTOH, only a handful of developers actually run
it, and the python support does seem quite useful. And 1.0.4 is rather
old at this point.
Again, thanks very much for your response. I have a much better
understanding of what's going on now, and what our options are for
moving forward.
-Peff

Jeff King

2018-08-25 08:22:51 UTC

Permalink

Post by Jeff King

Post by Julia Lawall
However, this can all be very expensive, because you are matching paths
through the function definition which you don't really care about.
All you care about here is the name. So another approach is

Yeah, it is. Using the pre-1.0.7 version, the original patch runs in
~1.3 minutes on my machine. With "<... P ...>" it's almost 4 minutes.
Your python suggestion runs in about 1.5 minutes.
Curiously, 1.0.4 runs the original patch in only 24 seconds, and the
angle-bracket one takes 52 seconds. I'm not sure if something changed in
coccinelle, or if my build is simply less optimized (my 1.0.4 is from
the Debian package, and I'm building 1.0.7 from source; …

I guess that such run time comparisons are interesting for further software
development considerations.
* How often would you run discussed SmPL scripts?

We run them as part of every automated CI build, and some people run
them manually periodically (I'd guess once every few days or so).

I don't think speed is critical for our use case. I mostly wanted to see
if the angle-bracket approach was intolerably slow (it's not, which
makes it an acceptable tradeoff against the need to have spatch built
with python support). And I found the speed difference between the two
versions mostly a curiosity.

-Peff

Jeff King

2018-08-26 06:36:49 UTC

Permalink

Post by Jeff King
I don't think speed is critical for our use case. I mostly wanted to see
if the angle-bracket approach was intolerably slow (it's not,
which makes it an acceptable tradeoff against the need to have spatch
built with python support). And I found the speed difference between
the two versions mostly a curiosity.

I am curious if your software behaviour expectations will vary any more.
Will the mentioned run time differences accumulate to more meaningful
numbers over time?

Maybe. My plan is to cross that bridge when we come to it.

Would you dare to take the application of the SmPL construct “<+... … ...+>”
into account?
http://coccinelle.lip6.fr/docs/main_grammar004.html

I don't know what you mean by "dare". Are you suggesting that the "<+"
form might solve my problem better?

-Peff