Discussion:
[Cocci] Determination for the absence of an option in a function call
SF Markus Elfring
2018-02-17 16:00:21 UTC
Permalink
Hello,

I am working with the following specification in some scripts for the semantic
patch language.


target = action(...);



This source code search pattern shows that a return value from a function call
should be stored somewhere. The concrete call is restricted by a selection of
function names. Such an approach is working to some degree when restrictions
on function call parameters can be omitted.

But a safer source code analysis requires to distinguish these parameters in
more detail.

1. How should be ensured that a specific option was not passed?

2. The parameter number becomes also relevant then.
How should functions be split based on their signature?

Regards,
Markus
Julia Lawall
2018-02-17 16:05:40 UTC
Permalink
Post by SF Markus Elfring
Hello,
I am working with the following specification in some scripts for the semantic
patch language.


target = action(...);


This source code search pattern shows that a return value from a function call
should be stored somewhere. The concrete call is restricted by a selection of
function names. Such an approach is working to some degree when restrictions
on function call parameters can be omitted.
But a safer source code analysis requires to distinguish these parameters in
more detail.
1. How should be ensured that a specific option was not passed?
2. The parameter number becomes also relevant then.
How should functions be split based on their signature?
I don't understand the questions. What do you mean by option? A
command-line option of Coccinelle? A particular argument of action?

For the second question, maybe you are looking for the following:

@r@
expression list[n] es;
@@

target = action(es)

Now r.n is the number of arguments to action.

julia
Julia Lawall
2018-02-17 16:42:52 UTC
Permalink
Post by Julia Lawall
Post by SF Markus Elfring
But a safer source code analysis requires to distinguish these parameters in
more detail.
1. How should be ensured that a specific option was not passed?
2. The parameter number becomes also relevant then.
How should functions be split based on their signature?
I don't understand the questions. What do you mean by option?
Enumeration values (or preprocessor symbols) are often used for this kind
of function parameters.
Do you prefer the wording “flag”?
Post by Julia Lawall
A command-line option of Coccinelle?
Not in this clarification attempt.
Post by Julia Lawall
A particular argument of action?
Yes.
I am working with the determination for memory allocation functions
from Linux source files for a while.
It matters in this software domain if the option “__GFP_NOWARN” was applied
(or not).
<+...__GFP_NOWARN...+> in the appropriate argument position.
Post by Julia Lawall
@r@
expression list[n] es;
@@
target = action(es)
Now r.n is the number of arguments to action.
This information can be useful for other analysis goals than what
I have got in mind here.
Each function name is usually connected with a specific argument count.
This fact has got some consequences for the development of corresponding
SmPL scripts.
I still have no idea what you are looking for here.

julia
Julia Lawall
2018-02-17 17:09:20 UTC
Permalink
Post by Julia Lawall
I am working with the determination for memory allocation functions
from Linux source files for a while.
It matters in this software domain if the option “__GFP_NOWARN” was applied
(or not).
<+...__GFP_NOWARN...+> in the appropriate argument position.
It is easy to check the presence of such an identifier.
But I find it very challenging to determine (by script code)
if it is actually not passed (as an option) in a function call.
It's not clear what you want. You will have to send some examples.
Post by Julia Lawall
Each function name is usually connected with a specific argument count.
This fact has got some consequences for the development of corresponding
SmPL scripts.
I still have no idea what you are looking for here.
I imagine that SmPL disjunctions (or further SmPL rules) will be
relevant to distinguish the known parameter numbers.
How would you manage the information which of the parameters
would get the argument “gfp” (for example)?
You have to match the definition of the function to find out what
parameter position you are interested in. If the function is defined in
another file you may need to use iteration. See demos/iteration.cocci.

julia
Julia Lawall
2018-02-17 17:44:07 UTC
Permalink
Post by Julia Lawall
It is easy to check the presence of such an identifier.
But I find it very challenging to determine (by script code)
if it is actually not passed (as an option) in a function call.
It's not clear what you want.
Another try 

Post by Julia Lawall
You will have to send some examples.
When we look at concrete Linux source code, we mostly see that
the option “__GFP_NOWARN” is just missing for a call of a function
like “devm_kmalloc”.
An other analysis tool can show the information that such an identifier
is referenced only in 207 files (from Linux 4.16-rc1).
But how can the Coccinelle software help here to exclude these source
code places from specific transformation attempts?
(
f(...,<+...__GFP_NOWARN...+>,...)
|
transformation
)

Alternatively,

@ok@
position p;
@@
f(...,<+...__GFP_NOWARN...+>,...)

@@
position p != ok.p;
@@
- ***@p(...)
+ whatever
Post by Julia Lawall
How would you manage the information which of the parameters
would get the argument “gfp” (for example)?
You have to match the definition of the function to find out what
parameter position you are interested in.
It seems to be feasible to encode such knowledge for a small number
of function names (in SmPL disjunctions or regular expressions).
But how does the software situation look like when you would like
to automate the search for interesting positions as much as possible?
My iteration suggestion covers this case.

julia
Julia Lawall
2018-02-17 18:17:13 UTC
Permalink
I guess that it covers only a part of the desired search automation.
The generic handling of variations in parameter positions is
more challenging, isn't it?
With iteration you can collect some information on one pass and use it on
another pass. This is discussed in the following set of slides:
http://coccinelle.lip6.fr/papers/cocciwk4_talk2.pdf

julia
Julia Lawall
2018-02-17 19:05:30 UTC
Permalink
Post by Julia Lawall
I guess that it covers only a part of the desired search automation.
The generic handling of variations in parameter positions is
more challenging, isn't it?
With iteration you can collect some information on one pass and use it on
http://coccinelle.lip6.fr/papers/cocciwk4_talk2.pdf
It would be nice if a function database will be usable.
Database queries can group the involved function names to some degree.
You can write python code to do whatever you want.

julia
Julia Lawall
2018-02-17 19:47:23 UTC
Permalink
Post by Julia Lawall
f(...,<+...__GFP_NOWARN...+>,...)
Does this SmPL specification mean that the identifier can appear anywhere
within the function call parameters?
Yes.
Would it be acceptable for a risk level of false positives to omit
the check for the really appropriate parameter position?
Up to you to see what happens.

julia
Julia Lawall
2018-02-17 20:25:18 UTC
Permalink
Post by Julia Lawall
Post by Julia Lawall
f(...,<+...__GFP_NOWARN...+>,...)
Does this SmPL specification mean that the identifier can appear anywhere
within the function call parameters?
Yes.
Would it be acceptable for a risk level of false positives to omit
the check for the really appropriate parameter position?
Up to you to see what happens.
Thanks for another clarification.
Does it increase the chances to integrate any SmPL scripts
for transformation of questionable error messages after
failed memory allocations into a directory which you maintain?
Which confidence categorisation would fit here?
Low. The script has no idea whether the printed string is useful or not.

julia
Julia Lawall
2018-02-17 20:36:50 UTC
Permalink
Post by Julia Lawall
Which confidence categorisation would fit here?
Low.
May scripts with this view be integrated?
It's possible. It depends on the benefit of the transformation provided.
Post by Julia Lawall
The script has no idea whether the printed string is useful or not.
This is a general data processing challenge. How will it influence
the software situation further?
I have no idea what "it" refers to, nor "software situation".

Maybe you can identify some cases that are particularly likely to be
useless and only report on those.

julia
Julia Lawall
2018-02-17 20:55:24 UTC
Permalink
Post by Julia Lawall
Which confidence categorisation would fit here?
Low.
May scripts with this view be integrated?
It's possible.
Will the integration make more sense when the duplication of
regular expressions for SmPL constraints can be avoided?
This point is completely irrelevant.
Post by Julia Lawall
It depends on the benefit of the transformation provided.
Should the benefit be clearer after I published hundreds of update
suggestions for this change pattern?
Well, many people have rejected the patches as well. And checkpatch
already highlights the issue.

julia

Continue reading on narkive:
Loading...