[Cocci] Determination for the absence of an option in a function call

Discussion:

SF Markus Elfring

2018-02-17 16:00:21 UTC

Hello,

I am working with the following specification in some scripts for the semantic
patch language.

…
target = action(...);
…

This source code search pattern shows that a return value from a function call
should be stored somewhere. The concrete call is restricted by a selection of
function names. Such an approach is working to some degree when restrictions
on function call parameters can be omitted.

But a safer source code analysis requires to distinguish these parameters in
more detail.

1. How should be ensured that a specific option was not passed?

2. The parameter number becomes also relevant then.
How should functions be split based on their signature?

Regards,
Markus

Julia Lawall

2018-02-17 16:05:40 UTC

Permalink

Post by SF Markus Elfring
Hello,
I am working with the following specification in some scripts for the semantic
patch language.
âŠ
target = action(...);
âŠ
This source code search pattern shows that a return value from a function call
should be stored somewhere. The concrete call is restricted by a selection of
function names. Such an approach is working to some degree when restrictions
on function call parameters can be omitted.
But a safer source code analysis requires to distinguish these parameters in
more detail.
1. How should be ensured that a specific option was not passed?
2. The parameter number becomes also relevant then.
How should functions be split based on their signature?

I don't understand the questions. What do you mean by option? A
command-line option of Coccinelle? A particular argument of action?

For the second question, maybe you are looking for the following:

@r@
expression list[n] es;
@@

target = action(es)

Now r.n is the number of arguments to action.

julia

Julia Lawall

2018-02-17 16:42:52 UTC

Permalink

Post by Julia Lawall

Post by SF Markus Elfring
But a safer source code analysis requires to distinguish these parameters in
more detail.
1. How should be ensured that a specific option was not passed?
2. The parameter number becomes also relevant then.
How should functions be split based on their signature?

I don't understand the questions. What do you mean by option?

Enumeration values (or preprocessor symbols) are often used for this kind
of function parameters.
Do you prefer the wording âflagâ?

Post by Julia Lawall
A command-line option of Coccinelle?

Not in this clarification attempt.

Post by Julia Lawall
A particular argument of action?

Yes.
I am working with the determination for memory allocation functions
from Linux source files for a while.
It matters in this software domain if the option â__GFP_NOWARNâ was applied
(or not).

<+...__GFP_NOWARN...+> in the appropriate argument position.

Post by Julia Lawall
@r@
expression list[n] es;
@@
target = action(es)
Now r.n is the number of arguments to action.

This information can be useful for other analysis goals than what
I have got in mind here.
Each function name is usually connected with a specific argument count.
This fact has got some consequences for the development of corresponding
SmPL scripts.

I still have no idea what you are looking for here.

julia

Julia Lawall

2018-02-17 17:09:20 UTC

Permalink

Post by Julia Lawall

I am working with the determination for memory allocation functions
from Linux source files for a while.
It matters in this software domain if the option â__GFP_NOWARNâ was applied
(or not).

<+...__GFP_NOWARN...+> in the appropriate argument position.

It is easy to check the presence of such an identifier.
But I find it very challenging to determine (by script code)
if it is actually not passed (as an option) in a function call.

It's not clear what you want. You will have to send some examples.

Post by Julia Lawall

Each function name is usually connected with a specific argument count.
This fact has got some consequences for the development of corresponding
SmPL scripts.

I still have no idea what you are looking for here.

I imagine that SmPL disjunctions (or further SmPL rules) will be
relevant to distinguish the known parameter numbers.
How would you manage the information which of the parameters
would get the argument âgfpâ (for example)?

You have to match the definition of the function to find out what
parameter position you are interested in. If the function is defined in
another file you may need to use iteration. See demos/iteration.cocci.

julia

Julia Lawall

2018-02-17 17:44:07 UTC

Permalink

Post by Julia Lawall

It is easy to check the presence of such an identifier.
But I find it very challenging to determine (by script code)
if it is actually not passed (as an option) in a function call.

It's not clear what you want.

Another try âŠ

Post by Julia Lawall
You will have to send some examples.

When we look at concrete Linux source code, we mostly see that
the option â__GFP_NOWARNâ is just missing for a call of a function
like âdevm_kmallocâ.
An other analysis tool can show the information that such an identifier
is referenced only in 207 files (from Linux 4.16-rc1).
But how can the Coccinelle software help here to exclude these source
code places from specific transformation attempts?

(
f(...,<+...__GFP_NOWARN...+>,...)
|
transformation
)

Alternatively,

@ok@
position p;
@@
f(...,<+...__GFP_NOWARN...+>,...)

@@
position p != ok.p;
@@
- ***@p(...)
+ whatever

Post by Julia Lawall

How would you manage the information which of the parameters
would get the argument âgfpâ (for example)?

You have to match the definition of the function to find out what
parameter position you are interested in.

It seems to be feasible to encode such knowledge for a small number
of function names (in SmPL disjunctions or regular expressions).
But how does the software situation look like when you would like
to automate the search for interesting positions as much as possible?

My iteration suggestion covers this case.

julia

Julia Lawall

2018-02-17 18:17:13 UTC

Permalink

I guess that it covers only a part of the desired search automation.
The generic handling of variations in parameter positions is
more challenging, isn't it?

With iteration you can collect some information on one pass and use it on
another pass. This is discussed in the following set of slides:
http://coccinelle.lip6.fr/papers/cocciwk4_talk2.pdf

julia

Julia Lawall

2018-02-17 19:05:30 UTC

Permalink

Post by Julia Lawall

I guess that it covers only a part of the desired search automation.
The generic handling of variations in parameter positions is
more challenging, isn't it?

With iteration you can collect some information on one pass and use it on
http://coccinelle.lip6.fr/papers/cocciwk4_talk2.pdf

It would be nice if a function database will be usable.
Database queries can group the involved function names to some degree.

You can write python code to do whatever you want.

julia

Julia Lawall

2018-02-17 19:47:23 UTC

Permalink

Post by Julia Lawall
f(...,<+...__GFP_NOWARN...+>,...)

Does this SmPL specification mean that the identifier can appear anywhere
within the function call parameters?

Yes.

Would it be acceptable for a risk level of false positives to omit
the check for the really appropriate parameter position?

Up to you to see what happens.

julia

Julia Lawall

2018-02-17 20:25:18 UTC

Permalink

Post by Julia Lawall

Post by Julia Lawall
f(...,<+...__GFP_NOWARN...+>,...)

Does this SmPL specification mean that the identifier can appear anywhere
within the function call parameters?

Yes.

Would it be acceptable for a risk level of false positives to omit
the check for the really appropriate parameter position?

Up to you to see what happens.

Thanks for another clarification.
Does it increase the chances to integrate any SmPL scripts
for transformation of questionable error messages after
failed memory allocations into a directory which you maintain?
Which confidence categorisation would fit here?

Low. The script has no idea whether the printed string is useful or not.

julia

Julia Lawall

2018-02-17 20:36:50 UTC

Permalink

Post by Julia Lawall

Which confidence categorisation would fit here?

Low.

May scripts with this view be integrated?

It's possible. It depends on the benefit of the transformation provided.

Post by Julia Lawall
The script has no idea whether the printed string is useful or not.

This is a general data processing challenge. How will it influence
the software situation further?

I have no idea what "it" refers to, nor "software situation".

Maybe you can identify some cases that are particularly likely to be
useless and only report on those.

julia

Julia Lawall

2018-02-17 20:55:24 UTC

Permalink

Post by Julia Lawall

Which confidence categorisation would fit here?

Low.

May scripts with this view be integrated?

It's possible.

Will the integration make more sense when the duplication of
regular expressions for SmPL constraints can be avoided?

This point is completely irrelevant.

Post by Julia Lawall
It depends on the benefit of the transformation provided.

Should the benefit be clearer after I published hundreds of update
suggestions for this change pattern?