[Cocci] 0079-netdev-destructor.cocci very slow

Discussion:

Johannes Berg

2018-09-18 09:22:23 UTC

On Mon, 2018-09-17 at 23:55 +0200, Hauke Mehrtens wrote:
> The 0079-netdev-destructor.cocci spatch in backports is very slow for
> me. For bigger files I get a warning that it takes over to 15 seconds to
> apply it to just one file, for the complete backports tree it takes over
> an hour to apply.
>
> This is the patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git/tree/patches/0079-netdev-destructor.cocci
>
> When I remove the <-- --> in the first rule, it is applied in some
> seconds on the complete tree, so an speed improvement of about 100
> times, but it is not working correctly any more. ;-)
>
> Is this normal or how can I improve the spatch to be faster? I am using
> coccinelle 1.0.7 build with default configure arguments against the
> libraries from Debian stable.

We've had this discussion before :-)
I think we determined that it was normal.

> If this is normal I should probably try to reduce the number of files it
> tries to apply this against in gentree.py before spatch gets started.

spatch should already try that internally, but perhaps with some extra
knowledge we can do a better job ...

johannes

Johannes Berg

2018-09-19 08:43:46 UTC

Permalink

On Tue, 2018-09-18 at 23:52 +0200, Hauke Mehrtens wrote:

> > spatch should already try that internally, but perhaps with some extra
> > knowledge we can do a better job ...

> Yes we talked about this topic some months ago in IRC.
> If there is really no better solution, then I will grep in all files for
> needs_free_netdev and priv_destructor and only apply this to the files
> which are matching. This list should be pretty short.

Right. No objection to that. Perhaps we should have some sort of special
comment header for our spatches that the script can consume?

Something like

// restrict-files: grep -qE 'needs_free_netdev|priv_destructor'

and we'd run that on all files? Or perhaps the API should be more a la
"grep -lE" so we can run it on many files and get a list of matching
files out?

> It looks like coccinelle already does such a grep when I remove the <--
> --> from the patch, because this is about 100 times faster.

Good point, not sure why it doesn't do that with the <... ...>?

johannes

Julia Lawall

2018-09-19 08:49:25 UTC

Permalink

On Wed, 19 Sep 2018, Johannes Berg wrote:

> On Tue, 2018-09-18 at 23:52 +0200, Hauke Mehrtens wrote:
>
> > > spatch should already try that internally, but perhaps with some extra
> > > knowledge we can do a better job ...
>
> > Yes we talked about this topic some months ago in IRC.
> > If there is really no better solution, then I will grep in all files for
> > needs_free_netdev and priv_destructor and only apply this to the files
> > which are matching. This list should be pretty short.
>
> Right. No objection to that. Perhaps we should have some sort of special
> comment header for our spatches that the script can consume?
>
> Something like
>
> // restrict-files: grep -qE 'needs_free_netdev|priv_destructor'
>
> and we'd run that on all files? Or perhaps the API should be more a la
> "grep -lE" so we can run it on many files and get a list of matching
> files out?
>
> > It looks like coccinelle already does such a grep when I remove the <--
> > --> from the patch, because this is about 100 times faster.
>
> Good point, not sure why it doesn't do that with the <... ...>?

Because <... ...> means 0 or more of what is inside. <+... ...+> looks
for one or more and may be faster. On the other hand, it ensures that
there is one or more, which can also be expensive.

It could be better to just have a rule:

@worthwhile@
@@

(
functions(...)
|
you(...)
|
like(...)
)

and then have the <... ...> rule depend on worthwhile.

julia

Johannes Berg

2018-09-19 09:02:09 UTC

Permalink

On Wed, 2018-09-19 at 10:49 +0200, Julia Lawall wrote:

> > > It looks like coccinelle already does such a grep when I remove the <--
> > > --> from the patch, because this is about 100 times faster.
> >
> > Good point, not sure why it doesn't do that with the <... ...>?
>
> Because <... ...> means 0 or more of what is inside.

Oops, right.

> <+... ...+> looks for one or more and may be faster.

Indeed, it's two orders of magnitude faster (running it on just
drivers/net/wireless goes from ~500 to ~2s for me) as it can throw away
almost all files immediately.

> On the other hand, it ensures that
> there is one or more, which can also be expensive.

That doesn't really matter all that much for us - the (really) expensive
part is running it on all files that don't even contain it at all.

> It could be better to just have a rule:
>
> @worthwhile@
> @@
>
> (
> functions(...)
> >
>
> you(...)
> >
>
> like(...)
> )
>
> and then have the <... ...> rule depend on worthwhile.

Good idea too.

Thanks!

johannes