Discussion:
[Cocci] [PATCH 01/18] coccinelle: change strncpy+truncation to strlcpy
Dominique Martinet
2018-07-13 01:14:43 UTC
Permalink
Besides being simpler, using strlcpy instead of strncpy+truncation
fixes part of the following class of new gcc warnings:

drivers/gpu/drm/i915/intel_tv.c: In function ‘intel_tv_get_modes’:
drivers/gpu/drm/i915/intel_tv.c:1358:3: error: ‘strncpy’ specified bound 32 equals
destination size [-Werror=stringop-truncation]
strncpy(mode_ptr->name, input->name, DRM_DISPLAY_MODE_LEN);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Note that this is not a proper fix for this warning (and not all of the
occurences give the warning either - the strings are not always static).
The warning was intended to have developers check the return code of
strncpy and act in case of truncation (print a warning, abort the
function or something similar if the original string was not nul
terminated); the change to strlcpy only works because gcc does not
handle the function the same way.

Suggested-by: Ville Syrjälä <***@linux.intel.com>
Signed-off-by: Dominique Martinet <***@codewreck.org>
---

Running this fixes 30 occurences of the problem in 17 different
components of the kernel, and while the produced patches are fairly
straight-forward I'm not sure who I should expect to pick this up as
it is sent as a series.
I expect each maintainer will pick their share of the patchs if they
agree with it and the rest will just be dropped?

.../coccinelle/misc/strncpy_truncation.cocci | 41 +++++++++++++++++++
1 file changed, 41 insertions(+)
create mode 100644 scripts/coccinelle/misc/strncpy_truncation.cocci

diff --git a/scripts/coccinelle/misc/strncpy_truncation.cocci b/scripts/coccinelle/misc/strncpy_truncation.cocci
new file mode 100644
index 000000000000..28b5c2a290ac
--- /dev/null
+++ b/scripts/coccinelle/misc/strncpy_truncation.cocci
@@ -0,0 +1,41 @@
+/// Use strlcpy rather than strncpy(dest,..,sz) + dest[sz-1] = '\0'
+///
+// Confidence: High
+// Comments:
+// Options: --no-includes --include-headers
+
+virtual patch
+virtual context
+virtual report
+virtual org
+
+@r@
+expression dest, src, sz;
+position p;
+@@
+
+***@p(dest, src, sz);
+dest[sz - 1] = '\0';
+
+@script:python depends on org@
+p << r.p;
+@@
+
+cocci.print_main("strncpy followed by truncation can be strlcpy",p)
+
+@script:python depends on report@
+p << r.p;
+@@
+
+msg = "SUGGESTION: strncpy followed by truncation can be strlcpy"
+coccilib.report.print_report(p[0],msg)
+
+@ok depends on patch@
+expression r.dest, r.src, r.sz;
+position r.p;
+@@
+
+-***@p(
++strlcpy(
+ dest, src, sz);
+-dest[sz - 1] = '\0';
--
2.17.1
Himanshu Jha
2018-07-13 07:44:55 UTC
Permalink
Post by Dominique Martinet
Besides being simpler, using strlcpy instead of strncpy+truncation
drivers/gpu/drm/i915/intel_tv.c:1358:3: error: ‘strncpy’ specified bound 32 equals
destination size [-Werror=stringop-truncation]
strncpy(mode_ptr->name, input->name, DRM_DISPLAY_MODE_LEN);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Note that this is not a proper fix for this warning (and not all of the
occurences give the warning either - the strings are not always static).
The warning was intended to have developers check the return code of
strncpy and act in case of truncation (print a warning, abort the
function or something similar if the original string was not nul
terminated); the change to strlcpy only works because gcc does not
handle the function the same way.
---
Running this fixes 30 occurences of the problem in 17 different
components of the kernel, and while the produced patches are fairly
straight-forward I'm not sure who I should expect to pick this up as
it is sent as a series.
I expect each maintainer will pick their share of the patchs if they
agree with it and the rest will just be dropped?
Masahiro Yamada <***@socionext.com> takes coccinelle patches,
so please cc him or your patch would be lost.
Post by Dominique Martinet
.../coccinelle/misc/strncpy_truncation.cocci | 41 +++++++++++++++++++
1 file changed, 41 insertions(+)
create mode 100644 scripts/coccinelle/misc/strncpy_truncation.cocci
diff --git a/scripts/coccinelle/misc/strncpy_truncation.cocci b/scripts/coccinelle/misc/strncpy_truncation.cocci
new file mode 100644
index 000000000000..28b5c2a290ac
--- /dev/null
+++ b/scripts/coccinelle/misc/strncpy_truncation.cocci
@@ -0,0 +1,41 @@
+/// Use strlcpy rather than strncpy(dest,..,sz) + dest[sz-1] = '\0'
+///
+// Confidence: High
+// Options: --no-includes --include-headers
+
+virtual patch
+virtual context
You might consider adding context rule or remove this line perhaps ?
Post by Dominique Martinet
+virtual report
+virtual org
+
+expression dest, src, sz;
+position p;
+
+dest[sz - 1] = '\0';
+
+p << r.p;
+
+cocci.print_main("strncpy followed by truncation can be strlcpy",p)
+
+p << r.p;
+
+msg = "SUGGESTION: strncpy followed by truncation can be strlcpy"
+coccilib.report.print_report(p[0],msg)
+
+expression r.dest, r.src, r.sz;
+position r.p;
+
++strlcpy(
+ dest, src, sz);
+-dest[sz - 1] = '\0';
The above rule produces an output that I think is not correct:
--------------------------------------------------------------
diff =
diff -u -p a//ti/wl1251/acx.c b//ti/wl1251/acx.c
--- a//ti/wl1251/acx.c
+++ b//ti/wl1251/acx.c
@@ -150,14 +150,7 @@ int wl1251_acx_fw_version(struct wl1251
}

/* be careful with the buffer sizes */
- strncpy(buf, rev->fw_version, min(len, sizeof(rev->fw_version)));
-
- /*
- * if the firmware version string is exactly
- * sizeof(rev->fw_version) long or fw_len is less than
- * sizeof(rev->fw_version) it won't be null terminated
- */
- buf[min(len, sizeof(rev->fw_version)) - 1] = '\0';
+ strlcpy(buf, rev->fw_version, min(len, sizeof(rev->fw_version)));

-----------------------------------------------------------------

I think the comment is useful and should not be removed. Also, consider
changing Confidence level appropriately.

Thanks.
--
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology
Himanshu Jha
2018-07-13 09:14:12 UTC
Permalink
Post by Himanshu Jha
Post by Dominique Martinet
I expect each maintainer will pick their share of the patchs if they
agree with it and the rest will just be dropped?
so please cc him or your patch would be lost.
Thanks, will do.
Post by Himanshu Jha
Post by Dominique Martinet
+virtual patch
+virtual context
You might consider adding context rule or remove this line perhaps ?
Victim of copypasta, I'll remove this.
Post by Himanshu Jha
Post by Dominique Martinet
++strlcpy(
+ dest, src, sz);
+-dest[sz - 1] = '\0';
--------------------------------------------------------------
diff =
diff -u -p a//ti/wl1251/acx.c b//ti/wl1251/acx.c
--- a//ti/wl1251/acx.c
+++ b//ti/wl1251/acx.c
@@ -150,14 +150,7 @@ int wl1251_acx_fw_version(struct wl1251
}
/* be careful with the buffer sizes */
- strncpy(buf, rev->fw_version, min(len, sizeof(rev->fw_version)));
-
- /*
- * if the firmware version string is exactly
- * sizeof(rev->fw_version) long or fw_len is less than
- * sizeof(rev->fw_version) it won't be null terminated
- */
- buf[min(len, sizeof(rev->fw_version)) - 1] = '\0';
+ strlcpy(buf, rev->fw_version, min(len, sizeof(rev->fw_version)));
-----------------------------------------------------------------
I think the comment is useful and should not be removed.
I agree this comment is useful now that I'm taking a closer look, I
glanced at this too fast.
I'm not sure how to make coccinelle not remove comments between lines
though?
Well, there is no such facility in Coccinelle to ignore comments.
You can hack with other facilities provided in SmPL though ;)

Try this:

$ spatch -D patch --sp-file strlcopy.cocci --very-quiet drivers/net/wireless/ti/wl1251/acx.c

---------------------------------------------------------------------
virtual patch

@depends on patch@
expression dest, src, sz;
identifier f;
@@

(
- strncpy(
+ strlcpy(
dest, src, sizeof(sz));
- dest[sizeof(sz) - 1] = '\0';
|
- strncpy(
+ strlcpy(
dest, src, f);
- dest[f - 1] = '\0';
)
---------------------------------------------------------------------

This eliminates that case because expression is generic metavariable and
it somehow matched whole "min(len, sizeof(...)..", so it better to
divide the rules as done above to be more specific about the matching
pattern.

I thought to replace "identifier f" with "constant F" but that misses
few cases.

Also, it is advised to put a space affer '+/-'

Thanks.
--
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology
Julia Lawall
2018-07-13 09:44:06 UTC
Permalink
Post by Himanshu Jha
Post by Himanshu Jha
Post by Dominique Martinet
I expect each maintainer will pick their share of the patchs if they
agree with it and the rest will just be dropped?
so please cc him or your patch would be lost.
Thanks, will do.
Post by Himanshu Jha
Post by Dominique Martinet
+virtual patch
+virtual context
You might consider adding context rule or remove this line perhaps ?
Victim of copypasta, I'll remove this.
Post by Himanshu Jha
Post by Dominique Martinet
++strlcpy(
+ dest, src, sz);
+-dest[sz - 1] = '\0';
--------------------------------------------------------------
diff =
diff -u -p a//ti/wl1251/acx.c b//ti/wl1251/acx.c
--- a//ti/wl1251/acx.c
+++ b//ti/wl1251/acx.c
@@ -150,14 +150,7 @@ int wl1251_acx_fw_version(struct wl1251
}
/* be careful with the buffer sizes */
- strncpy(buf, rev->fw_version, min(len, sizeof(rev->fw_version)));
-
- /*
- * if the firmware version string is exactly
- * sizeof(rev->fw_version) long or fw_len is less than
- * sizeof(rev->fw_version) it won't be null terminated
- */
- buf[min(len, sizeof(rev->fw_version)) - 1] = '\0';
+ strlcpy(buf, rev->fw_version, min(len, sizeof(rev->fw_version)));
-----------------------------------------------------------------
I think the comment is useful and should not be removed.
I agree this comment is useful now that I'm taking a closer look, I
glanced at this too fast.
I'm not sure how to make coccinelle not remove comments between lines
though?
Well, there is no such facility in Coccinelle to ignore comments.
You can hack with other facilities provided in SmPL though ;)
$ spatch -D patch --sp-file strlcopy.cocci --very-quiet drivers/net/wireless/ti/wl1251/acx.c
---------------------------------------------------------------------
virtual patch
@depends on patch@
expression dest, src, sz;
identifier f;
@@
(
- strncpy(
+ strlcpy(
dest, src, sizeof(sz));
- dest[sizeof(sz) - 1] = '\0';
|
- strncpy(
+ strlcpy(
dest, src, f);
- dest[f - 1] = '\0';
)
---------------------------------------------------------------------
This eliminates that case because expression is generic metavariable and
it somehow matched whole "min(len, sizeof(...)..", so it better to
divide the rules as done above to be more specific about the matching
pattern.
I thought to replace "identifier f" with "constant F" but that misses
few cases.
Also, it is advised to put a space affer '+/-'
Thanks Himanshu for the suggestions.

However, I'm not sure to follow the discussion. The original problem was
that Coccinelle was removing a comment that should be preserved. I think
that this occurs because the line just below the comment is completely
removed. Coccinelle considers that the comment belongs with that line and
if the line is removed the comment won't make much sense.

In Himanshu's solution, the code is just not transformed at all, so as a
side effect the comment stays too. Is that what is wanted in this case?

julia
Himanshu Jha
2018-07-13 10:21:03 UTC
Permalink
Post by Julia Lawall
Thanks Himanshu for the suggestions.
However, I'm not sure to follow the discussion. The original problem was
that Coccinelle was removing a comment that should be preserved. I think
that this occurs because the line just below the comment is completely
removed. Coccinelle considers that the comment belongs with that line and
if the line is removed the comment won't make much sense.
In Himanshu's solution, the code is just not transformed at all, so as a
side effect the comment stays too. Is that what is wanted in this case?
Yes, there is no transformation with my solution which I advised to
prevent comment removal(which i thought was useful).

My rule is more narrower approach than the regular ones which used
"expression" metavariable.

Rest upto Dominique to choose whatever suits better :)
--
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology
Julia Lawall
2018-07-13 10:50:13 UTC
Permalink
For the file drivers/net/wireless/ti/wl1251/acx.c, the following keeps the
comment before the buf update and moves the strlcpy call below it. It
does however drop the comment just before the original call to strncpy.

julia


@r@
expression dest, src;
expression f;
@@

- strncpy(dest,src,f);
dest[f - 1] = '\0'
+ + strlcpy(dest,src,f)
;

@@
expression r.dest, r.src;
expression r.f;
@@

- dest[f - 1] = '\0' + strlcpy(dest,src,f);
+ strlcpy(dest,src,f);
Himanshu Jha
2018-07-13 16:42:13 UTC
Permalink
Post by Himanshu Jha
(
- strncpy(
+ strlcpy(
dest, src, sizeof(sz));
- dest[sizeof(sz) - 1] = '\0';
|
- strncpy(
+ strlcpy(
dest, src, f);
- dest[f - 1] = '\0';
)
How do you think about the following code transformation specification
which would work with SmPL disjunctions at other places?
-strncpy
+strlcpy
(dest, src, \( sizeof(sz) \| f \) );
-dest[\( sizeof(sz) \| f \) - 1] = '\0';
Great!!!
Much cleaner than mine I would say.

Btw, Markus, I am curious to know how do you benchmark Coccinelle rules?
Post by Himanshu Jha
Also, it is advised to put a space affer '+/-'
It depends on the circumstances.
I advised only for improving readability and what is followed in every
Cocci rule in the mainline kernel.

Bikeshedding ;)
* Will you become interested in the usage of the double addition token?
http://coccinelle.lip6.fr/docs/main_grammar005.html#sec11
* The Coccinelle software contains software development challenges as can be
seen in the feature request “Support replacement of arithmetic operators
with SmPL”.
https://github.com/coccinelle/coccinelle/issues/144
Ah, I see.

Great working on improving Coccinelle.

I wish to see Coccinelle handling headers in source file in future.
--
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology
Himanshu Jha
2018-07-13 18:41:06 UTC
Permalink
[ removing Dominique Martinet as I don't want to spoil his weekend ;) ]
Post by Himanshu Jha
Btw, Markus, I am curious to know how do you benchmark Coccinelle rules?
Such software profiling functionality belongs to the current
standard functionality (to some degree).
I suggest to take another look at information sources like
the following.
* Increase precision for display of SmPL script execution durations
https://systeme.lip6.fr/pipermail/cocci/2018-May/005102.html
Agreed! Though we have --profile option.

I couldn't see your patch in the above link to do what you requested.
Did I miss something ?
* Selection of benchmarks
https://github.com/coccinelle/coccinelle/issues/133
Hmm.

97 issues authored by you. Seems like you put a lot of time exploring
Coccinelle. But I don't see your "patches" in the cocci mailing-list !?
Post by Himanshu Jha
I wish to see Coccinelle handling headers in source file in future.
Data processing is also supported for header files to some degree
if you pass corresponding command line parameters.
* Are you looking for any special source code there?
Suppose for eg I wish to replace:

- #include <linux/module>
+ #include <linux/iio.h>

as iio.h to already includes(assume) module.h

Or another eg. would be to replace all <asm/...> with <linux/...>
https://lkml.org/lkml/2017/10/5/106
* Do you find run time characteristics for data parsing acceptable?
For the zalloc-simple.cocci, I would like to have some profiling
facilities to determine the best approach for framing the rule.
Isn't it ?
--
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology
Himanshu Jha
2018-07-13 20:26:50 UTC
Permalink
Post by Himanshu Jha
* Increase precision for display of SmPL script execution durations
https://systeme.lip6.fr/pipermail/cocci/2018-May/005102.html
Agreed! Though we have --profile option.
I couldn't see your patch in the above link to do what you requested.
Did I miss something ?
Do you find Julia's feedback more interesting for one of my
update suggestions around related topics?
https://systeme.lip6.fr/pipermail/cocci/2018-May/005104.html
https://systeme.lip6.fr/pipermail/cocci/2018-June/005105.html
https://systeme.lip6.fr/pipermail/cocci/2018-June/005106.html
Post by Himanshu Jha
97 issues authored by you.
I suggest to recheck this information.
https://github.com/coccinelle/coccinelle/issues?author%3Aelfring
I am trained to report various items. ;-)
Yeah, I can figure that out.
But...

"Talk is cheap. Show me the code."

That what I have saw in my nearly one year experience ;)
Post by Himanshu Jha
Seems like you put a lot of time exploring Coccinelle.
This is an usual “side effect” from my work also with this software
for years.
Ow.
Post by Himanshu Jha
But I don't see your "patches" in the cocci mailing-list !?
* The main developers omitted attribution for some changes
which were influenced (or even triggered) by my direct feedback.
* Would you like to inspect any pull requests once more?
https://github.com/coccinelle/coccinelle/pulls
I will check them out, later.
Post by Himanshu Jha
- #include <linux/module>
+ #include <linux/iio.h>
The replacement of paths for include statements is another software
development challenge.
Did you ever look at the following feature request?
Support data processing for include statements together with
the use of metavariables
https://github.com/coccinelle/coccinelle/issues/138
I don't know Ocaml or else would've tried adding such feature.
Did you try to send a patch instead ?

Nobody is looking for ideas, you need to work it out yourself IMHO. And
Coccinelle mailing-list queries/suggestions/rants is handled by Julia
only, not sure if Michael or any past cocci reviewers are still present.
So, its difficult.
Post by Himanshu Jha
* Do you find run time characteristics for data parsing acceptable?
For the zalloc-simple.cocci, I would like to have some profiling
facilities to determine the best approach for framing the rule.
Which further development ideas did you get for this SmPL script
in the meantime (since previous clarification attempts got bumpy)?
I didn't get time to look after the rule got merged, and I started
working on IIO and am still working
https://summerofcode.withgoogle.com/projects/#6691473790074880

Few months back I worked on a rule with help of Julia:
https://lkml.org/lkml/2018/3/7/1009

But somehow there was no development in that direction :(
--
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology
Dominique Martinet
2018-07-13 15:25:39 UTC
Permalink
Post by Dominique Martinet
+msg = "SUGGESTION: strncpy followed by truncation can be strlcpy"
+coccilib.report.print_report(p[0],msg)
I would prefer to omit an intermediate Python variable (similar to the previous
SmPL rule) just for the simple display of such a message.
+coccilib.report.print_report(p[0],
+ "SUGGESTION: strncpy followed by truncation can be strlcpy.")
Out of curiosity, is the performance cost of using an intermediate
variable high in spatch?
I personally do not mind either way, but that does make for a pretty
long line once indented and I know many who would prefer the initial
version.
Post by Dominique Martinet
+-strncpy at p(
++strlcpy(
+ dest, src, sz);
How do you think about to adjust another SmPL code transformation specification
like the following?
++strlcpy
+ (dest, src, sz);
This also came from the example I picked, but if this does not make a
difference as it sounds like I will update to that.
--
Dominique Martinet
Julia Lawall
2018-07-13 15:28:46 UTC
Permalink
Post by Dominique Martinet
Post by Dominique Martinet
+msg = "SUGGESTION: strncpy followed by truncation can be strlcpy"
+coccilib.report.print_report(p[0],msg)
I would prefer to omit an intermediate Python variable (similar to the previous
SmPL rule) just for the simple display of such a message.
+coccilib.report.print_report(p[0],
+ "SUGGESTION: strncpy followed by truncation can be strlcpy.")
Out of curiosity, is the performance cost of using an intermediate
variable high in spatch?
There is no reason that it would have any cost.
Post by Dominique Martinet
I personally do not mind either way, but that does make for a pretty
long line once indented and I know many who would prefer the initial
version.
Post by Dominique Martinet
+-strncpy at p(
++strlcpy(
+ dest, src, sz);
How do you think about to adjust another SmPL code transformation specification
like the following?
++strlcpy
+ (dest, src, sz);
This also came from the example I picked, but if this does not make a
difference as it sounds like I will update to that.
Probably not removing something just to add it back would be a good idea.

julia
Post by Dominique Martinet
--
Dominique Martinet
_______________________________________________
Cocci mailing list
https://systeme.lip6.fr/mailman/listinfo/cocci
Dominique Martinet
2018-07-13 16:11:38 UTC
Permalink
Post by Himanshu Jha
$ spatch -D patch --sp-file strlcopy.cocci --very-quiet drivers/net/wireless/ti/wl1251/acx.c
---------------------------------------------------------------------
virtual patch
@depends on patch@
expression dest, src, sz;
identifier f;
@@
(
- strncpy(
+ strlcpy(
dest, src, sizeof(sz));
- dest[sizeof(sz) - 1] = '\0';
|
- strncpy(
+ strlcpy(
dest, src, f);
- dest[f - 1] = '\0';
)
---------------------------------------------------------------------
This eliminates that case because expression is generic metavariable and
it somehow matched whole "min(len, sizeof(...)..", so it better to
divide the rules as done above to be more specific about the matching
pattern.
I thought to replace "identifier f" with "constant F" but that misses
few cases.
My first test started with 'constant sz' as well and I found expression
to be better.
If I understand this correctly, it just makes sure not to match the
'min(...)' expression so the problem doesn't arise, but it's not really
a solution as it is really a chance that this comment comes here for
this more complex expression.
I'd rather just advise to pay attention to comments and drop the
confidence level
Post by Himanshu Jha
Also, it is advised to put a space affer '+/-'
Ok, thanks
Post by Himanshu Jha
For the file drivers/net/wireless/ti/wl1251/acx.c, the following keeps the
comment before the buf update and moves the strlcpy call below it. It
does however drop the comment just before the original call to strncpy.
Nice, doing it in two passes does the trick; it even keeps the comment
before strncpy if there was no comment in between.

I'm sorry to write that after being provided with such a nice
work-around though, now that I'm a bit less tired I've had a look at the
comments again and it does not make sense to keep the second comment as
is -- the whole point of using strlcpy (or now strscpy as per feedback
elsewhere) is that the string will be null terminated properly after the
call, so I'll stick to the original version.


I also see the position does not seem to be useful for the patch phase,
spatch automatically only applies only to what was matched earlier in
report mode?
Post by Himanshu Jha
Post by Dominique Martinet
Post by Dominique Martinet
++strlcpy
+ (dest, src, sz);
This also came from the example I picked, but if this does not make a
difference as it sounds like I will update to that.
Probably not removing something just to add it back would be a good
idea.
Actually playing with your example made me realize that removing and
re-adding argument does fix the indentation issue in the original code I
had noticed for mptctl, so it might actually unexpectedly be a good idea
to go in the opposite direction and make coccinelle remove/add arguments
in the general case (e.g. if strncpy and strlcpy hadn't been the same
length)

The jury is still out for this one :)


Thanks for all the feedbacks,
--
Dominique Martinet
Dominique Martinet
2018-07-14 09:16:15 UTC
Permalink
How do you think about to adjust the initial meta-data a bit more?
* SPDX identifier
Oh, right, 7/55 of the cocci scripts have one... I'll add one in a v3 of
the patch on Tuesday, I want to give a bit more time for other comments
if any come.
* Copyright information
I left that one out on purpose, as I do not want to give the copyright
to anyone and do not particularily care for myself.
I'm doing that on my free time and this is not related to my work (as
opposed to e.g. the work I'm doing on 9P where I use my work e-mail;
which is also on my free time but relies on knowledge I owe to my work),
and I mostly see people attribute themselves copyright when related to
their work establishment.

Now I'm looking a bit closer I see this is not necessarily the case, but
I'd still rather leave this out unless there's a reason to add it.
the only exceptions would be if someone relied on strncpy to fill the end
of the buffer with zero to not leak data somewhere but that is not easy
to judge by itself (although I hope rare enough)
Would you dare to develop a corresponding source code search as another
safety check?
Hmm, good question. It would be handy but will limit the matches more
than required I think.

Taking an example at random in the reports of the current patch,
cpumask in tools/accounting/getdelays.c is not zeroed out before the
strncpy so would be ruled out -- but when it's actually used, it only
sends to the network 'strlen(cpumask)+1' bytes of cpumask so the usage
made is perfectly safe.

My second argument here is a bad one (I just have to learn ;)) but while
I could easily check if dest has been memset'd/allocated with kzalloc,
I'm not sure how to express 'dest is a member of struct s, s was
allocted with kzalloc' which is probably much more common.

I'm also not sure how far back coccinelle would be able to check that?
For example in drivers/gpu/drm/i915/intel_tv.c we have 'mode_ptr =
drm_mode_create(...)' followed by 'strncpy(mode_ptr->name...), and
'drm_mode_create' did allocate with kzalloc; would coccinelle look that
far?

Thanks,
--
Dominique Martinet
Julia Lawall
2018-07-14 11:41:27 UTC
Permalink
Post by Dominique Martinet
* Copyright information
I left that one out on purpose, as I do not want to give the copyright
to anyone and do not particularily care for myself.
I'm doing that on my free time and this is not related to my work (as
opposed to e.g. the work I'm doing on 9P where I use my work e-mail;
which is also on my free time but relies on knowledge I owe to my work),
and I mostly see people attribute themselves copyright when related to
their work establishment.
Now I'm looking a bit closer I see this is not necessarily the case, but
I'd still rather leave this out unless there's a reason to add it.
I don't care what you want to do with the copyright. It' just the
opportunity to do something if you want to. On the other hand, it is
helpful to have the name of the person who proposed the semantic patch
present in the file, if there are future concerns about false positives.
Perhaps you could add something to the comments fiel, if you don't want to
put a copyright.
Post by Dominique Martinet
the only exceptions would be if someone relied on strncpy to fill the end
of the buffer with zero to not leak data somewhere but that is not easy
to judge by itself (although I hope rare enough)
Would you dare to develop a corresponding source code search as another
safety check?
Hmm, good question. It would be handy but will limit the matches more
than required I think.
Taking an example at random in the reports of the current patch,
cpumask in tools/accounting/getdelays.c is not zeroed out before the
strncpy so would be ruled out -- but when it's actually used, it only
sends to the network 'strlen(cpumask)+1' bytes of cpumask so the usage
made is perfectly safe.
My second argument here is a bad one (I just have to learn ;)) but while
I could easily check if dest has been memset'd/allocated with kzalloc,
I'm not sure how to express 'dest is a member of struct s, s was
allocted with kzalloc' which is probably much more common.
I'm also not sure how far back coccinelle would be able to check that?
For example in drivers/gpu/drm/i915/intel_tv.c we have 'mode_ptr =
drm_mode_create(...)' followed by 'strncpy(mode_ptr->name...), and
'drm_mode_create' did allocate with kzalloc; would coccinelle look that
far?
Coccinele works on one function at a time. You can collect information in
one rule and use it in another. But you can't be sure that eg an x
=kzalloc and an x in another function refer to the same thing.

Basically, you have two choices. You can try to make the rule more
defensive, at least in the patch case. Or you can reduce the confidence
and add a discussion at the top about what false positives may arise. See
for example tests/doublebitand.cocci.

julia
Dominique Martinet
2018-07-14 08:12:31 UTC
Permalink
Besides being simpler, using strscpy instead of strncpy+truncation
fixes part of the following class of new gcc warnings:

drivers/gpu/drm/i915/intel_tv.c: In function ‘intel_tv_get_modes’:
drivers/gpu/drm/i915/intel_tv.c:1358:3: error: ‘strncpy’ specified bound 32 equals
destination size [-Werror=stringop-truncation]
strncpy(mode_ptr->name, input->name, DRM_DISPLAY_MODE_LEN);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Note that this is note a proper fix for this warning. The warning was
intended to have developers check the return code of strncpy and
act in case of truncation (print a warning, abort the function or
something similar if the original string was not nul terminated);
the change to strscpy only works because gcc does not handle the
function the same way.

v2:
- Use strscpy instead of strlcpy, as strlcpy can read after the number
of requested bytes in the source string, and none of the replaced users
want to know the source string size length
- Add longer semantic patch information, warning in particular for
information leak
- Lowered Confidence level to medium because of the possibility of
information leak, that needs manual checking
- Fix spacing of the diff section and removed unused virtual context

Signed-off-by: Dominique Martinet <***@codewreck.org>
---
Thanks for the many feedback I received; I hope I didn't miss any.
In particular, I have conciously not removed the intermediate msg
variable; as I made the message longer I actually added one more of
these for the org mode section.
I also have decided to let spatch remove the second comment, given the
confidence level has been lowered, the user should be able to manually
adjust the result if required.

I will resend the other patchs of the series a much smaller number at
a time after doing all the appropriate checks and giving them a better
comment, after this has been merged.

.../coccinelle/misc/strncpy_truncation.cocci | 50 +++++++++++++++++++
1 file changed, 50 insertions(+)
create mode 100644 scripts/coccinelle/misc/strncpy_truncation.cocci

diff --git a/scripts/coccinelle/misc/strncpy_truncation.cocci b/scripts/coccinelle/misc/strncpy_truncation.cocci
new file mode 100644
index 000000000000..dd157fc8ec5f
--- /dev/null
+++ b/scripts/coccinelle/misc/strncpy_truncation.cocci
@@ -0,0 +1,50 @@
+/// Use strscpy rather than strncpy(dest,..,sz) + dest[sz-1] = '\0'
+///
+//# This makes an effort to find occurences of strncpy followed by forced
+//# truncation, which can generate gcc stringop-truncation warnings when
+//# source and destination buffers are the same size.
+//# Using strscpy will always do that nul-termination for us and not read
+//# more than the maximum bytes requested in src, use that instead.
+//#
+//# The result needs checking that the destination buffer does not need
+//# its tail zeroed (either cleared beforehand or will never leave the
+//# kernel) so as not to leak information
+//
+// Confidence: Medium
+// Comments:
+// Options: --no-includes --include-headers
+
+virtual patch
+virtual report
+virtual org
+
+@r@
+expression dest, src, sz;
+position p;
+@@
+
+***@p(dest, src, sz);
+dest[sz - 1] = '\0';
+
+@script:python depends on org@
+p << r.p;
+@@
+
+msg = "strncpy followed by truncation can be strscpy, if the destination buffer does not need to be fully overwritten"
+cocci.print_main(msg, p)
+
+@script:python depends on report@
+p << r.p;
+@@
+
+msg = "SUGGESTION: strncpy followed by truncation can be strscpy, if the destination buffer does not need to be fully overwritten"
+coccilib.report.print_report(p[0], msg)
+
+@ok depends on patch@
+expression r.dest, r.src, r.sz;
+@@
+
+- strncpy
++ strscpy
+ (dest, src, sz);
+- dest[sz - 1] = '\0';
--
2.17.1
Julia Lawall
2018-07-14 11:54:29 UTC
Permalink
Post by Dominique Martinet
+msg = "SUGGESTION: strncpy followed by truncation can be strscpy, if the destination buffer does not need to be fully overwritten"
+coccilib.report.print_report(p[0], msg)
This is the first SUGGESTION. I don't know if anyone out there is relying
on it always being WARNING or ERROR.

julia
Julia Lawall
2018-07-20 05:33:42 UTC
Permalink
Using strscpy instead of strncpy+truncation is simpler and fixes part
drivers/gpu/drm/i915/intel_tv.c:1358:3: error: ‘strncpy’ specified bound 32 equals
destination size [-Werror=stringop-truncation]
strncpy(mode_ptr->name, input->name, DRM_DISPLAY_MODE_LEN);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Note that this is not a proper fix for this warning. The warning was
intended to have developers check the return code of strncpy and
act in case of truncation (print a warning, abort the function or
something similar if the original string was not nul terminated);
the change to strscpy only works because gcc does not handle the
function the same way.
A previous version of this patch suggested to use strlcpy instead,
but strscpy is preferred because it does not read more than the given
length of the source string unlike strlcpy, which could read after the
end of the buffer in case of unterminated string.
strscpy does however not clear the end of the destination buffer, so
there is a risk of information leak if the full buffer is copied as is
out of the kernel - this needs manual checking.
As fasr as I can tell from lkml, only one of these patches has been
accepted? There was also a concern about an information leak that there
was no response to. Actually, I would prefer that more of the generated
patches are accepted before accepting the semantic patch, for something
that is not quite so obviously correct.

julia
---
- Use strscpy instead of strlcpy, as strlcpy can read after the number
of requested bytes in the source string, and none of the replaced users
want to know the source string size length
- Add longer semantic patch information, warning in particular for
information leak
- Lowered Confidence level to medium because of the possibility of
information leak, that needs manual checking
- Fix spacing of the diff section and removed unused virtual context
- Add license/copyright
- Rewording of commit message
I didn't see many other remarks, but kept SUGGESTION as discussed.
I didn't move all virtuals in a single line because none of the other
kernel patch do it, and still do not see any advantage of moving the
string to not use a variable so kept that as well.
This should hopefully be the last version :)
.../coccinelle/misc/strncpy_truncation.cocci | 52 +++++++++++++++++++
1 file changed, 52 insertions(+)
create mode 100644 scripts/coccinelle/misc/strncpy_truncation.cocci
diff --git a/scripts/coccinelle/misc/strncpy_truncation.cocci b/scripts/coccinelle/misc/strncpy_truncation.cocci
new file mode 100644
index 000000000000..7732cde23a85
--- /dev/null
+++ b/scripts/coccinelle/misc/strncpy_truncation.cocci
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/// Use strscpy rather than strncpy(dest,..,sz) + dest[sz-1] = '\0'
+///
+//# This makes an effort to find occurences of strncpy followed by forced
+//# truncation, which can generate gcc stringop-truncation warnings when
+//# source and destination buffers are the same size.
+//# Using strscpy will always do that nul-termination for us and not read
+//# more than the maximum bytes requested in src, use that instead.
+//#
+//# The result needs checking that the destination buffer does not need
+//# its tail zeroed (either cleared beforehand or will never leave the
+//# kernel) so as not to leak information
+//
+// Confidence: Medium
+// Copyright: (C) 2018 Dominique Martinet
+// Options: --no-includes --include-headers
+
+virtual patch
+virtual report
+virtual org
+
+expression dest, src, sz;
+position p;
+
+dest[sz - 1] = '\0';
+
+p << r.p;
+
+msg = "strncpy followed by truncation can be strscpy, if the destination buffer does not need to be fully overwritten"
+cocci.print_main(msg, p)
+
+p << r.p;
+
+msg = "SUGGESTION: strncpy followed by truncation can be strscpy, if the destination buffer does not need to be fully overwritten"
+coccilib.report.print_report(p[0], msg)
+
+expression r.dest, r.src, r.sz;
+
+- strncpy
++ strscpy
+ (dest, src, sz);
+- dest[sz - 1] = '\0';
--
2.17.1
Julia Lawall
2018-07-20 05:49:59 UTC
Permalink
Post by Julia Lawall
strscpy does however not clear the end of the destination buffer, so
there is a risk of information leak if the full buffer is copied as is
out of the kernel - this needs manual checking.
As fasr as I can tell from lkml, only one of these patches has been
accepted? There was also a concern about an information leak that there
was no response to. Actually, I would prefer that more of the generated
patches are accepted before accepting the semantic patch, for something
that is not quite so obviously correct.
As I'm pointing to the script which generated the patch in the generated
patches, I got told that it would be better to get the coccinelle script
accepted first, and asked others to hold on taking the patches at
several places - I didn't resend any v2 of these with strscpy yet mostly
for that reason.
I can't accept a semantic patch for which I can't judge the correctness.
It would be better to put a proper commit message in the individual
patches and get them accepted first.

The actual change is made by a script that is only a few lines long. You
can put those lines in your commit message if you like.
There were concerns for information leaks that I believe I adressed in
the specific patch that was pointed out by the concern (I might have
missed some?), but I'll take the time to check all the patches
individually before resending as well as filling in better commit
messages which also was one of the main concerns.
I'm however a bit stuck if I'm waiting for the cocinelle script to be
accepted to resend the patches, but you're waiting for the individual
patches to be accepted to take the script... :)
I guess there is no value in the script landing first by itself, I'll
just remove the script path from the commit messages and resend the
first few this weekend.
It's not that there is no value to the script. The problem is that I
don't know if the script is correct - I'm not familiar with these string
functions. Once the script is in the kernel, it stays there beyond your
patches, so I would prefer to know that it is correct up front, rather
than having to remove it afterwards.

julia
Julia Lawall
2018-07-20 06:03:35 UTC
Permalink
Post by Julia Lawall
I guess there is no value in the script landing first by itself, I'll
just remove the script path from the commit messages and resend the
first few this weekend.
It's not that there is no value to the script. The problem is that I
don't know if the script is correct - I'm not familiar with these string
functions. Once the script is in the kernel, it stays there beyond your
patches, so I would prefer to know that it is correct up front, rather
than having to remove it afterwards.
I understand, I didn't say there is no value in the script ("landing
first by itself" doesn't mean it should/can not be taken later)
I'll bump this thread again in a couple of weeks after having resent
most of the other patches
Thanks.

The rule is also not so efficient in the patch case, because you have the
rule r that matches the pattern, and then the ok rule at the end that
matches the same pattern. It would be better to put depends on org ||
report in the rule r, and let the patch rule be freestanding, ie just
declare dest, src, and sz, not r.dest, etc.

If you like, you can also add the context case by just putting a * in
front of the strncpy call in the r rule. That highlights the change in
diff-like output, which can in general be useful to see the context in
which the issue occurs.

julia

Julia Lawall
2018-07-14 20:36:09 UTC
Permalink
Not a big deal, but actually the v2 goes below the ---
I've seen both being done (if you look at the git log of the linux
kernel and search for 'v2' you will have some matches)
I guess. Normally I would conseider that since the v1 is not in the git
history, no one care about the delta between the v1 and v2. If there is
important information it should just be in the commit message.
The list was a bit long in this case, but I think it's worth at least
mentioning that the previous version used strlcpy and why I changed in
the commit message.
I guess, but you could say that strlcpy was not used for a certain reason,
without making it historical information.
Post by Dominique Martinet
+msg = "SUGGESTION: strncpy followed by truncation can be strscpy, if the destination buffer does not need to be fully overwritten"
+coccilib.report.print_report(p[0], msg)
This is the first SUGGESTION. I don't know if anyone out there is relying
on it always being WARNING or ERROR.
Eh, I must have been really unlucky with the scripts I looked at, one
just happened to have SUGGESTION used like this (misc/warn.cocci), but
now you said that I can see it's the only one!
I'm not sure on what to do here, if you think there could be scripts
relying on that then I'll change this to WARNING, but the wording feels
a bit strong and "suggestion" leaves more room for interpretation.
I guess that if there is already one, then another won't hurt.
Copyright stuff in the other sub-thread
Replying here instead to limit the number of mails sent,
I think people would look at git blame/log if there is no name in the
file, but I can understand it is simpler if a name is present.
One less command to type.
Just a nitpick on format, all copyright comments on cocci scripts end
with the license; since that will be added as an SPDX tag instead do you
mind if I do not list it again there?
I know nothing about SPDX tags. If something is added, I don't know how
it is done.

julia
Also just a head's up, I'll be AFK for the next ~48 hours; I'll post a
v3 of the patch with license/copyright added, possibly suggestion
changed, and whatever else comes up by then :)
Thanks,
--
Dominique Martinet
Loading...