The BEDOPS suite includes `bedmap`

, which performs statistical calculations on BED elements. For your question, the `--bases`

operator returns the number of bases of overlap between the reference element and mapping elements. If we use the input BED file for both reference and mapping, then we simply get back the number of bases that make up each element.

For example, let's say we have the following BED file:

```
chr2 160 210 id-2 4
chr2 220 490 id-3 10
```

We can run `bedmap`

on this file to get the number of bases in each element:

```
$ bedmap --echo --bases --delim '\t' test.bed
chr2 160 210 id-2 4 50
chr2 220 390 id-3 10 170
```

We can pipe to `awk`

or any other downstream tool for filtering:

```
$ bedmap --echo --bases --delim '\t' test.bed | awk '{if ($6 < 200) print}' -
chr2 160 210 id-2 4 50
```

This result can be piped to `wc -l`

to get the number of filtered elements:

```
$ bedmap --echo --bases --delim '\t' test.bed | awk '{if ($6 < 200) print $0}' - | wc -l
1
```

Or we can pipe to `cut`

to get back filtered elements:

```
$ bedmap --echo --bases --delim '\t' test.bed | awk '{if ($6 < 200) print}' - | cut -f1-5
chr2 160 210 id-2 4
```

In all, this seems a more complicated approach than what you use for your example case. However, BEDOPS `bedmap`

can perform many statistical operations on BED elements, and so you might find it useful for other applications.

BEDOPS also has a very low memory footprint and scales efficiently to very large genomic datasets, whereas it can be easier to run into memory, performance and scaling issues with using R with large datasets.

+1 for a very nice R tutorial link

thx! very nice R tutorial!