Java library for matching text ranges

URLs

Do you want to improve this page? Please edit it on GitHub.

Description

Java library to find:

  • text ranges (defined by a start string and an end string) that can be included in each other.

  • specific string sequence outside a range.

Text range

First example

In following text, you want to match ranges that starts with ( and ends with ):

5 + (4 + (1 + 2) / 3 - 5) * 10 / (3 + 2)

The first range is expected to be (4 + (1 + 2) / 3 - 5).

This result (4 + (1 + 2) (matching the first opening bracket and the first closing bracket) is wrong.

This other result (4 + (1 + 2) / 3 - 5) * 10 / (3 + 2) (matching the first opening bracket and the last closing bracket) is also wrong.

Corresponding java code with SubstringFinder:

String text = "5 + (4 + (1 + 2) / 3 - 5) * 10 / (3 + 2)";

SubstringFinder finder = SubstringFinder.define("(", ")");
Optional<Range> findRange = finder.nextRange(text);
if (findRange.isPresent()) {
    Range range = findRange.get();
    String substring = text.substring(range.getRangeStart(), range.getRangeEnd());
    assertEquals("(4 + (1 + 2) / 3 - 5)", substring);
}

Where SubstringFinder corresponds to this imported class: fr.jmini.utils.substringfinder.SubstringFinder.

Second example

Find the correct range defined { by } corresponding to the main method:

public static void main(String[] args) {
    if(args != null) {
        for (String s : args) {
            printArg(s);
        }
    }
}

public static void printArg(String arg) {
    System.out.println("Arg: " + arg);
}

Exclude an other range

Consider this example:

package tmp;

@SomeAnnotation(arg1="value ;-)", arg2="other value")
public class SomeClass {
}

If you would like to find the content of the @SomeAnnotation value, you can define your range like this:

  • start: @SomeAnnotation(

  • end )

But in this case you also need to exclude the content between the quotes (" .. "), in order to not match the end of the range with the :-) in the String.

String text = ""
        + "package tmp;\n"
        + " \n"
        + "@SomeAnnotation(arg1=\"value ;-)\", arg2=\"other value\")\n"
        + "public class SomeClass {\n"
        + "}\n";

SubstringFinder finder = SubstringFinder.define("@SomeAnnotation(", ")", "\"", "\"");
Optional<Range> findRange = finder.nextRange(text);
if (findRange.isPresent()) {
    Range range = findRange.get();
    String substring = text.substring(range.getRangeStart(), range.getRangeEnd());
    assertEquals("@SomeAnnotation(arg1=\"value ;-)\", arg2=\"other value\")", substring);
}

String positions outside a range

First example

In following text, you want to find the comma , outside ranges defined by single quote ':

'Hello,world',5,true

The two correct matches are:

  • Between ' and 5

  • Between 5 and t

The first comma should be ignored because it is between ' and '.

Corresponding java code with PositionFinder:

String text = "'Hello,world',5,true";

PositionFinder finder = PositionFinder.define(",", "'", "'");
List<Integer> findPositions = finder.indexesOf(text);
assertEquals(2, findPositions.size(), "size");

assertEquals("'Hello,world'", text.substring(0, findPositions.get(0)));
assertEquals("5", text.substring(findPositions.get(0) + 1, findPositions.get(1)));
assertEquals("true", text.substring(findPositions.get(1) + 1));

Where PositionFinder corresponds to this imported class: fr.jmini.utils.substringfinder.PositionFinder.

Second example

Find the world video that is not inside < and >:

This video: <video width="320" height="240" controls>
  <source src="movie.mp4" type="video/mp4">
</video>

First position outside a range

Similar to the previous example, if you are only interested by the first position of the , outside ( and ):

lorem(Hello,world),ipsum

Corresponding java code with PositionFinder:

String text = "lorem(Hello,world),ipsum";

PositionFinder finder = PositionFinder.define(",", "(", ")");
Optional<Integer> findPosition = finder.indexOf(text);
assertEquals(true, findPosition.isPresent(), "isPresent");

assertEquals("lorem(Hello,world)", text.substring(0, findPosition.get()));
assertEquals("ipsum", text.substring(findPosition.get() + 1));

Where PositionFinder corresponds to this imported class: fr.jmini.utils.substringfinder.PositionFinder.

Download

Starting with version 1.0.1, the library is hosted on maven central.

Maven coordinates of the library
<dependency>
  <groupId>fr.jmini.utils</groupId>
  <artifactId>substring-finder</artifactId>
  <version>1.1.1</version>
</dependency>

Build

This project is using gradle.

Command to build the sources locally:

./gradlew build

Command to deploy to your local maven repository:

./gradlew publishToMavenLocal

Command to build the documentation page:

./gradlew asciidoctor

The output of this command is an HTML page located at <git repo root>/build/docs/html5/index.html.

For project maintainers

signing.gnupg.keyName and signing.gnupg.passphrase are expected to be set in your local gradle.properties file to be able to sign. sonatypeUser and sonatypePassword are expected to be set in order to be able to publish to a distant repository.

Command to build and publish the result to maven central:

./gradlew publishToSonatype

Command to upload the documentation page on GitHub pages:

./gradlew gitPublishPush

Command to perform a release:

./gradlew release -Prelease.useAutomaticVersion=true

Using ssh-agent

Some tasks requires to push into the distant git repository (release task or updating the gh-pages branch). If they are failing with errors like this:

org.eclipse.jgit.api.errors.TransportException: ... Permission denied (publickey).

Then ssh-agent can be used.

eval `ssh-agent -s`
ssh-add ~/.ssh/id_rsa

(source for this approach)

Get in touch

Use the issue tracker on GitHub.

You can also contact me on Twitter: @j2r2b

License

Code is under Eclipse Public License - v 2.0. Documentation and slides are under the Creative Commons BY-SA 4.0