项目作者: diogoap

项目描述 :
Hamming Calculator program
高级语言: C#
项目地址: git://github.com/diogoap/HammingCalculator.git
创建时间: 2018-11-03T20:25:11Z
项目社区:https://github.com/diogoap/HammingCalculator

开源协议:

下载


Build Status

Hamming Calculator

This program calculates the Hamming Distance for 2 binary inputs.

Solution is written in C# targeting .NET Core 2.1. Source code is integrated with TravisCI to allow Continuous Integration.

The program expects the following arguments:

  1. Data source: argument name must be either -inline or -file

    • 2 values are required
    • If source is -inline:
      • Values must be two strings containing a sequence of bits. Example: “0101010101010101”.
      • Strings must be equal size.
    • If source is -file:
      • Values must be two valid file names. Example: “C:\file.txt”.
      • File contents must be equal size.
  2. Method of calculation: argument name is -method

    • Optional
    • Value should be:
      • Standard: to select default calculation method.
      • Parallel: to perform the calculation with parallelism. Should be used in large amounts of data.

Usage instructions

To call the program, specify the data source and provide two strings containing either inline data or the file names where data will be read from. For example:

  1. dotnet HammingCalculator.App.dll -inline 0011 1101
  1. dotnet HammingCalculator.App.dll -file "C:\temp\file1.txt" "C:\temp\file2.txt"

Also, the calculation method can be specified. For example:

  1. dotnet HammingCalculator.App.dll -file "C:\temp\file1.txt" "C:\temp\file2.txt" -method Parallel
  1. dotnet HammingCalculator.App.dll -inline 0011 1101 -method Standard

If -method argument is not provided, Standard method is considered.

Dealing with large amounts of data

The program provides an alternative approach to calculate the Hamming Distance for large amounts of data. When used, this approach split the data into small chunks and process each part simultaneously through .Net Parallel library.

Using Parallel, the calculation is executed about 75% faster than the regular method, or even more.
Here is a comparison of two calculation methods, processing files with 91,8 MB. First, let’s use the Standard method:

  1. dotnet HammingCalculator.App.dll -file "C:\temp\file1.txt" "C:\temp\file2.txt" -method Standard
  2. Calculating Hamming Distance for the inputs below:
  3. Input 1: C:\temp\file1.txt
  4. Input 2: C:\temp\file2.txt
  5. Strategy: HammingCalculator.Lib.DistanceCalculator.HammingDistanceCalculatorStandard
  6. Hamming distance for the provided inputs is 5.
  7. Elapsed Milliseconds: 410

Now the Parallel execution method:

  1. dotnet HammingCalculator.App.dll -file "C:\temp\file1.txt" "C:\temp\file2.txt" -method Parallel
  2. Calculating Hamming Distance for the inputs below:
  3. Input 1: C:\temp\file1.txt
  4. Input 2: C:\temp\file2.txt
  5. Strategy: HammingCalculator.Lib.DistanceCalculator.HammingDistanceCalculatorParallel
  6. Hamming distance for the provided inputs is 5.
  7. Elapsed Milliseconds: 216

Standard method took 410 ms and Parallel method ran in 216 ms.

Improvements

Here are some suggestions for future improvements:

  • Usage of a DI Container to instantiate an IHammingDistanceCalculatorStrategy class instead of using a factory. This change would make the maintenance simpler.
  • Adopt a 3rd library to parse the program arguments. Currently, ProgramOptions class is responsible for this task. However, using some 3rd library would bring more capabilities and remove this concern of the application.
  • Expand CI capabilities by adding an automatic deployment into CI pipeline.