The use of magnetism to discriminate between different laser printers

(1)

The use of magnetism to discriminate

between different laser printers

Name: Student nr.: Education: Institute: Organisation: Supervisor: Confidential: Date: Version: M.M.J. Mieremet BSc 1308505 MSc Applied Mathematics Delft University of Technology Netherlands Forensic Institute Ir. K. Herlaar

Not confidential 31-07-2013 Version 1.0

(2)

(3)

Institute Delft University of Technology Education MSc Applied Mathematics

Organisation Netherlands Forensic Institute Laan van Ypenburg 6

2497 GB The Hague P.O. Box 24044 2490 AA The Hague www.forensischinstituut.nl Department Team Expertise

Digital Technology en Biometrics

Handwriting, Documents and Fingerprints Questioned Document Examination Supervisor Ir. K. Herlaar

Contact T. +31 (0)70 888 6327 @ k.herlaar@nfi.minvenj.nl

The use of magnetism to discriminate

between different laser printers

Date 31-07-2013

Version Version 1.0 Status Published Confidential Not confidential

Name M.M.J. Mieremet BSc Student nummer 1308505

Contact T. +31 (0)6 1191 5595

(4)

(5)

Preface

The purpose of this report is to outline the study into the magnetic properties of fused toner particles in questioned document examination and to answer the question whether it is possible to use magnetism to discriminate between different laser printers on a quantitative basis. During an internship at the Netherlands Forensic Institute this new subject is treated. The internship is part of the master program Applied Mathematics at the Delft University of Technology.

The report is relevant for forensic scientists, especially for those who are interested in questioned document examination. Note the fact that this report contains the likelihood-ratio framework, so some knowledge in statistics is desirable.

First a short introduction to questioned document examination and this particular subject is treated in chapter 1. Then the physics of a laser printer is dealt with in chapter 2. Chapter 3 and 4 contain an overview of the reference pages and the measuring device, and in the following chapters the reader will gain information in the measuring process, the storage of measurements and the processing of all data. Chapter 7 then discusses the choices made during a first analysis, while chapter 8 contains the actual results, both guided with many histograms and graphs. Some discussion, a brief conclusion and some recommendations for future scientists can be found eventually in respectively chapter 9, 10 and 11.

At last I would like to thank my supervisor Koen Herlaar for giving me the opportunity to study this subject. Furthermore, I would like to thank Brenda van Daelen, Iris den Ouden, Mignonne Fakkel and other colleagues at the Netherlands Forensic Institute for their help and support.

A special thanks is for Joost Vlek from Advanced Security Solutions BV for making the Regula Magmouse Model 4197 available to the Nederlands Forensic Institute.

Miriam Mieremet The Hague, 2013

(6)

(7)

Abstract

Magnetism used to be a qualitative property to discriminate between different laser printers. Nowadays even small magnetic activity can be measured securely, so the question has risen whether magnetism can be used to discriminate on a quantitative basis.

This study uses many samples from both magnetic and non-magnetic toners to answer this

question. For each sample the flux and magnetic induction is measured with the Regula Magmouse Model 4197, which can be mutually compared with help of several distance measures and

correlation measures.

With help of the log-likelihood-ratio cost the author has chosen the best comparing method: the 1-norm in a finite vector space, with data on a logarithmic scale and the kernel density estimate as probability density function.

The author has found that only black magnetic toner particles can be discriminated by magnetism on a quantitative basis with the likelihood ratio as evidential power. Multiple samples or multiple measurements in one comparison appear to be an improvement for the results and there are also positive signs for comparing similar but different fonts.

More research is necessary before conclusions can be used in legal cases, but then it will contribute in several forensic topics in questioned document examination.

(10)

(11)

1 Introduction to questioned document examination

At the Netherlands Forensic Institute in The Hague a lot of forensic investigation is done. A team within the department Digital Technology and Biometrics is specialized in questioned document examination. The forensic document examiners work for example on financial documents, contracts and anonymous letters.

A lot of different topics can be addressed during a questioned document examination: • Visualizing indented writing;

• Reconstruction of documents; • Mutual comparison of documents; • Linking documents to a machine; • Authenticity of documents; • Origin of anonymous letters.

With a mutual comparison of documents examiners try to find out whether two or more documents have the same origin. In case of a written text the ink can be chemically analyzed, the paper can be analyzed and last but not least the handwriting can be compared.

Where people often have specific aspects in their handwriting, like pressure and curvature,

machines do not have specific characteristics due to mass production. E.g. all printers of the same type share the same common characteristics, like printing technique and resolution. The more characteristics the documents have in common, the higher the evidential power that the documents have the same origin.

In the past the magnetic properties of fused toner particles have come up as a characteristic for laser printers. Earlier it was only possible to test for magnetism on a qualitative basis. However, with the introduction of more modern magneto-optical visualizers, it is now possible to test for magnetism on a quantitative basis.

The question now is whether these quantitative measurements give new insight in questioned document examination: Is it possible to use magnetism to discriminate between different laser printers on a quantitative basis?

When the answer to this question turns out to be yes, this study will contribute to several above-mentioned topics, e.g. mutual comparison of documents, linking documents to a machine and authenticity of documents.

(12)

2 Technology of a laser printer

Before investigating the magnetic properties of fused toner particles, it is useful to know the cause of the magnetic properties. For that reason the technology of a laser printer is treated in this chapter. The first section gives insight to the general process of a laser printer, while the second section explains why some toner particles are magnetic and others are not.

2.1 The electrophotographic process

Laser printers are able to produce printouts from computers since the late 1970s. The technique is called the electrophotographic process and consists of six process steps that are extensively described by Williams (1993). In this section an overview of these process steps is given, which is graphically shown in figure 2.1.

Initially the photoconductor drum, or shortly drum, is uncharged. Then a corona device charges the drum and creates a uniform surface potential (1). Second the drum is exposed in a specific pattern consistent with the image to be printed. A flash lamp draws the image in one time (2a), while a laser beam forms the image spot by spot (2b). The magnetic brush then transfers the magnetic developer towards the drum and leaves toner particles at the exposed spots (3).

After the image is developed on the drum, the drum almost touches the plain paper and toner particles are transferred from the drum to the paper with help of an attractive electric field from the transfer corona (4). Fusing the toner by heat and pressure will finally make the image permanent (5).

It is not uncommon that some toner particles are left on the drum. With an erase lamp the surface of the drum is completely discharged (6a) and the remainder of the toner is removed (6b). The drum is now ready for a new image.

(13)

2.2 Mono versus dual component developer

In the previous section it is explained that the magnetic brush ‘transfers the magnetic developer towards the drum’. These days two sorts of developers are commonly used in laser printers. Dual component developer consists of toner particles and carrier beads. The toner particles are charged by triboelectrification and adhere to the much larger carrier beads. The carrier beads are magnetic and can be transported easily under influence of a permanent magnet towards the drum. Near the drum the toner particles are rubbed from the carrier beads and transferred to the drum, while the carrier beads stay behind (see figure 2.2).

Figure 2.2 Developing unit for a dual component developer (Williams, 1993, p.122)

The toner-carrier adhesion is not easily broken, so the toner particles do not always transfer from the carrier beads to the drum. This problem is eliminated with the use of mono component developer. This developer consists of magnetic toner particles, so carrier beads are no longer necessary in the developing unit (see figure 2.3). The magnetic properties are usually caused by the magnetic pigment iron oxide (Fe3O4) in the toner particles.

Figure 2.3 Developing unit for a mono component developer (Takahashi, 1982, p.254)

The difference between both developers is measurable, because the printed image of a mono component developer will show some magnetic activity, while the printed image of a dual component developer will hardly do so. These properties will be used in the present study.

(14)

3 Reference material

With the background theory in mind, it is now necessary to gather reference material. Many reference pages are available for all sorts of goals. Some contain text and others contain pictures. Some are black and white and others are in full color. Which reference pages are well suited for a certain study depends on availability and properties.

At the NFI a major collection of reference pages from the ‘Kwaliteitskring Documenten en

Betaalmiddelen’ is available printed by both inkjet printers and laser printers from 2001 to 2006. A scaled version of the reference page is given in Appendix A.1. This particular reference page will be refered to as reference page A from now on. In the collection 113 prints from 70 different color laser printers are used in the study into magnetic properties of toner particles.

In Appendix A.2 and A.3 two other reference pages (named reference page B and C) are added. Reference page B contains a text that originally comes from handwriting examination. It shows the whole alphabet and all numeric characters. Reference page C is an altered version of a testprint of the company ImageXpert. The pages are designed by Herlaar (2010) especially for the Printer Project Primex.

All employees of the NFI were asked to print reference page B four times and reference page C one time on their inkjet printer during 2009 and 2010. However, some prints from laser printers were collected as well. Those 72 reference pages B and 7 reference pages C from 18 different laser printers can be used in this present study.

Because more than 80% of the printers were color laser printers, a third collection was made in 2013, especially for this study. More testprints from mostly black and white laser printers were asked to friends and family of the author. This question resulted in a collection of 16 times reference page A, 136 times reference page B and 16 times reference page C from 35 different laser printers.

The total collection of 360 reference pages contains a wide range of brands. Figure 3.1 shows that HP, Samsung and Xerox represent more than half of the collection. HP and Samsung are often used in households, while Xerox is a brand common for companies.

(15)

Microsoft Excel is used for keeping a database. On each row the information of one print is gathered (see figure 3.2), from which brand, type and date are most important.

(16)

4 Measuring device

Besides reference material, a measuring device is also very important. In figure 4.1 the Regula Magmouse Model 4197 is shown. This magneto-optical visualizer is developed by the company Regula, that is situated in Belarus.

The original purpose of the Magmouse is detection of counterfeit money, forged cheques and forged identity papers.

However, the device might also be useful in questioned document examination.

The device visualizes magnetic properties of ink or toner, and is also able to measure the magnetic induction and flux.

4.1 Visualization of magnetic properties

The device can be used in combination with CADR software. The CADR software has a direct link with the Magmouse, which makes it possible to import even continuous images, like a real video camera. However, the Magmouse does not show the paper and toner, but only the magnetic properties. In figure 4.2 and 4.3 images of a non-magnetic toner and a magnetic toner are visible. These pictures show that it is possible to use magnetism to discriminate between different laser printers on a qualitative basis.

Figure 4.2 Image of a non-magnetic toner

Figure 4.3 Image of a magnetic toner

(17)

4.2 Measurements of magnetic properties

The question to be answered in this report is whether it is possible to use magnetism to discriminate between different laser printers on a quantitative basis. The CADR software has a special measurement module implemented that can be used to answer this question. For every pixel the magnetic field in normal direction is measured, the numbers are scaled if necessary and the results are plotted in a histogram together with an estimated flux*_{. Again there is a major} difference between a non-magnetic toner and a magnetic toner (see figure 4.4 and 4.5), only this time it is possible to quantify the difference in an objective way. In section 5.3 several options for this quantification are discussed.

Figure 4.4 Histogram of magnetic induction of a non-magnetic toner

Figure 4.5 Histogram of magnetic induction of a magnetic toner

*

(18)

4.3 Soft- versus hard-magnetic properties

The Magmouse consists of a removable magnet that causes an external magnetic field. This removable magnet is called a magnetic bias system (see figure 4.6). When the magnetic bias system is attached to the Magmouse the external magnetic field magnetizes the toner particles. This magnetic field in normal direction is then measured by the CADR software.

After removing the magnetic bias system, soft-magnetic materials do not sustain their magnetic field, while hard-magnetic materials sustain their hard-magnetic field. This change is also measurable by the CADR software and makes it possible to discriminate between soft-magnetic and hard-magnetic materials.

Note that the external magnetic field from the magnetic bias system is not measured by the CADR software, because this field is not in a normal direction, but in a tangential

direction. Figure 4.6 Removable magnetic bias system (Regula, n.d., p.7)

4.4 Variation in measurements

In a comparative study the variation is expressed in the degree of repeatability and reproducibility. The repeatability is characterized by the same conditions, while with reproducibility the effects of each condition are examined by changing only one condition in a time. Of course the same sample is measured every time: ‘Bob’, Times New Roman, 12pt, HP laserjet P1006, print #1.

In this study the following conditions are considered: • Location

• Measuring area • Researcher

Location

In the laboratory of questioned document examination many tables, chairs and electrical equipment are available. Some parts influence the magnetic field in the laboratory, so the location of the measurements can influence the results.

The standard location was on a small wooden table. For the test on reproducibility ten different

locations have been chosen: three locations on a small wooden table, three locations on the floor, three locations on a table with metal parts and one location on a marble

(19)

Measuring area

The measuring area is a magneto-optical sensitive element at the bottom of the Magmouse. The element is 14x18mm, which corresponds with 1024x1280 pixels. Because it is possible to select fragments, it is necessary to know whether the location in the measuring area can influence the results. For that reason ten different locations in the measuring area are chosen (see figure 4.8).

Researcher

Also a test with ten different researchers is done. However this test appears to be unuseful, due to the fact that in this test not only the researcher changed, but also the measuring area. This contradicts with the fact that only one

Figure 4.8 Ten different locations condition in a time is allowed to changed. For that reason the

in the measuring area results will be ignored in this study. Repeatability

The repeatability test consists of 30 measurements on a wooden table (location 2 in the laboratory) in the upper left corner of the measuring area (location 1 in the measuring area). The results are shown in the left graph of figure 4.9. With help of Matlab it is easy to find the mean (60,4mT) and the standard deviation (2,4mT).

Reproducibility

The reproducibility is supposed to consist of three times 10 measurements, but due to the above-mentioned reasons only two times 10 measurements are discussed. The right graph in figure 4.9 shows two boxplots that represent the variation when the location is changed and the variation when the measuring area is changed. Matlab shows a slightly higher mean (62,9mT) and a similar standard deviation (2,1mT) for the changing location, but the difference is much bigger when the measuring area is changing. Now the mean is 69,6mT and the standard variation is even 4.6mT. In a comparative study the mean is less important than the standard deviation. For that reason we may conclude that the location will not influence the results much, but the measuring area will do so.

Figure 4.9 Variation in flux

All measurements for the rest of the study are done at location 2 in the laboratory and at location 1 in the measuring area, so for the flux a standard deviation of 2,4mT needs to be taken into account under the assumption of homoscedasticity (equal standard deviation for all samples).

(20)

5 Research method

In the prior chapters the reference material and the measuring device are introduced. The next step is to measure all reference material. The choices that have been made concerning the measurements are discussed in section 5.1.

When all data is collected it is important to check which part is usefull for analysis. In section 5.2 the correlation between the flux and magnetic induction is introduced to see whether the flux gives additional information besides the magnetic induction.

Section 5.3 deals with the statistical analysis based on the magnetic induction. The likelihood-ratio framework is shortly explained and ten different comparison methods are introduced.

The chapter ends with a metric for validity of the likelihood-ratio framework.

5.1 Measurements with the Magmouse

The measuring device is only for a short time available for the NFI. During a period of six weeks all information should be gathered, so it is important to make the right choices before starting the measurements.

The reference material consists of three different reference pages. On reference page A it is possible to take square samples of 500 by 500 pixels in yellow, magenta, cyan and black (see figure 5.1). These samples are easy to measure and appear on reference page C as well. Besides the comparison of those four samples with the same samples from another laser printer, it might also be possible to compare between the colors on one page.

Figure 5.1 Four samples of 500 by 500 pixels

Reference page B consists only of black text in different fonts. Because three different fonts are used, it is convenient to take one sample from each font. To be able to compare between the fonts on one page, it is necessary to take a word that appears in all fonts. The word ‘Bob’ satisfies this condition and can be captured in a sample of 500 by 250 pixels (see figure 5.2). On line 1 the name is in Times New Roman (12pt), on line 20 the name is in Arial (11pt) and finally on line 23 the name is in Calibri (12pt).

Figure 5.2 Three samples of 500 by 250 pixels

The samples have been chosen, so now it is possible to actually start with the measurements. The measurements are done according to a clear script that is attached to this report as appendix B.1. In this script a transparant model of the measuring device is used to put the measuring device in the right position. A non-transparant version of the model is given in appendix B.2.

The values of the flux and the histogram of magnetic induction are saved in the same database that was put forward in chapter 3. The flux is a single value and the histogram consists of 33 values on a logarithmic scale.

(21)

5.2 Correlation between flux and magnetic induction

The next step is to find out which data can be used for questioned document examination and which data can be neglected. The CADR-software gives measurements of both the flux and the magnetic induction in the normal direction. An example of the data from the histogram of magnetic induction is presented in table 5.1.

i 1 2 3 4 5 6 7 8 9 10 11 Bi -2,8 -2,7 -2,6 -2,5 -2,4 -2,2 -2,0 -1,7 -1,4 -1,1 -0,9 Ni 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 1,4 2,6 i 12 13 14 15 16 17 18 19 20 21 22 Bi -0,7 -0,5 -0,3 -0,2 -0,1 0,0 0,1 0,15 0,3 0,5 0,7 Ni 3,5 4,3 4,9 5,2 4,8 5,8 4,7 5,3 5,0 4,6 4,1 i 23 24 25 26 27 28 29 30 31 32 33 Bi 0,9 1,1 1,4 1,7 2,0 2,2 2,4 2,5 2,6 2,7 2,8 Ni 3,5 2,5 1,6 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0

Table 5.1 Data from histogram of magnetic induction of the black square sample of a copy of reference page A from the Canon CP 660 from 05-04-2001 (Bi in mT, 10^Ni in pixels). The flux (φn) and the magnetic induction (Bn) are related in the following way:

,

∫

=

B

n

dA

n

φ

where the physical unit of the flux is Weber (Wb) and the physical unit of the magnetic induction is Tesla (T = Wb/m2_{). It is thus also possible to calculate the flux from the magnetic induction.} Agalidi Yu et al. (2011) state that the physical length and width of a pixel is 16µm for this particular device. Because the physical area of a pixel then equals A=256∙10-12_m2†_{, it is now} possible to calculate the flux with the slightly adapted equation

.

10

33 1

∑

=

⋅

=

i N n i n

A

B

i

φ

When the estimated flux and the calculated flux are highly correlated, it is no longer necessary to consider both variables in the analysis, because the estimated flux will not supply additional information. Indeed, section 7.2 shows that both variables are highly correlated, because the correlation coefficient is close to one (r=0,9994).

For that reason the statistical analysis that will be introduced in the next section is done based on the magnetic induction.

†

Regula confirmed that the physical area of a pixel (256∙10-6

m2

(22)

5.3 Likelihood-ratio framework

As in many sciences statistics are also used in forensic sciences. There is a distinction between descriptive statistics and inferential statistics. The latter is used to draw conclusions from data, and that is exactly what is needed in forensic sciences. The likelihood-ratio framework is a part of inferential statistics. In this section the general theory is applied to questioned document examination with respect to the magnetic properties of toner particles.

Hypotheses

In questioned document examination often two documents are compared. The question in this study is whether the documents come from the same type of printer. For that reason the hypotheses can be formulated in the following way:

H0 Documents A and B come from the same type of printer, H1 Documents A and B come from different types of printers.

Score

In this report the magnetism of toner particles is measured and from both documents the flux and the magnetic induction is available, from which the flux is already discarded. Now it is necessary to quantify the difference between both documents, but there are several possibilities for this

quantification. In this study ten different options are discussed, that can be subdivided in three sets: distance measures in a finite vector space, distance measures in a function space and correlation measures.

Distance measures in a finite vector space

Each histogram of magnetic induction measured with the Magmouse and made by CADR, is represented by a row vector of dimension n=33. Therefore it is logical to compare the two row vectors when two documents are compared.

Haase (2012) presents an introduction to inner products, norms and metrics for finite vector spaces. The best known distance measure for two vectors in a finite vector space is the norm. In table 5.2 the p-norms are shown for p=1,2,∞. The distance between two documents printed by the same type of printer is likely to be about zero, while the distance between two documents printed by different types of printers is likely to be bigger.

Table 5.2 Different p-norms (p=1,2,∞) Distance measures in a function space

The histogram can also be seen as a function in a function space, because the values represent a piecewise linear function. After all, each value in the row vector belongs to a certain magnetic induction and the dots are connected by straight lines. The p-norm is also defined for functions. The sum is replaced by an integral and the dimension n=33 is replaced by the interval [a,b]=[-2.8,2.8] as can be seen in table 5.3. The distance between two functions is thus given by the p-norm of the difference. Due to the similarity with the distance measure in a finite vector space, it is not strange that the expected values are also zero or bigger in this case. The infinity norms are even equal,

Table 5.3 Different p-norms (p=1,2,∞) because the function is piecewise linear.

Vectors x en y 1-norm 1 n i i i

x

y

=

−

∑

2-norm 1/ 2 2 1 n i i i

x

y

=





−







∑



∞-norm

max

i=1..n

x

i

−

y

i Functions f(x) en g(x) 1-norm

( )

b a

f x

−

g x dx

∫

2-norm 1/ 2 2

( )

b a

f x

g x

dx





−







∫



∞-norm

max

x∈_{[ ]}a b,

f x

( )

−

g x

( )

(23)

Correlation measures

A correlation measure is completely different from a distance measure. According to Dekking (2005) the correlation between random variables is a dimensionless quantification of how they influence one other.

Pallant (2010) offers two different kinds of correlation coefficients. Pearson’s product-moment correlation coefficient (r) assigns a value between -1 and 1 to the degree of linearity between two variables. On the other hand, Spearman rank order correlation coefficient (ρ) assigns a value between -1 and 1 to the degree of monotonicity between two variables. These correlation coefficients can also be used when two different documents are compared. The row vectors of dimension n=33 are put in a scatterplot with one document at the x-axis and the other document at the y-axis. This results in a scatterplot with 33 dots.

Two documents that come from the same type of printer should have similar row vectors, so all dots should lie around the line y=x. In this case both correlation coefficients should be

approximately 1, because the line y=x is both linear and monotonically increasing.

In order to have a score that is zero or bigger, like the distance measure, the correlation measures are slightly altered:

1

1 s

= −

r

and

s

₂

= −

1 ρ

. However, both correlation coefficients have

disadvantages. Pearson’s product-moment correlation coefficient is also equal to 1 for other linear functions, a characteristic that is unwanted. On the other hand, Spearman rank order correlation coefficient is equal to 1 for all monotonically increasing functions. A solution to those disadvantages is to find the estimated regression line y=ax+b with the method of least squares and compare this line with the line y=x. A measure for the distance between the lines is the sum of the distance at 0 and the distance at m, which is an appropriate upper bound for all measurements (see figure 5.3).

This comparison score need to become symmetric by some scaling, because comparing document A with document B should result in the same score as

comparing document B with document A. Figure 5.3 Scatterplot with regression line Some trial and error results in the following two symmetric correlation measures:

3 1/ 2

b

am b m

s

a

+

+ −

=

_and ₄

1 b

am b m

s

a

+

+ −

=

+

.

One should however note that these scores do not take the distance between the dots and the regression line into account, while correlation coefficients do. For that reason all correlation measures will be examined in chapter 7.

(24)

Probability density function

The best scoring method has not been defined yet, so for every scoring method the same

procedure is followed. All combinations in the database are given a score. Afterwards these scores are split into two sets, one set with all scores that come from two documents printed by the same type of printer and one set with all scores that come from two documents printed by different types of printers. For convenience both sets are converted to a histogram and a probability density function is fit to the data.

Folded normal distribution

A common choice for the probability density function is the use of a normal distribution. This probability density is a smooth function, and is known as an appropriate probability density function in many scientific topics.

However, the normal distribution allows negative scores, what is not possible in this study. For that reason it is necessary to switch to the folded normal distribution, a normal distribution folded at x=0 that respects the sample mean and corrected sample standard deviation (see figure 5.4).

Figure 5.4 Example of the folded normal distribution and its underlying normal distribution

Let µf and σf be the mean and standard deviation of the folded normal distribution and µ and σ the mean and standard deviation of the underlying normal distribution. To plot the probability density function of the folded normal distribution µ and σ2 are necessary, but only µf and σf can be estimated by the sample mean and the corrected sample standard deviation:

∑

=

n i i f

x

n

1

1 µ

and

∑

=

−

=

n i f i f

x

n

1 2

)

(

1

1 _µ

σ

.

(25)

Luckely, Leone (1961) explains how to estimate µ and σ with help of µf and σf.In table 5.4 the ratios σf/σ and µ/σ can be found with help of the ratio µf/σf, and µ and σ can be estimated.

Table 5.4 Ratios σf/σ and µ/σ corresponding to various ratios of µf/σf (Leone, 1961, p.545)‡ The ratio µf/σf will hardly be smaller than 1.3236, so the first line of ratios is also valid for this situation. On the other hand, when the ratio µf/σf is bigger than 3.00, folding hardly influences the mean and standard deviation, thus σ=σf and µ=µf are appropriate.

Kernel density estimate

A second option for the probability density function is the use of a kernel density estimate. To respect the impossibility of negative scores, not a normal kernel should be applied, but a folded normal kernel. One should take µf=xi and an appropriate bandwidth σf for each element xi in the dataset. Then the prior procedure is used to find the underlying normal kernels. Finally the kernel density estimate is produced by summing all folded normal kernels.

‡

(26)

Within-and-between variation

For both sets it is now possible to plot the within-and-between variation. The probability density function that corresponds to the set with all scores that come from two documents that are printed by the same type of printer is called the within curve, while the probability density function that corresponds to the set with all scores that come from two documents that are printed by different types of printers is called the between curve.

Likelihood ratio

Note first that the evidence equals the comparison score of the two questioned documents. Then the likelihood ratio is an expression for the evidential power, given by the quotient of the

probability that the evidence happens given that both documents are printed by the same type of printer over the probability that the evidence happens given that both documents are printed by different types of printers.

Some probability analysis results in the fact that the likelihood ratio then must be equal to the quotient of the within curve over the between curve. Because there exists no evidence in this study, the likelihood ratio will be plotted with the evidence as a variable:

)

(

)

(

)

|

(

)

|

(

)

(

1 0

E

f

E

f

H

E

P

H

E

P

E

LR

b w

=

.

5.4 Validity of the likelihood-ratio framework

According to Morrison (2011) the validity of the likelihood-ratio framework can not be determined by correct classifications and classification errors, due to the fact that the evidential power is not taken into account. Therefore he introduces the log-likelihood-ratio cost from Brümmer and du Preez as a metric of validity.

For all N pairs of documents printed by the same type of printer the likelihood ratio (LRequal) is estimated and for all M pairs of documents printed by different types of printers the likelihood ratio (LRdiff) is estimated. The log-likelihood-ratio cost is then given by the mean of two means of penalties:

(

)

_













+













+

=

∑

= = M j j diff N i equali llr

LR

M

LR

N

C

1 , 2 1 , 2

log

1

1 log

1

2

1

.

The validity is better, when the likelihood-ratio cost is closer to zero. So with help of the log-likelihood-ratio cost it is possible to choose the best scoring method.

(27)

6 Calculations in Matlab

In this research method five main steps are distinguised in Matlab (see figure 6.1). First all data are imported from Excel to Matlab. Then all data are compared and every pair gets a score. Because some pairs of documents are printed by the same type of printer and other pairs of documents are printed by different types of printers, the data are sorted into these two groups. Then a probability density function is fitted to the data and a graphic overview of the results is plotted. And finally the validity needs to be calculated.

Figure 6.1 Flowchart of Matlab

Each step has its own function file and all function files are available in Appendix C. Information on existing functions in Matlab (written in italics, e.g. norm.m) can be found in the Documentation Center (Mathworks, n.d.).

Loading

The database in Excel contains all information on one worksheet. To have a better excess to this information all data is transferred to a new Excel file that contains nine worksheets sorted by its reference page and its magnetic properties.

The existing function xlsread.m imports data from one worksheet at a time. The brand and type are saved in a string array, while scalars are saved in a matrix.

In this step also the decision is made to use the data on a linear scale (10^Ni) or to use the data on a logarithmic scale (Ni) as given by CADR.

Comparing

The next step is to compare all samples with each other. The scoring method determines which function file is used. ComparingVectorSpace.m contains all distance measures in a finite vector space, ComparingFunctionSpace.m contains all distance measures in a function space and ComparingCorrelation.m contains all correlation measures.

Distance measures in a finite vector space can use the existing function norm.m, while distance

measures in a function space are estimated with help Figure 6.2 The area under a function of the trapezoidal rule (see figure 6.2). is estimated with help of trapezoids At last ComparingCorrelation.m uses the existing function corr.m for correlation measures s1 and s2 and anonymous functions for correlation measures s3 and s4. Note that the anonymous functions incorporate m=106_{in case of a linear scale and m=6 in case of a logarithmic scale.}

Comparing Sorting Plotting Validating Loading

(28)

Sorting

One way to sort all scores is to compare brand and type. The list with brand and type of all printers is already imported in step 1, so all we have to do is compare them with help of the function

strcmp.m. A for-loop will sort all comparisons into two column vectors ‘Equal’ and ‘Diff’. Plotting

When the probability density function of some folded normal distribution needs to fit the data, it is necessary to estimate µ and σ. First the sample mean µf and the corrected sample standard deviation σf are calculated with help of mean.m and std.m. The function file fitting.m then automatically compares the ratio with table 5.4, and does the appropriate subsequent steps to calculate µ and σ. At last the probability density function can be calculated with help of

normalpdf.m.

However, the kernel density estimate needs to fit a folded normal distribution with appropriate bandwidth for each kernel (determined to be 0.8*dx by trial and error, where dx is the binwidth of the histogram). Fitting.m is thus repeatedly used in this procedure. All kernels are summed and form the kernel density estimate.

Next a graphic overview of all results is given, because a picture is worth a thousand words. The function file hist.m gives the data from the histogram, while bar.m converts the data into a graph. The histograms are combined with the probability density functions plotted with plot.m. Finally these functions are also used to plot the within-and-between variation and the graph of the likelihood ratio.

Validating

Finally the performance of the likelihood-ratio framework is calculated in two steps. First for each score the likelihood ratio is estimated from its graph. Then the log-likelihood-ratio cost is calculated according to its equation.

(29)

7 First analysis

Before it is possible to discuss the results based on all data and their interpretation, it is necessary to make some important choices. Section 7.1 explains the correlation between the flux and the magnetic induction. Section 7.2 then determines which data is most promising. And section 7.3 to 7.5 treat the decision-making in respectively the scale, the probability density function and the score based on the most promising data.

7.1 Correlation between flux and magnetic induction

In section 5.2 the physical relation between flux and magnetic induction is introduced. This relation already suggests that the flux is probably correlated with the magnetic induction. When this is the case, the flux can be neglected and the analysis can be done with use of the measurements of magnetic induction only.

A total of 1310 samples are measured with the Magmouse. For every sample the flux and the magnetic induction are measured. The flux is also calculated from the magnetic induction. The measured flux and the calculated flux should have similar results, which results in a dot at the line y=x in the scatterplot. Figure 7.1 shows that this is indeed the case for all samples.

Figure 7.1 Scatterplot of measured flux vs calculated flux (Number of measurements n =1310, Correlation r=0.9994)

Now it is mathematically shown that the flux and the magnetic induction are highly correlated and it is allowed to continue with the data from the magnetic induction in the rest of this study.

(30)

7.2 Diversity of the measurements

The next step is to see what data should be used in the decision-making in the last three sections. This step is very important, because not all data will provide proper results.

First, table 7.1 shows that reference page B has most prints for both non-magnetic and magnetic toners. Because more prints result in more reliability, table 7.1 is in favour of reference page B.

Non-magnetic toners (pages / printers) Magnetic toners (pages / printers) Reference page A 123 / 83 6 / 4 Reference page B 136 / 34 72 / 19 Reference page C 23 / 23 0 / 0

Table 7.1 Subdivision of the reference pages

The next question is whether to use the non-magnetic toners or the magnetic toners. Helpful in this decision is the diversity of the magnetic induction. Because it depends on 33 variables, it is hard to show the diversity. However, the flux consists of only one variable and is highly correlated to the magnetic induction, so boxplots of the flux will represent the diversity (see figure 7.2).

Figure 7.2 Boxplots of the measurements of all reference pages B

The boxplots show that the diversity for magnetic toners is many times bigger than the diversity for non-magnetic toners and also bigger than the standard deviation of the Magmouse. For that reason the set of magnetic toners has the most promising data, and will be used in the rest of this chapter to make some important decisions.

(31)

7.3 Linear and logarithmic scale

The data of the histogram of the magnetic induction is saved on a logarithmic scale. This, however, does not mean that the data should be compared on a logarithmic scale. There is a possibility that the data give better results when it is considered on a linear scale.

In figure 7.3 and 7.4 the histograms for the 1-norm in a vector space are shown after comparison on a linear scale and on a logarithmic scale. Because there are less peaks in figure 7.4, the logarithmic scale is preferable.

Figure 7.3 Histograms after comparison on linear scale (1-norm, vector space)

Figure 7.4 Histograms after comparison on logarithmic scale (1-norm, vector space)

Similar results are found in figure 7.5 and 7.6, when the 2-norm in a function space is considered, and also other scores lead to the conclusion that a logarithmic scale is preferable.

Figure 7.5 Histograms after comparison on linear scale (2-norm, function space)

(32)

7.4 Probability density function

In section 5.3 two possibilities for the probability density function are suggested. The probability density function of the folded normal distribution and the kernel density estimate.

Figure 7.7 gives the histogram for the 2-norm in a function space with the probability density function of the folded normal distribution, while figure 7.8 gives the kernel density estimate. For magnetic toners from the same type of printer both probability density functions are approximately the same, while the functions differ for magnetic toners from different types of printers.

Figure 7.7 Folded normal distribution fitting the histograms (2-norm, function space)

Figure 7.8 Kernel density estimate fitting the histograms (2-norm, function space)

There is one big difference that places the kernel density estimate in favour of the folded normal distribution. According to the folded normal distribution the probability for a score around zero is much higher than according to the kernel density estimate. According to the histogram a score of zero does not happen very often. The kernel density estimate thus follows the expectation that equal measurements do not often happen when documents are from different types of printers. Other histograms, for example the histogram for the infinity norm, show some asymmetry. No folded normal distribution can fit these histograms, as can be seen in figure 7.9, which again places the kernel density estimate in favour of the folded normal distribution.

(33)

7.5 Scoring method

In the previous sections the logarithmic scale and the kernel density estimate have been chosen. With these choices in mind, the different scores are compared in three subsets: the distance measures in a finite vector space, the distance measures in a function space and the correlation measures. The section ends with an overview of the validity.

Distance measures in a finite vector space

In figure 7.10 the within-and-between variation and the likelihood ratio are plotted for the 1-norm in a finite vector space. The within curve shows a peak, while the between curve is much lower and spread. The likelihood ratio is not very high, but the shape is still a positive sign. The validity is represented by a log-likelihood-ratio cost of 0,5665.

Figure 7.10 Within-and-between variation and likelihood ratio (1-norm, vector space)

The 2-norm in a finite vector space has a similar plot of the likelihood ratio (see figure 7.11). However, the log-likelihood-ratio cost is slightly higher (0,6029) due to the fact that the interval for which the likelihood ratio is bigger than one captures a larger area beneath the between curve (more type II errors).

Figure 7.11 Within-and-between variation and likelihood ratio (2-norm, vector space)

The infinity norm tells a different story. The within-and-between variation shows many wiggles, and the likelihood ratio is lower at the interval where the within curve is relatively high (see figure 7.12). This finds expression in the validity (Cllr = 0,6386).

(34)

Distance measures in a function space

One would expect that the 1-norm and 2-norm in a function space have similar results as in a finite vector space. However, this is false for the 1-norm. The within and between curves have the same shape, what results in low likelihood ratios. The effect is a log-likelihood-ratio cost of 0.7955, which is much larger than the one for the 1-norm in a finite vector space.

Figure 7.13 Within-and-between variation and likelihood ratio (1-norm, function space)

The 2-norm does show similarities with the 1- and 2-norm in a finite vector space (figure 7.14). Nevertheless the log-likelihood-ratio cost is higher (Cllr = 0,6572).

Figure 7.14 Within-and-between variation and likelihood ratio (2-norm, function space) Correlation measures

Besides all distance measures, chapter 5 has also introduced some correlation measures. Pearson’s product-moment correlation coefficient only gives the degree of linearity, which is quite a general formulation. It was expected that the results of correlation measure s1 (see figure 7.15) would not be sufficient, as the validity (Cllr = 0,6910) confirms.

(35)

Spearman rank order correlation coefficient is even more general than Pearson’s product-moment correlation coefficient, denoting the degree of monotonicity. The within curve and between curve in figure 7.16 coincide on multiple punts, what causes a useless likelihood ratio for correlation

measure s2. The validity is therefore omitted.

Figure 7.16 Within-and-between variation and likelihood ratio (correlation measure s2) The third correlation measure is score s3. Figure 7.17 shows a peak in the within curve, while the between curve is relatively low. This seems to be a good measure and the log-likelihood-ratio cost (Cllr = 0,6085) is in the top 5.

Figure 7.17 Within-and-between variation and likelihood ratio (correlation measure s3) The fourth and last correlation measure s4 consists of the same expressions in the numerator as correlation measure s1, but the denominator causes a change in range (see figure 7.18). This results in a hardly changing validity (Cllr = 0,6079).

(36)

Overview

With help of the validity of all mentioned measures, a choice should be made. In table 7.2 an overview of the results is presented. Based on this table one may conclude that the 1-norm in a finite vector space is the best option.

Score Cllr

1-norm (vector space) 0,5665

2-norm (vector space) 0,6029

∞-norm (vector space) 0,6386

1-norm (function space) 0,7955

2-norm (function space) 0,6572

Correlation measure s1 (Pearson) 0,6910 Correlation measure s2 (Spearman) N/A

Correlation measure s3 0,6085

Correlation measure s4 0,6079

(37)

8 Results & Interpretation

In the previous chapter many choices have been made. One has chosen to evaluate all data on a logarithmic scale, to use a kernel density estimate with folded normal kernel as probability density function and to use the 1-norm in a finite vector space as comparing method.

The present chapter will provide the final results and how to interpret them. Section 8.1 continues with black magnetic toners, while sections 8.2 and 8.3 deal with black non-magnetic toners and colored toners.

Sections 8.4 and 8.5 study possibilities to improve the scoring method. One could use for example multiple samples or multiple measurements. At last one could compare samples with similar fonts and sizes instead of exact matches and see whether the method is still appropriate.

8.1 Black magnetic toners

Samples of black toners do not only occur in reference page B. Also reference page A and C contains regions in black. When all samples of black magnetic toners are taken into account, one deals with 317 within pairs and 7366 between pairs. Of course only the exact same samples can form a pair.

Applying the likelihood-ratio framework to these pairs result in the graphs in figure 8.1. The within histogram shows a peak at a score of about 8, while the between histogram is lower and more spread. Both curves nicely follow the histograms, so the likelihood ratio in the fourth graph is a good representation for the black magnetic toners in the database§_.

Figure 8.1 Results of black magnetic toners

From the third and fourth graph the intervals can be determined for which the null hypothesis is in favour of the alternative hypothesis and for which the opposite is true. The boundary between both intervals is given by the score at the intersection of the within and between curves, which is equivalent to the score for which the likelihood ratio equals one. With help of Matlab the score is determined to be 11,1600.

§

(38)

The best way to explain the interpretation is with help of two (fictive) examples:

Example #1

A threatening letter (printed by a laser printer) has been captured by some security service. An investigation has started and a suspect has come up. After a search the police found a laser printer in the suspect’s house. A test print is made with the laser printer and compared to the threat letter. The comparison results in a score of 8,2 and has thus a likelihood ratio of 2,0. A likelihood ratio bigger than one means that the null hypothesis is in favour of the alternative hypothesis and a likelihood ratio close to one results in a low evidential power.

This means that whatever you believed before this evidence was presented, you should now be 2 times more likely than before to believe that the threat letter and the test print come from the same type of printer.

Example #2

Two companies signed a contract (printed by a laser printer) in duplo. One company claims that the other company altered the contract, but can not show their copy due to a bad administration. The word ‘the’ can be found multiple times in the contract, including in the questioned sentence. The words are compared resulting in a score of 22,3. Now the likelihood ratio (0.005) is smaller than one, which put the alternative hypothesis in favour of the null hypothesis. This means that whatever you believed before this evidence was presented, you should now be 200 times more likely than before to believe that the contract is printed with help of two different types of printers. Compared to the first example the evidential power is now much larger.

The likelihood ratio thus gives the evidential power in favour of one of the hypothesis. Note

however that extended study is necessary before the results shown are allowed to be used in legal cases.

8.2 Black non-magnetic toners

The same method is also used for black non-magnetic toners. Samples are available from reference page A, B and C. Only exact same samples are compared, which include the black squares from reference A and C.

The result of this subset is shown in figure 8.2. The within and between curves have a similar shape, what makes the likelihood ratio about one. The peak at a score of 40 is probably a round-off error with major consequences. No conclusions can be drawn.

(39)

8.3 Colored toners

These days many laser printers contain besides black toner also cyan, magenta en yellow toners. All toners but one are non-magnetic and show the same diversity as black non-magnetic toners. Due to the similarities between the colored toners only the graphs for the magenta toner is shown (figure 8.3). The within and between curves intersect multiple times, so wiggles around one in the likelihood ratio graph are no exception. Again no conclusions can be drawn.

Figure 8.3 Results of magenta toners

8.4 Multiple samples

Note that section 5.1 describes how three samples are taken from each reference page B. Until now, all samples were handled as a singular sample, but it is also possible to combine the three samples. Instead of three vectors of length 33, one can compare one vector of length 99. Now the differences between the within and between curve become larger, as can be seen in figure 8.4.

(40)

One result of combining multiple samples is that the likelihood ratios can become higher (up to 18 instead of 8), what provides a stronger evidential power. On the other hand less between pairs are false positive, due to a slight stretch of the between curve. Both consequences contribute to a lower log-likelihood-ratio cost. Indeed 0,4531 is an improvement with respect to 0,5665 found in section 7.5.

8.5 Multiple measurements

Besides multiple samples, it is also possible to do multiple measurements. The Magmouse provides two options: soft-magnetic measurements and hard-magnetic measurements. The vector of length 66 can be compared and the log-likelihood-ratio cost that belongs to figure 8.5 becomes 0.5271. Note that it is also possible to repeat one kind of measurement two or more times.

Figure 8.5 Results of multiple measurements in one comparison

8.6 Different fonts

In reference page B the word ‘Bob’ appears in three different fonts. Earlier only exact same samples were compared. This section deals with the question whether different but comparable fonts can be compared as well. First we take a look at the diversity of the measurements. Figure 8.6 shows that the diversity is similar for all fonts, what is necessary to bring this extension to a success.

When one allows the comparison between different but comparable fonts and one runs the Matlab code, one gets figure 8.7. This

extension slightly changes the graphs and Figure 8.6 Boxplots of the flux for three log-likelihood-ratio cost in a negative way different but comparable fonts. (0.6457), but makes it not impossible to use.

(41)

(42)

9 Discussion

Each study deals with quality issues and assumptions. In the present study a database is used, for which the sample size and the representation need to be discussed On the other hand two

assumptions about homoscedasticity of measurements and the bandwidth of the kernels are discussed. At last some comments on the performance test are given.

Quality of the database

The database that is used in this study seems to be large with 360 prints from 163 laser printers from 11 brands. However, the non-magnetic toners appear to be useless, because all

measurements are approximately the same. Only 78 prints from 23 laser printers from 4 brands are left for the study. The size is large enough for a preliminary investigation, but for a proper representation one should have more prints from more printers.

Besides the sample size another point of interest arises. The database only consists of notes about the brand and type of the printer, but does not take the brand and type of the toner into account. It is assumed that the toner does not influence the magnetic activity, although this is not very realistic.

Assumption of homoscedasticity

In chapter 4 thirty measurements are considered to estimate the standard deviation of the flux measurements by the Magmouse. This is done only for the HP Laserjet P1006, which had a mean of 60,4mT, and homoscedasticity is assumed for other types and brands. This assumption is

questionable, because it is possible that the standard deviation of the flux measurements is smaller or bigger for a laser printer with a mean of 20mT.

Bandwidth in kernel density estimate

The bandwidth of a kernel depends on the assumption of homoscedasticity, because

homoscedasticity implies a constant bandwidth. By trial and error the bandwidth is chosen to be 0.8*dx. Even when homoscedasticity appears to be true, it is possible to find a better

approximation for the bandwidth with help of multiple measurements. This idea is outlined in chapter 11.

Performance test

It is common to do a performance test for new research methods. Morrison (2011) mentioned both validity and reliability for the likelihood-ratio framework. Due to lack of time only the validity is incorparated, which is an incomplete analysis.

(43)

10 Conclusion

This study shows that it is possible to use magnetism to discriminate between different laser printers on a quantitative basis. This follows from comparisons between reference pages from 23 different laser printers with a black magnetic toner. For this the magnetic induction is measured using the Regula Magmouse Model 4197. The measurements are compared with help of the 1-norm in a finite vector space, a scoring method with the best validity compared to eight other scoring methods. Unfortunately, the method can not be used for laser printers with a non-magnetic toner. The differences that appear in these measurements fall within the range of measurement

uncertainties.

The results will contribute in research regarding questioned document examination, for example mutual comparison of documents, linking documents to a machine and determining the

authenticity of documents. The likelihood-ratio framework provides a small evidential power, but some adjustments could increase the evidential power. Suggestions for that are the use of multiple samples or multiple measurements.

Note at last that further investigation is necessary before a cut and dried method is available for legal cases.

(44)

11 Recommendations

In this chapter some solutions are suggested for the items in the discussion and some extensions are put forward.

Future database

The present database is not representative. A future database, which can be used in legal cases, should be composed carefully. With help of market research the composition and sample size of the database can be determined. Because colored toners are hardly ever magnetic, the database can be limited to reference pages with black text. Reference page B remains a good option, because the complete alphabet and all numeric characters are incorporated. Furthermore it is advised to include samples from different printers from the same type and printers from the same type with different kinds of toners.

Homoscedasticity versus heteroscedasticity

Before continuing with the assumption of homoscedasticity, one should verify this. This study presents the standard deviation for a printer with a flux of approximately 60mT. When the standard deviation is the same for printers with a flux of approximately 20mT and 40mT, one can continue with the assumption.

Bandwidth in kernel density estimate

The assumption of homoscedasticity can also be replaced with a better approximation of the bandwidth of a kernel in the kernel density estimate. A researcher could do, say, 5 measurements for each sample. One comparison between two samples would than consists of 25 comparisons between measurements, from which a mean and standard deviation can be calculated. This mean and standard deviation can then be used for this particular kernel.

Performance test

A recommendation for the performance test is to use both validity and reliability mentioned by Morrison (2011). Because this can change the opinion for best scoring method, it is recommended to revise the performance of the four best methods (1-norm in a finite vector space, 2-norm in a finite vector space, correlation score s3 and correlation score s4).

Extensions

In this study the influence of paper and time is not incorporated. It is not known whether the magnetic properties of a print decreases after some period or whether the adhesion of toner particles differs per paper. These possible influences are important to investigate, because questioned document examiners can deal with older documents and different kinds of paper.

(45)

(46)

(47)

(48)

Appendix B.1 Script for measuring with the Regula Magmouse Model 4197

1. Connect the Magmouse to the computer.

(For soft-magnetic measurements the magnetic bias system should be on the Magmouse, while for hard-magnetic measurements the magnetic bias system should be removed.)

2. Open ‘CADR.exe’.

3. Click on ‘Background image’.

4. Hold the Magmouse 10 cm above the surface, click on any button of the Magmouse and wait for about 10 seconds.

5. Put the reference page beneath the Magmouse.

6. Position the Magmouse above some text of figure with help of the transparant model. Make sure that the sample is measured in the upper left corner of the measuring area. 7. Click on ‘Image input’ and wait for about 10 seconds.

8. When the text or figure is highlighted on the screen, the toner is magnetic.

When the text or figure is not highlighted on the screen, the toner is not magnetic. 9. Save the conclusion in Excel.

10.Press F4, set the size and position of the fragment and press Enter.

11.Press Ctrl+Del to cut the fragment. 12.Click on ‘Magnetization’.

(49)

14.Click on ‘Magnetic measurements’ and wait for about 30 seconds.

15.Press F4, set the size and position of the fragment again and press Enter. 16.Save the flux in Excel.

17.Save data from the histogram in Excel.

18.Right click on histogram and convert histogram to image.

19.Save image in an appropriate folder. 20.Close all frames.

21.Repeat step 3 to 20 for other text or figures. 22.Close ‘CADR.exe’.

(50)

(51)

Appendix C.1: Loading.m

function [List,Data] = Loading(Page,Printers)

% Use this script in the study into magnetism of toner particles. % This script will load flux and magnetic induction measurements % from one of nine worksheets in the excel-file 'Database.xls'. % Copyright 2013 Miriam Mieremet

disp('Loading...') List = []; Data = []; switch Page

case 1 % reference page A

xlstab = 1; switch Printers

case 1 % non-magnetic printers

Data = xlsread('Database.xls',xlstab,'Q5:EV127'); [N,List] = xlsread('Database.xls',xlstab,'C5:D127'); case 2 % soft-magnetic measurements of magnetic printers

Data = xlsread('Database.xls',xlstab+1,'Q5:EV10'); [N,List] = xlsread('Database.xls',xlstab+1,'C5:D10'); case 3 % hard-magnetic measurements of magnetic printers

Data = xlsread('Database.xls',xlstab+2,'Q5:AX10');

[N,List] = xlsread('Database.xls',xlstab+2,'C5:D10'); otherwise

disp('The second input is wrong.') end

case 2 % reference page B

Data = xlsread('Database.xls',xlstab,'Q5:DN140'); [N,List] = xlsread('Database.xls',xlstab,'C5:D140'); case 2 % soft-magnetic measurements of magnetic printers

Data = xlsread('Database.xls',xlstab+1,'Q5:DN76'); [N,List] = xlsread('Database.xls',xlstab+1,'C5:D76'); case 3 % hard-magnetic measurements of magnetic printers

Data = xlsread('Database.xls',xlstab+2,'Q5:AX76'); [N,List] = xlsread('Database.xls',xlstab+2,'C5:D76'); otherwise

case 3 % reference page C

Data = xlsread('Database.xls',xlstab,'Q5:EV27'); [N,List] = xlsread('Database.xls',xlstab,'C5:D27'); case 2 % soft-magnetic measurements of magnetic printers

disp('No such printers')

case 3 % hard-magnetic measurements of magnetic printers

disp('No such printers') otherwise

otherwise

disp('The first input is wrong.')

end

disp('Completed')

(52)

Appendix C.2: ComparingVectorSpace.m

function [Score] = ComparingVectorSpace(Data,z)

% Use this script in the study into magnetism of toner particles. % This script will compare the magnetic induction for each pair % of samples with the p-norm in a finite vector space.

disp('Comparing...')

[n,m] = size(Data); Score = zeros(n,n);

% find p-norm in a finite vector space for each pair of samples for i = 2:n for j=1:i-1 Score(i,j)= norm(Data(i,:)-Data(j,:),z); end end disp('Completed') return

The use of magnetism to discriminate between different laser printers